[cvsnt] Re: Branch merging - this seems wrong...

Tue Jun 6 22:08:15 BST 2006

Tony Hoyle wrote:

> Tony Eva wrote:
>> What changes have been lost?  All the original commits to A are
>> still there, and the head of A now has the necessary changes to
>> allow A and B to work together.  What would the merge from the
>> branchpoint have saved?
> 
> If you used the merge point *all* changes on branch A would remain on branch A 
> and would not be merged.
> 
> If your intent is to promote the development to test then those changes are 
> effectively lost.

I think this whole struggle about what to call a branch ('development',
'test', 'stable', ...) doesn't do much good. Let's just stick with branch
A, branch B and branch C and a defined scenario...

Branch A: that's where all other branches eventually end up. May be HEAD or
anything else.
Tag A1: developer B starts branch B.
Tag A2: developer C starts branch C.
Tag A3: developer C merges his changes from branch C back to A.
Tag B1: dev. B needs the code from dev. C and merges it in from branch A.
(repeat the cycle A2,A3,B1 ad lib)
Tag A4: dev. B merges his code back to A.

A
|
A1----->B
|       |
A2-->C  |
|    |  |
A3<--+  |
|       |
+------>B1
|       |
A4<-----+
|

If B and C don't go on a branch, every commit they do affects everybody on
branch A. Say in a product with 30 modules, each of them works on 3
modules. Their changes are not immediately necessary for the rest of the
team (10 other developers working on other modules), but they need the
modules to be working for their own work. During both B's and C's work, the
affected modules will be in various intermediate, non-working states. 

I see the following options:

1- Don't create branches B and C, all work on branch A. Since they have
non-working code in their sandbox that would make it impossible for the
others (on the same branch) to work, they don't commit. B and C (and
everybody else) don't commit until their code is complete. Admins generally
don't like this :)  The merges happen when the developers update their
sandboxes; either regularly (then they are similar to option 4) or just
once at the end (then they are similar to option 3).

2- Don't create branches B and C, all work on branch A. They do commit
daily. Merges happen every day before the commit (similar to option 4, but
without the control when exactly the merge happens: merging is dictated by
the backup function of the commits). This creates the situation that the
whole application may not work until B is done. Everybody else just has to
work without a running application. (When B is done, probably dev. F is
right in the middle, and again nothing works :) This enforces that all code
is always close together. This may be good, or may be disastrous, depending
on the project situation.

3- Create the branches B and C, but don't do the merge B1. Since we have
the branches, the devs can commit daily and the admin is happy :) -- but
the final merge of B may be bigger. And everybody who works on another
module who has something to contribute for B has to do that on the B
branch. May make sense, or not.

4- Create the branches B and C and do the merge B1 as described. The
advantage compared to 3 is that the final merge of B may be much easier.
Tony Hoyle called that "ouch messy" in another message :) and it may be,
sometimes. But it also may be less messy than option 3, because the
(possibly multiple) merges from A to B keep B close to A while the
(possibly extensive) feature or refactoring gets implemented. The final
merge from B to A then has branch B changes that are already within the
current branch A structure. I've had situations where this was (IMO, of
course -- we didn't run a parallel team that did it the other way) much
easier than if we had put off the merges until the final moment of branch
B. The changes were clearly structured and fit right into what was on A at
that time -- because of the many previous, smaller merges from A to B. 

Most of my experience with extensive merges is with cvs servers (not cvsnt
-- no merge points), so I cannot really comment much on how they work in
any of these scenarios. But I'm pretty sure that there are many situations
where scenario 4 (and I think that's what Tony Eva is talking about) makes
perfect sense. Also note that the merging work is independent of whether
separate branches exist or not -- the requirement to merge is not created
by branches, it is created by several people affecting the same files.
Merging in a branch with a certain change set is the same work as merging
in a sandbox with that change set. And you can do it all at once or in
smaller steps, with both approaches. The only difference is that with the
sandbox, it's not called "merging", it's called "updating" :)

But I see the problem with merge points here. I think you can look at a
merge point as a starting point for a collection of diffs. It of course is
not enough to use the diff from B1 to the tip of B for the final merge of B
into A (in option 4) -- you need to use the relevant changes all the way
from the start of B. So using B1 as merge point for the final merge doesn't
work: the diffs have to start at the start of B and go all the way to the
tip of B. (Note that when merging, you do not merge the difference between
the tip of B and A4 into A, you merge the difference between the root and
the tip of B into A. When Tony Eva says that in the last revision on B all
the information is there, that's talking about the difference between tip
of B and A4. But this is not the change set used for the merge.)

What might work is to use the change set of branch B /minus/ the change set
from A1 to A4 -- that is, having a merge command option to tell cvsnt that
all changes on A between the start of B and the tip of A should already be
incorporated in B and that it should only consider the /other/ changes on
B. (That would be a command for supporting specifically a scenario like
option 4.) I'm not sure how realistic it would be to expect such an
algorithm to work.

Gerhard