[cvsnt] Betr.: CVSNT to CVS

Wed Apr 8 23:02:57 BST 2009

On Wed, 08 Apr 2009 19:13:31 +0000, Tony Hoyle wrote:
> Andreas Krey wrote:
> >It comes close to the best thing since version control in general, IMHO.
> >That way I can basically do everything offline, even complicated merges,
> >and only need a connection for pushing back the results. I first considered
> >it overkill just as well but then the whole git repository is typically
> >smaller than all the 'pristine copy' files that svn puts in the sandbox.
> 
> Only for a new repository.  Try pulling down something like redhat, for 
> example.

Surprisingly, not. Pulling the linux kernel really takes some time,
but take one of our repos which now contains about 1600 commits:
It contains 3000kBytes of sources, and the git repo is 2100kBytes
big. The same in CVS(nt) ist three times that size, and svn needs
almost ten times that size. Also note that the repo is smaller
than the source which means that the svn sandbox is (because
of the pristine copy) bigger than a git 'sandbox' including the
repo.

Most software is written and then a bit modified, and then stays,
which means repositories don't grow over time, but mainly with
the code size they contain. Which explains why git often manages
to have the repository smaller than the (uncompressed) tar ball
of the head.

...
> it's debatable how much you're actually saving).  If you're only on a 
> project for a short period it's a huge waste especially if it's 
> happening remotely.

I'm just cloning the linux kernel just to see the numbers (and
to look a bit into the history), and that does take some time,
and isn't through yet...now it is (at 1Mbit/s it takes a while).

Sandbox:               400M (as by 'du -k')
Repository:            320M (dito)
Uncompressed tar file: 353M
     Gzipped tar file:  76M (sandbox and tars of head/master)

Which basically means that you need to enable compression if a
cvs checkout of the head shall be faster than a git clone of the
*entire* repo.

git trades space savings for CPU power; with our lowly sparc
the processor is the bottleneck, not the WAN.

...
> Something we considered adding to cvsnt is offline commits, so you could 
> commit locally then sync later.  It's not really a common case though - 
> it's rare you don't have *some* internet connectivity.

I happen to travel by rail, and while there is GPRS etc., it is
a bit flaky when you actually move, and it's not exactly fast.
You definitely don't want to do lots of 'cvs diff's there.

...
> People use a lot less of version control systems than sometimes forums 
> like this would have you believe.  They're a tool.  They don't spend all 
> day debating the best way to work..

Those people would also be pretty indifferent to switching tools;
give them a list of old and new commands, and that's it.

> heck, a significant proportion never 
> even merge in the way you or I would.. the fire up WinMerge, view the 
> differences and merge manually, then commit the changes without any 
> merge information.  That's the way they like it

The question is whether they like it because there was hardly another
way on cvs. Also I can't quite believe they don't run into problems
pretty soon when they seriously start using branches.

...
> If you get 90% of the functionality right then you cover 99% of the 
> users.  The others are the noisy ones that complain on forums :)

Unfortunately, those (the mergemeisters and generally those who
do the difficult stuff) need to be in the boat. If you omit these
10% you got an acceptance problem at least where such people exist.

...
> Pulling down an entire cvs repository is a long process (even more so 
> for svn).  We do it for migrate and it can take many hours.

I know. :-(

...
> I presume git is based on arch,

git was started from scratch, and has taken ideas from bitkeeper,
which was previously used to version control the linux kernel.

> so had an existing system to base itself around.

Not really. But git sprouts lots of useful little things that are
all in the base distribution, like clean (delete all files mentioned
by .gitignore), archive (make a tar/zip file from then current commit),
bisect (find the commit that caused a problem, without manual bookkeeping),
and lots of general (command line) usability stuff. Just how often you
have to type repository urls to svn is annoying. (Although the price for the
biggest shittyness*popularity product goes to ant.)

...
> >No, but if I have a svn repository that makes heavy use of svn:external,
> >and EVS doesn't support it, I'm rather unlikely to switch.
> 
> That one we haven't done yet.. I don't see much (or any, really) use of 
> it in the wild.

It's just become really useful in 1.5, and we do heavily need it to
maintain which library is used in which version in what project/module.

> The same goes for git support.. nobody's actually asked for it yet.  It 
> hasn't been beyond the discussion stage.. maybe never will, or maybe a 
> customer will turn up with a 10,000 user order tomorrow and make it a 
> requirement.  I can dream :p

I note a trend there. :-) Namely the reliance on the commercial
business case (which is nothing to boo at).

> >But you can't commit to each one of them when they are disconnected
> >from each other?
> 
> Yes you can - it's designed for intermittent/poor connections.  If they 
> were connected all the time anyway you wouldn't need two servers.

Conflicting commit resolution?

> >Ehm, nope. He could have given me the patch but then the svn/evs
> >commit would carry my name, not his.
> 
> Obviously the patch bears the name of the person who does the commit. 
> If he doesn't have commit rights then it could never have his name, 
> except perhaps in the comments.

git has another stand on this. If I just forward the commit to the
official repository, by default it keeps the original committer name.
There won't even necessarily be a (visible) branch.

> They normal way to handle that kind of thing is to give certain people 
> their own branch and promote changes from that branch to the development 
> branch as permission is granted by the admin.

'normal' as in 'centralized VCS', of course. :-) The problem is the
administrative overhead in that, the distributed VCS are a lot more
democratic (or anarchic) there.

Andreas