[cvsnt] Re: Performance problems

Tony Hoyle tmh at nodomain.org
Tue Dec 28 04:34:49 GMT 2004


Nitzan Shaked wrote:
> Secondly -- I would increase repository size, but not by much at all! I
> propose a hierarchical, or logarithmic, storage of the full versions. So the
> difference in the number of times the full version would be saved between
> 10,000 revisions and 20,000 is... 1. Not terrible at all.

If you're only saving 1 extra revision there isn't any point in the 
complexity of trying to handle something like that.

The diff rebuild in cvs is *very* fast.  Checking out rev. 1.1 of a 
10,000 revision file would cause a rebuild probably of less than a 
second.  Those are the extremes.  In the real world most people work on 
revisions
much nearer to HEAD than that - often just a couple of revisions.

In the example of the CVSNT_2_0_x branch that's close to a worst case 
(since it's been branched for longer than most branches would normally) 
and the slowdown isn't noticable.

> there is a user on the other side, and commits are not done on hundreds of
> files usuaully, the small time per file will not be noticed by the user who

Commits are often done on thousands or tens of thousands of files in 
large repositories.

> Tagging at a file level is important, at least for me. But isn't there a way
> to do so without writing to each file? I suppose you could store the tags in
> a linked list in the file, so that adding a tag to the file won't have to

How do you suggest doing this?

> re-write the whole file. Or: could be stored in a different file altogether

You still have to rewrite the file.  CVS *never* just modifies a file - 
that would be unsafe on disk failure/powercut etc.  It builds a 
completely new file (mostly by doing a copy of the unchanged elements 
and patching the new ones in) then at the last moment does a (hopefully) 
atomic rename of the file on top of the old one.

> As an idea: the client knows which revisions of which files it is currently
> holding. Just send that information to the server (recurse over all
> client-side directories) and call that a tag. Put in a file of it's own.
> Scalable, and quite fast.

Not really..  you're still having to write the file, which is the slow 
part.  You're saving little or nothing on the current scheme, unless the 
RCS files are *really* big, and in that case other factors are already 
slowing you down.

> What's the idea behind hierarchical tags? And slightly related: is there a
> place I can read about ,v structure?

With a heirarchical tag, you don't recurse down directories on rtag, you 
just tag the directory with the exact moment of the tag (this requires 
high-granularity timers in the files... per-second isn't nearly good 
enough).  Every file/directory below that is deemed to have a tag that 
is the current version at that moment, unless overridden by a lower down 
tag (on a subdirectory or on the file).

You have to be careful with branches done this way... it's not as easy 
as I make it sound.

Don't 'man rcsfile' on a Unix system gives you the basic structure. 
cvsnt extends this somewhat but is still mostly compatible.

Tony



More information about the cvsnt mailing list