[cvsnt] Re: Performance problems

Mon Dec 27 21:40:41 GMT 2004

Thanks for replying Tony. And I can by no means compare our knowledge of CVS
internal. Still allow me to argue the following:

> That would slow down the commit considerably....  it would also add
> quite a lot to the repository size - You may easily have 10,000
> revisions in a file...

First of all it is arguable whether commit or update should be faster, if
there's a tradeoff. I for one think that update is a much more frequent
operation. And with the BUILD tag method specified here I guess I would
prefer an update which is fast not only on the HEAD. But that I suppose
should be a "survey" of what the most common operations are.

Secondly -- I would increase repository size, but not by much at all! I
propose a hierarchical, or logarithmic, storage of the full versions. So the
difference in the number of times the full version would be saved between
10,000 revisions and 20,000 is... 1. Not terrible at all.

Finally -- I don't agree that I would slow commit considerably. If the diffs
are stored hierarchly (with a limit on depth) then a new commit only needs
to store the diff from one version, which by itself is a diff from no more
than (depth) other versions, but given that a commit is an "MMI action" -- 
there is a user on the other side, and commits are not done on hundreds of
files usuaully, the small time per file will not be noticed by the user who
is also waiting for TCP sockets, the GUI to update, etc. In percentage it
would be less than the time saved on updating. I'm guessing. :-)

> Revision storage is unreleated to tagging.  One of the advantages of cvs
> is you can make any tag point to any revision on any file, and can
> branch different parts of the repository at different times.  The cost
> of this is the tag is stored at file level - requiring the tag to be
> rewritten to each file.

Tagging at a file level is important, at least for me. But isn't there a way
to do so without writing to each file? I suppose you could store the tags in
a linked list in the file, so that adding a tag to the file won't have to
re-write the whole file. Or: could be stored in a different file altogether
(still have to see how this affects 'update by tag' for files without that
tag).

As an idea: the client knows which revisions of which files it is currently
holding. Just send that information to the server (recurse over all
client-side directories) and call that a tag. Put in a file of it's own.
Scalable, and quite fast.

> It's possible to optimise the cvs way of doing it, using tag heirarchies
> for example (which would make rtag instant or at least very fast), but
> it's a fairly major change so is waiting for some other things to be
> done before anything like that goes in.

What's the idea behind hierarchical tags? And slightly related: is there a
place I can read about ,v structure?

... and once again: sorry if I'm begin a little presumptious -- thought I'd
take advantage of the fact you took the time to reply :-)

Cheers,
Nitzan