[cvsnt] Forcing binary behavior

Glen Starrett glen.starrett at march-hare.com
Thu Feb 15 22:19:32 GMT 2007


Jim Hyslop wrote:
> You raise an interesting point, though, which maybe Arthur or one of the
> others from March Hare can comment on: which algorithm is more efficient
> at storing text files, the standard RCS diff or the binary algorithm
> CVSNT uses?

That depends on the nature of the file and the changes...

CVSNT (and CVS) uses a line based difference on text files *and* binary 
files by default.  This means, it looks for a CRLF and if the preceding 
characters are not exactly the same, it marks the change down in the RCS 
file.  Binary files are stored inefficiently because the "lines" can be 
extremely long (since there is not normally any pattern to how far 
you'll have to look to find CRLF).

CVSNT adds 'binary deltas' ('B' keyword) which is a more efficient 
algorithm for binaries and, with the 'z' keyword also will compress 
those deltas.

If you have a text file that has really really long lines, and very few 
of them, you'll be as inefficient as binary.  Or, if you have a normal 
text file with as little as 1 change per line, it'll store the entire thing.

Metadata overhead is about the same for both binaries and text.

Don't forget to consider the advantage of text -- ability to merge, 
automatic translating line endings based on the client OS, etc.


Regards,

-- 
Glen Starrett
Technical Account Manager, North America
March Hare Software, LLC

http://march-hare.com/cvspro/


More information about the cvsnt mailing list