[cvsnt] Re: check CVS repository integrity

Michael Wojcik Michael.Wojcik at microfocus.com
Fri May 19 17:56:58 BST 2006


> From: cvsnt-bounces at cvsnt.org 
> [mailto:cvsnt-bounces at cvsnt.org] On Behalf Of Glen Starrett
> Sent: Friday, 19 May, 2006 11:49
> 
> If you were thinking of doing an overall checksum for the file it
would 
> be impractical -- having to calculate the checksum on the entire RCS 
> file after every commit, tag, etc. would kill performance.

I believe that's highly dubious.  CVS has to rewrite the whole file for
most operations, since the RCS file format is plain-text (and
consequently full of, in effect, variable-length records).  Computing a
checksum over the contents before writing them out would be a matter of
a handful of cycles for typical algorithms.  I'm willing to be that for
the vast majority of users even a cryptographic hash like MD5 or
(better) SHA-256 would not have a noticeable effect on performance.
It's negligible relative to disk I/O time - not to mention network I/O
for remote repositories, which is what most people use anyway.

I wrote an implementation of MD5 in COBOL (!) recently, and on an IBM
laptop that's a couple of years old (so hardly top-of-the-line hardware)
it gets about 5MB/s throughput - including the disk I/O.  And COBOL is
not an ideal language for this operation.  Calculating checksums is not
expensive.

Frankly, I think Walter Tichy should have included at least a CRC (the
output of the "sum" utility would have been fine) in the original RCS
file format.  Even for machines of the day the cost would have been
negligible.

I'd patch it in myself, but I'm not using a current build (we're still
stuck on 2.0.51d at the moment), so there wouldn't be much point in my
hacking the version I use.

-- 
Michael Wojcik
Principal Software Systems Developer, Micro Focus



More information about the cvsnt mailing list