[cvsnt] Unicode bug

Tony Hoyle tmh at nodomain.org
Tue Dec 9 23:48:41 GMT 2003


Glen Starrett wrote:

>>But we have trouble with Unicode file with BOM
>>(byte order mark).
>>We don't expect cvs server handles unicode specially,
>>so we register it with binary mode.
>>First commit(for add) is OK, but next commit is horrible.
>>Repository file will be broken.
> 
> 
> Cvsnt is supposed to be able to handle unicode files, but I have no direct
> knowledge of how well it works (sounds like you've checked it out a bit).
> 
Unicode works fine - I use it occasionally myself, and the java people 
seem to prefer it to anything else.

The OP registered the files as binary, which means that there is no 
unicode processing at all (so BOM makes no difference).  With -kb you 
always get back exactly what you put in.  Since he's using a 
non-standard server though it's impossible to tell what might be going 
on (the patch on the website is dozens of files, including some that can 
cause major breakage if it's not handled carefully).

I kind of agree that it's not ideal to insist on the BOM, but it *is* 
part of the unicode standard, and anything that can't handle it is buggy 
anyway - plus removing it is extremely non-trivial (it would require 
quite a bit of inteligence, or about half a dozen -ku type options, both 
of which really aren't worth the risk to the rest of the system).

Tony




More information about the cvsnt mailing list