[cvsnt] 200+ modules in one repo

Tue Mar 20 18:35:22 GMT 2007

Aleksander Pahor wrote:
> In the environment I installed CVSNT new projects are moving 
> to it. 20 projects each composed of 5 to 10 units (testing, 
> integ_testing, release_test, docs, develop, integration1, 
> integration2 ...).
Alarm bells have just started ringing.

Why are you creating separate modules for development, integration,
testing, and so on? Modules should be based on a set of functionality,
not the current stage of development. For keeping track of stages of
development, use tags.

> More than 200 new modules will be needed 
> (virtual modules in "module" file - I will push -d option to 
> create them in subdirectories to organize them better).
> 
>  
> 
> Did anyone try this amount of modules on one repo? Does the 
> performance deteriorate with this many modules? Do you have 
> any recommendations for me?

I've used that many modules with no problems (mind you, that was with
the original CVS, but I don't think CVSNT has changed things so much
that my experiences would be completely invalid).

Some commands, especially those based on tags, may search through the
entire repository. One of the more knowledgeable folks can correct me if
I'm wrong, but I believe CVSNT has taken steps to reduce the searches,
so that may not be an issue.

Other than that, the only noticeable performance degradation was,
predictably, when you issued a command that works on the entire
repository. If your modules are well thought out, then you will rarely,
if ever, need to run a command against the entire repository, but only
against the subset you are currently working on.

We disabled history logging. With such a large repository, the history
would have grown extremely large quite quickly, and would be difficult
to break down by module.

We had a lot of modules that were shared (for example, libraries). To
keep testing and release procedures within a tolerable level of sanity,
all libraries were treated as third party items. If a library was
modified, it would go through the usual release procedures. Each release
would be tagged, and checked in. In addition to its CVS module, each
project had a makefile which would check out the specific revision of
the library required. That way, if developer A modifies a library, and
there are subtle differences or problems with it, developer B won't be
held up until the problems are resolved. 

If you have a lot of people accessing the repository, consider beefing
up the server. We had a Sun/Solaris workstation, with gigabit ethernet
plugged directly into the switch, and RAID storage (we also had
somewhere around 100 people using the repository, with about 35GB of
source data).

Broadening the scope of this somewhat: backups are essential. Every once
in a while, make sure you can restore your repository from backups (not
into the live production repository, of course, in case something has
messed up).

Your server *WILL* grow. Plan for that now. Don't use the IP address, or
the actual machine name in your CVSROOT. Instead, give the machine a DNS
alias, and use that instead. For example, at my last company, all the
servers were named after Greek or Roman gods, but my CVSROOT was
'[protocol]:cvs.company.com'. 'cvs' was a DNS alias to 'apollo'. Why go
through this step? For the very simple reason that when it comes time to
upgrade your server, you can configure a new box, copy the repository
over, test it out, and when you are sure the upgrade process works,
switch DNS entries, and the upgrade is complete with minimal
interruptions to the users.

-- 
Jim