[kepler-dev] draft conversion of ptII tree from svn to cvs and svn nits

Wed Jun 4 09:26:58 PDT 2008

Edward writes:   
>       It's always tempting to spend time switching software
>    infrastructure to the "next new thing," but I want to point out
>    that every hour we spend converting to SVN is one hour less that
>    we spend improving Ptolemy II and Kepler.  I have not yet heard a
>    compelling argument for this conversion.  I would really rather
>    spend the time improving customizability of the UI (to have custom
>    front panels for workflows), leveraging Netbeans and/or Eclipse,
>    improving our testing infrastructure, etc.

Hi Edward,

I agree with you and part of the reason I'm pointing out the drawbacks
is to make the point that newer is not always better. 

However, I do think we have an obligation to train students to use
recent tools.  If I was starting a new company, I probably would not
choose make and cvs, I would go with Maven and svn.

I'm pretty much committed towards moving to svn.  I don't think there
is that much more work to do.  Since refactoring the Ptolemy Kore is a
possibility, now is a good time to look at changing version control
systems so that new build systems and nightly builds can use svn.  A
good parallel is that this effort is a large remodeling project and in
remodeling it is good to fix foundational issues early rather than
waiting.  We went through similar pain and suffering converting from
SCCS to CVS.  There were probably downsides then too, but they
are gone in the sands of time. 

Remaining tasks:
* Anonymous svn access
* Eclipse instructions
* Change nightly build

I think I can pretty much finish these today.  

Your point is well taken, and I'm very aware that by working on svn I
could be doing other things.  

Another issue is that I'm well suited for the task of updating the
version control system and less well suited for doing UI work.

Ian writes:

> Christopher,
>         the benefits I see from SVN are different than what you
> highlight.  In particular they are:

Thanks for your feedback, I really appreciate the discussion.
I've attempted to address your points below.

Edward wrote:   
> [Ian writes]  
> > 1. Atomic commits. The directory itself is versioned and so when you
> > commit number of files to implement an issue they naturally stay
> > together.  Also, if a network problem happens you don't get into a
> > situation where only some of the files are committed.
>
> Isn't this easily accomplished by using a date?

Yep, checking out the repository by date is how I narrow down bugs.
In fact, an atomic commit could make it more difficult for me to
find problems in other people's code.  If a user checks in code in
small atomic changes, then I'm more likely to be able to help as
the tree is likely to work between their changes.  When huge changes
go in, it can be harder to find the bug.  Huge changes fly in the
face of Agile programming.  I realize that huge changes are sometimes
necessary.

In ~50k cvs commits over 10 years I have not seen a huge problem
with this.  As Edward pointed out, Eclipse handles this.
However, I have broken the tree because I've been checking in 
incremental commits.  This is partly the way I work within emacs
and not that likely to change unless I switch IDEs.

Edward writes:   
> [Ian writes]  
> > 2. Better binary file handling. Binary diffs are stored and
> > transmitted making updates faster and repository storage smaller.
>
>  The real problem with CVS and binaries, IMHO, is that users get
>  it wrong and commit them as ASCII files, and they are corrupted
>  in the repository.  Does SVN prevent this?  It's hard to see how
>  it could...  Eclipse helps a great deal because it makes smart
>  guesses, and usually gets it right...

Agreed.  CVS is very poor with binary data.  SVN has some heuristics
to guess about binary files and probably does a better job than CVS.

However, I'm surprised about the svn repository size being larger.

Also, the default Eclipse installation incorrectly checks in .tcl and
makefiles as binary.  

Another issue is that it looks like if someone checks a file into
svn with a file name that has the wrong case, then changing it
can get really tricky.  We'll have to see how this unfolds.

Edward writes:
> [Ian writes]  
> > 3. Updates / commits are faster because diffs are sent both
> > directions on the network whilst CVS tends to only send diffs from
> > server -> client and sends entire files in the other direction.
>   
> I guess on performance issues, the tradeoff is: Do we save more
> time than we spent on the conversion to SVN?  This does not seem
> clear to me.

The conversion will not take that much more time.  I've probably
spent 8 hours, and estimate another 8 hours, maybe less.

However, I checked and checkout speed is roughly the same for svn+ssh
and cvs.  I don't see update and commit speed being a big issue.  My
home DSL connection is very slow, I don't see svn improving this much.

Note that the svn docs say that svn with Apache + mod_dav_svn is
slower than svn+ssh.

Edward writes:
> [Ian writes]
> > I've not seen the disk hog stuff before. I assume this is the
> > repository size on the server and not the client. You expect the
> > client size to be twice that of CVS because SVN makes a backup of
> > each file so that it can do local diff and rollback without needing
> > to contact the server over the network. On the server, are you using
> > the file storage configuration?
>
>       The local size bloat seems problematic to me...  Particularly
>   since I use Eclipse, which already provides this rollback... So
>   presumably I will pay the price twice.  I'm chronically out of disk
>   space on my laptop these days (yes, I should upgrade my laptop, but
>   that requires two weeks of time that will not be spent improving
>   Ptolemy II... :-)

The size I'm talking about is the size on the server, not on
the local machine.  I was just surprised that svn is a hog on the
server. 

Interestingly, checking out ptII via cvs and svn shows that 
svn is twice as large

On a Linux machine, I did
cvs -d :ext:source:/home/cvs co ptII
svn co svn+ssh://source.eecs.berkeley.edu/home/svn_chess/ptII/trunk

du -ks shows the ptII cvs repository on the local disk is 329Mb
du -ks shows the ptII svn repository on the local disk is 725Mb

And this is without doing a cvs update -P -d, which would
remove deadwood from the cvs tree. 

I'll look into why this is happening.

I'm running svn under Solaris, so I'm using Berkeley DB.  I'd prefer
to use fsfs, but that appears to not be a possibility under Solaris.
Using a database here adds complexity, fragility and makes backups
harder.  A file system is a fine database for files, using Berkeley DB
was a poor design choice.

_Christopher