[kepler-dev] Re: [SDM-SPA] RFC new directory structure

Matt Jones jones at nceas.ucsb.edu
Wed Mar 17 16:15:02 PST 2004


Xiaowen,

Your suggestions overall sound good to me.  A few comments below. 
Please don't make these changes until we've all agreed that they are 
good, which shouldn't take too long.

Xiaowen Xin wrote:
> Hi,
> 
> David, Ilkay, Zhengang, Dan and I have discussed on the phone and over
> email the last couple of days about the directory hierarchy
> reorganization and have come to a rough consensus.  This will be a long
> email, so please bear with me =)
> 
> The problem basically is that there is currently no clear organization
> of the files in the CVS repository.  Workflows are scattered around the
> lib/ directory for example, and it's not clear, looking at the
> repository, which files relate to SPA and which to one of the other
> projects.
> 
> Here's a pictorial view of how we would like to reorganize the
> repository.  I will be talking mostly about SPA, but the concepts should
> carry over to the other Kepler projects as well.
> 
> - copyright.txt
> - README
> - build.xml
> - bin
>         - runVergil.bat
>         - runVergil.sh
> - build (directory used for compiling the sources)
> - docs
> - lib
>         - jar (directory for all the jars)
>         - dll (directory for all the dlls)
> - src
>         - org
>                 - ecoinformatics
>                 - geon
>                 - sdm
>                         - spa (directory for all the spa-related actors)
> 			- util
> - test
> - workflows
>         - spa (directory for spa workflows)
>         - seek
>         - geon
> 
> 
> So all the SPA related actors will be in the org.sdm.spa package. 
> Currently there are some SPA actors in edu.ncsu.sdm, but this doesn't
> make much conceptual sense.  There's no reason to divide up SPA source
> files according to the organization that developed it, since all of SPA
> will be working closely together to create _one_ set of interrelated
> actors.  Zhengang will work on moving the actors from edu.ncsu.sdm to
> org.sdm.spa.  It's better to divide up the source files by project
> rather than by organization because the boundary between organizations
> is artificial and only serves to confuse things.

Good.  If we must divide things up by project, then its best to have one 
tree per project.  A functional classification of the actors would be 
better (because it would reduce inter-project redundancy and make things 
far easier to find).  Have you considered a package hierarchy like this 
(this is rough, it would need much more careful thought, but I wanted to 
throw it out there):
- src/org/ecoinformatics/kepler/
        - statistics
          - univariate
          - multivariate
            - clustering
          - nonparametric
        - models
          - simulation
            - ecological
            - geological
            - molecular
          - analytical
            - ecological
            - geological
            - molecular
        - utilities
        - dataaccess                   <== alternatively, this might be
           - genomics                      organized by protocol,
           - museumcollections             like jdbc, gridftp, etc.
           - ecological
           - ...

On top of this, in SEEK we've been contemplating ways of classifying the 
actors and data sources in an ontology so that scientists can easily 
locate the actors of interest to them in the tree.  We have developed 
some ideas about dynamically organizing the tree of actors in the UI 
according to functional lines based on a preferred ontological view that 
a given scientist might have.  Thus, an ecologist might want to view the 
models arranged in a different order than a molecular biologist, and 
this would be expressed in the interface as a different view on the 
ontology.

Finally, people should be sure to review the changes we are proposing in 
SEEK to Kepler's UI regarding data access.  The details are in 
kepler/docs/dev (both UML diagrams and screenshots of proposed changes).

> Currently, there's a util/ directory in src/.  This isn't really
> consistent with our naming convention so far (i.e. putting source files
> in packages that reflect which project made them).  The argument for
> having a util/ directory has been that we'd like to put useful classes
> in there that can be shared between the projects.  However, this
> argument doesn't make much sense because theoretically, all of our
> actors could be shared between the projects.  So we'd like to move the
> files in util/ to org/sdm/spa/util.  If another project requires the
> functionality of those two classes, then it would have to include the
> org.sdm.spa.util package instead of simply the util package.
> 
> After making these changes, all the SPA source files will be in
> org/sdm/spa, thus making it much easier to distinguish SPA and its
> contribution to Kepler.
> 
> Currently, all the workflows are in lib/ and there are some that are not
> in CVS at all.  We would like create a top-level workflows/ directory to
> store all of the workflows.  spa, seek, and geon would be subdirectories
> under there.  Thus all SPA workflows will be put in workflows/spa/. 
> Similarly, GEON and SEEK should probably do the same with their
> workflows.

I think a top level workflows directory is a great idea.  Again, I think 
a functional classification is best, but the organizational one you 
propose would be an improvement.  Does it really matter which 
organization developed which workflow, and that the package structure 
reflect that?  I think not, but I come from a functional rather than 
political perspective on this.

> With this directory structure, it would be easy to tell which workflows
> are designed for which project, but we must also remember to check all
> of our workflows into CVS, and update them when/if they break.  Having
> PIW-full.xml, PIW-full_new_matt.xml, PIW-int-ex0.xml, and
> PIW-full_new.xml, as we do right now is just plain confusing!
> 
> Currently the lib/ directory is a mess.  It appears to be the garbage
> bin, where everything is dumped if the author can't find a better
> container for it.  So we propose a series of steps to clean this up.
> 
> 1. There are two dll's in lib/.  There should probably be a subdirectory
> called dll/ under lib/ that contains these dll's.  The person who put
> these there should probably move them ...

I think this is chad's work.  He'll be back soon and will address this.

> 
> 2. demos.htm and ptolemy-index.html should probably be moved out of lib/
> and into a more appropriate folder, probably into src/.

I guess I conceive of lib as the location of resources (such as required 
libraries, html, xsl files, etc that are needed for an app to run), 
while src is for java code, etc.  I'm not convinced this is a good 
proposal yet.

> 
> 3. Ilkay will dispose of lib/forBerkeley/ and lib/forSB/ folders or move
> them into the top-level test/ folder since they contain testing material
> and so don't belong in lib/.  Whoever's responsible for
> lib/ecoPipelines/ should probably do the same because that's testing
> material also as I understand it.
agreed.  The "test" folder should contain 1) directory structures that 
lead to JUnit source files, and 2) the resources such as testing data 
that is needed to run those or other tests.

> 
> 4. Is everyone ok with our deleting makefile and makefile.lib from
> lib/?  These are also not library files, and we're not using makefiles
> any more.

Yes.

> 
> 5. We should move runVergil.bat and runVergil.sh into a top-level bin/
> directory.
Fine.

> 
> 6. Does anyone know what lib/sample.dat and lib/scew-0.3.1.tar.gz are? 
> Can we delete them?
Ask Chad.  He will know.

> 
> 7. We will delete the lib/soap directory because it's empty, and there's
> already a lib/jar/soap that contains jar files.
You can't actually delete directories in CVS.  Its a shortcoming of the 
software.  When you checkout or update, be sure to set the "prune" 
option and it will remove any empty directories (e.g., cvs update -d -P -A).
> 
> 8. We need to do something about lib/testdata/ because it's not a
> library file.  I personally think it should be moved into the global
> test/ directory.
Agreed.

> 
> 9. We will move lib/workflow/ into the top-level workflows/ directory.
OK. See comments above.

> 
> The existence of src/exp/ seems a bit questionable.  It seems to stand
> for "experimental".  Maybe it's time to either make it stable and
> incorporate it into an existing project, or delete it ...
Sure.  exp is a questionable practice.  But at times it is ueful as a 
scratch area to get stuff into CVS for sharing that is not (or should 
not) be in the build process.  If it is put into "src", it'll be built 
automatically by ant unless the build.xml is modified to specifically 
exclude it.

> 
> Please comment!  If nobody objects to the proposed restructuring here, I
> can do nothing but assume everybody loves it =)  We'd like to get this
> finalized as soon as possible, which would make it easier to create a
> distribution.  Matt will be back next week from travel I believe, and it
> would be wonderful to have some kind of rudimentary installer for him.

These are huge (but good) changes.  Finalizing it quickly is good, but 
be sure to give everyone time to react, considering that some key 
developers (e.g., Chad) are traveling.  I think at least 1 week for 
comments after the final revisions to the scheme is appropriate.  I 
think a formal proposal and vote is warranted for a change of this 
magnitude.

Matt

> 
> Xiaowen

-- 
-------------------------------------------------------------------
Matt Jones                                     jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California Santa Barbara
Interested in ecological informatics? http://www.ecoinformatics.org
-------------------------------------------------------------------



More information about the Kepler-dev mailing list