[kepler-dev] Duplicated jar contents.

Wed Dec 14 14:00:47 PST 2005

Shawn,

Kepler right now is very much a monolithic system.  And if it works now,
it's more by luck than planning.  The current classpath is not
controlled at all and nobody knows what the origin of the individual
jars are.  Most of the cvs commit logs for jars are not useful. 
Sometimes the packager of the jar is nice enough to include a version
number in the manifest.  Most often one can only guess by comparing
commit dates in our repository to to release dates for the packages.

Because Kepler does not support classloader isolation in any form we are
given very few options.  In addition, the current discussions of
mutliple classloaders only covers actors and does not support seperation
of integral subsystems (I'm including sms here).  The only recourse we
are left with is to document where each jar comes from.  When somebody
wants to add a new jar, we need to determine if it breaks anything else,
etc.  This is nothing more than good old Software Engineering practices.

1)  If two existing jars are some how incompatible or a proposed
additional jar is incompatible with the current set of jars we are left
with a few recourses:

a) Not use the new jars.  This might mean abandon development of a
feature or it might mean we need to do more work in order to implement
the feature.

b) Attempt to port the new jars to be compatible - or ask the maintainer
of the jar to do it.

c) Use the new jars, abandon the old jars, and rewrite our code to work
with the new jars

It is not our decision how some 3rd party package is written.  But it is
our decision which 3rd party packages to use.

2)  No testing is not fun.  But it is very useful for exactly these
kinds of things.

3)  The collection of jars required by kepler has never been
documented.  There are many jars in the collection which are not needed
at all (jaxb-xjc.jar for example).  There are some jars which are
duplicates.  It would be trivial to remove those jars.  For the record,
I was never suggesting we just yank out jars with abandon.  I am
attempting to determine systematically which jars are actually used and
which are not and where there is duplication which one is being used.

There are some jars which I am absolutely certain about.  There are
others that I have no clue about and that is why I was asking for the
help of the experts who use these jars.

Kevin

Shawn Bowers wrote:

>
> Some problems (I'm in a hurry, so it's a bit terse):
>
>  1. Other projects might not appreciate us writing them
>     asking for them to rewrite their code so that it works
>     with the "latest" release of a jar -- or some dependency
>     on a jar that another project uses (this point doesn't
>     seem to be addressed in the emails so far; that there
>     are complex dependencies not controlled by the "contributor"
>     of the jars ...)
>
>  2. Figuring out when code breaks because we swap out some
>     "old" code (jar) with "new" code will require major/extensive
>     testing
>
>  3. Yanking out all but a few core jars will cripple Kepler and
>     will probably result in major schedule slips for any type of
>     alpha/beta/full release we want to do
>
> -shawn
>
>
>
>
> Matt Jones wrote:
>
>> I've asked Kevin to try and work this out so that we have something
>> workable for the freeze date, but he can't do it without the help of
>> the people that might have contributed the jars in the first place. 
>> If Kevin has to work it out by himself, I think the only sensible
>> solution would be to remove everything except a few core jars, and
>> then work with the original contributors to add them back in.  But it
>> might be easier if those developers worked with him now to identify
>> requirements for their code.
>>
>> Once we identify a non-conflicting set of jars, we should probably
>> institute a policy of no new jars added to CVS without a discussion
>> first on the mailing list, and maybe even lock down the jars
>> directory to only allow one person write access, and that person
>> becomes responsible for conflict review.
>>
>> Over the longer term we'd like to utilize different classloaders for
>> each actor, which would allow actors to have conflicting dependencies
>> and would make things easier for actor developers.  But the
>> underlying Kepler engine would still have a shared loader, so some
>> degree of interoperability must be acheived here.
>>
>> Matt
>>
>> Kevin Ruland wrote:
>>
>>> Shawn Bowers wrote:
>>>
>>>> Kevin Ruland wrote:
>>>>
>>>>> The SMS system jars have significant overlap with other jars and
>>>>> is also
>>>>> very complex.  I would appreciate it if you would document what the
>>>>> requirements of the code is.  This includes revisions of specific
>>>>> jars
>>>>> which are know to be compatible and those which are know to not work.
>>>>
>>>>
>>>>
>>>>
>>>> I think you are completely missing the point: Many of the jars are
>>>> required
>>>> exactly by the packages used by code in kepler -- not by the kepler
>>>> code
>>>> directly!
>>>>
>>> The analysis I did which produced this report was essentially a diff
>>> of the jar's contents looking for overlap.  All it states is that
>>> certain classes appear in more than one jar file on the classpath
>>> and indicates which particular copy of that class would be used if
>>> it's used at all.  It does not attempt to distinguish which source
>>> in kepler requires which particular classes.  That report will be
>>> out soon.
>>>
>>> The unfortunate situation is if another class in kepler or some
>>> other actor requires a class, it can be provided by a different jar
>>> than anticipated.  Even if you think your jar is only used by your
>>> piece of code, in fact, you might be using somebody else's jar. 
>>> This is exactly what I'm trying to point out.  As you can see from
>>> my analysis below,  even though the sms developers might have
>>> thought they were using the edu.stanford.db.* classes from
>>> lib/jar/sms/rdf-api-2001-01-19.jar, in fact they were pciked up from
>>> lib/jar/scia/sf_edu.jar.
>>>
>>> And since java 1.4 was so nice to bundle org.apache.xerces in
>>> rt.jar, where you thought you were using lib/jar/sms/xercesImpl.jar,
>>> in fact you were using that in rt.jar.   What version of xerces that
>>> is, I do not know.
>>>
>>>> For SMS, the only code we rely on right now is Jena -- you can go
>>>> to their
>>>> web-page to find out which packages are required for the particular
>>>> release
>>>> we are currently using ...
>>>>
>>> I have never developed with Jena but I do know there are about 4 or
>>> 5 different released versions.  I have no clue which version was
>>> placed in lib/jar so I have no idea how to determine its runtime
>>> requirements.  Instead of trying to figure this out, I would be
>>> inclined to just blow away the existing jars and place a known
>>> version in the repository.  Since that is probably not what you
>>> want, and could very likely break things, I would appreciate it if
>>> you could track this down.
>>>
>>> Kevin
>>>
>>
>