[kepler-dev] [Bug 2572] - Export KAR can produce several actors with the same lsid

Sun Jun 17 18:24:52 PDT 2007

> Date: Fri, 15 Jun 2007 12:27:44 -0700 (PDT)
> From: bugzilla-daemon at ecoinformatics.org
> Subject: [kepler-dev] [Bug 2572] - Export KAR can produce several
> 	actors	with the same lsid
> To: kepler-dev at ecoinformatics.org
> Message-ID: <20070615192744.519112FE72 at ceres.ecoinformatics.org>
> 
> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2572
> 
> ------- Comment #1 from berkley at nceas.ucsb.edu  2007-06-15 12:27 -------
> This is a problem with how we store lsids in the cache.  The problem is
> that 
> the generated kar file does not get cached when it is created (although it
> could be cached later).  If we store the lsid in the database upon kar
> creation, it will have to get a new lsid when it is inserted into the
> cache
> (because there is no way to know that the object being put into the cache
> is
> the same as the one that was created earlier since we lose control of it).
> Another way around this is to ask the user if they wish to get a unique
> lsid or
> not.
> 
> The heart of this is that we still need a centralized way to keep track of
> lsids.  We are still dependent on the local machine to generate ids, which
> in
> no way guarantees they are unique.  To really fix a lot of these ID 
> issues, we need a central ID repository 
> (or we need to use library.kepler-project.org as one).

I can see the benefits of using LSIDs but I think that many end users will
feel very wary of being forced to connect to a central repository to
register a new ID every time they wish to create a new actor or export a
KAR.  It forces Kepler users to be connected to the Internet and requires
your ID service to be up for them to complete this task.  I suspect that
these constraints might also be impractical for many scientific and military
projects.  It also introduces a serious risk when you need to demonstrate
Kepler in environments where Internet connections can be unpredictable (e.g.
conferences).

I had a quick look at the LSID specification at OMG ...
  http://www.omg.org/cgi-bin/doc?dtc/04-05-01
... and realised that the ObjectID field can be any arbitrary string.  

So my suggestion is that you consider using a UUID for the ObjectID field of
an LSID.  In other words, LSIDs and UUIDs need not be mutually exclusive.
You get all the benefits of LSIDs but remove the need for a centralised ID
generator and the other associated ID server problems.  Users can still
*optionally* elect to register their actors with your centralised repository
but it is no longer mandatory.  

You can preserve existing ObjectIDs so existing models don't break.  You
could perhaps only use UUIDs for temporary/local/unregistered actors but
this may introduces other problems should you decide to later replace them.
There might be an argument for introducing a new LSID domain that uses the
new UUIDs.  

Actor developers don't need to generate a new UUID for new versions of
existing actor classes since they can use the version field provided by LSID
for this purpose (an important LSID feature). Developers familiar with
Windows COM components might need this reminder.  

Cheers,
Tony.