[SEEK-Taxon] Thoughts on GUIDs

dave thau thau at learningsite.com
Tue May 25 13:53:21 PDT 2004


Hmm... so is there consensus that we don't need the versioning information
of LSIDs?  That's fine with me!

Anyone feel strongly that we do need it?

Dave


> Rich --
>
> You are correct, my issues with the information content and structure of
the GUIDs is indeed secondary to the issue of what is a concept in the
SEEK sense.  That is by far the more fundamental question and I also
feel *the* core question SEEK Taxon needs to answer.  Thankfully,
everyone has an opinion to contribute!
>
> I agree with your view that if there is any subjectivity possible in the
information contained within the GUID, then we will at higher risk of
the enterprise data model breaking down.  It makes no sense in my view,
once we have agreed to the requirement and need of a surrogate key
identifier, to then regress and allow multiple 'versions' of those
identifiers to exist on the basis of some implied relationships
> (represented by a version number).  This is a such a critical design
feature for the taxon part of SEEK.
>
> One could back up a step and argue that we do not need GUIDs at all,
that the identity of the concept should be algorithmically derivable by
ad hoc analysis of the metadata (which is conceptually nicer), but that
choice would seem to me to bring on all sorts of practical perils and
risks when dealing with so many 'uncontrolled' sources of data.  I think
it makes sense to use GUIDs because it recognizes these practical issues
with incomplete and ambiguous data.  It allows almost anyone to 'create'
concepts, get them into the concept space, and allow people to use them
or ignore them.  Hopefully quality will determine which concepts survive
through usage.
>
> --Jim
>
>
>
> _____________________________
> James H. Beach
> Biodiversity Research Center
> University of Kansas
> 1345 Jayhawk Boulevard
> Lawrence, KS 66045, USA
> T 785 864-4645, F 785 864-5335
>
>
>
>
>
> -----Original Message-----
> From: Richard Pyle [mailto:deepreef at bishopmuseum.org]
> Sent: 25 May, 2004 3:02 PM
> To: SEEK Taxon
> Subject: RE: [SEEK-Taxon] Thoughts on GUIDs
>
>
> Hi Jim,
>
>> Completely independent of the choice of identifier schemes, is the
question Nico, Rich and Dave have been tangoing around
>> -- whether the identifier should contain explicitly or implicitly any
information about the identify or the relationship of a concept to
something else.
>
> That is an important issue (which I have opinions on), but it is not the
issue I was asking about.  My question was more generally about whether
there is such a notion of "versions" of the same concept (as opposed to
different concepts), where each "version" required its own GUID (whether
or not the GUID itself contains information about the relationship of a
version to its concept).  And if so, then what distinguishes a case
where there are two versions of the same concept, from a case where
there are two distinct concepts.  The fundamental question is whether
there is one and only one GUID per concept (except in cases of
> inadvertent duplication), or if the system needs to accomodate
> potentially more than one (intentional) GUID per concept (i.e., one
unique GUID for each "version" of the same concept).
>
>> Embedding version numbers in ID's is additional information, i.e.
metadata, about the taxon concept that may be present nowhere else. One
of the strongest arguments for the evaluation of 'artificial' or
'surrogate' key fields in a database context is that the 'key' should
not contain any implicit or explicit information about the object being
identified, other than its identity!
>
> We are in FULL agreement on this issue!
>
>> If the key itself has information then you will inevitably run into a
situation where the key will need to be changed because something about
the information represented by the key value has changed or is in doubt
or is a matter of interpretation, (thus losing the temporal uniqueness
of the GUID).
>
> Yes - EXACTLY!  I do understand the value of preserving *some*
> information within the content of the GUID string (e.g., the server
domain that issued the number).  But in my opinion, the GUID should not
attempt to include metadata/information about the concept itself (only
metadata about the GUID -- like where it was issued).
>
>> If for example, we decide to embed version numbers within the GUID,
then there will be relationships between GUIDs that need to be
>> maintained and respected and modeled as a consequence of the version
numbers themselves (sort of an embedded data model within the ID),
which adds another layer of abstraction to the whole enterprise of
managing concepts.  Instead of just worrying about mapping the
>> taxonomic relationships among concepts using unique IDs as the
>> handles, such as in the recent examples, one now has to verify that the
subkey/version identifiers are accurate (and that may be a matter of
differing interpretations) and related in the appropriate way that
corresponds to the taxonomy.
>
> Again, we seem to be in full agreement on this.
>
>> I would recommend that versioning be handled outside of the key or ID.
>
>> Let resolver services deal with version differences based on the
metadata, don't hard code relationships among concept versions in the
identifier.
>
> Yes, exactly -- this is one of the points I was originally trying to make.
> But more fundamentally, I wanted to first understand what a "version"
is, and how it differed from a case where you would simply identify two
distinct concepts (and then secondarily map their congruencies).  The
important/relevant question is what does a GUID represent?  It makes the
most sense to me that the GUID represents one concept, and not
> potentially multiple versions of one concept. But my position on this
may be based on a flawed understanding of what a "version" is, and how
it differs from a case of two distinct concepts (hence my refined
questions in later posts).
>
> Examples seem to be helpful for communication for these sorts of
discussions, so I'll go back to Dave's example GUIDs:
>
> urn:lsid:taxaserver.org:3232:1
> urn:lsid:taxaserver.org:3232:2
>
> These constitute two distinct GUIDs, pertaining to one concept.  The
concept ID is 3232, within the context of taxaserver.org's LSID series.
In this case, two GUIDs have been assigned to two different versions of
the same concept.
>
> There seem to me to be two kinds of metadata/information embedded within
the GUIDs themselves.  First, there is metadata about the GUID:  it
self-identifies as an LSID, and that it was issued by taxaserver.org.  I
see no real harm in including this sort of information embedded within
the GUID.
> Second, there is, as Jim described, information about the relationship
between a version of a concept, and a concept. In other words, the GUID
refers to two discrete entities: the concept (3232), and the version (1
or 2), and the implied relationship between them.  Therefore, there is
no single GUID for the "concept".
>
> My concern about the distinction between different versions of the same
concept, vs. different concepts, is that if there is *any* subjectivity
at all in making that distinction, you may potentially be tempted to
interpret it a different way later on, so that you instead have:
>
> urn:lsid:taxaserver.org:3232:1
> urn:lsid:taxaserver.org:3233:1
>
> This would requre a change in GUID (and consequent need for propagation
of that change), which, as Jim states, is one of the main things you're
trying to avoid when establishing a surrogate key.
>
> Even if there would never be any ambiguity about whether two records
should be treated as different versions of the same concept, or two
separate concepts; I still feel uneasy about extending the meaning of
the GUID to include concept versions, rather than simply representing
distinct concepts
> (1:1 Concept:GUID).
>
> So really there are (at least) two subtly different, but I think
fundamentally important, questions here:  What, if any, kinds of
information should be embeded within GUID string itself; and whether the
minimal unit of a GUID is a concept, or a version of a concept.
>
> If I hadn't thoroughly confused the issue before, certainly I have done
so now!
>
> Aloha,
> Rich
>
> ======================================================Richard L. Pyle,
PhD
> Natural Sciences Database Coordinator, Bishop Museum
> 1525 Bernice St., Honolulu, HI 96817
> Ph: (808)848-4115, Fax: (808)847-8252
> email: deepreef at bishopmuseum.org
> http://www.bishopmuseum.org/bishop/HBS/pylerichard.html
>
>
>
>
> _______________________________________________
> seek-taxon mailing list
> seek-taxon at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/seek-taxon
> _______________________________________________
> seek-taxon mailing list
> seek-taxon at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/seek-taxon
>
>






More information about the Seek-taxon mailing list