[SEEK-Taxon] Thoughts on GUIDs
Shawn Bowers
bowers at sdsc.edu
Tue May 25 12:38:36 PDT 2004
Beach, James H wrote:
> One of the strongest arguments for the evaluation of 'artificial' or
> 'surrogate' key fields in a database context is that the 'key' should
> not contain any implicit or explicit information about the object being
> identified, other than its identity!
The comment above doesn't seem quite right. How can something have an
identity that is independent (i.e., a surrogate) of the identity of the
thing? In other words, if the key doesn't contain any information about
the object being identified, it surely can't uniquely describe or
identify the object, right?
In general, surrogate keys are used for purely pragmatic purposes within
a database management system, e.g., so that a clustered B+-tree index
can be constructed for the table, or to identify certain relationships
in an ER or OO database. But, surrogate keys always "pass the buck" of
identity to something else. For example, in OODBs there are two notions
of equality, where objects can be deep-equal (value-equal) or
shallow-equal (id-equal).
Another problem with surrogate keys is that they are arbitrarily
assigned, and conceptually, don't provide any information to a user
about the corresponding object (other than the mac address used to
construct the identifier, or the time the thing was put into the system,
etc.). Often, surrogate keys are "hidden" from the user, which gets back
to the problem of how to really identify objects. Also, with
surrogates, true uniqueness is always in question. Hence arguments for
"globally" unique identifiers versus "universally" unique identifiers,
and so on.
> If the key itself has information then you will inevitably run into a
> situation where the key will need to be changed because something about
> the information represented by the key value has changed or is in doubt
> or is a matter of interpretation, (thus losing the temporal uniqueness
> of the GUID).
Again, then the information used as the key isn't really "identifying"
information, and you have a problem anyway.
There is a very interesting article that people may want to read
concerning properties of things and classification, including identity
and unity, that may be relevant to what taxon is trying to accomplish
with concepts.
The paper can be found here, and was published in the Communications of
the ACM in 2002. There are longer, more detailed versions available,
but this is a good primer.
http://www.loa-cnr.it/Papers/CACM2002.pdf
> If for example, we decide to embed version numbers within
> the GUID, then there will be relationships between GUIDs that need to be
> maintained and respected and modeled as a consequence of the version
> numbers themselves (sort of an embedded data model within the ID), which
> adds another layer of abstraction to the whole enterprise of managing
> concepts. Instead of just worrying about mapping the taxonomic
> relationships among concepts using unique IDs as the handles, such as in
> the recent examples, one now has to verify that the subkey/version
> identifiers are accurate (and that may be a matter of differing
> interpretations) and related in the appropriate way that corresponds to
> the taxonomy.
>
> I would recommend that versioning be handled outside of the key or ID.
> Let resolver services deal with version differences based on the
> metadata, don't hard code relationships among concept versions in the
> identifier.
>
> _____________________________
> James H. Beach
> Biodiversity Research Center
> University of Kansas
> 1345 Jayhawk Boulevard
> Lawrence, KS 66045, USA
> T 785 864-4645, F 785 864-5335
>
>
>
More information about the Seek-taxon
mailing list