[SEEK-Taxon] Re: first cut at guid decision document

Richard Pyle deepreef at bishopmuseum.org
Thu Mar 11 19:00:46 PST 2004


First of all...Dave...nice job on the GUID report!  I'm on the road at the
moment, but will review it in detail when I get a chance. After a brief
read, I tend to think that the Handle System seems best, but I want to look
into it some more. Couple questions about the numbers themselves:  Are the
post-slash numbers integers?  Necessarily sequentially assigned, or can they
be random? Positive values only? How many bits? (i.e., how large is the
domain/scope of assignable numbers?) Final question -- what about the issue
of fee-per-number?  I thought you said that the Handle system charged some
small fee for each issued number...?

James Ytow wrote:

> GUID issue is bit unclear for me.
> As you wrote in the document, a taxon concept can be specified by
> combination of a name, a name authority and a citation including
> publication (here I ignore detail of each components).
> The combination of them, e.g. a string concatinated with '_', seems
> sufficiently unique.  Why do we need more than it?

Hi James -- good to see you on this discussion group!  I think there are two
fundamental reasons why arbitrary GUID's are helpful (indeed necessary):

1) To be absolutely unique, the concatenated text string approach would need
to include a lot of detail, e.g.:

OriginalGenusName_originalspeciesname_Describers_Year_Page_ConceptAuthors_Co
nceptYear_ConceptPage

Even this may yield a handful of duplicates among the couple of million
names that have already been established -- so additional citation details
would probably need to be included -- which means a somewhat cumbersome
standard protocol to ensure that the formation of the concatenation is done
consistently for all names & concepts.

2) Much more significantly (in my perspective), any attempt to use
information-bearing primary keys is an invitation to problems.  For example,
any typographical error or other variation of any character (e.g., as you
cited, different ways of representing the name Linnaeus) introduced at the
time the primary key is "christened" would either need to be preserved
permanently as such, or would need to corrected in such a way that the fix
is reliably propagated to all data sources and links to that concept across
all databases.

In my opinion, a primary key should be permanent -- NEVER subject to
subsequent alteration.  Anytime the key itself contains information (which
an arbitrary integer does not), there is the potential for error in the
original assignment of the key value, and thus a motivation to change the
key.

Of course, the main problem with an arbitrary GUID approach is that there
needs to be some mechanism to ensure that each key is reliably married to
the information it is intended to represent -- but that, to me, seems a much
more manageable problem to accommodate.

> The above combination is a potential taxon rather than a taxon
> cocnept.  Are you talking about identifier of the taxon concept
> instead of potential taxa?

This is an excellent question, and one that was left unresolved at the
meeting I attended at UCSB.  I am (via Taxonomer) at one extreme, where
every single usage instance of a name (my notion of an "Assertion" -- see
http://www2.bishopmuseum.org/schema/Taxonomer-Pyle.pdf) gets its own unique
ID, no matter how trivial the circumstance of the name usage.  I then define
various subtypes of this pool of Assertions.  One important subtype is the
set of Assertions that constitute Code-compliant original descriptions of
names (I apply the word "Protonym" for these).  Another subtype could be
those assertions that are concept-bearing assertions (which would include
all Protonyms, as well as all subsequent usages of names that are deemed to
represent unique taxon concepts as applied to the names).  There are
potentially a number of other kinds of Assertion subtypes.  All Assertion
instances would be "potential taxa" (potential concepts), and would "plug
in" to the network of "defined" concepts via an Assertion-to-Assertion
mapping index, with each mapped link qualified by one of the five kinds of
concept relationship (congruent, includes, included in, overlaps, excludes).

My request to the SEEK community was that if they are going to establish a
GUID system for concepts, they should allow numbers to be drawn from the
same GUID pool to be assigned to even the most minute potential taxon (e.g.,
a specific identification tag on a specimen) -- even if SEEK is only
interested in the (comparatively small) subset of "Assertion" instances that
are concept-bearing.

Aloha (from chilly Columbus Ohio),
Rich

=======================================================
Richard L. Pyle, PhD
Natural Sciences Database Coordinator, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://www.bishopmuseum.org/bishop/HBS/pylerichard.html





More information about the Seek-taxon mailing list