[SEEK-Taxon] GUID (was Re: first cut at guid decision document)

Nozomi Ytow nozomi at biol.tsukuba.ac.jp
Fri Mar 12 11:10:21 PST 2004


Hi Rich,

since either (say) number or concatinated_text_style_ID can be cumbersome
depending on views, and we can generate a number ID from text_ID by
choosing a hash function, I think the points are stability of a text_ID
and what are expected to be identified by a GUID.

You are right that text_ID depending on variable contents can't be
primary key of a tapule (or an object).  However, it does not mean
that no text_ID works; if contents are fixed, it (or its hashed
friend) should work fine.  Different tapules in distributed
databases may indicate the same potential taxon.  In that case I
prefer to indetify these tapules (or, contents more precisely) as
the same potential taxon.  Isn't it one of the reason why we need
a GUID?

Nomencurator is based on publication model; once you published
an article you can't modify it but publish amendment, while you
have complete control on not yet published manuscript.  Once you
contribute an data entry to public Nomencurator server, then you
can't modify the entry.  You can contribute a new entry with
reference to the previous one to be amended.  This reference
is retained by Annotation data.  This mechanism allows us to record
mistakes including typographical one.  Nomencurator is also designed
to lint fragmental data, e.g. "Canis but I do not know authority"
or "Canis L., but do not know citation".  Nomencurator is epected
to create an itengerated data to manage these 'raw' data.  The
integrated data has references to raw data used to determine
its contens.  We need only 'cooked' data for ordinary use; if
you have doubt in its contents, then you can examine 'raw' data.
It implies we need two modes in Nomencurator, cooked and raw modes.
While public 'raw' data can have fixed text_ID, 'cooked' data 
has a text_ID variable when a new but relevant 'raw' data is
contributed.  We need a sophisticated method including N-gram to
manage cooked text_ID.  Misspelling in latin names and variants of
author names are inevitable, so we need such inteligent mechanism
even with fixed text_ID, or, number ID refering to such contents.

I have strong sympathy with Taxonomer's design, especialy its
'extreme' positioning.  Just like Taxonomer does with 'Assertion',
Nomencurator distinguish each name usage as 'NameUsage' data
(it was called as NameRecord in our paper).  We also recognise
importance of richness in linkage types of 'Annotation' expecting
its similarity to subtyping of 'Assertion'.  I think that we need to
compare Taxonomer and Nomencurator in more detail, or one of them is
enough for higher resemblance?  Could be... porting one data to
another may be a good practice.

Cheers,
JMS



More information about the Seek-taxon mailing list