[SEEK-Taxon] identifiers for taxonomic concepts

Hannu Saarenmaa hsaarenmaa at gbif.org
Thu Nov 27 07:05:45 PST 2003


Hello Dave, cc others

I am delighted to see you raising this this question.  About one year ago,
when we scoped ECAT, I was wrestling with the same issue.  For time being,
ECAT has been happy to deal only with a simple synonym resolution, and the
issue has been buried here at GBIF. However, I drafted a document that I am
not sure if you have seen.  It is here
http://circa.gbif.net/Public/irc/gbif/ict/library?l=/programme_strategy/taxo
nomicobjectservicere/

The paper is not yet finished.  There are several things which I would like
to improve, but time has not allowed.  So please do not get excited about
some soft spots in the paper...

Basically, I surveyded the same options that you have done.  My conclusion
is that LSID is the most attractive form for taxon guids because it maps
nicely with the global identifying scheme we use for specimen/observation
data. That is, LSID has a pattern like
NetworkName:InstitutionCode:CollectionCode:CatalogNumber. This would allow
us to treat name providers very much in the same way we treat other data
providers, which would nice and build on existing infrastructure (well...
sort of... there is very little infra to guarantee of uniqueness of the said
codes yet...).  That would also allow the name providers to insert some sign
of authorship into the concepts, so that when someone uses a concept
conceived elsewhere, they'd immediately know who's dunnit.  

On the other hand, using just running numbers or UUIDs would probably work
fine, too.  Numbers is how the Nordic Code Centre (1985-93 R.I.P.
http://www.nrm.se/ncc/) identified their concepts.  They basically had a 8
letter code for each name used in the field in PDA's and the like, like
  "APTE AUS","Apteryx australis","","","NCC930825"
and a running number like NCC930825 for the taxonomic concept that the names
were supposed to map into.

Regards, Hannu

-----------
Hannu Saarenmaa / Deputy Director for Informatics / GBIF - Global
Biodiversity Information Facility / Universitetsparken 15 / DK-2100
Copenhagen / tel +45-35321479 / gsm +45-28751479 / hsaarenmaa at gbif.org /
http://www.gbif.org/ 


> -----Original Message-----
> From: seek-taxon-admin at ecoinformatics.org 
> [mailto:seek-taxon-admin at ecoinformatics.org] On Behalf Of 
> thau at learningsite.com
> Sent: 26 November 2003 18:06
> To: seek-taxon at ecoinformatics.org
> Subject: [SEEK-Taxon] identifiers for taxonomic concepts
> 
> 
> Hello everyone,
> 
> I've been thinking a bit about what kind of identifiers to use for
> representing taxonomic concepts throughout SEEK.  I have a 
> longish note,
> which is attached here as a text file and as a word document.  
> 
> Here's the summary:
> 
> -----
> 
> SEEK is storing data using grid technologies.  The draft mechanism for
> identifying resources in the ecogrid uses URIs of the form: 
> 
> ecogrid://registered.naming.authority/local_id
> 
> I think the local_ids for taxonomic concepts should be UUIDs -
> semantic-free, globally unique strings generated following a specific
> algorithm.  UUIDs look like this:
> 
> 5c2775f0-1f59-11d8-a2da-b8a03c50a862
> 
> Libraries for generating UUIDs already exist in most major 
> languages.  If
> UUIDs are too ugly, ids should at least be semantics free, all lower
> case, and draw from the following character set if we want 
> the flexibility
> of using the ids for systems outside of the ecogrid:
> 
> | "(" | ")" | "-" | "." |
> | "_" | "!" | "*" | 
> | "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
> | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
> | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
> | "y" | "z" |
> | "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
> | "8" | "9" |
> 
> -----
> 
> The attached document has justifications for, and expansions of, these
> opinions, which you may want to digest on Friday, along with your
> Thanksgiving and/or Eid leftovers, or wait until next week if you're
> afraid of upsetting your stomach.
> 
> Dave
> 




More information about the Seek-taxon mailing list