[tcs-lc] Names as Objects

Sat Mar 5 12:57:33 PST 2005

OK guys and gals I am continuing on the discussion from Rich and 
Jessie's exchange re: Names as Objects. I won't include all their 
interwoven messages here as it is getting almost impossible to follow.

Here is a link to the start of the threads in the archives

http://www.ecoinformatics.org/pipermail/tcs-lc/2005-March/000016.html

Rich basically proposes that ScientificName (or whatever the element is 
called) should be a top level structure along with taxon concepts, 
vouchers and publications. I'll call this the 'modular' approach.

Jessie defends the names embedded in the TaxonConcept approach that is 
currently in TCS.

Most of the arguments have been put forward and all parties seem to 
agree that either method would 'work'. There is no right and wrong here 
we are just trying to pick the better of two options.

One way to look at this kind of situation is to do a 'regret analysis'. 
If we were all chatting in 10 years time what would we regret about 
choosing the modular over the embedded approach or visa versa.  We are 
trying to guess which of the options we will cause us fewer headaches in 
the future.

Currently my money is on the embedded approach causing fewer problems. I 
imagine some one who hasn't been involved in the discussions here (and 
probably isn't even a taxonomist) implementing a system to publish 
checklists from surveys and they look at the schema and think 
"ScientificNames that is what I've got! I'll just map the names in the 
database to those elements in the schema". Months, maybe years, later 
some one else realizes that the data being published by this 
organization is useless because it is a list of names not taxa and has 
to go and work out how to get it corrected and correct any decision that 
have been taken on the basis of the data. I imagine this happening quite 
a lot.

The down side of the embedded approach is that it is slightly less 
convenient for taxonomists. If you are publishing 15 different concepts 
that use the same name there will be a lot of redundancy but this 
redundancy is only in the instance of a document that may appear briefly 
  - not in a database. In 10 years time I would hope this will go away 
as ALL published names will be cataloged and have GUIDs - anyway one 
would hope so.

Rich proposes that there could be a brief summary of a name in the 
TaxonConcept element as well as a pointer the to full scientific name. 
This leads us into the several-ways-to-achieve-the-same-thing situation 
which is hell for a programmer. Which do we display to the user? Which 
do we use to make judgments about whether two concepts from different 
data sources are talking about the same Name (my hobby horse)?

Basically my point is that taxonomists can handle the concept of a NULL 
or nominal concept that just contains name data much more easily than 
non-taxonomists can grasp the subtle difference between taxon concepts 
and the names we use for them. It is, after all, our job to think about 
these things but we need to produce a schema that is used by people who 
aren't us.

So currently I am in the embedded camp. I could defect at any moment but 
I am looking for a good reason to. The arguments are closely related to 
another thread that I am just about to start. "Are we passing the 
product of taxonomic research or raw taxonomic data?"

Can anyone give a scenario of regretting going with embedded approach. I 
am sure some one can!

Roger

-------------- next part --------------
A non-text attachment was scrubbed...
Name: roger.vcf
Type: text/x-vcard
Size: 275 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/tcs-lc/attachments/20050305/1cf5c2ed/roger.vcf