[tcs-lc] Names as Objects
Roger Hyam
roger at hyam.net
Sat Mar 5 12:57:33 PST 2005
OK guys and gals I am continuing on the discussion from Rich and
Jessie's exchange re: Names as Objects. I won't include all their
interwoven messages here as it is getting almost impossible to follow.
Here is a link to the start of the threads in the archives
http://www.ecoinformatics.org/pipermail/tcs-lc/2005-March/000016.html
Rich basically proposes that ScientificName (or whatever the element is
called) should be a top level structure along with taxon concepts,
vouchers and publications. I'll call this the 'modular' approach.
Jessie defends the names embedded in the TaxonConcept approach that is
currently in TCS.
Most of the arguments have been put forward and all parties seem to
agree that either method would 'work'. There is no right and wrong here
we are just trying to pick the better of two options.
One way to look at this kind of situation is to do a 'regret analysis'.
If we were all chatting in 10 years time what would we regret about
choosing the modular over the embedded approach or visa versa. We are
trying to guess which of the options we will cause us fewer headaches in
the future.
Currently my money is on the embedded approach causing fewer problems. I
imagine some one who hasn't been involved in the discussions here (and
probably isn't even a taxonomist) implementing a system to publish
checklists from surveys and they look at the schema and think
"ScientificNames that is what I've got! I'll just map the names in the
database to those elements in the schema". Months, maybe years, later
some one else realizes that the data being published by this
organization is useless because it is a list of names not taxa and has
to go and work out how to get it corrected and correct any decision that
have been taken on the basis of the data. I imagine this happening quite
a lot.
The down side of the embedded approach is that it is slightly less
convenient for taxonomists. If you are publishing 15 different concepts
that use the same name there will be a lot of redundancy but this
redundancy is only in the instance of a document that may appear briefly
- not in a database. In 10 years time I would hope this will go away
as ALL published names will be cataloged and have GUIDs - anyway one
would hope so.
Rich proposes that there could be a brief summary of a name in the
TaxonConcept element as well as a pointer the to full scientific name.
This leads us into the several-ways-to-achieve-the-same-thing situation
which is hell for a programmer. Which do we display to the user? Which
do we use to make judgments about whether two concepts from different
data sources are talking about the same Name (my hobby horse)?
Basically my point is that taxonomists can handle the concept of a NULL
or nominal concept that just contains name data much more easily than
non-taxonomists can grasp the subtle difference between taxon concepts
and the names we use for them. It is, after all, our job to think about
these things but we need to produce a schema that is used by people who
aren't us.
So currently I am in the embedded camp. I could defect at any moment but
I am looking for a good reason to. The arguments are closely related to
another thread that I am just about to start. "Are we passing the
product of taxonomic research or raw taxonomic data?"
Can anyone give a scenario of regretting going with embedded approach. I
am sure some one can!
Roger
-------------- next part --------------
A non-text attachment was scrubbed...
Name: roger.vcf
Type: text/x-vcard
Size: 275 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/tcs-lc/attachments/20050305/1cf5c2ed/roger.vcf
More information about the Tcs-lc
mailing list