[tcs-lc] Modularisation of standards - identification of names

Tue Mar 8 03:24:48 PST 2005

> One use case that springs to mind is the separation of homonyms,
> particularly where it comes to homonym genera.

That *should* be discernable as long as authorships are included, but I
wonder how often genera authors will be provided?  Even species-level
authorships are not consistely provided.  I think that relying on a simple
string comparison is too weak.

> In the canonical names part of the Linnean Core we included (I'm
> not sure if it's disappeared in the latest version, but I don't think so)
> scope for a reference attribute in the separate atoms of the names.

I believe these are important, and would be included among the "[...and all
the other LC bits]" in the email I just sent.

> There's a third way (sorry to introduce a note of domestic UK
> politics, but Rich and Nico started it) which is to take the LC
> approach and embed both identifiers and data:
>
>  <TaxonConcepts>
>    <TaxonConcept id="tc1">
>      <Name id="123-1">
>        <Label>Aus bus</Label>
>        <CanonicalAuthorship>Black, 1965</CanonicalAuthorship>
>      </Name>
>      <AccordingTo>Smith</AccordingTo>
>    </TaxonConcept>
>    <TaxonConcept id="tc2">
>     <Name id="123-1">
>        <Label>Aus bus</Label>
>        <CanonicalAuthorship>Black, 1965</CanonicalAuthorship>
>      </Name>
>      <AccordingTo>Jones</AccordingTo>
>    </TaxonConcept>
>  </TaxonConcepts>

Where would the "123-1" values point to?  Somewhere else internal to the
DataSet package, or an external GUID reference?

> Of course, to a computer the inclusion of the second set of
> information within the name is redundant but we shouldn't
> underestimate the amount of human eyeballing of XML data goes
> on.

More significantly, there needs to be a place for human-readable versions of
the information when the id link is not provided (part of the function of my
NameVerbatim element).

> Also if the name id has an existence outside the transient life
> of the xml document instance (for instance, if it were an id from a
> nomenclator) then the processing power involved in producing that
> sort of document on demand as part of a web service would be
> reduced.

Agreed -- that's why I favor inclusion of the human-readable data within the
package, but nomalized somewhat by an internal reference (with the option of
expanding to an external reference identifier, when Name Registration
eventually comes online).

> Possibly we've been missing a trick in how we implement these
> things, but trying to create a document on the fly using templates
> where we were keeping track of a list of publications, a list of
> vouchers and now a list of names, and put ad hoc ids (1, 2, 3 ...)
> into the main schema referring to the separate lists at the bottom
> of the document was the one thing that made implementing TCS a
> bit of a challenge for IPNI

Where on the "diffculty scale" would the implementation of the approach I
just sent fall?

> One thing about XML that I've found, if you try and approach it with
> an OO programmer hat on and make it enforce business rules,
> then you very quickly get frustrated, or end up with very
> complicated schemas.

So I am learning!!! (Special thanks to Bob Morris for opening my eyes on
this!)

Aloha,
Rich