[tcs-lc] nameObjects, spellings, vernaculars, etc

Thu May 5 11:50:44 PDT 2005

> From my memory of Christchurch, we agreed that the Simple name form
> (which came under a number of different labels) would be a canonical
> form, with space given elsewhere for the the author's own particular
> and correctable spelling of it.

Yes, exactly -- but I thought the "Label" element of LC was created for this
purpose, leaving me to assume that NameSimple (part of TCS, outside of LC)
was intended for the verbatim spelling.  But I couldn't get confirmation.

> I would hope that the TCS doesn't need an extra simple name element
> of its own, or if it does that it shouldn't be different from the LC
> one.

Exactly!  Since LC has "Label", I assumed NameSimple would be verbatim
spelling.

> of letters to represent it. The 'as published' name should also be
> reproducible.
>
> As far as I am concerned, all other orthographic variants that might
> have been published elsewhere won't be going into IPNI. (apart, of
> course, from IPNI's own scanning errors, which we will be correcting
> as we find them).

But if you are recording the 'as published' name in your database for each
usage, and you are linking those usages back to a proper canonical name,
then you are, in effect, building a list of orthographic variants.  The main
thing here, though (which comes back to my earlier comments to Gregor), is
that the orthographic variants exist only in the context of their actual
usage -- they do not exist as a list of variants disconnected from their
usage instances.  At least, that's how I think the core LC/TCS should be set
up.  Maybe defined lists of variants attached directly to a canonical name
(disconnected from any usages) could be a later extension to the schema, but
I don't see it as part of the core.

> If we want to allow for users entering typos
> (whether their own or somebody else's) into the search term, we'll
> use fuzzy searching (like Google's 'did you mean') which should catch
> all possible typos not just those that have been published.

That would be a great feature as well (not of concern to the schema,
though).  But I think we ought to hard-encode verbatim "name as published"
attached to each TaxonConcept instance. To me, that is a fundamental piece
of information, from which a list of published variants can eventually be
built.

> Of course if anyone else wanted to do the (sometimes considerable)
> research involved  in working out that one orthographic variant was
> actually the same as this other orthographic variant then that is
> useful information and it should probably be stored somewhere (don't
> uBio have an interest in this?) But isn't it the case that all of
> these orthographic variants must have appeared in a publication
> somewhere (otherwise why bring them up?) and so the mapping must
> always be 'Migenus myspecies L. sec. R.Pyle, Ladies Home Journal

:-)

> 2005' = 'Mygenus myspecies L. sec Linnaeus Species Plantarum 1753' -
> in which case we are mapping concepts to concepts are we not? Because
> name objects in themselves don't have a publication other than the
> original protologue that is recorded in the Original Taxon Concept.

I'm not sure I exactly understand your point here, but I certainly agree
that name variants are a property of usage (=concept instance, in this
context), not of the NameObject per se.

> My 2p worth. This discussion is rapidly getting over my head so if
> I've opened  a whole can of worms that someone else had previously
> closed, please just shout me down

No shouts from me!!

Also, the topic of orthographic variants by itself is not all that heady.
The most heady thing we need to pin down is "What constitutes a NameObject?"
The differences between Botany and Zoology on this question are greater than
I originally thought.  As far as I can tell, the botany definition consits
of a set of these parts:

Genus or Monomial Name-unit + [species Name-Unit + [tertiary Name-Unit +
tertiary Name-Rank]]

"Name-Unit" has a 1:1 correlation with a protonym/protologue. For example,
the Basionym "Mygenus myspecies" implies two protologues: one for "Mygenus",
and one for "myspecies"; hence, two Name-Units.  Variants/misspellings do
not count as a separate Name-unit from their Code-correct version, because
both share the same protologue. For example, "Mygenus mispecies" consists of
exactly the same two Name-Units as "Mygenus myspecies".

Braketed items in the formula above are optional.  As rendered above, it
implies that a "Genus or Monomial Name-unit" is required for all
NameObjects. There may optionally be a species Name-Unit (binomials). There
can only be a tertiary Name-Unit if there is also a species Name-Unit --
hence the tertiary one is secondarily optional in the context of a provided
species Name-Unit.

The part that was new to me (thanks to Paul Kirk's helpful clarification) is
the "+ tertiary Name-Rank" bit.  I originally though it was simply an issue
of a set of from one to three Name-Units that defined a botanical
NameObject, but given that treating the terminal Name-Unit as a subspecies
as opposed to a variety changes the authorship (and therefore really
represents a distinct NameObject), the terminal epithet rank is required to
minimally distinguish a trinomial "NameObject" in botany.

For comparison, a "NameObject" in zoology is identical to a "Name-Unit", as
defined above.

Clear as mud.

Aloha,
Rich