[tcs-lc] nameObjects, spellings, vernaculars, etc

Richard Pyle deepreef at bishopmuseum.org
Wed May 4 02:42:52 PDT 2005


Hi Gregor:

> We must be misunderstanding each other, or at least one of us. I
> think that the
> above proposal is fine as a child of both a NameObject and a
> TaxonConceptObject.

How does a TaxonConceptObject have more than one scientific name?  How often
does an AccordingTo author cite multiple scientific names for the same
concept definition?  Very rarely, I would think, and usually by lapsus.

On the other hand, often times Concept AccordingTo authors may well provide
one scientific name, and one or more non-scientific names.  In most cases
(but possibly not all), there can be assumed to be congruency of concept
circumscriptions of the scientific and non-scientific names.  At present,
TCS allows the attachment of only one "Name" to each concept definition.  I
asked a specific question about this some time ago, but I don't think anyone
responded.

> It may be desirable to have source and quality
> designations, but basically these would be optional.
>
> <NameObject>
>   <Label xml:lang="la" type="concise">Euonymus europaeus</Label>
>   <Label xml:lang="de" type="concise">Pfaffenhütchen</Label>
>   <VariantSpellings>
>     <VariantSpelling source="doi:10.12812878" location="201"
> revisionstatus="2">Evonymus europaeus</VariantSpelling>
>     <VariantSpelling>Euonymus europaeeus</VariantSpelling>
>   </VariantSpellings>

I can see that -- so existing variant spellings would then be thought of as
properties of a single "NameObject".  I think I would still prefer to leave
them primarily as a properties of usage instances (~=TaxonConcept
instances) -- that is, the specific usage/concept instances that applied
each variant spelling.  However, I can also see some value in "harvesting"
and "caching" the existing known set of variant spellings and attaching them
as direct properties of NameObjects, as you have done above.

> > Or...would you treat each variant as a separate NameObject (with its own
> > GUID, and its own set of LC elements for canonicalName,
> CanonicalAuthorship,
> > original orthography, etc., etc.)?
>
> No. I see orthographic variants on a distinctly different level.

Can you offer your own personal definition of a "NameObject" as you see it?

> I am sorry that I am at the moment not able to fully express that
> as an TCS
> example, perhaps someone may help. I assumed that for a concept
> there would be
> a recommended label (recommended by the provider, in the absence
> of a code of
> taxon-concept suffixes there cannot be a canonical form) as part
> of the concept
> object. Is there not such a label?

I'm not sure.  Several times I have asked the question whether the original
"NameSimple" element of TCS was intended as a "verbatim" label (i.e.,
exactly how the AccordingTo author spelled it), or as some sort of
corrected/recommended/concatenated [from canonical bits] name; not
necessarily how the AccordingTo Author spelled it. So far, I don't think
anyone has answered.

As far as I am concerned, there is only one "name" (string of characters)
that should be attached directly to a TaxonConcept instance:  the verbatim
spelling as used by the AccordingTo Author.  The "Code-correct" redering of
the "intended" name should come via LC -- whether embedded in NameDetailed
of a TaxonConcept (v.0.95.0), or linked to a top-level NameObject (v0.95.2),
or better yet inherited from a linked corresponding Nominal TaxonConcept
instance containing the LC elements. But nobody else seems to like that
perspective.

> If as a data consumer I obtain concept data and want to have a
> user pick from a
> list, do I have to create my own rules what is a label for the
> object for human
> consumption? That is not to say that based on atomic data, a user
> interface may
> create different labels, following local rules. But I believe a generic
> provided label would be hugely useful.

Yes, but should that generic label include verbatim spelling as by the
AccordingTo author?  Or the Code-correct spelling of the corresponding
NameObject?  I would prefer both, unambiguously distingusihed, as described
above.

> > In the above list, the variations of "Richard Pyle 2000" are
> are variations of
> > the AccordingTo author (post-SEC.).  PLEASE don't tell me that
> you think that
> > the schema needs to accomodate every possible way that every
> author who has ever
> > cited a taxonomic name in a concept definition has or might be
> represented!!
>
> Sorry I do.

Are we sure we're talking about the same thing here?  *NOT* the author of
the name (e.g., "Smith" in "Aus bus Smith").  We're talking about the
AccordingTo Author of the cocnept (e.g., "Jones" in "Aus bus Smith SEC.
Jones").

If you really believe that the schema needs to accomodate variants of the
*SEC* author ("Jones" above), then I guess we stand in firm disagreement.

I even disagree that we need to keep track in the TCS schema variants of the
*name* author ("Smith" above).  The exact orthography of the author name has
no bearing either on the concept, or on the name. These, to me, are
properties of a human (Agent), not of a ConceptObject or of a NameObject.
You may want to model it in your local database, but I really don't think it
belongs in the transfer schema.  I would like to read what others think of
this.

> Accomodate implies no contract with data providers to
> provide even
> a single such variant. But if they are there, they may be hugely
> useful, I
> believe. I may be wrong, if you believe that it is plain
> impossible to map
> strings as used in checklists to denote a concept to unambigously
> map to a
> specific concept.

I certainly don't think it is impossible, or even worthless -- just that it
is beyond the scope of a transfer schema for concepts and names.  I would
rather see that in a transfer schema of humans (Agents).  I see it as akin
to providing various misspellings and abbreviations of a title of an article
in which a name was described.  I.e., maybe of interest to a transfer schema
for publication data, but too far afield for inclusion in a concept/name
transfer schema.

One possible compromise solution to our polar views on this would be to
include a bare-bones simple CanonicalName and CanonicalAuthor embeded within
the <Name> element of <TaxonConcept>.  But I would favor this only if these
Canonical elements were defined as "verbatim" as they were rendered by the
TaxonConcept's AccordingTo author.

> Imagine the red book lists - how often do people try to copy the
> spelling from
> there. If the list contains concept-specific names (which I
> believe most of us,
> me included, hope will pick up in the future), would it not be
> useful to be
> able to give that spelling for all the sources re-using the concept?

Spelling of the genus/species/subspecies name components -- yes.  Spelling
of the Code-regulated author(s) of the applied name -- maybe.  Spelling of
the Author(s) of the concept definition -- no.

Consider:

Mygenus myspecies Linnaeus

Publication1 (Pyle, 2000) cites:
Mygenus myspecies Linnaeus SEC. Linnaeus
Mygenus myspecies Linnaeus SEC. Cuvier and Valenciennes
Mygenus myspecies Linnaeus SEC. Pyle

Publication2 (Hyam, 2001) cites:
Mygenus myspecies Linnaeus SEC. Linnaeus
Mygenus myspecies Linnaeus SEC. Cuvier & Valenciennes
Mygenus myspecies Linnaeus SEC. Hyam

Publication3 (Hagedorn, 2002) cites:
Migenus myspecies L. SEC. L.
Migenus myspecies L. SEC. C. & V.
Migenus myspecies L. SEC. Pile
Migenus myspecies L. SEC. Hyam
Migenus myspecies L. SEC. Hagedorn


- I do think it is important for TCS to record the fact that Hagedorn (2002)
used a different spelling of the genus name from the other two pubs.

- I do not think it is important for TCS to record the fact that Hagedorn
(2002) abbreviated "L." for the authorship of the name, "Migenus myspecies".

- I definitely do not think it is important for TCS to record the fact that
Hagedorn (2002) abbreviated the SEC authorships of "L." and "C. & V.", nor
that he misspelled "Pile" as a SEC author.

- I super-duper definitely do not think it is important for TCS to record
the fact that Hyam (2001) rendered the SEC authorship of Cuvier &
Valenciennes with an "&"; as opposed to Pyle (2000), who rendered it with an
"and".

Again, I would like to see what others think about recording this sort of
variance within TCS.

> > I even have serious problems with designing the schema to accomodate
> > variations of *name* authorships -- let alone concept authorships.
>
> That is a good point I am silently thinking about myself. It
> makes significant
> sense to keep separate lists for name-without-author variation
> and authorship
> citation variation. The process trying to map strings to name
> objects would
> then have to multiply them out. Or you just have a single list
> and try it the
> other way round.

Maybe the "compromise" view of having a very SIMPLE set of CanonicalName and
CanonicalAuthorship within <TaxonConcept> to capture the *verbatim* spelling
of each TC instance as rendered by the AccordingTo author/publication will
work efficiently?

> > For the concepts, there should be only one instance for
> "Evonymus europaeus sec.
> > Richard Pyle 2000", and the "Evonymus europaeus" part should be
> spelled exactly
> > the way Pyle spelled it in his 2000 publication.
>
> If I were to cite this concept, I would correct the name to the
> ICBN-canonical
> one, and would still ass the "sec..." and I would mean to refer
> to exactly your
> concept.

I would like to think that you would preserve Pyle's actual spelling of the
name-string in connection with his SEC. concept of it, provided that it was
clearly linked to a NameObject instance that represented the "Code-correct"
spelling of the name.  If you are talking a paper-published presentation,
most syononymy listings preserve exact spelling as used by each author.  If
you are talking an electronic dataset, then presumably you have internal
links to he NameObject from the Sec. Pyle instance, so you can represent
either Pyle's verbatim spelling, or the Code-correc spelling, or both:

EUONYMUS EUROPAEUS Linnaeus
	- Euonymus europaeus L. SEC. L.
	- Evonymus europaeus L. SEC. Pyle
	- etc...

> I think this may again be a different tradition in
> botany and zoology.
> In botany the importance of "original spelling" is relatively low.

In zoology, it depends on what you mean by "original".  The Code-correct
spelling may not be the exact spelling as used in the original description
(=protologue).  If, by "original", you mean "verbatim" as I have used it
(non-protologue subsequent usages), then this is where I think you capture
your list of "variant spellings in use".

> > If datasets exist out there that record the concept to which they are
> > mapping biological data as "Evonymus europaeus sec. R.Pyle 2000", that
> > should *not* be something for TCS to accomodate.  That's a
> problem at the
> > dataset side; not the transfer schema side.
>
> Can you justify that statement? Why do you think it is NOT
> important to be able
> to find, e.g. GenBank molecular sequences for a taxon concept???

I *do* think it's important to establish bridges between GenBank sequences
and the imagined "ConceptBank".  I just do not think it is the job of TCS to
provide a tool to be able to match every possible "GenusName (SubgenusName)
speciesname subspeciesname varietyname (NameAuthor) CombinationAuthor SEC.
ConceptAuthor" text string that might be "out there" (including GenBank and
elsewhere).  I think TCS can accomodate a broad set of "GenusName
(SubgenusName) speciesname subspeciesname varietyname" text strings, via
capture of verbatim usage with each TaxonConcept instance.  But beyond that,
I think it is the job of GenBank (or the job of the people who have provided
sequences to it) to build the bridge (hard link) between their dataset and
ConceptBank.

I can imagine tools that, using just the "GenusName (SubgenusName)
speciesname subspeciesname varietyname", would narrow it down to the
"correct" name pretty damn quickly, with only the occassional confusing
homonym -- by which point a pair of human eyes should complete the link.

> It may be placed in a separate standard, of course, but I think
> you cannot just
> move it to the information (rather than concept definition)
> provider. I think
> the name variant isssue is a natural extension of a GBIF
> name-standard and I
> propose to place it there. It is optional in any case.

O.K., I have no problem with the idea of a name-variant extension, that
essentially serves as a harvester of all VerbatimSpellings as used in
TaxonConcept instances (for example).  But we need to focus on creating TCS
v1.0 before the next TDWG meeting.  I do agree that the question of variant
spellings (of names, at least) is relevant to "NameObjects" (and hence to
the distinction between v0.95.0 and v0.95.2), so we need to consider these
things.  But you answered my primary question early on, when you confirmed
that you would not consider each spelling variant to be a distinct
"NameObject".

Almost modnight here in the Pacific -- time to get some other work done (how
badly I wish I could say "time for bed" -- but sleep is a luxury I haven't
enjoyed much of lately...)

Aloha,
Rich




More information about the Tcs-lc mailing list