[tcs-lc] nameObjects, spellings, vernaculars, etc

Paul Kirk p.kirk at cabi.org
Wed May 4 04:59:25 PDT 2005


sorry, that last post wasn't all together clear - #7 and #8 are different names in botany.

-----Original Message-----
From: tcs-lc-bounces at ecoinformatics.org
[mailto:tcs-lc-bounces at ecoinformatics.org]On Behalf Of Richard Pyle
Sent: 04 May 2005 12:52
To: tcs-lc at ecoinformatics.org
Subject: Re: [tcs-lc] nameObjects, spellings, vernaculars, etc


> A name under a regulation such as ICBN, ICZN, etc., but in the
> future also
> perhaps other codes of nomenclature (perhaps BAYER/EPPO codes for
> pathogenic
> species?). I think I agree with an earlier post on this list.

Can you be more specific?

For example, given:

1. Mygenus
2. Migenus (alternate spelling of Mygenus)
3. Yourgenus
4. Mygenus myspecies
5. Mygenus mispecies (alternate spelling of mispecies)
6. Yourgenus myspecies
7. Yourgenus yourspecies subsp. myspecies
8. Yourgenus yourspecies var. myspecies

How many of these constitute distinct "NameObjects", and which would be
combined together under the same NameObject?

In the zoologcal perspective, there are four distinct NameObjects:
"Mygenus", "Yourgenus", "myspecies", and "yourspecies".

Each of the above-listed combinations represent merely different treatments
of each terminal name (different spellings, different classifications,
different rank assignments, etc.).

In the botanical perspective, I believe there are at least five different
NameObjects:
#1, 3, 4, 6, 7.
I don't know whether botanists would consider #8 to be a distinct name
object from #7.

I'm not entirely clear how many NameObject instances would be created in the
current version of TCS v0.95.2 (assuming a dataset included TaxonConcept
instances that used all eight Name-strings to represent defined taxon
concepts).

> > Yes, but should that generic label include verbatim spelling as by the
> > AccordingTo author?  Or the Code-correct spelling of the corresponding
> > NameObject?  I would prefer both, unambiguously distingusihed,
> as described
> > above.
>
> I think neither, just the concept-name-string under the
> canonicalization rules
> preferred by the data provider. To my knowledge there is no code
> how to express
> a concept-specific name, so this needs to be open.

Would the verbatim name-string as used by the AccordingTo author be recorded
anywhere in TCS?

> I argue this not from a taxonomy-producer/editor standpoint, but from the
> practical consumer standpoint that I would like to work to the
> data. And in all
> the name/concept-based information I have, names are strings with
> nomenclatural
> authors, and occasionally (<1%) with a concept suffix.

I believe I understand your viewpoint -- I just do not agree with it in
terms of shaping TCS.  Comments from others???

> This is not stuff for the local database insofar as I would like
> to share this
> huge work of connecting information based on strings with others,
> rather than
> everybody doing it him or herself in a "local database". Just my
> desire of
> where GBIF could come in and really increase our efficiency.

Would you agree that building a Thesaurs of name-variants (disconnected from
usage instances) and of author-spelling variants is outside the "Core" TCS,
and ought to be relegated to an extension of TCS?

> > > Imagine the red book lists - how often do people try to copy the
> > > spelling from
> > > there. If the list contains concept-specific names (which I
> > > believe most of us,
> > > me included, hope will pick up in the future), would it not be
> > > useful to be
> > > able to give that spelling for all the sources re-using the concept?
> >
> > Spelling of the genus/species/subspecies name components --
> yes.  Spelling
> > of the Code-regulated author(s) of the applied name -- maybe.
> Spelling of
> > the Author(s) of the concept definition -- no.
>
> Without spelling of the authors, a tcs data file would be close
> to worthless to
> me.

Certainly it should *spell* the authors!  All I am saying is that it
shouldn't be designed to accomodate *multiple* variant author spellings
within the same TaxonConcept instance, or the same NameObject instance. In
other words, it should have the built-in structure to accomodate a thesaurus
of alternate spellings of the same person name (except, perhaps, in the
sense of having a single "VerbatimAuthor" for each TC instance, so that a
collection of many TC instances may yield a set of alternate spellings for
the same Author.

It should have:

<Authors>
  <Author>Jones</Author>
  <Author>Smith</Author>
</Authors>

I don't think it should have:
<Authors>
  <Author>
    <AuthorSpelling type="correct">Jones</AuthorSpelling>
    <AuthorSpelling type="variant">Jonas</AuthorSpelling>
    <AuthorSpelling type="abbreviation">Jon.</AuthorSpelling>
  </Author>
  <Author>
    <AuthorSpelling type="correct">Smith</AuthorSpelling>
  </Author>
</Authors>

> > - I do think it is important for TCS to record the fact that
> Hagedorn (2002)
> > used a different spelling of the genus name from the other two pubs.
>
> Why?

So we can build a list of alternate spellings, anchored to the authors that
spelled them each different way.

> Assuming that the purpose of names is about linking two pieces of
> information

I see this as the job of the NameObject, which in v0.95.2 is a ref link
attribute of <Name>.

> - why do you care for the spelling of the linking
> goal but not for
> the spelling of the link originiators, i.e. those publication
> that desire to
> link to "xxx sec Hagedorn (2002)"?

I don't understand this question.

> > - I do not think it is important for TCS to record the fact
> that Hagedorn
> > (2002) abbreviated "L." for the authorship of the name,
> "Migenus myspecies".
> >
> > - I definitely do not think it is important for TCS to record
> the fact that
> > Hagedorn (2002) abbreviated the SEC authorships of "L." and "C.
> & V.", nor that
> > he misspelled "Pile" as a SEC author.
>
> I agree that this is not important as "facts", and in the case of
> the "&" it
> would be simple to solve this algorithmically. However, I think the name
> variants are a good compromise between the need to be able to
> associate name-
> strings

I think that our core difference of opinion may come from (what I think is)
your view that the strings serve as the computer-based links between
information instances; whereas I am assuming such links will be established
via LUIDs or GUIDs.

> > spelling of the name.  If you are talking a paper-published
> presentation, most
> > syononymy listings preserve exact spelling as used by each author.
>
> I think this assumption is fundamental to your model, and I know
> of no such tradition.

Perhaps it is a Zoology/Botany difference?  This is a very common practice
in Zoology.

> The synonymy lists and checklists I know are NOT the source
> spellings but are corrected to the best current knowledge of the
> author. Can
> you provide some details in which taxonomic areas people follow
> the rule you
> outline?

Certainly in fishes, but I believe widely in Zoology.

> > I can imagine tools that, using just the "GenusName (SubgenusName)
> > speciesname subspeciesname varietyname", would narrow it down to the
> > "correct" name pretty damn quickly, with only the occassional confusing
> > homonym -- by which point a pair of human eyes should complete the link.
>
> My own experience is that this is significant work worth sharing.
> Maybe that is
> special to pathogenic fungi which are known to have a much higher
> than average
> share of homonyms (because of the tradition to name new species
> Genus genitive-
> of-host-plant). Of the 200 000 GLOPP names to connect to index
> fungorum, 30%
> worked with authors. I then connected all those blind that did not have a
> homonym in IndexFungorum and did have the same spelling of the
> name without
> authors. Further, I generated an algorithically generated list of name
> variants, and got to 70%. The rest was still manual work, and it
> would be good
> to check that piece and share it. Much of it is currently
> unchecked, and quite
> a number of false connections are found continously.

I guess I'm looking forward to the day when people generate datasets using
software tools (that I believe SEEK is devloping) to make the job easier to
link directly from their dataset to the ConceptBank instances. With good
software and a robust ConceptBank, this task can be made relatively painless
and easy.

> I have no issue of delegating this to another standard, but I
> think the effort
> to support this in TCS is very little. To me its worth to include
> in 1.0. All
> depends on what others do. However, I believe the GBIF situation
> is such that
> GBIF exactly wants to do the connections, not wait until first
> the nomenclators
> and concept bases are finished, and finally GenBank uses them as
> a standard.

I think you and I understand each other well (though we may not be in
complete agreement).  It would be great if others shared their persepctives.

Aloha,
Rich


_______________________________________________
Tcs-lc mailing list
Tcs-lc at ecoinformatics.org
http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/tcs-lc


More information about the Tcs-lc mailing list