[tcs-lc] nameObjects, spellings, vernaculars, etc

Richard Pyle deepreef at bishopmuseum.org
Fri Apr 29 20:10:24 PDT 2005


Hi Bob,

Great post!

> We need to define what we mean by name in the sense of various uses and
> places in the TCS.

EXACTLY!!!!  This is the point I am trying to get at. We still seem to have
some confusion at very fundamental levels here.

> I see four primary forms being important. (1) The raw
> character string, what ever it is, including vernaculars,

I've used the terms "VerbatimNameString" to refer to this, but for LC, James
proposed "Name-literal" (I think this is what he meant by that term).

http://wiki.cs.umb.edu/twiki/bin/view/UBIF/LinneanCoreDefinitions


> (2) Character
> string of a supposed code-compliant name without authorities,

For LC I originally proposed "Name-String" to mean this.

(see above link).

> (3)
> Character string of a supposed code-compliant name with authorities,

I proposed the cumbersome "Name-string with authorship" for this.

> (4) a
> code-compliant name including all spellings and variants.

If I understand you correctly, I *think* this is what I mean by "NameObject"
(i.e., an abstract object, with complex properties like Canonical
authorship, original orthography, Code-Correct orthography, etc.) In XMLese,
I see this as a ComplexType, whereas the others would be SimpleTypes (I
think?).

> Each of these
> has different functions and needs to in some way be supported by a
> combination of TCS and related TDWG standards.

I Agreed!!!

> From the perspective of
> documenting concepts, it would be most convenient to have the first of
> these be our nameObject upfront.

I Disagree!!!  :-)

> However, Jessie has appropriately placed
> what appears to be either 3 or 4 as the nameObject to enhance
> compatibility with the Linnaean Core effort.

Unfortuantely, I think the only way we can rectify with LC is to treat 4 as
the "NameObject".  In my mind, the major question within LC is whether such
objects are defined sensu Botany (unique combination of from one to three
"name units", = basionyms+newcombinations), or sensu zoology (only the
terminal name usnits; binomials and trinomials being a function of usage).
I suspect the botany perspective will probably win-out, for various
reasons -- not all of them having to do with informationally optimized data
structuring.

But I think the basic issue I have is that your items 1-3 above seem to be
to be all variants of text stings, without really any complex properties of
their own (and hence difficult to justify as distinct "objects").

> The listserve conversation
> seems to be drifting toward #4 alone.  But let me reiterate, the TCS has
> to accommodate the other types of "names" in some way.

I think we all agree to that -- at this point we're really just trying to
figure out where to draw the most optimal line between "properties" of
objects, and unique "objects".  In my mind, a text string representing a
name and/or authorship of a name is a property of a Name-useage/Concept
"object".  A "Name object", on the other hand, is something that has complex
properies such as origina description, Code-regulated objective
relationships with other name objects, more structured representations of
authorships via publication objects and agent objects (contrasted with
simple name strings).

> In particular, if
> we accept the fourth type for our meaning of nameObject, then we need to
> have this use of name as an optional link to a concept, and the raw string
> in the reference (#1 or #2) should be required within the concept
> definition.

That's EXCATLY how I think it should be resolved!  First we have Name Usage
instances.  Some of these represent defined "concept objects", and some are
simply "identifications" of data/observation/specimen instances, that are
mapped to pre-existing defined concept objects (for the time being, let's
postponse the debate about how, exactly, that distinction is made).

Among the defined "concept objects", presumably all of them have a property
of a "Name".  Some of these "Name" properties will be in the form of
less-structured vernacular names, and other of these "Name" properties will
be in the form of highly-structured scientific names.  A certain subset will
lie somewhere at the border between these two domains.

At this point, I am referring to "Names" as properties -- that is, text
strings (using whatever mechanism of text-string representation is
necessary, e.g., unicode characters, or even symbols that are not covered in
unicode).  I believe it is non-optional that a defined "concept object" must
have at least some sort of value for this "Name" property (even if it is
just a number of other non-traditional "name").  In other words, I believe
that all defined "concept objects" must at minimum have some sort of "Name"
that can be rendered as a series of symbols (most often in the form of
text).

But NOT all concept objects would have to be linked to a Name object.  It
makes perfect sense to me to define a "Name object" (as opposed to the "Name
property" of a defined "concept object") as an abstract object that is
defined in terms of Code-governed rules, and has certain properties that are
complely different from the sorts of properties that apply to defined
"concept objects". Its uniqueness is established in terms of its original
Code-regulated description, its type specimen (or type species), etc., etc.

The part I don't understand is, what properties does a name-string (as used
in a TaxonConcept) have, other than the sequence of characters & symbols
that form that Name-String, such that it requires a full-blown Name-Object
instance (presumably with its own unique GUID)?  Sure, you could parse Name
"units" (GenusName, SpeciesEpithet, SubspecificEpithet, OriginalAuthor,
CombinationAuthor) as separate elements, but I don't understand how doing so
contributes to the informatic value of a TC instance.  It seems clear to me
that you either know the Code-governed name-object (sensu me) that was
intended by the AccordingTo author (in which case you provide both a
Name-string and a ref link), or you don't (in which case you only provide
the name-string). My gut feeling is that perhaps it is wise to separate the
"name" part of a name-string from the "authorship" part of a name-string
(for a variety of reasons).  But to go beyond that probably means you know
what the intended Name object (sensu me) was.

> If a concept is a "name" as used in a reference as we have previously
> asserted, we want to be as unambiguous as possible about that name used in
> the reference.  To support all forms of taxonomic concept documentation,
> the name needs to be the character string used in the source, be it Aus
> bus, or Aus buus, or some random misspelling of the original.

Agreed!  That is what I see the function of the "Name-string" or
"Name-literal" within the TaxonConcept structure serving ("Name" in
v0.95.2).

> This use of
> name is different from a code-compliant nomenclatural unit, which might
> include alternative spellings and the like. There is a many-to-one
> relationship between name and nameObject.  Ultimately the nameObject might
> be stable while the name applied to it might be a moving target. To me,
> documenting spellings of the name associated with a nameObject is outside
> the scope of TCS and should not be handled by linking concepts as Roger
> had suggested yesterday..

Agreed!

> In short, I agree with Rich that we should
> someplace allow name-name relationships, even if this is not part of TCS.

DEFINITELY agreed!!! :-)

> The authority is not technically part of the name but only documentation
> of the name. I grant that if our goal is to document names sensu
> ICBN/ICZN, it can be helpful to include the authority, but there is no
> standardization as to how to represent these and our list of names will
> explode with the thousands of alternative abbreviations unless we stick to
> the name string per se and recognize that the authority string is a
> separate thing, albeit required for certain nomenclatural acts and often
> useful to separate the more extreme of the different concepts to which the
> name string has been applied.

Right -- this is exactly why my gut tells me that wwe should break out
<NameString> from <NameAuthor> within TaxonConcept/Name (v0.95.2).

> Having recognized that the character string is what is important for
> supporting concepts, we can take a new look at vernaculars. Should not the
> vernacular "Yellow-legged Aus" be a valid name for the purposes or
> reporting a concept?

Yes, I think so!  But one thing that the TCS folks could help me to
understand: The TaxonConcept/Name element in v0.95.2 has a boolean attribute
"scientific".  What criteria would one use to decide whether to set this
value to "True" or "False"?

Some examples (following Roger's style):

Name					RLP
==================================
Mygenus				Y
Mygenus myspecies			Y
Mygenus cf. myspecies		Y
Mygenus n.sp. cf. myspecies	Y
Mygenus sp.1			Y
Mygenus sp.1 of Jones, 1995	Y
Mygenus n.sp. Hawaii		Y
Mygenus [non-italics]		N
Yellow-legged Mygenus		N
Yellow-legged sap-sucker	N
TSN 123456				N

RLP=How I would set the "scientific" attribute of the TCS v0.95.2 "Name"
element
Y=Yes/True
N=No/False

How would others do it?

> Not all datasets or publications we might wish to
> mark up use scientific names and in these cases all we have to work with
> is the vernacular.  While the vernacular usually represents an
> identification to an unspecified (or unknown) concept that is in turn
> based on a code-compliant name, we often do not know what concept or
> code-compliant name was intended and the reported vernacular is the best
> we can do.

I certainly agree with the above, and I further agree that TCS should not be
restricted to scientific names only (it's just that the VernacularNameCore
designers don't seem to be as willing to stay up in the wee hours of the
morning writing overly passionate email diatribes...)

But here's a question that's been mulling about in my mind:  Yes, many times
a datasource will use a vernacular name without an associated scientific
name, and there should be TC instances in place to link to vernacular-only
names.  BUT!!!  How do you deal with a situation (which we have somewhat
regularly in fishes) where a taxonomic worker applies BOTH a scientific name
to a well-defined concept, AND one or more vernacular names to the same
concept?  The only logical way to deal with this situation, it seems to me,
is to allow multiple "Names" for a single TaxonConcept instance.  TCS has
not been, nor is it currently, designed to accomodate more than one
namestring (or NameObject) for each TaxonConcept instance.  Something deep
in my reptilian databaser's brain tells me that this is the way it *should*
be (and should stay).  But the only way to work around this in the scenario
I describe is to create multiple congruent TC instances by the same
AccordingTo; one for each of the different names applied.  I guess that's
not a terrible thing -- but I maily wanted to see if others assume the same
thing.

> An individual investigator might synonymize such a vernacular
> concept to a concept based on a code-compliant name, but that is a
> separate act and typically unrelated to the efforts of the original
> investigator who's work we are trying to document.  I see acceptance of
> the concept "Yellow-legged Aus" as no better or worse than acceptance of
> "Aus aff. bus".  We have previously agreed that best practice is to avoid
> these, and we might not want to clutter up taxonomic databases with weak,
> vernacular concepts, but for purposes of providing the capacity to
> document organisms referenced in a broad range of dataset types, concepts
> based on vernacular names seem essential.

This sounds good to me.

> Of course, if you buy into this
> argument,

...I do....

> the next logical step is to move all the vernaculars into the
> name entity of the TCS concept object The alternative is to accept
> concepts that have a null scientific name but do have a "has vernacular"
> relationship.

Yeah -- I've been thinking this one through as well, but I'm not sure how
best to deal with it in the schema.

> Having moved the code-compliant nameObjects up front, why do we need rank
> to be part of the concept?

VERY good question -- one that I forgot to ask earlier.  There really are
two different meanings of "Rank".  In the Code-governed nomenclatural sense,
there are three "Rank-Groups" (Family-group, Genus-group, and
Species-group).  These are important, but can be easily derived from the
more widely used sense of "Rank", which is any taxonomic rank under the sun.
I 'm not sure what my preference is, but I do agree it's something that's a
non-trivial issue to sort out.


> As I see it, rank belongs as an attribute of
> the nameObject and not the taxonomic concept.

I can see it that way, but I can also see it the other way too.

'nuff for now....


Aloha,
Rich




More information about the Tcs-lc mailing list