[tcs-lc] Misspelled Names and Orthographic Variants (Issue 005)

Richard Pyle deepreef at bishopmuseum.org
Thu Apr 28 15:15:15 PDT 2005


> 1) All names that look, sound and smell like a scientific name should be
> created as NameObjects. This is because:
> a) Someone may have used them as part of a concept or concepts somewhere.
> b) They may or may not be erroneous. We can't say don't mark up the
> erroneous ones as NameObjects because you might not know they are
> erroneous.

That's what I was afraid of -- and the specific reason I sent that list.
So, if I understand you correctly, NameObject=NameString (where NameString
is any unique sequence of Unicode characters, which somewhat resembles an
attempt at a scientific name).

Is this correct?

> 2) The PublicationStatus element can be used as a human readable note in
> NameObjects to indicate that a name is a orthographic variant of another
> name. Other than this there should not be name-name links to indicate
> misspellings.

But, but, but....I thought the whole point of making names as objects was to
allow direct name-name relationships???

> 3) To mark up a misspelling one should create a link between two
> TaxonConcepts.

Well...does that mean that all misspellings must be attached to "Defined
Concepts"?  Or, does it mean that not all misspellings will be represented
as TCS objects?

> a) For an author to misspell a name they must have used it to refer to a
> concept of some kind that it would be useful to reason about.

Agreed -- but I thought that not all name-usages rose to the level of
"defined concepts"?

> b) The person who misspelled the name should be in the according to.

I'm happy with that -- provided that there is a liberal allowance for
representing usages as concepts (that are mapped congruently to other more
well-defined concepts).

In fact, part of the reason for my previous query was to get at exactly this
issue:  I think if you are going to define "names" as Objects, there snould
be a 1:1 ratio between unique NameObject instances, and
Basionyms+NewCombinations (botanical perspective), or perhaps even a 1:1
ratio between NameObject instances and terminal epithets of Basionyms alone
(zoological perspective).  That way, misspellings & such are captured in a
human-readable "VerbatimSpelling" element of each TC instance, but point to
a well-defined NameObject that excludes misspellings (i.e., "AccordingTo
Author used this text string, but really meant this NameObject").

It makes no sense to me to create a full structure for top-level NameObjects
if NameObject=NameString.  A text string is better represented as a simple
element within the TC substructure -- you don't need a defined object for
that.  The whole point of defining NameObjects (I thought) was to treat them
as complex properties, with myriad Name-Name relationships, not to be
confused with TaxonConcepts, which have only one or a very few Concept-Name
relationships, but potentially many Concept-Concept relationships.

> c) If the author misspelled a name when they initially published it
> (e.g. wrong gender)  there may be some concepts that use the incorrect
> spelling and some that use the correct spelling. All these concepts
> should be capable of being related to each other in terms of set
> relationships.

Exactly!!!!  But there is no special need to relate the "Aus bea" concepts
to other "Aus bea" concepts, exclusive of "Aus bus" concepts.  So it makes
no sense to me to define two separate name objects ("Aus bea" and "Aus
bus").  Rather, there should be ONE Name Object (which has at minimum
attributes detailing original orhtography and Code-correct orthography, but
not necessarily all possible orthographies), and then all "Aus bea" and "Aus
bus" TC objects would point to the SAME NameObject.  Whether the AccordingTo
author spelled the species epithet "bea" or "bus" is trivial both
nomenclaturally and conceptually, and is therefore relegated to a
"VerbatimSpelling" element within each TC instance (which would probably be
the element used for text-match searches).  What really matters is that the
"Aus bea" authors and the "Aus bus" authors intended to refer to the same
"name object" -- and it just seems like a no-brainer to me that you would
represent this fact by linking both sets of TC instances to the same
NameObject instance.

> In order to do 3 there needs to be a 'is misspelling of' concept
> relationship type that is currently missing from the schema.

I think it would be a mistake to go that route.  We don't need that level of
complexity, when a simple "VerbatimSpelling" would both capture the
human-readible reality of the text string that appeared in the publiction,
and serve as the perfect field to search through for text matches.

Anyway, I've got other things on my plate today, so will have to come back
to this later.

My sincerest apologies if I have misunderstood something!

Aloha,
Rich




More information about the Tcs-lc mailing list