[tcs-lc] Misspelled Names and Orthographic Variants (Issue 005)

Fri Apr 29 05:09:33 PDT 2005

> No. If I had meant that a NameObject was just a string I would
> have written that.  It has to be a set of characters (not all
> the Unicode ones but just those acceptable by the codes) that
> some one might believe to be a published scientific name of a
> biological organism governed by one of the codes. Yes a name
> object could be constructed as a random set of characters but
> there would be little point in doing it. One has to have an
> intent to use the construct for something.

We seem to be miscommunicating here. I didn't mean to confuse the issue with
unicode, and I didn't mean to imply that any random set of text characters
could be created as a name object.

What I meant was, if I understand you correctly, your primary criterion for
deciding whether two uses of a "name" should point to the same NameObject
boils down to whether or not both name instances consist of the exact same
sequence of text characters (ignoring homonyms for a moment).  Stated
another way, two different text character strings can never share the same
NameObject ID number; regardless of whether they share the same original
description and differ only due to lapsus, gender matching, or represent
some other orthographic variant.  Correct?

If so, then every unique name-string applied to a Concept gets a unique
NameObject instance and a unique NameObjectID.

If so, then this does NOT conform to the generally agreed need for name
objects by LC folks, as per previous dicussions on the LC Wiki and
elsewhere.  I believe the LC crowd agreed that the "name object" should be
defined as basionyms+new combinations, and orthographic variants would *not*
be represented as distinct name objects (but rather would be attributes of
individual name usage instances).

I *REALLY* think we should stick with the simple "Aus bus" examples until we
sort out this very basic fundamental stuff, but I will provide you with a
real-world example that touches on several of these issues (below).

> > Well...does that mean that all misspellings must be attached to "Defined
> > Concepts"?  Or, does it mean that not all misspellings will be
represented
> > as TCS objects?
>
>
> If you read the next three points I explain...

Sorry!  My bad.  I saw that you answered my question and I intended to
delete my line quoted above, but I guess I forgot...

> > > a) For an author to misspell a name they must have used it to refer to
a
> > > concept of some kind that it would be useful to reason about.
> >
> > Agreed -- but I thought that not all name-usages rose to the level of
> > "defined concepts"?
>
> Who misspelled the name?

Let's say the author of a checklist, who did not intend to define a new
concept.

> If the person that did it was circumscribing a taxon then you create a
concept
> and possibly NameObject etc.

Certainly the author of the checklist *implied* a circumscribed taxon, but
he didn't really create one -- he was just using the name in the same sense
that the most recent revision (cited by the checklist) used it.  So the
checklist implied concept would be congruent with the revision implied
concept, but the checklist author mis-spelled the name.  I would like to see
a TC instance created for the checklist use of the name, and secondarily
establish it as congruent with the revision usage of the name.  But I
thought the TCS folks opposed this "inflation" of concepts?

> You could create a empty concept and have them as the according to if you
don't
> know anything about their circumscription.

That would be fine with me too -- I have no fear of concept inflation (it
only costs us GUIDs, and GUIDs are cheap!)

> On the other hand if they misspelled it in hand writing, on a debt slip,
on a
> Friday afternoon 120 years ago and you don't believe anyone else has used
that
> spelling I would be tempted not to create a concept or a name.

I tend to agree.  But there is a LOT of stuff between a hand-written note
120 years old, and a full-blown concept definition.

> This kind of thing is more likely to be defined in ABCD anyhow. If  you
believe
> that the hand written scrawl could be a 'real name' as in it might
actually
> have been published somewhere then you could create a NameObject and
possibly
> a nominal concept if you want to debt the specimen to it.

So does this mean that every single orthographic variant of every single
name (if used to apply to a concept somewhere) would have its own unique
Nominal concept?  Asked another way, is there a firm 1:1 ratio between
unique NameObjects, and unique Nominal Concepts? (this question stands
regardless of how we define a "unique NameObject")  If the answer is "yes",
then my case for embedding LC and NameObject identifiers within Nominal TC
instances is further strengthened.

> It is horses for courses on this.

I have absolutely no idea what that means. :-)

> That is why I would like to start talking about real life examples now.

Real-world examples are only going to make this conversation more confusing,
I think...but I'll comply below.

> The only trouble is you can't assume that some one knows they have a
wrongly
> spelled name. They may mark it up as a accepted name because they think it
is.
> Then we have to deal with it and the way we deal with it depends on who
has
> concepts relating to it.

Certainly we have to allow for situations where the nomenclature is not
formally resolved. The way I would prefer to handle those cases is to enter
the VerbatimNameString in the TC instance, and if you can't point to a
well-defined NameObject (defined in the sense of populated LC elements),
then you don't include a reference to a NameObject - you simply provide the
VerbatimNameString.  If you don't know what the intended NameObject was,
then what's the point of linking to a NameObject, if all the elements of the
NameObject are empty?

My feeling is to reserve NameObjects for well-defined names, and rely on the
VerbatimNameString by itself for all TC instances that cannot be linked to a
well-defined name object.

The analogy would be when you have AccordingToSimple, but no
AccordingToDetailed.  Indeed, in my mind "VerbatimNameString" is what I
thought "NameSimple" was intended to be (the only reason I didn't use
"NameSimple" in place of "VerbatimNameString" is that some people
interpreted "NameSimple" as a concatenation of elements within NameDetailed,
and I wanted to explicitly define an element for "Name as spelled in the
AccordingTo publication").

You said:
> > c) If the author misspelled a name when they initially published it
> > (e.g. wrong gender)  there may be some concepts that use the incorrect
> > spelling and some that use the correct spelling. All these concepts
> > should be capable of being related to each other in terms of set
> > relationships.

Then I jumped the gun and said:

> Exactly!!!!

...but now I see I read your words too quickly.  The relationship between
"Aus bus" and its orthographic variant "Aus bea" is *not* a "set"
relationship.  It is a singleobject-->singleobject relationship.  Concept
circumscriptions represent sets of individuals, and therefore can relate to
each other with things like "includes" and "ovelaps".  Names are unique
individuals themselves, and have only two kinds of relationships: "equal",
and "not equal".

> > What really matters is that the
> > "Aus bea" authors and the "Aus bus" authors intended to refer to the
same
> > "name object" -- and it just seems like a no-brainer to me that you
would
> > represent this fact by linking both sets of TC instances to the same
> > NameObject instance.
>
> That would be great. All we need to do is know ahead of time whether a
name
> is a misspelling so we could decide not to mark it up.

No we don't.  We only need to know ahead of time whether we have
nomenclatural details about the name object the AccordingTo author intended,
or we don't.  If we do, then we provide a link to the appropriate name
object (no matter whether the AccordingTo author spelled it correctly or
not).  If we do not know what NameObject was intended, we populate
"VerbatimNameString" and stop there.  As I said before, if you don't know
anything about the NameObject that the AccordingTo author intended, then
there's no point in linking to a NameObject.

> In reality these objects will be created.

Why?  Why not stop at VerbatimNameString if you don't know what defined
NameObject to link to? Isn't that exactly what you would do in v0.95 (i.e.,
provide a NameSimple, but not provide any NameDetailed stuff?)

> They can be linked together if need be, either at the concept level (if we
> are comparing taxa based on the names) or at the nomenclatural level. I
> just can't see what your problem is here.

I guess my problem is that one of the few things that the LC group did agree
on (I think unanimously), was that orthographic variants, lapsuses (lapsi?),
etc. should *not* be treated as separate name objects.  What it sounds like
you are proposing in v0.95.2 is to disregard this, and go back to the
earlier notion that "if it's a different string of text characters, it's a
different name object".  It's fine if TCS finds this the most practical
solution for exchanging concepts and the names applied to those concepts,
but it is then very likely that LC will need to develop its own schema to
deal with the exchange of nomenclatural data.  That is the scenario I am
*desperately* trying to avoid.

> > I think it would be a mistake to go that route.  We don't need that
level of
> > complexity, when a simple "VerbatimSpelling" would both capture the
> > human-readible reality of the text string that appeared in the
publiction,
> > and serve as the perfect field to search through for text matches.
>
> We need the misspelling relationship because two Concepts (with different
> circumscriptions) may have names that have that relationship.

Concept 1: Aus bea Smith SEC. Smith
Concept 2: Aus bus Smith SEC. Jones

You propose two NameObjects:
Name 1: Aus bea Smith
Name 2: Aus bus Smith

Concept 1 links to Name 1
Concept 2 links to Name 2
Name 1 links to Name 2 via "is misspelling of" relationship.

Correct?

I think it makes more sense to have one name object, which would have LC
elements to document both "Original Orthography" and "Code-Correct
Orthography" (among other LC elements to document nomenclaturally important
stuff).

Both Concepts (1 & 2) would then link to the same NameObject (if we already
know that one is a misspelling of the other, then we already know what the
"correct" Name/spelling is). Each concept would have in its
VerbatimNameString to track how each spelled the name, but the important
thing is that they would be automatically linked to each other via a shared
NameObjectID.

> We can't say they are the same thing simply because they both use
different
> versions of the same name.

We wouldn't need to!  Sharing the same NameObjectID does not in any way
imply congruency between the two circumscriptions.

> We have to relate them with 'is congruent' or 'overlaps' or somethings

Exactly!  That's a different part of the TCS.

> but it would also be useful to say the equivalent of "we believe this
taxon
> uses a different spelling of the same scientific name as that taxon"

...which you would automatically have by virtue of the fact that both
concepts are linked to the same NameObject, but have unequal
VerbatimNameString values. Which of the two VerbatimNameString value is
correct (if either), is irrelevant to either Concept instance, but could be
determined from elements of the NameObject.

> You could do this without even creating a name object for one of the
> TaxonConcepts if you like. The thing is pretty damn flexible.

If you didn't create a NameObject for both, then how would you know that
they used different name spellings? I don't see any place within
TaxonConcept where a namestring (NameSimple) could be stored.

> Things will become clear as we work through real examples. I am currently
doing
> the ones on the LC wiki and will post them when complete. Can't guarantee
I will
> get through them all though as it is quite time consuming.

O.K., here's one for you.  I didn't chose it for any reason other than I
happend to have a need to sort through it yesterday.  It has a few odd
thngs, but is not especially thorny or problematic (believe me -- if you
want tough ones, I can give you tough ones!) I can send you full details for
publications, vouchers, etc. -- but this should be enough to get you
started.

ICZN (fishes)

Linnaeus (1758) established the genus "Gobius" on p. 262.
He included these new species in this genus:
G. anguillaris (p.264)
G. aphya (p.263)
G. eleotris (p. 263)
G. jozo (p. 263)
G. niger (p. 262)
G. paganellus (p. 263)
G. pectinirostris (p. 264)

Linnaeus (1766) added this species to his genus "Gobius":
G. barbarus (p.450)

The type species of this genus was subsequently designated as Gobius niger
Linnaeus 1758:262 by Gill (1863:268).
This genus has been placed on the ICZN "Official List" (Opinion 77,
Direction 56).

Gmelin (1789) added these new species in the genus Gobius Linnaeus:
G. arabicus (p. 1198)
G. bicolor (p. 1197)
G. cruentatus (p. 1197)
G. gronovii (p. 1205)
G. melanuros (p. 1201)
G. pisonis (p. 1206)

Gronow (1763) used the genus name "Eleotris". However, this work has been
rejected by the ICZN, and thus this name has been placed on the ICZN
"Official Index" (Direction 56).

Bloch & Schneider (1801) used Gronow's name "Eleotris" on p. 65.
They described these new species in this genus:
E. lanceolata (p. 67)
E. mauritii (p. 66)
They included Gobius pisonis Gmelin 1789 among the species in this genus.
Because Gronow's name has been rejected, Bloch & Shneider are considered the
authors of this genus name.
In Opinion 93, Direction 56, ICZN used its plenary powers to establish
Gobius pisonis Gmelin 1789 as the type species of this genus.

Rüppell (1830) established the new genus "Asterropterix" on p. 138.
In the same publication (same page), he also described the new species,
"Asterropterix semipunctatus".  Because this was the only species Rüppell
included in his new genus, this species is established as the type species
of the genus (monotypy).
In the same publication, he included an illustration of a specimen that now
bears the catalog number SMF 1691, and is the Holotype of his new species A.
semipunctatus.  In the caption for that figure, Rüppell spelled the name
"Asterropteryx semipunctatus" ("-yx" for the genus, instead of "-ix" for the
genus).

Rüppell (1835) listed the same genus and species, but consistently spelled
the genus "Asterropteryx". In doing so, by ICZN rules he serves as the
"first reviser", and thereby establishes the "-yx" spelling of the genus as
the "correct original spelling".  We can only assume that Rüppell's 1935
concept circumscription of the species is congruent to his 1830 concept
circumscription. Because the genus is monotypic in both publications, we can
also assume congruency between the two genus concept circumscriptions.

Bleeker (1855) described these new species within the genus Eleotris Bl. &
Sch. (although Bleeker attributed the genus to Gronow):
E. cyanostigma (p. 452)
E. heteropterus (p. 422)

Bleeker (1874a) established the new genus Brachyeleotris (p. 306). He
designated Eleotris cyanostigma Bleeker (1855) as the type species.

Bleeker (1874b) described the new species ensifera (p. 375), and included it
within his genus Brachyeleotris.

Snyder (1904) placed the species Eleotris cyanostigma Bleeker (1855) within
the genus "Asterropterix" (incorrect spelling).

Whitley (1932) described the new subspecies, "Asterropterix semipunctatus
quisqualis" (incorrect spelling of genus).

Dor (1984) regarded Eleotris cyanostigma Bleeker (1855) to be a junior
synonym of Asterropteryx semipunctatus Rüppell (1830).
In doing so, he (by definition) also considered the genus Brachyeleotris
Bleeker (1874a) to be a junior synonym of Asterropteryx Rüppell (1830).

Randall et al. (1997) placed Brachyeleotris ensifera Bleeker (1874b) in the
genus "Asterropteryx" (correct spelling), and spelled the species epithet
"ensiferus".

Privitera (2001) published on the reproductive biology of Asterropteryx
semipunctatus Rüppell, but pointed out that the genus "Asterropteryx" is
feminine, and thus spelled A. semipunctatus as "Asterropteryx semipunctata".

Nakabo (2002) also recognized the feminine gender of Asterropteryx, and
followed Dor in placing Brachyeleotris ensifera Bleeker (1874b) in
Asterropteryx, and thus spelled it "Asterropteryx ensifera".

Allen & Adrim (2003) also placed Brachyeleotris ensifera Bleeker (1874b) in
the genus Asterropteryx, but spelled the species epithet "ensifer".

Randall et al. (2004) used the spelling "Asterropteryx ensiferus".

Greenfield & Randall (2004) mistakenly used the spelling "Asterropterix
semipunctatus" in their treatment of that species.

O.K., that's probably enough.  Believe it or not, this is highly
simplified -- the real situation is about ten times more complicated!!  And
here's the amazing part:  I did NOT deliberately pick a very complicated
case.  All I wanted to do wa illustrate the real-world
Asterropteryx/Asterripterix and semipunctatus/semipunctata spelling issues.
All the other stuff emerged as I was writing the above, just trying to
represent type species and such.  I didn't even include homonyms in this
case.  The point is, this level of complexity is NOT unusual -- in fact,
it's disproportionately simplified.

Aloha,
Rich