[tcs-lc] Misspelled Names and Orthographic Variants (Issue 005)

Bob Morris ram at cs.umb.edu
Fri Apr 29 05:56:06 PDT 2005


There is a small gotcha that bit us once. The horticultural code, which 
seems to be incorporated in ICBN by reference , is underspecified on the 
use of the symbol for cross, which in print is sometimes rendered with 
the "multiplication" sign, 0xD7 in unicode hex. Not only is this not in 
the ISO Latin set, but it is unclear to me whether ICBN would treat the 
same string using the ascii character "X" as an orthographic variant or 
not in a printed environment and hence(?) in an electronic environment.

Bob


Richard Pyle wrote:
>>No. If I had meant that a NameObject was just a string I would
>>have written that.  It has to be a set of characters (not all
>>the Unicode ones but just those acceptable by the codes) that
>>some one might believe to be a published scientific name of a
>>biological organism governed by one of the codes. Yes a name
>>object could be constructed as a random set of characters but
>>there would be little point in doing it. One has to have an
>>intent to use the construct for something.
> 
> 
> We seem to be miscommunicating here. I didn't mean to confuse the issue with
> unicode, and I didn't mean to imply that any random set of text characters
> could be created as a name object.
> 
> What I meant was, if I understand you correctly, your primary criterion for
> deciding whether two uses of a "name" should point to the same NameObject
> boils down to whether or not both name instances consist of the exact same
> sequence of text characters (ignoring homonyms for a moment).  Stated
> another way, two different text character strings can never share the same
> NameObject ID number; regardless of whether they share the same original
> description and differ only due to lapsus, gender matching, or represent
> some other orthographic variant.  Correct?
> 
> If so, then every unique name-string applied to a Concept gets a unique
> NameObject instance and a unique NameObjectID.
> 
> If so, then this does NOT conform to the generally agreed need for name
> objects by LC folks, as per previous dicussions on the LC Wiki and
> elsewhere.  I believe the LC crowd agreed that the "name object" should be
> defined as basionyms+new combinations, and orthographic variants would *not*
> be represented as distinct name objects (but rather would be attributes of
> individual name usage instances).
> 
> I *REALLY* think we should stick with the simple "Aus bus" examples until we
> sort out this very basic fundamental stuff, but I will provide you with a
> real-world example that touches on several of these issues (below).
> 
> 
>>>Well...does that mean that all misspellings must be attached to "Defined
>>>Concepts"?  Or, does it mean that not all misspellings will be
> 
> represented
> 
>>>as TCS objects?
>>
>>
>>If you read the next three points I explain...
> 
> 
> Sorry!  My bad.  I saw that you answered my question and I intended to
> delete my line quoted above, but I guess I forgot...
> 
> 
>>>>a) For an author to misspell a name they must have used it to refer to
> 
> a
> 
>>>>concept of some kind that it would be useful to reason about.
>>>
>>>Agreed -- but I thought that not all name-usages rose to the level of
>>>"defined concepts"?
>>
>>Who misspelled the name?
> 
> 
> Let's say the author of a checklist, who did not intend to define a new
> concept.
> 
> 
>>If the person that did it was circumscribing a taxon then you create a
> 
> concept
> 
>>and possibly NameObject etc.
> 
> 
> Certainly the author of the checklist *implied* a circumscribed taxon, but
> he didn't really create one -- he was just using the name in the same sense
> that the most recent revision (cited by the checklist) used it.  So the
> checklist implied concept would be congruent with the revision implied
> concept, but the checklist author mis-spelled the name.  I would like to see
> a TC instance created for the checklist use of the name, and secondarily
> establish it as congruent with the revision usage of the name.  But I
> thought the TCS folks opposed this "inflation" of concepts?
> 
> 
>>You could create a empty concept and have them as the according to if you
> 
> don't
> 
>>know anything about their circumscription.
> 
> 
> That would be fine with me too -- I have no fear of concept inflation (it
> only costs us GUIDs, and GUIDs are cheap!)
> 
> 
>>On the other hand if they misspelled it in hand writing, on a debt slip,
> 
> on a
> 
>>Friday afternoon 120 years ago and you don't believe anyone else has used
> 
> that
> 
>>spelling I would be tempted not to create a concept or a name.
> 
> 
> I tend to agree.  But there is a LOT of stuff between a hand-written note
> 120 years old, and a full-blown concept definition.
> 
> 
>>This kind of thing is more likely to be defined in ABCD anyhow. If  you
> 
> believe
> 
>>that the hand written scrawl could be a 'real name' as in it might
> 
> actually
> 
>>have been published somewhere then you could create a NameObject and
> 
> possibly
> 
>>a nominal concept if you want to debt the specimen to it.
> 
> 
> So does this mean that every single orthographic variant of every single
> name (if used to apply to a concept somewhere) would have its own unique
> Nominal concept?  Asked another way, is there a firm 1:1 ratio between
> unique NameObjects, and unique Nominal Concepts? (this question stands
> regardless of how we define a "unique NameObject")  If the answer is "yes",
> then my case for embedding LC and NameObject identifiers within Nominal TC
> instances is further strengthened.
> 
> 
>>It is horses for courses on this.
> 
> 
> I have absolutely no idea what that means. :-)
> 
> 
>>That is why I would like to start talking about real life examples now.
> 
> 
> Real-world examples are only going to make this conversation more confusing,
> I think...but I'll comply below.
> 
> 
>>The only trouble is you can't assume that some one knows they have a
> 
> wrongly
> 
>>spelled name. They may mark it up as a accepted name because they think it
> 
> is.
> 
>>Then we have to deal with it and the way we deal with it depends on who
> 
> has
> 
>>concepts relating to it.
> 
> 
> Certainly we have to allow for situations where the nomenclature is not
> formally resolved. The way I would prefer to handle those cases is to enter
> the VerbatimNameString in the TC instance, and if you can't point to a
> well-defined NameObject (defined in the sense of populated LC elements),
> then you don't include a reference to a NameObject - you simply provide the
> VerbatimNameString.  If you don't know what the intended NameObject was,
> then what's the point of linking to a NameObject, if all the elements of the
> NameObject are empty?
> 
> My feeling is to reserve NameObjects for well-defined names, and rely on the
> VerbatimNameString by itself for all TC instances that cannot be linked to a
> well-defined name object.
> 
> The analogy would be when you have AccordingToSimple, but no
> AccordingToDetailed.  Indeed, in my mind "VerbatimNameString" is what I
> thought "NameSimple" was intended to be (the only reason I didn't use
> "NameSimple" in place of "VerbatimNameString" is that some people
> interpreted "NameSimple" as a concatenation of elements within NameDetailed,
> and I wanted to explicitly define an element for "Name as spelled in the
> AccordingTo publication").
> 
> You said:
> 
>>>c) If the author misspelled a name when they initially published it
>>>(e.g. wrong gender)  there may be some concepts that use the incorrect
>>>spelling and some that use the correct spelling. All these concepts
>>>should be capable of being related to each other in terms of set
>>>relationships.
> 
> 
> Then I jumped the gun and said:
> 
> 
>>Exactly!!!!
> 
> 
> ...but now I see I read your words too quickly.  The relationship between
> "Aus bus" and its orthographic variant "Aus bea" is *not* a "set"
> relationship.  It is a singleobject-->singleobject relationship.  Concept
> circumscriptions represent sets of individuals, and therefore can relate to
> each other with things like "includes" and "ovelaps".  Names are unique
> individuals themselves, and have only two kinds of relationships: "equal",
> and "not equal".
> 
> 
>>>What really matters is that the
>>>"Aus bea" authors and the "Aus bus" authors intended to refer to the
> 
> same
> 
>>>"name object" -- and it just seems like a no-brainer to me that you
> 
> would
> 
>>>represent this fact by linking both sets of TC instances to the same
>>>NameObject instance.
>>
>>That would be great. All we need to do is know ahead of time whether a
> 
> name
> 
>>is a misspelling so we could decide not to mark it up.
> 
> 
> No we don't.  We only need to know ahead of time whether we have
> nomenclatural details about the name object the AccordingTo author intended,
> or we don't.  If we do, then we provide a link to the appropriate name
> object (no matter whether the AccordingTo author spelled it correctly or
> not).  If we do not know what NameObject was intended, we populate
> "VerbatimNameString" and stop there.  As I said before, if you don't know
> anything about the NameObject that the AccordingTo author intended, then
> there's no point in linking to a NameObject.
> 
> 
>>In reality these objects will be created.
> 
> 
> Why?  Why not stop at VerbatimNameString if you don't know what defined
> NameObject to link to? Isn't that exactly what you would do in v0.95 (i.e.,
> provide a NameSimple, but not provide any NameDetailed stuff?)
> 
> 
>>They can be linked together if need be, either at the concept level (if we
>>are comparing taxa based on the names) or at the nomenclatural level. I
>>just can't see what your problem is here.
> 
> 
> I guess my problem is that one of the few things that the LC group did agree
> on (I think unanimously), was that orthographic variants, lapsuses (lapsi?),
> etc. should *not* be treated as separate name objects.  What it sounds like
> you are proposing in v0.95.2 is to disregard this, and go back to the
> earlier notion that "if it's a different string of text characters, it's a
> different name object".  It's fine if TCS finds this the most practical
> solution for exchanging concepts and the names applied to those concepts,
> but it is then very likely that LC will need to develop its own schema to
> deal with the exchange of nomenclatural data.  That is the scenario I am
> *desperately* trying to avoid.
> 
> 
>>>I think it would be a mistake to go that route.  We don't need that
> 
> level of
> 
>>>complexity, when a simple "VerbatimSpelling" would both capture the
>>>human-readible reality of the text string that appeared in the
> 
> publiction,
> 
>>>and serve as the perfect field to search through for text matches.
>>
>>We need the misspelling relationship because two Concepts (with different
>>circumscriptions) may have names that have that relationship.
> 
> 
> Concept 1: Aus bea Smith SEC. Smith
> Concept 2: Aus bus Smith SEC. Jones
> 
> You propose two NameObjects:
> Name 1: Aus bea Smith
> Name 2: Aus bus Smith
> 
> Concept 1 links to Name 1
> Concept 2 links to Name 2
> Name 1 links to Name 2 via "is misspelling of" relationship.
> 
> Correct?
> 
> I think it makes more sense to have one name object, which would have LC
> elements to document both "Original Orthography" and "Code-Correct
> Orthography" (among other LC elements to document nomenclaturally important
> stuff).
> 
> Both Concepts (1 & 2) would then link to the same NameObject (if we already
> know that one is a misspelling of the other, then we already know what the
> "correct" Name/spelling is). Each concept would have in its
> VerbatimNameString to track how each spelled the name, but the important
> thing is that they would be automatically linked to each other via a shared
> NameObjectID.
> 
> 
>>We can't say they are the same thing simply because they both use
> 
> different
> 
>>versions of the same name.
> 
> 
> We wouldn't need to!  Sharing the same NameObjectID does not in any way
> imply congruency between the two circumscriptions.
> 
> 
>>We have to relate them with 'is congruent' or 'overlaps' or somethings
> 
> 
> Exactly!  That's a different part of the TCS.
> 
> 
>>but it would also be useful to say the equivalent of "we believe this
> 
> taxon
> 
>>uses a different spelling of the same scientific name as that taxon"
> 
> 
> ...which you would automatically have by virtue of the fact that both
> concepts are linked to the same NameObject, but have unequal
> VerbatimNameString values. Which of the two VerbatimNameString value is
> correct (if either), is irrelevant to either Concept instance, but could be
> determined from elements of the NameObject.
> 
> 
>>You could do this without even creating a name object for one of the
>>TaxonConcepts if you like. The thing is pretty damn flexible.
> 
> 
> If you didn't create a NameObject for both, then how would you know that
> they used different name spellings? I don't see any place within
> TaxonConcept where a namestring (NameSimple) could be stored.
> 
> 
>>Things will become clear as we work through real examples. I am currently
> 
> doing
> 
>>the ones on the LC wiki and will post them when complete. Can't guarantee
> 
> I will
> 
>>get through them all though as it is quite time consuming.
> 
> 
> O.K., here's one for you.  I didn't chose it for any reason other than I
> happend to have a need to sort through it yesterday.  It has a few odd
> thngs, but is not especially thorny or problematic (believe me -- if you
> want tough ones, I can give you tough ones!) I can send you full details for
> publications, vouchers, etc. -- but this should be enough to get you
> started.
> 
> ICZN (fishes)
> 
> Linnaeus (1758) established the genus "Gobius" on p. 262.
> He included these new species in this genus:
> G. anguillaris (p.264)
> G. aphya (p.263)
> G. eleotris (p. 263)
> G. jozo (p. 263)
> G. niger (p. 262)
> G. paganellus (p. 263)
> G. pectinirostris (p. 264)
> 
> Linnaeus (1766) added this species to his genus "Gobius":
> G. barbarus (p.450)
> 
> The type species of this genus was subsequently designated as Gobius niger
> Linnaeus 1758:262 by Gill (1863:268).
> This genus has been placed on the ICZN "Official List" (Opinion 77,
> Direction 56).
> 
> Gmelin (1789) added these new species in the genus Gobius Linnaeus:
> G. arabicus (p. 1198)
> G. bicolor (p. 1197)
> G. cruentatus (p. 1197)
> G. gronovii (p. 1205)
> G. melanuros (p. 1201)
> G. pisonis (p. 1206)
> 
> Gronow (1763) used the genus name "Eleotris". However, this work has been
> rejected by the ICZN, and thus this name has been placed on the ICZN
> "Official Index" (Direction 56).
> 
> Bloch & Schneider (1801) used Gronow's name "Eleotris" on p. 65.
> They described these new species in this genus:
> E. lanceolata (p. 67)
> E. mauritii (p. 66)
> They included Gobius pisonis Gmelin 1789 among the species in this genus.
> Because Gronow's name has been rejected, Bloch & Shneider are considered the
> authors of this genus name.
> In Opinion 93, Direction 56, ICZN used its plenary powers to establish
> Gobius pisonis Gmelin 1789 as the type species of this genus.
> 
> Rüppell (1830) established the new genus "Asterropterix" on p. 138.
> In the same publication (same page), he also described the new species,
> "Asterropterix semipunctatus".  Because this was the only species Rüppell
> included in his new genus, this species is established as the type species
> of the genus (monotypy).
> In the same publication, he included an illustration of a specimen that now
> bears the catalog number SMF 1691, and is the Holotype of his new species A.
> semipunctatus.  In the caption for that figure, Rüppell spelled the name
> "Asterropteryx semipunctatus" ("-yx" for the genus, instead of "-ix" for the
> genus).
> 
> Rüppell (1835) listed the same genus and species, but consistently spelled
> the genus "Asterropteryx". In doing so, by ICZN rules he serves as the
> "first reviser", and thereby establishes the "-yx" spelling of the genus as
> the "correct original spelling".  We can only assume that Rüppell's 1935
> concept circumscription of the species is congruent to his 1830 concept
> circumscription. Because the genus is monotypic in both publications, we can
> also assume congruency between the two genus concept circumscriptions.
> 
> Bleeker (1855) described these new species within the genus Eleotris Bl. &
> Sch. (although Bleeker attributed the genus to Gronow):
> E. cyanostigma (p. 452)
> E. heteropterus (p. 422)
> 
> Bleeker (1874a) established the new genus Brachyeleotris (p. 306). He
> designated Eleotris cyanostigma Bleeker (1855) as the type species.
> 
> Bleeker (1874b) described the new species ensifera (p. 375), and included it
> within his genus Brachyeleotris.
> 
> Snyder (1904) placed the species Eleotris cyanostigma Bleeker (1855) within
> the genus "Asterropterix" (incorrect spelling).
> 
> Whitley (1932) described the new subspecies, "Asterropterix semipunctatus
> quisqualis" (incorrect spelling of genus).
> 
> Dor (1984) regarded Eleotris cyanostigma Bleeker (1855) to be a junior
> synonym of Asterropteryx semipunctatus Rüppell (1830).
> In doing so, he (by definition) also considered the genus Brachyeleotris
> Bleeker (1874a) to be a junior synonym of Asterropteryx Rüppell (1830).
> 
> Randall et al. (1997) placed Brachyeleotris ensifera Bleeker (1874b) in the
> genus "Asterropteryx" (correct spelling), and spelled the species epithet
> "ensiferus".
> 
> Privitera (2001) published on the reproductive biology of Asterropteryx
> semipunctatus Rüppell, but pointed out that the genus "Asterropteryx" is
> feminine, and thus spelled A. semipunctatus as "Asterropteryx semipunctata".
> 
> Nakabo (2002) also recognized the feminine gender of Asterropteryx, and
> followed Dor in placing Brachyeleotris ensifera Bleeker (1874b) in
> Asterropteryx, and thus spelled it "Asterropteryx ensifera".
> 
> Allen & Adrim (2003) also placed Brachyeleotris ensifera Bleeker (1874b) in
> the genus Asterropteryx, but spelled the species epithet "ensifer".
> 
> Randall et al. (2004) used the spelling "Asterropteryx ensiferus".
> 
> Greenfield & Randall (2004) mistakenly used the spelling "Asterropterix
> semipunctatus" in their treatment of that species.
> 
> O.K., that's probably enough.  Believe it or not, this is highly
> simplified -- the real situation is about ten times more complicated!!  And
> here's the amazing part:  I did NOT deliberately pick a very complicated
> case.  All I wanted to do wa illustrate the real-world
> Asterropteryx/Asterripterix and semipunctatus/semipunctata spelling issues.
> All the other stuff emerged as I was writing the above, just trying to
> represent type species and such.  I didn't even include homonyms in this
> case.  The point is, this level of complexity is NOT unusual -- in fact,
> it's disproportionately simplified.
> 
> Aloha,
> Rich
> 
> 
> 
> _______________________________________________
> Tcs-lc mailing list
> Tcs-lc at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/tcs-lc
> 

-- 
Robert A. Morris
Professor of Computer Science
UMASS-Boston
http://www.cs.umb.edu/~ram
phone (+1)617 287 6466


More information about the Tcs-lc mailing list