[tcs-lc] Misspelled Names and Orthographic Variants (Issue 005)

Fri Apr 29 14:52:40 PDT 2005

Yes, I've come up against that gotcha as well.  For internal purposes, I try
to use the multiplication symbol as the "real" data, but have a routine that
swaps it for lower-cae "x" whenever rendering in an ISO Latin-based output
product.  That's easy enough for internal purposes, but perhaps suboptimal
for a general standard.

Where it can be meaningful and important is for names of nothotaxa.
Sometimes it's not always obvious whether a species epithet like "xalba"
represents a vothospecies.  Using a multiplication symbol for the first
character makes it self-evident.  As a quick-fix, I have sometimes gone with
"x_alba" for nothotaxa names in the database, converted appropriately at
output time.

Rich

> -----Original Message-----
> From: Bob Morris [mailto:ram at cs.umb.edu]
> Sent: Friday, April 29, 2005 2:56 AM
> To: Richard Pyle
> Cc: Roger Hyam; tcs-lc at ecoinformatics.org
> Subject: Re: [tcs-lc] Misspelled Names and Orthographic Variants (Issue
> 005)
>
>
> There is a small gotcha that bit us once. The horticultural code, which
> seems to be incorporated in ICBN by reference , is underspecified on the
> use of the symbol for cross, which in print is sometimes rendered with
> the "multiplication" sign, 0xD7 in unicode hex. Not only is this not in
> the ISO Latin set, but it is unclear to me whether ICBN would treat the
> same string using the ascii character "X" as an orthographic variant or
> not in a printed environment and hence(?) in an electronic environment.
>
> Bob
>
>
> Richard Pyle wrote:
> >>No. If I had meant that a NameObject was just a string I would
> >>have written that.  It has to be a set of characters (not all
> >>the Unicode ones but just those acceptable by the codes) that
> >>some one might believe to be a published scientific name of a
> >>biological organism governed by one of the codes. Yes a name
> >>object could be constructed as a random set of characters but
> >>there would be little point in doing it. One has to have an
> >>intent to use the construct for something.
> >
> >
> > We seem to be miscommunicating here. I didn't mean to confuse
> the issue with
> > unicode, and I didn't mean to imply that any random set of text
> characters
> > could be created as a name object.
> >
> > What I meant was, if I understand you correctly, your primary
> criterion for
> > deciding whether two uses of a "name" should point to the same
> NameObject
> > boils down to whether or not both name instances consist of the
> exact same
> > sequence of text characters (ignoring homonyms for a moment).  Stated
> > another way, two different text character strings can never
> share the same
> > NameObject ID number; regardless of whether they share the same original
> > description and differ only due to lapsus, gender matching, or represent
> > some other orthographic variant.  Correct?
> >
> > If so, then every unique name-string applied to a Concept gets a unique
> > NameObject instance and a unique NameObjectID.
> >
> > If so, then this does NOT conform to the generally agreed need for name
> > objects by LC folks, as per previous dicussions on the LC Wiki and
> > elsewhere.  I believe the LC crowd agreed that the "name
> object" should be
> > defined as basionyms+new combinations, and orthographic
> variants would *not*
> > be represented as distinct name objects (but rather would be
> attributes of
> > individual name usage instances).
> >
> > I *REALLY* think we should stick with the simple "Aus bus"
> examples until we
> > sort out this very basic fundamental stuff, but I will provide
> you with a
> > real-world example that touches on several of these issues (below).
> >
> >
> >>>Well...does that mean that all misspellings must be attached
> to "Defined
> >>>Concepts"?  Or, does it mean that not all misspellings will be
> >
> > represented
> >
> >>>as TCS objects?
> >>
> >>
> >>If you read the next three points I explain...
> >
> >
> > Sorry!  My bad.  I saw that you answered my question and I intended to
> > delete my line quoted above, but I guess I forgot...
> >
> >
> >>>>a) For an author to misspell a name they must have used it to refer to
> >
> > a
> >
> >>>>concept of some kind that it would be useful to reason about.
> >>>
> >>>Agreed -- but I thought that not all name-usages rose to the level of
> >>>"defined concepts"?
> >>
> >>Who misspelled the name?
> >
> >
> > Let's say the author of a checklist, who did not intend to define a new
> > concept.
> >
> >
> >>If the person that did it was circumscribing a taxon then you create a
> >
> > concept
> >
> >>and possibly NameObject etc.
> >
> >
> > Certainly the author of the checklist *implied* a circumscribed
> taxon, but
> > he didn't really create one -- he was just using the name in
> the same sense
> > that the most recent revision (cited by the checklist) used it.  So the
> > checklist implied concept would be congruent with the revision implied
> > concept, but the checklist author mis-spelled the name.  I
> would like to see
> > a TC instance created for the checklist use of the name, and secondarily
> > establish it as congruent with the revision usage of the name.  But I
> > thought the TCS folks opposed this "inflation" of concepts?
> >
> >
> >>You could create a empty concept and have them as the according
> to if you
> >
> > don't
> >
> >>know anything about their circumscription.
> >
> >
> > That would be fine with me too -- I have no fear of concept
> inflation (it
> > only costs us GUIDs, and GUIDs are cheap!)
> >
> >
> >>On the other hand if they misspelled it in hand writing, on a debt slip,
> >
> > on a
> >
> >>Friday afternoon 120 years ago and you don't believe anyone
> else has used
> >
> > that
> >
> >>spelling I would be tempted not to create a concept or a name.
> >
> >
> > I tend to agree.  But there is a LOT of stuff between a
> hand-written note
> > 120 years old, and a full-blown concept definition.
> >
> >
> >>This kind of thing is more likely to be defined in ABCD anyhow. If  you
> >
> > believe
> >
> >>that the hand written scrawl could be a 'real name' as in it might
> >
> > actually
> >
> >>have been published somewhere then you could create a NameObject and
> >
> > possibly
> >
> >>a nominal concept if you want to debt the specimen to it.
> >
> >
> > So does this mean that every single orthographic variant of every single
> > name (if used to apply to a concept somewhere) would have its own unique
> > Nominal concept?  Asked another way, is there a firm 1:1 ratio between
> > unique NameObjects, and unique Nominal Concepts? (this question stands
> > regardless of how we define a "unique NameObject")  If the
> answer is "yes",
> > then my case for embedding LC and NameObject identifiers within
> Nominal TC
> > instances is further strengthened.
> >
> >
> >>It is horses for courses on this.
> >
> >
> > I have absolutely no idea what that means. :-)
> >
> >
> >>That is why I would like to start talking about real life examples now.
> >
> >
> > Real-world examples are only going to make this conversation
> more confusing,
> > I think...but I'll comply below.
> >
> >
> >>The only trouble is you can't assume that some one knows they have a
> >
> > wrongly
> >
> >>spelled name. They may mark it up as a accepted name because
> they think it
> >
> > is.
> >
> >>Then we have to deal with it and the way we deal with it depends on who
> >
> > has
> >
> >>concepts relating to it.
> >
> >
> > Certainly we have to allow for situations where the nomenclature is not
> > formally resolved. The way I would prefer to handle those cases
> is to enter
> > the VerbatimNameString in the TC instance, and if you can't point to a
> > well-defined NameObject (defined in the sense of populated LC elements),
> > then you don't include a reference to a NameObject - you simply
> provide the
> > VerbatimNameString.  If you don't know what the intended NameObject was,
> > then what's the point of linking to a NameObject, if all the
> elements of the
> > NameObject are empty?
> >
> > My feeling is to reserve NameObjects for well-defined names,
> and rely on the
> > VerbatimNameString by itself for all TC instances that cannot
> be linked to a
> > well-defined name object.
> >
> > The analogy would be when you have AccordingToSimple, but no
> > AccordingToDetailed.  Indeed, in my mind "VerbatimNameString" is what I
> > thought "NameSimple" was intended to be (the only reason I didn't use
> > "NameSimple" in place of "VerbatimNameString" is that some people
> > interpreted "NameSimple" as a concatenation of elements within
> NameDetailed,
> > and I wanted to explicitly define an element for "Name as spelled in the
> > AccordingTo publication").
> >
> > You said:
> >
> >>>c) If the author misspelled a name when they initially published it
> >>>(e.g. wrong gender)  there may be some concepts that use the incorrect
> >>>spelling and some that use the correct spelling. All these concepts
> >>>should be capable of being related to each other in terms of set
> >>>relationships.
> >
> >
> > Then I jumped the gun and said:
> >
> >
> >>Exactly!!!!
> >
> >
> > ...but now I see I read your words too quickly.  The
> relationship between
> > "Aus bus" and its orthographic variant "Aus bea" is *not* a "set"
> > relationship.  It is a singleobject-->singleobject
> relationship.  Concept
> > circumscriptions represent sets of individuals, and therefore
> can relate to
> > each other with things like "includes" and "ovelaps".  Names are unique
> > individuals themselves, and have only two kinds of
> relationships: "equal",
> > and "not equal".
> >
> >
> >>>What really matters is that the
> >>>"Aus bea" authors and the "Aus bus" authors intended to refer to the
> >
> > same
> >
> >>>"name object" -- and it just seems like a no-brainer to me that you
> >
> > would
> >
> >>>represent this fact by linking both sets of TC instances to the same
> >>>NameObject instance.
> >>
> >>That would be great. All we need to do is know ahead of time whether a
> >
> > name
> >
> >>is a misspelling so we could decide not to mark it up.
> >
> >
> > No we don't.  We only need to know ahead of time whether we have
> > nomenclatural details about the name object the AccordingTo
> author intended,
> > or we don't.  If we do, then we provide a link to the appropriate name
> > object (no matter whether the AccordingTo author spelled it correctly or
> > not).  If we do not know what NameObject was intended, we populate
> > "VerbatimNameString" and stop there.  As I said before, if you
> don't know
> > anything about the NameObject that the AccordingTo author intended, then
> > there's no point in linking to a NameObject.
> >
> >
> >>In reality these objects will be created.
> >
> >
> > Why?  Why not stop at VerbatimNameString if you don't know what defined
> > NameObject to link to? Isn't that exactly what you would do in
> v0.95 (i.e.,
> > provide a NameSimple, but not provide any NameDetailed stuff?)
> >
> >
> >>They can be linked together if need be, either at the concept
> level (if we
> >>are comparing taxa based on the names) or at the nomenclatural level. I
> >>just can't see what your problem is here.
> >
> >
> > I guess my problem is that one of the few things that the LC
> group did agree
> > on (I think unanimously), was that orthographic variants,
> lapsuses (lapsi?),
> > etc. should *not* be treated as separate name objects.  What it
> sounds like
> > you are proposing in v0.95.2 is to disregard this, and go back to the
> > earlier notion that "if it's a different string of text
> characters, it's a
> > different name object".  It's fine if TCS finds this the most practical
> > solution for exchanging concepts and the names applied to those
> concepts,
> > but it is then very likely that LC will need to develop its own
> schema to
> > deal with the exchange of nomenclatural data.  That is the scenario I am
> > *desperately* trying to avoid.
> >
> >
> >>>I think it would be a mistake to go that route.  We don't need that
> >
> > level of
> >
> >>>complexity, when a simple "VerbatimSpelling" would both capture the
> >>>human-readible reality of the text string that appeared in the
> >
> > publiction,
> >
> >>>and serve as the perfect field to search through for text matches.
> >>
> >>We need the misspelling relationship because two Concepts (with
> different
> >>circumscriptions) may have names that have that relationship.
> >
> >
> > Concept 1: Aus bea Smith SEC. Smith
> > Concept 2: Aus bus Smith SEC. Jones
> >
> > You propose two NameObjects:
> > Name 1: Aus bea Smith
> > Name 2: Aus bus Smith
> >
> > Concept 1 links to Name 1
> > Concept 2 links to Name 2
> > Name 1 links to Name 2 via "is misspelling of" relationship.
> >
> > Correct?
> >
> > I think it makes more sense to have one name object, which would have LC
> > elements to document both "Original Orthography" and "Code-Correct
> > Orthography" (among other LC elements to document
> nomenclaturally important
> > stuff).
> >
> > Both Concepts (1 & 2) would then link to the same NameObject
> (if we already
> > know that one is a misspelling of the other, then we already
> know what the
> > "correct" Name/spelling is). Each concept would have in its
> > VerbatimNameString to track how each spelled the name, but the important
> > thing is that they would be automatically linked to each other
> via a shared
> > NameObjectID.
> >
> >
> >>We can't say they are the same thing simply because they both use
> >
> > different
> >
> >>versions of the same name.
> >
> >
> > We wouldn't need to!  Sharing the same NameObjectID does not in any way
> > imply congruency between the two circumscriptions.
> >
> >
> >>We have to relate them with 'is congruent' or 'overlaps' or somethings
> >
> >
> > Exactly!  That's a different part of the TCS.
> >
> >
> >>but it would also be useful to say the equivalent of "we believe this
> >
> > taxon
> >
> >>uses a different spelling of the same scientific name as that taxon"
> >
> >
> > ...which you would automatically have by virtue of the fact that both
> > concepts are linked to the same NameObject, but have unequal
> > VerbatimNameString values. Which of the two VerbatimNameString value is
> > correct (if either), is irrelevant to either Concept instance,
> but could be
> > determined from elements of the NameObject.
> >
> >
> >>You could do this without even creating a name object for one of the
> >>TaxonConcepts if you like. The thing is pretty damn flexible.
> >
> >
> > If you didn't create a NameObject for both, then how would you know that
> > they used different name spellings? I don't see any place within
> > TaxonConcept where a namestring (NameSimple) could be stored.
> >
> >
> >>Things will become clear as we work through real examples. I am
> currently
> >
> > doing
> >
> >>the ones on the LC wiki and will post them when complete. Can't
> guarantee
> >
> > I will
> >
> >>get through them all though as it is quite time consuming.
> >
> >
> > O.K., here's one for you.  I didn't chose it for any reason other than I
> > happend to have a need to sort through it yesterday.  It has a few odd
> > thngs, but is not especially thorny or problematic (believe me -- if you
> > want tough ones, I can give you tough ones!) I can send you
> full details for
> > publications, vouchers, etc. -- but this should be enough to get you
> > started.
> >
> > ICZN (fishes)
> >
> > Linnaeus (1758) established the genus "Gobius" on p. 262.
> > He included these new species in this genus:
> > G. anguillaris (p.264)
> > G. aphya (p.263)
> > G. eleotris (p. 263)
> > G. jozo (p. 263)
> > G. niger (p. 262)
> > G. paganellus (p. 263)
> > G. pectinirostris (p. 264)
> >
> > Linnaeus (1766) added this species to his genus "Gobius":
> > G. barbarus (p.450)
> >
> > The type species of this genus was subsequently designated as
> Gobius niger
> > Linnaeus 1758:262 by Gill (1863:268).
> > This genus has been placed on the ICZN "Official List" (Opinion 77,
> > Direction 56).
> >
> > Gmelin (1789) added these new species in the genus Gobius Linnaeus:
> > G. arabicus (p. 1198)
> > G. bicolor (p. 1197)
> > G. cruentatus (p. 1197)
> > G. gronovii (p. 1205)
> > G. melanuros (p. 1201)
> > G. pisonis (p. 1206)
> >
> > Gronow (1763) used the genus name "Eleotris". However, this
> work has been
> > rejected by the ICZN, and thus this name has been placed on the ICZN
> > "Official Index" (Direction 56).
> >
> > Bloch & Schneider (1801) used Gronow's name "Eleotris" on p. 65.
> > They described these new species in this genus:
> > E. lanceolata (p. 67)
> > E. mauritii (p. 66)
> > They included Gobius pisonis Gmelin 1789 among the species in
> this genus.
> > Because Gronow's name has been rejected, Bloch & Shneider are
> considered the
> > authors of this genus name.
> > In Opinion 93, Direction 56, ICZN used its plenary powers to establish
> > Gobius pisonis Gmelin 1789 as the type species of this genus.
> >
> > Rüppell (1830) established the new genus "Asterropterix" on p. 138.
> > In the same publication (same page), he also described the new species,
> > "Asterropterix semipunctatus".  Because this was the only
> species Rüppell
> > included in his new genus, this species is established as the
> type species
> > of the genus (monotypy).
> > In the same publication, he included an illustration of a
> specimen that now
> > bears the catalog number SMF 1691, and is the Holotype of his
> new species A.
> > semipunctatus.  In the caption for that figure, Rüppell spelled the name
> > "Asterropteryx semipunctatus" ("-yx" for the genus, instead of
> "-ix" for the
> > genus).
> >
> > Rüppell (1835) listed the same genus and species, but
> consistently spelled
> > the genus "Asterropteryx". In doing so, by ICZN rules he serves as the
> > "first reviser", and thereby establishes the "-yx" spelling of
> the genus as
> > the "correct original spelling".  We can only assume that Rüppell's 1935
> > concept circumscription of the species is congruent to his 1830 concept
> > circumscription. Because the genus is monotypic in both
> publications, we can
> > also assume congruency between the two genus concept circumscriptions.
> >
> > Bleeker (1855) described these new species within the genus
> Eleotris Bl. &
> > Sch. (although Bleeker attributed the genus to Gronow):
> > E. cyanostigma (p. 452)
> > E. heteropterus (p. 422)
> >
> > Bleeker (1874a) established the new genus Brachyeleotris (p. 306). He
> > designated Eleotris cyanostigma Bleeker (1855) as the type species.
> >
> > Bleeker (1874b) described the new species ensifera (p. 375),
> and included it
> > within his genus Brachyeleotris.
> >
> > Snyder (1904) placed the species Eleotris cyanostigma Bleeker
> (1855) within
> > the genus "Asterropterix" (incorrect spelling).
> >
> > Whitley (1932) described the new subspecies, "Asterropterix
> semipunctatus
> > quisqualis" (incorrect spelling of genus).
> >
> > Dor (1984) regarded Eleotris cyanostigma Bleeker (1855) to be a junior
> > synonym of Asterropteryx semipunctatus Rüppell (1830).
> > In doing so, he (by definition) also considered the genus Brachyeleotris
> > Bleeker (1874a) to be a junior synonym of Asterropteryx Rüppell (1830).
> >
> > Randall et al. (1997) placed Brachyeleotris ensifera Bleeker
> (1874b) in the
> > genus "Asterropteryx" (correct spelling), and spelled the
> species epithet
> > "ensiferus".
> >
> > Privitera (2001) published on the reproductive biology of Asterropteryx
> > semipunctatus Rüppell, but pointed out that the genus "Asterropteryx" is
> > feminine, and thus spelled A. semipunctatus as "Asterropteryx
> semipunctata".
> >
> > Nakabo (2002) also recognized the feminine gender of Asterropteryx, and
> > followed Dor in placing Brachyeleotris ensifera Bleeker (1874b) in
> > Asterropteryx, and thus spelled it "Asterropteryx ensifera".
> >
> > Allen & Adrim (2003) also placed Brachyeleotris ensifera
> Bleeker (1874b) in
> > the genus Asterropteryx, but spelled the species epithet "ensifer".
> >
> > Randall et al. (2004) used the spelling "Asterropteryx ensiferus".
> >
> > Greenfield & Randall (2004) mistakenly used the spelling "Asterropterix
> > semipunctatus" in their treatment of that species.
> >
> > O.K., that's probably enough.  Believe it or not, this is highly
> > simplified -- the real situation is about ten times more
> complicated!!  And
> > here's the amazing part:  I did NOT deliberately pick a very complicated
> > case.  All I wanted to do wa illustrate the real-world
> > Asterropteryx/Asterripterix and semipunctatus/semipunctata
> spelling issues.
> > All the other stuff emerged as I was writing the above, just trying to
> > represent type species and such.  I didn't even include homonyms in this
> > case.  The point is, this level of complexity is NOT unusual -- in fact,
> > it's disproportionately simplified.
> >
> > Aloha,
> > Rich
> >
> >
> >
> > _______________________________________________
> > Tcs-lc mailing list
> > Tcs-lc at ecoinformatics.org
> > http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/tcs-lc
> >
>
> --
> Robert A. Morris
> Professor of Computer Science
> UMASS-Boston
> http://www.cs.umb.edu/~ram
> phone (+1)617 287 6466