[tcs-lc] Misspelling db response

Gregor Hagedorn G.Hagedorn at BBA.DE
Tue May 3 03:45:37 PDT 2005


I agree, relating orthographic variants and misspellings to accepted names is 
THE problem that I have when working with name data. My own statistics are that 
only 30% of names have the same spelling as in a standard name source.

For botanical names, theoretically one may want to distinghuish between 
eligible variants, the choice of which depends on your knowledge of Greek and 
Latin (and I prefer this to be someone else's knowledge) or on nomenclatural 
name canonicalization rules in ICBN, and plain stupid typographic mistakes. 
However, Paul said correctly that one persons spelling is another persons 
misspelling, and I believe it is not very fruitful to distinguish between these 
two categories; at least this should be optional.

I think orthographic variant should be interpreted in a neutral sense, to 
encompass all this. And over time an orthographic variant may become the 
correct name, and vice versa.

However, a quality issue does exists. Therfore, when Charles asks:

> Who should put in this effort? -
> 1) Data provider (list compiler)
> 2) Data collator (manager of database incorporating several lists, e.g.
> Fauna Europaea, ERMS)
> 3) Third party body (a nameserver e.g. GBIF ECAT)
> 4) Individual user (up to user to research possible alternatives that they
> may need to use as search terms)
> 5) Users collectively (through online editing tool - e.g. IPNI project)

I believe we should have 

a) the nomenclators provide their knowledge about orthographic variants, 
flagging by some means the name used in the original publication (which - I 
believe ein contrast to ICZN - in ICBN may NOT be the correct one). These are 
high-quality name variants checked by editors.

b) Whereever name-based knowledge is related to standard name objects, 
automatically knowledge about name variants is generated. Whether the name data 
are in a molecular database (names as submitted to GenBank!), in a specimen 
collection, or taken from the literature as in checklists or host-parasite 
lists - as soon as in addition to the original name in the source a URI to some 
name service is added, a name variant is implicitly known.

c) to improve the efficiency of biodiversity informatics, a separate service 
aggregating misspellings from various sources would be highly desirable. This 
could be run perhaps by GBIF. 

A major task of the integrator service is to inform about "name homonyms", i.e. 
names that ambigously point to multiple nomenclatural objects. This is much 
more efficiently done once data are integrated. In my own work I find that not 
every name that many names can NOT be assigned unambiguously, at least not out 
of context. Where homonyms are frequent (as they are in fungi), it is not 
unuasual that a name with unusual or lacking author abbreviation (many old 
names use non-standard one-letter abbreviations) can be mapped only context-
dependent.

The integrator should be able to deal with "false assignments" and allow to 
contradict them. Not only when typing a name are plain stupid errors be made, 
but also when relating them. A feedback mechanism is desirable, but I would 
hope that data are contradictable on the integrator level to achieve immediate 
results.

Gregor----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn at bba.de)
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Königin-Luise-Str. 19           Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203



More information about the Tcs-lc mailing list