[tcs-lc] Misspelling db response
Gregor Hagedorn
G.Hagedorn at BBA.DE
Tue May 3 03:45:37 PDT 2005
I agree, relating orthographic variants and misspellings to accepted names is
THE problem that I have when working with name data. My own statistics are that
only 30% of names have the same spelling as in a standard name source.
For botanical names, theoretically one may want to distinghuish between
eligible variants, the choice of which depends on your knowledge of Greek and
Latin (and I prefer this to be someone else's knowledge) or on nomenclatural
name canonicalization rules in ICBN, and plain stupid typographic mistakes.
However, Paul said correctly that one persons spelling is another persons
misspelling, and I believe it is not very fruitful to distinguish between these
two categories; at least this should be optional.
I think orthographic variant should be interpreted in a neutral sense, to
encompass all this. And over time an orthographic variant may become the
correct name, and vice versa.
However, a quality issue does exists. Therfore, when Charles asks:
> Who should put in this effort? -
> 1) Data provider (list compiler)
> 2) Data collator (manager of database incorporating several lists, e.g.
> Fauna Europaea, ERMS)
> 3) Third party body (a nameserver e.g. GBIF ECAT)
> 4) Individual user (up to user to research possible alternatives that they
> may need to use as search terms)
> 5) Users collectively (through online editing tool - e.g. IPNI project)
I believe we should have
a) the nomenclators provide their knowledge about orthographic variants,
flagging by some means the name used in the original publication (which - I
believe ein contrast to ICZN - in ICBN may NOT be the correct one). These are
high-quality name variants checked by editors.
b) Whereever name-based knowledge is related to standard name objects,
automatically knowledge about name variants is generated. Whether the name data
are in a molecular database (names as submitted to GenBank!), in a specimen
collection, or taken from the literature as in checklists or host-parasite
lists - as soon as in addition to the original name in the source a URI to some
name service is added, a name variant is implicitly known.
c) to improve the efficiency of biodiversity informatics, a separate service
aggregating misspellings from various sources would be highly desirable. This
could be run perhaps by GBIF.
A major task of the integrator service is to inform about "name homonyms", i.e.
names that ambigously point to multiple nomenclatural objects. This is much
more efficiently done once data are integrated. In my own work I find that not
every name that many names can NOT be assigned unambiguously, at least not out
of context. Where homonyms are frequent (as they are in fungi), it is not
unuasual that a name with unusual or lacking author abbreviation (many old
names use non-standard one-letter abbreviations) can be mapped only context-
dependent.
The integrator should be able to deal with "false assignments" and allow to
contradict them. Not only when typing a name are plain stupid errors be made,
but also when relating them. A feedback mechanism is desirable, but I would
hope that data are contradictable on the integrator level to achieve immediate
results.
Gregor----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn at bba.de)
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Königin-Luise-Str. 19 Tel: +49-30-8304-2220
14195 Berlin, Germany Fax: +49-30-8304-2203
More information about the Tcs-lc
mailing list