[tcs-lc] nameObjects, spellings, vernaculars, etc

Roger Hyam roger at hyam.net
Fri May 6 07:29:53 PDT 2005


The 'Simple' elements are for holding summaries of what is in the 
Canonical elements. This is so that data can be exchanged between 
machines that can't handle the Canonical form. This is a straightforward 
practicality to make the schema as widely applicable as possible. (They 
may have had other reasons for existence at birth but that is what they 
have turned into)

If you want to represent an alternative spelling of a name use a 
TaxonConcept and/or a NameObject and relationships to express it as a 
misspelling of the name. This avoids any new constructs. I posted a 
instance document example of this and there is an issue about whether we 
introduce new relationship types in the next version to make this clearer.

Verbatim is a slippery slope to providing alternative versions of every 
field we have which would be far more complex than just using 
NameObjects when we need to.

Just for the record a NameObject is a data structure for holding pieces 
of data that resemble scientific biological names governed by one of the 
codes ICBN, ICZN, ICNCP etc. It is not just for 
validly/effectively/accepted/nice names you can store anything you like 
in it that appears like it might be code compliant. You can then make up 
TaxonConcepts that use it and synonymise these concepts into the ones 
you like.  It really is pretty simple.

Been working on ICNCP coverage yesterday/today and they make the 
zoological code actually look quite sensible! Just kidding. Hope to have 
coverage in the next version with only very minor changes.

Hope this is all make sense,

Roger


Sally Hinchcliffe wrote:

>Rich
>
>I think we agree with pretty much everything (hurray!). It does seem 
>to me that BOTH name-string-things (canonical name and verbatim name) 
>belong in the LC part of the schema
>The only thing I would ask is whether, with the LC objects pulled out 
>of the body of the TCS element, the TCS still needs a placeholder 
>string with the _canonical_ name just for readability.
>
>Rich - yes, I mean scientific-name-as-spelled-by-the-author not 
>author-name-as-spelled-by-the-author (the sooner you zoologists get 
>the equivalent of the botanists' standard abbreviations the better!)
>
>Paul - as far as I remember canonical name means the scientific name 
>as correctly rendered according to the code: so no mis-spellings of 
>the genus, mismatched genders, quadrinomials or other extra ranks
>
>Sally
>  
>
>>>If the NameSimple element of TCS IS intended for verbatim spelling
>>>then it needs another name, for clarity.
>>>      
>>>
>>The history of "NameSimple" in TCS preceeds LC/Christchurch.  I'm not sure
>>of its original intend, but I suspect it was created without a lot of
>>thought to the distinction between "Code-correct" and "verbatim"
>>name-strings (no criticism intended -- to be honest, I hadn't thought much
>>about it either before we really started discussing LC).  In Christchurch, I
>>think the LC breakout groups sort of assumed it would be a canonical
>>concatenation....but that was before "Label" was introduced as a root
>>element in LC.  I've asked the question a couple of times (what is the
>>specific function of NameSimple), and I even re-christened it "NameVerbatim"
>>in the version of TCS/LC that I sent (it didn't seem to get much
>>traction...), to achieve exactly what you are suggesting.
>>
>>    
>>
>>>Otherwise it should be
>>>exactly the same as the Label element of LC and be in the canonical
>>>form, with another element there for the verbatim or as-published
>>>spelling.
>>>      
>>>
>>Why does there need to be two elements, with different names, in different
>>parts of the overall schema, that share exactly the same purpose?  One could
>>argue that it would serve the function of canonical name in cases where one
>>only has the name, and has not (yet) established a link with a full
>>canonical name object.  But I would counter that argument with the point
>>that, such cases imply that one has not identified the proper canonical
>>name, and as such, what else would one have, besides the verbatim name?
>>
>>That said, I would STRONGLY advocate re-naming to "NameVerbatim", or
>>"VerbatimSpelling", or something like that.
>>
>>    
>>
>>>A well designed schema should have element names which 'do
>>>exactly what it says on the tin'
>>>      
>>>
>>Actually, I think "NameSimple" leaves a lot of latitude for interpretation.
>>
>>    
>>
>>>- OK - I see your point. As it happens, IPNI will only be recording
>>>the first publication of a name, so the number of orthographic
>>>variants is limited to the original spelling of the author, plus any
>>>corrections (or mistakes) made by IPNI rendering that into canonical
>>>form. But other databases of course will record more than the first
>>>use of a name. In that case each publication instance can come with
>>>only one orthographic variant (unless the author has been
>>>inconsistent within the article or book).
>>>      
>>>
>>There are cases (more than just a rare few) where a single author will use
>>more than one spelling in the same publication.  Sometimes this is clearly a
>>lapsus or printer's error, ad can be safely ignored or mentioned in a
>>human-readable comment somewhere.  At the other extreme are cases where the
>>author used two different spellings where it's not so obviously a lapsus. I
>>can give examples, if you're interested.  But I don't think this is
>>something that needs to shape the structure of LC.  I would say it should
>>support only one verbatim spelling per AccordingTo publciation/NameObject
>>instance.
>>
>>Also, when you say "original spelling of the author" -- can I safely assume
>>you mean "original spelling of the scientific name by the original author",
>>and not "original spelling of the author's name" (e.g., "L." vs.
>>"Linnaeus")?
>>
>>    
>>
>>>To me it seems simple (I know you will correct me on this point) -
>>>each concept will have one publication instance and hence one
>>>orthographic rendering, which may be reproducibly correctable to one
>>>canonical form.
>>>      
>>>
>>No corrections!  This is exactly what I feel as well!
>>
>>    
>>
>>>Therefore the LC part of the schema needs to have a
>>>place where the (single) 'as published' name goes, plus a place
>>>(Label) where the canonical form goes.
>>>      
>>>
>>In LC, I assume you mean the "as published" name is the verbatim name as it
>>appeared in the original description/protologue?  If so, yes!
>>
>>    
>>
>>>I thought this was in the schema already.
>>>      
>>>
>>I thought that's what "OriginalOrthography" was for (an element I
>>wholeheartedly support, because this is a special-case "Verbatim" spelling,
>>separate from the concept instance).
>>
>>    
>>
>>>Multiple versions of the same name-object will be
>>>mapped onto each other by mapping concepts to concepts, because each
>>>version should have a publication-instance of some sort.
>>>      
>>>
>>Yes, but if Names are treated as stand-alone objects (as in v0.95.5), then
>>the multiple verbatim renderings of the same "name" will also be
>>cross-linked to each other by virtue of the fact that all of these concept
>>instances will point to the same "NameObject" (LC instance).  Thus, the
>>name-links would exist even without the concept-concept mappings.
>>
>>    
>>
>>>I think Rich and I are in agreement here ...
>>>      
>>>
>>As do I!! :-)
>>
>>    
>>
>>>as to what consitutes a name object, I leave that to the real
>>>taxonomists
>>>      
>>>
>>So far (as in my previous), it seems to be:
>>
>>Botany View:
>>"GenusOrMonomial Name-Unit [+ species Name-Unit [+ tertiary Name-Unit +
>>tertiary Name-Rank]]"
>>
>>Zoological view:
>>"Name-Unit"
>>
>>Just so everyone is clear, "Name-Unit" is not simply the string of
>>characters that form a single component of a scientific name.  Rather,
>>"Name-Unit" implies a well-defined "object", with multiple inherent
>>properties such as the creation event (=protologue), and many/most of the
>>elements in LC.  It's what I would call a "Protonym".
>>
>>Any other candidates to define a "NameObject"???
>>
>>Aloha,
>>Rich
>>
>>
>>_______________________________________________
>>Tcs-lc mailing list
>>Tcs-lc at ecoinformatics.org
>>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/tcs-lc
>>    
>>
>
>*** Sally Hinchcliffe
>*** Computer section, Royal Botanic Gardens, Kew
>*** tel: +44 (0)20 8332 5708
>*** S.Hinchcliffe at rbgkew.org.uk
>
>_______________________________________________
>Tcs-lc mailing list
>Tcs-lc at ecoinformatics.org
>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/tcs-lc
>
>  
>

-- 

==============================================
 Roger Hyam
----------------------------------------------
 Biodiversity Informatics
 Independent Web Development 
----------------------------------------------
 http://www.hyam.net  roger at hyam.net
----------------------------------------------
 2 Janefield Rise, Lauder, TD2 6SP, UK.
 T: +44 (0)1578 722782 M: +44 (0)7890 341847
==============================================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/tcs-lc/attachments/20050506/6a18245f/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: roger.vcf
Type: text/x-vcard
Size: 275 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/tcs-lc/attachments/20050506/6a18245f/roger-0001.vcf


More information about the Tcs-lc mailing list