[tcs-lc] nameObjects, spellings, vernaculars, etc

Nozomi Ytow nozomi at biol.tsukuba.ac.jp
Fri Apr 29 21:04:01 PDT 2005


Rich,

> > I see four primary forms being important. (1) The raw
> > character string, what ever it is, including vernaculars,
> I've used the terms "VerbatimNameString" to refer to this, but for LC, James
> proposed "Name-literal" (I think this is what he meant by that
> term).

I'm not confident.  If 'raw' implies a XML document representationt of
character string appeared in database, then it is name-string.
A TCS docuemnt (XML document, I mean, not documentation on TCS) is
encoded in UTF-8, and hence there are multiple code points can
represent the same caharacter, e.g. hybrid symbol or so-known
half- and full-width Katakana.  They have different code value,
and hence different UTF-8 string, but they represents the
same name-literal.  This level of variants can be resolved by
Unicode nomalisation.

However, in the Bob's context, 'raw' seems implying each name-usage 
in my terminology.  I suspect Bob's category #4 is what to be covered
by nameObject, so its 'value' could be a name-literal, while #1-#3 are 
name-usages/TaxonConcept so their 'value' could be
name-string. Categories #2 and #3 could be potential taxon names also.

Cheers,
JMS
--
Dr. Nozomi "James" Ytow
Institute of Biological Sciences / Gene research center
University of Tsukuba
Tsukuba, Ibaraki 305-8572
Japan


More information about the Tcs-lc mailing list