[tcs-lc] Modularisation of standards - identification of names

Tue Mar 8 07:26:37 PST 2005

Sally wrote:
>
>On the second part of Donald's email:
>
>> (Now that I think of it) I guess there is one other possible 
>reason why we
>> may wish to be able to separate names out.  This is a data 
>processing issue
>> and Bob Morris may just tell me that my problem comes from assuming a
>> particular implementation, but here goes...
>> 
>> Assume a document in which two concepts refer to the same 
>published name
>> (using an abbreviated representation of TCS data):
>> 
>> <TaxonConcepts>
>>   <TaxonConcept id="tc1">
>>     <Name>
>>       <Label>Aus bus</Label>
>>       <CanonicalAuthorship>Black, 1965</CanonicalAuthorship>
>>     </Name>
>>     <AccordingTo>Smith</AccordingTo>
>>   </TaxonConcept>
>>   <TaxonConcept id="tc2">
>>     <Name>
>>       <Label>Aus bus</Label>
>>       <CanonicalAuthorship>Black, 1965</CanonicalAuthorship>
>>     </Name>
>>     <AccordingTo>Jones</AccordingTo>
>>   </TaxonConcept>
>> </TaxonConcepts>
>> 
>> The nomenclatural data here under name is rather simple and 
>there may be
>> little problem with denormalising it.  A DiGIR-style search 
>could allow a
>> user to find all TaxonConcepts based on "Aus bus Black, 1965".
>> 
>> However will it ever matter to an application processing 
>such a document
>> that the two <Name> elements are the same?  Do we need a 
>better way to
>> indicate this than simply relying on the byte-identity of 
>the XML content?
>
>One use case that springs to mind is the separation of homonyms, 
>particularly where it comes to homonym genera.
>In the canonical names part of the Linnean Core we included (I'm 
>not sure if it's disappeared in the latest version, but I 
>don't think so) 
>scope for a reference attribute in the separate atoms of the names. 
>So in a binomial, the <genus> object could have a reference (id) 
>that would allow the output to unambiguously identify _which_ Aus 
>we had in mind when we said Aus bus. 

If you modelled names as concepts then there would not be any ambiguity as to which Aus you meant as each Aus woudl be a concpet and would have it's own ACCordingTo which I think is what Sally is saying they needed to put into LC??
>
>when passing information about uninomials, there is a lot more 
>scope for ambiguity between byte identical XML content (or 
>'homonyms' as I old-fashionedly like to call them)
>
>There's a third way (sorry to introduce a note of domestic UK 
>politics, but Rich and Nico started it) which is to take the LC 
>approach and embed both identifiers and data:
>
> <TaxonConcepts>
>   <TaxonConcept id="tc1">
>     <Name id="123-1">
>       <Label>Aus bus</Label>
>       <CanonicalAuthorship>Black, 1965</CanonicalAuthorship>
>     </Name>
>     <AccordingTo>Smith</AccordingTo>
>   </TaxonConcept>
>   <TaxonConcept id="tc2">
>    <Name id="123-1">
>       <Label>Aus bus</Label>
>       <CanonicalAuthorship>Black, 1965</CanonicalAuthorship>
>     </Name>
>     <AccordingTo>Jones</AccordingTo>
>   </TaxonConcept>
> </TaxonConcepts>
>
I agree with Sally that we would probably want ot have the GUID and what we've talked about - the primary key, in this case the name plus the according to. My only different take on htis would be htat you could if you really wanted have the Nameid = a taxonconceptGUID and have the label as Sally has. Maybe this would give the reuse that people seem to want but oif course all we would be saying is the the id in Name would be meaning the name element of the Taxonconcept referred to by the GUID.

>On a slightly related subject, Gregor and I did some thinking and 
>discussing on ids - it's on the LC wiki somewhere - trying to come 
>up with a structure that would allow ids to have different scopes: 
>either transient ones (1, 2, 3 ...) that were created for the 
>life of a 
>document and only made sense within that context, or ones which 
>referred to ids that were unique and immutable in the context of a 
>dataset (e.g . IPNI ids) all the way up to full blown LSIDs. Looking 
>at that might help futureproof any schema we do come up with so 
>that if LSIDs or whatever do take off we're able to deal with them
>
do we need to model these separately or have some inbuilt meachanism (in the software)of determining what we've been passed?

Jessie
This message is intended for the addressee(s) only and should not be read, copied or disclosed to anyone else outwith the University without the permission of the sender.
It is your responsibility to ensure that this message and any attachments are scanned for viruses or other defects. Napier University does not accept liability for any loss
or damage which may result from this email or any attachment, or for errors or omissions arising after it was sent. Email is not a secure medium. Email entering the 
University's system is subject to routine monitoring and filtering by the University.