[tcs-lc] Names as Objects

Mon Mar 7 05:58:41 PST 2005

(I've a horrible feeling that the tcs-lc mailing list is down, & that's 
why things have gone quiet ...)

anyway - can I throw in my two-pence worth here:

As someone who has actually implemented a very old version of 
the TCS as an output from IPNI, can I just say that it's quite hard to 
produce separate lists of publications, vouchers etc. and reference 
them from within the TaxonConcepts. So adding in a third set of 
references to names would be an additional burden.
The schema might look more modular as a design but it's a bugger 
to implement so on practical grounds I'd reject this suggestion...

Sally

> 
> Thanks, Roger --
> 
> > Rich basically proposes that ScientificName (or whatever the element is
> > called) should be a top level structure along with taxon concepts,
> > vouchers and publications. I'll call this the 'modular' approach.
> 
> Just to be clear, I tried to emphasize that I'm *not* (necessarily)
> proposing this as an actual alternative for TCS (yet).  I just wanted to use
> it as a discussion tool for teasing apart what the advantages &
> disadvantages would be of treating names as separate, stand-alone objects.
> The top-level "TaxonName" element essentially forces it to be treated as a
> separate object, and allows us to examine the "what if" consequences of
> doing so.  I do like the term "Modular" (as inspired by James) to label this
> approach.
> 
> > Most of the arguments have been put forward and all parties seem to
> > agree that either method would 'work'. There is no right and wrong here
> > we are just trying to pick the better of two options.
> 
> That's my contention, but I'm not sure I'm right about this, and I'm not
> sure everyone agrees yet.  What I think we *do* all agree on is that the
> final TCS should attempt to meet the largest/bradest set of needs, while
> also maintaining some optimal level of structural "elegance". (I use that
> word "elegance" as a catch-all descriptor that implies low processing
> overhead requirements, inherently enforced data integrity, general
> simplicity of design, convenience of modularity, and a number of other
> qualities that database programmers generally aspire to.)
> 
> > One way to look at this kind of situation is to do a 'regret analysis'.
> > If we were all chatting in 10 years time what would we regret about
> > choosing the modular over the embedded approach or visa versa.  We are
> > trying to guess which of the options we will cause us fewer headaches in
> > the future.
> 
> AGREED!!!!  That's a great way to look at it.  However, it does need to be
> tempered a bit by the fact that we need a working draft ASAP.
> 
> > Currently my money is on the embedded approach causing fewer problems. I
> > imagine some one who hasn't been involved in the discussions here (and
> > probably isn't even a taxonomist) implementing a system to publish
> > checklists from surveys and they look at the schema and think
> > "ScientificNames that is what I've got!
> > I'll just map the names in the database to those elements in the schema".
> 
> Well...maybe, but this is balanced by the same fellow who might encounter
> the embedded approach and say "TaxonConcepts -- what's that?  This schema is
> of no use to me."
> 
> Also, as I tried to hint at (but haven't yet thought through), there might
> be design "elegance" in defining a strict 1:1 correspondence between a
> "ScientificName" object, and a "Nominal" Concept object, in which case we
> could consider identical GUID values for both.  In that circumstance, your
> hypothetical naive user would be doing no harm by plugging in directly to
> names, because that would simultaneously plug into the corresponding Nominal
> Concept (which is exactly what we want to do if they have name-only data).
> 
> Now, having just proposed that idea (matching GUIDs between Nominal concepts
> and name objects), the database dude in me prefers to embed Names within
> Concepts in the schema -- but in a way that compartmentalizes (i.e., keeps
> modular) the name-relevant data as separate from the concept data.  After
> this "TaxonNames as top-level elements" thought experiment runs its course,
> I'll shift gears into advocating an "all nomenclatural information embedded
> within a Nominal Concept" arguement.  For now, though, I don't want to
> clutter the discussion any more than I have already cluttered it.
> 
> I do have one question for the XML-gurus:  How do you represnet a "Subtype"
> in XML?  By "Subtype", I mean an unambiguously defined specific subset of a
> larger set of more generalized records.  I.e., "Person" and "Organization"
> are each subtypes of "Agent".
> 
> Stated another way, if TaxonConcepts can be one of, say, six different
> types -- how do you represent a set of elements in XML that says "these
> elements only apply to TaxonConcept instances of Type 1, but not to
> instances of Types 2-6"?
> 
> > Months, maybe years, later
> > some one else realizes that the data being published by this
> > organization is useless because it is a list of names not taxa and has
> > to go and work out how to get it corrected and correct any decision that
> > have been taken on the basis of the data. I imagine this happening quite
> > a lot.
> 
> BUT!!! If there is an unambiguous connection between each name object and
> its corresponding Nominal concept (as there absolutetly must be -- and
> shared GUIDs would be only one way of achieving this), then the task of
> converting data linked directly to names over to links to Nominal concepts
> would be extremely trivial.
> 
> > The down side of the embedded approach is that it is slightly less
> > convenient for taxonomists.
> 
> I'm not sure this is a downside, because the vast, VAST majority of
> taxonomists will be accesing the data via some UI that hides the structural
> complexity of the data.
> 
> The downsides I see have to do with structural elegance -- specifically,
> mixing what I see as apples and oranges (relationships between names, vs.
> relationships between concepts) in one place as though them meant more or
> less the same thing.  Clearly, the designers of TCS as it currently exists
> appreciate the value of structurally separating "similar" sorts of data into
> different structures, as indicated by the separation of "Relationships"
> (within TaxonConcepts) from "RelationshipAssrtions".  Both structures do the
> same thing (establish relationships between a pair of concepts in the
> context of an "AccordingTo") -- but they exist in different parts of of the
> schema because there is a structural elegance in unambiguously separating
> those relationships that form part of the *definition* of a concept, from
> those elements that represent secondary *interpretations* of concept
> relationships.
> 
> My basic point is that name-object data elements (and intra-name
> relationships) are sufficiently different from concept-object (and
> inrea-concept relationships) that they warrant compartmentalization
> (modularization) in the data schema.  Exactly how that modularization is
> optimally achieved is another topic of discussion.
> 
> > If you are publishing 15 different concepts
> > that use the same name there will be a lot of redundancy but this
> > redundancy is only in the instance of a document that may appear briefly
> >   - not in a database. In 10 years time I would hope this will go away
> > as ALL published names will be cataloged and have GUIDs - anyway one
> > would hope so.
> 
> I would certainly hope so, but I'm not sure I get what you mean in the
> paragraph above.
> 
> > Rich proposes that there could be a brief summary of a name in the
> > TaxonConcept element as well as a pointer the to full scientific name.
> 
> ...as currently exists with "NameSimple" and "NameDetailed".  The only other
> element to consider is "NameVerbatim", which is neccesary if you are going
> to decouple "literal string of characters as appears in the concept
> definition" from "name".  I get the sense from Jessie's recent posts that
> TCS assumes that "unique string of characters" *defines* "new name".  If
> that's true, and if the TCS schema is designed around that premise, then it
> is of limited use to nomenclators, and thus encourages the nomenclators to
> abandon TCS as a mechanism for exchanging name (sensu nomenclaturalist)
> data.  I can't imagine a scenario where anyone benefits from such a
> separation.
> 
> > This leads us into the several-ways-to-achieve-the-same-thing situation
> > which is hell for a programmer. Which do we display to the user? Which
> > do we use to make judgments about whether two concepts from different
> > data sources are talking about the same Name (my hobby horse)?
> 
> I certainly understand and agree with the first sentence, but I don't quite
> understand why the two questions relate to the discussion at hand.  I mean,
> of course they "relate" -- but no matter which approach (modular vs.
> embedded) we end up with, those questions will still need to be answered.  I
> can't see any intrinsic reason why one approach or the other necessarily
> makes those questions easier to answer.
> 
> > Basically my point is that taxonomists can handle the concept of a NULL
> > or nominal concept that just contains name data much more easily than
> > non-taxonomists can grasp the subtle difference between taxon concepts
> > and the names we use for them. It is, after all, our job to think about
> > these things but we need to produce a schema that is used by people who
> > aren't us.
> 
> I would agree that a schema design that attaches/embeds name data into
> Nominal concepts such that there is an unambiguous 1:1 match between a
> "name" (sensu nomenclaturalist) and "Nominal" concept (sensu TCS) would
> probably be acceptable to the nomenclatural users.  However, I am not
> convinced that a schema that embeds the name info in a pseudo-concept
> instance is necessarily more comprehensable to a non-taxonomist than one
> that modularizes name data as distinct from concept data.  BOTH approaches
> (at the XML schema level) would be difficult to grasp by a non-taxonomist
> (hell, I'm a taxonomist who specializes in electronic data management and
> I'm *still* not sure I understand as much about TCS as I need to).
> 
> The point here is that the data will have to be rendered from an XML schema
> into a screen-load of information via some sort of UI; and as long as the UI
> programmers understand the schema, the difference between the two structural
> approaches is really irrelevant (provided they both contain the same
> complement of information).
> 
> So...I don't accept that the naive user is relevant in this discussion about
> the schema structure.  What is relevant are questions about package size,
> informational flexibility, and processing performance.  These are the things
> that affect how broad the user base is that finds the exchange schema
> "useful" to their particular needs.
> 
> > So currently I am in the embedded camp. I could defect at any moment but
> > I am looking for a good reason to. The arguments are closely related to
> > another thread that I am just about to start. "Are we passing the
> > product of taxonomic research or raw taxonomic data?"
> 
> > Can anyone give a scenario of regretting going with embedded approach. I
> > am sure some one can!
> 
> At one level, there is the hypothetical regret that the people who manage
> taxonomic names data did not find the embedded approach workable (in a
> practical sense -- not in a technical sense) to serve their data needs, and
> therefore developed their own separate name-based schema.  I know that *I*
> would regret this.
> 
> There is also the regret of adopting a "system of convenience" in a world
> that preceeded universal taxon name registration, which the post
> taxon-name-registration world got stuck with as a legacy mechanism of data
> exchange.
> 
> I would also deeply regret the adoption international standard that ws
> generated without a full mutual understanding of the issues.  I know that I,
> for one, do not fully understand all the issues yet; and if I had reason to
> believe that I was the only one in this situation, I would certainly not be
> spending so much time in articulating the stuff that I do understand (or at
> least *think* I understand).
> 
> As I wrote this morning in an off-list email:
> 
> There are several VERY complex issues that all have to be considered
> simultaneously: Nomenclatural rules & practice (separate for Botany &
> Zoology), Taxon Concept Circumscriptions, general information structure and
> management theory, and specific computer technologies (like XML).  Any one
> of these has a very steep learning curve; I seriously doubt that anyone on
> the CC list of this conversation has a mastery of ALL of these things (e.g.,
> I'm very weak on botanical nomenclature rules & practice and on XML, and
> have varying degrees of comprehension of the others).
> 
> So...I would regret it if a standard was adopted that did not satisfy the
> respective experts in all of these complex disciplines.
> 
> Aloha,
> Rich
> 
> 
> _______________________________________________
> tcs-lc mailing list
> tcs-lc at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/tcs-lc

*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe at rbgkew.org.uk