[Tcs-lc] Human Readable - the thread formally known as 'Name of NomenCode'

Tue Mar 29 18:12:14 PST 2005

Hi Roger,

My thoughts:

1) I think "is child of" and "is parent of" should be banished from the
enumerated list of RelationshipTypes.  I cannot think of a single case where
this ambiguous RelationshipType cannot be better represented by a more
explicit RelationshipType; for example:

Name-name relationships:
 - is type species of/typifies
 - is type genus of/typifies
 - is first combination within/is first combined with
 - is original genus of/originally placed in genus
 - etc.

Concept-Concept relationships
 - is included in/includes

What else would "is parent of/is child of" mean other than one of these
things?  What value is it to allow an ambiguous "is parent of/is child of",
rather than a more explicit RelationshipType?

As for the reciprocal relationships, these apply to the
TaxonConcept/Relationships only for references to other concepts within the
same AccordingTo.  TaxonConcept/Relationships that point to earlier-defined
concepts (i.e., to TaxonConcepts attached to a different AccordingTo) cannot
be reciprocal. For the intra-AccordingTo Relationships, my feeling is to
include them both as "best practices".  Same applies to
RelationshipAssertions, where inter-AccordingTo Relationships will be
reciprocal.

As for infrageneric ranks (subgenus, section, etc.), I think we agreed these
are not part of a "Name" object, except when the infregeneric name unit is
the terminal unit.  In other words, a "Name" object would be limited to one
genus-rank basionym/protonym reference, plus (optionally) one species-rank
basionym/protonym *OR* one infrageneric-rank basionym/protonym reference,
plus (optionally) one infraspecific-rank basionym/protonym reference.  All
names at ranks above genus would be treated as monomials, with one and only
one basionym/protonym reference.

Thus, there would be no name-name relationships involving infrageneric
(supraspecific) relationships, *except* when the terminal name itself is an
infrageneric name (in which case there would be only an additional
genus-rank basionym/protonym reference). That's how LC is currently
structured, anyway.

I just read Sally's & Paul's responses to your on this post, and it seems we
are all saying basically the same thing (but coming at it from different
directions).

Thus, full construction of the name including infrageneric components would
require iterating through "is included in/includes" concept-concept
relationships, and inserting the infrageneric names that way.

Rich

P.S. Except for my distaste for "is parent of/is child of", I agree with
everything Bob Peet said in his response to you.

-----Original Message-----
From: tcs-lc-bounces at ecoinformatics.org
[mailto:tcs-lc-bounces at ecoinformatics.org]On Behalf Of Roger Hyam
Sent: Tuesday, March 29, 2005 2:29 AM
To: S.Hinchcliffe at kew.org
Cc: tcs-lc at ecoinformatics.org
Subject: Re: [Tcs-lc] Human Readable - the thread formally known as 'Name of
NomenCode'

I am working with Jessie and Robert on the schema at the moment and hope to
get fancier pointers in there. They make sense to me.  I hope we can have
something to throw open for discussion in the next couple of days.

I am just trying to generate an instance document for a couple of species by
hand at the moment and I am getting very confused.

What is the relationship type between a species and it's genus? Is it
'included in' or 'child of' and would the user agent expect the reciprocal
relationships to be marked up i.e. 'includes' and 'is parent of'?
Would all software agents:

expect only up pointing links.
expect only down pointing links.
expect both to be present for the link to be valid.

allow the use of a mixture of the two, relationship sometime shown with
'includes' and sometimes with 'included in'.
I define relationships between a species concept and it's genus concept but
what about the subgenus and sectional stuff?
I could:

mark the species as belonging to the section and section to subgenus and
subgenus to genus and not specify any other relationships
 join the species to section, subgenus and genus as well as joining up the
section to subgenus and genus and subgenus to genus. i.e. all the includes
relationships.
do a mixture of the two. Species always in genus but other things not.

Confused? I am and I am just doing this manually! I find the thought of
writing software to consume this more scary than handling the links to the
publications and specimens. Even if we define how these relationships should
be encoded there is no way for the schema to validate it so we will have to
write checking code and try and handle graceful degradation etc. I think we
need to nail this thing down a bit - there should only really be one way of
encoding a basic taxonomic hierarchy and that should be enforced by the
schema I think. What do you all think?

Has anyone generated instances of recent versions of the TCS (0.9+) using
real data? If so could they send me one.

Roger

Sally Hinchcliffe wrote:
Roger wrote:

Yes I agree that it should be easy to implement but there also needs to
be a gradient of implementations. It should be easy to do simple things
but it should be possible to do more complex things with a bit more effort.

I think we are resigned to having some level of normalization in the
schema but I imagine this will rarely be used in a single document
instance. I am thinking that if we could have 'fancy pointers' of some
kind then the work for this reduces greatly. If the pointer in the
schema can contain a string summary of the publication, for example, as
well as a reference to that publication then in a great many
implementations the details of the publication can simply be retrieved
with a second call if they are required.

 - yes with fancy pointers (I'm sure Gregor had a more technical
sounding phrase for these but I quite like it!), everything gets a
lot simpler all round. And I think this covers Gregor's point re SDD
as well. It also solves my problem that if the user asks for taxon 1&
2 they also get taxon 3 & 4 just to make the links come out.

But can we do it with TCS/LC as it stands now? From my understanding
the referred to objects (publication references, other taxon
concepts) have to be included within the instance document & can't be
available via fancy pointers from somewhere else.
Or have I misread the schema?

Sally

Sally Hinchcliffe wrote:

Roger wrote:

I am all for readability and it is something I am just sitting down

to

look at in the TCS today. This is partially inspired by trying to

put

together some instance documents over the weekend. This matter does

not

only include field names but also general structure.

How important do people consider it is to be able to read/hand

craft TCS

instances - at least simple one?

I vote for readable, but not necessarily hand-writable. Another thing
to look out for is making sure it's easy for programs to generate the
stuff

readability enhances acceptance - if the instance documents look
readable then people are more likely to use them, and they will feel
confident that they will be able to troubleshoot any problems.

For writability, the main impact is on the wrappers producing the
XML. When we did this in IPNI, producing the data via templates, the
problem was keeping track of references within the document - for
instance references to publications. The way a website like IPNI
serves data up is as a stream of names with a header at the top and a
footer at the bottom. It's easiest if each name and its associated
data can be totally self contained with no need to keep track of a
second set of data that's being referred to internally within the
document. It's not impossible (we did handle references to
publications in the TCS data we served) but the more internal
references there are to keep track of the harder it is. Unfortunately
recent discussion seems to be sending us down the internal reference
root more and more.
So from a generator's point of view this is easy (and I think also
more human readable):

start stuff - headers etc.
- taxonname 1
 - interesting facts about taxonname 1
 - publication information about taxonname 1
 - other names related to taxonname 1
- taxonname 2
- interesting facts about taxonname 2
- publication information about taxonname 2
- other names related to taxonname 2
end stuff

whereas this is hard (but not impossible):

start stuff
- taxonname 1
  - interesting facts about taxonname 1
  - taxonname 1 published in reference 1
  - taxonname 1 related to taxonname 3
- taxonname 2
  - interesting facts about taxonname 2
  - taxonname 2 published in reference 2
  - taxonname 2 related to taxonname 4
- taxonname 3
  - interesting facts about taxonname 3
  - taxonname 3 published in reference 3
- taxonname 4
  - interesting facts about taxonname 4
  - taxonname 4 published in reference 4
- reference 1
 - details for reference 1
- reference 2
 - details for reference 2
- reference 3
 - details for reference 3
- reference 4
 - details for reference 4
end stuff

Of course it may be we're not generating the data in the most
efficient way ...

ps my vote would be for NomenclaturalCode. Does exactly what it says
on the tin...
Sally

*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe at rbgkew.org.uk

_______________________________________________
Tcs-lc mailing list
Tcs-lc at ecoinformatics.org
http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/tcs-lc

--

==============================================
 Roger Hyam
----------------------------------------------
 Biodiversity Informatics
 Independent Web Development
----------------------------------------------
 http://www.hyam.net  roger at hyam.net
----------------------------------------------
 2 Janefield Rise, Lauder, TD2 6SP, UK.
 T: +44 (0)1578 722782 M: +44 (0)7890 341847
==============================================

*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe at rbgkew.org.uk

_______________________________________________
Tcs-lc mailing list
Tcs-lc at ecoinformatics.org
http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/tcs-lc

--

==============================================
 Roger Hyam
----------------------------------------------
 Biodiversity Informatics
 Independent Web Development
----------------------------------------------
 http://www.hyam.net  roger at hyam.net
----------------------------------------------
 2 Janefield Rise, Lauder, TD2 6SP, UK.
 T: +44 (0)1578 722782 M: +44 (0)7890 341847
==============================================