[tcs-lc] Nominal Concepts as bearers of Nomenclatural information

Richard Pyle deepreef at bishopmuseum.org
Wed Mar 9 03:27:49 PST 2005


O.K., time to put my schema where my mouth is (err...fingers are...).

I have attached four files in two sets:

Set 1:
  v090b-RLP1.xsd
  lc015_napier1-RLP1.xsd

Set 2:
  v090b-RLP2.xsd
  lc015_napier1-RLP2.xsd

In both sets, I have made no changes to the v090b file outside of the
"TaxonConcept".  Most of the changes inside the TaxonConcept are within the
"Name" element.  The only changes I have made outside the "Name" element
apply to both sets, and are:

1. Moved "Rank" to within "Name" element (this is because, as I understand
it, Rank only applies to scientific names). Also, used the LC version of
this element.

2. Deleted "Kingdom" (I think everyone agrees that this is more represented
as "NomenclaturalCode", which falls under "Name".

3. Changed the Annotations of "AccordingTo", "Relationships",
"SpecimenCircumscription", and "CharacterCircumscription" (but made no other
changes to these).

In both sets, I have made various changes to the lc015_napier1 file within
ScientificName and CanonicalName.  I did not scrutinize any of the other
stuff in this file.

In general, among the bits that I did edit, I removed a lot of "for
discussion" elements (inherited from LC), or elements that I felt were of
secondary importance to a first-generation schema.  If any of the LC folks
feel that something I removed was of fundamental importance to the
first-generation Names schema, please feel free to say so -- I was mainly
trying to avoid getting bogged down in the details, rather than focusing on
the fundamental questions.

Before I go any further, let me preface by saying that I am most decidedly
NOT an expert in XML schema design.  If I have committed some faux pas in
how I designed this, please be merciful and make the necessary
correction(s).  Many of my decisions about whether or not an element should
be optional were not made with a lot of deep insight, so consider those
highly provisional (that is, more highly provisional than the rest of it).
Also, I didn't spent a lot of time mulling over the super-detailed stuff --
a lot of that was carried over directly from LC.  It would all need to be
scrutinized and brought into consistency with the rest of TCS.  But let's
not focus on those details just yet -- let's focus mainly on whether the
general idea of "Nominal Concept instances as bearers of Nomenclatural
Information" is worth putting more thought into, or if it should be scrapped
right at the outset, before we waste any more time on it.

Finally, before I go into detail about the two different sets, let me
describe the major structural changes within the "Name" element that are
shared by both sets.

Overriding Philosophy:
The "Nominal concept" will be re-defined as a "Name Object".  I think that
the smartest way to define it is, "The Code-corrected representation of a
complete scientific name, which may consist of as few as one, and as many as
three, nomenclatural units". That is, excluding orthographic and other
spelling variants, one name "object" equals one unique set of from one to
three name parts. Subgenera and other infrageneric names would only be
considered as one of the parts if the name itself is of rank subgenus.
Hence, the following would each be considered as a distinct "name object":

#	Name						Rank
--------------------------------------------------
1.	Anthias					Genus
2.	Pseudanthias				Genus
3.	Anthias (Pseudanthias)			Subgenus
4.	Anthias ventralis				Species
5.	Pseudanthias ventralis			Species
6.	Anthias hawaiiensis			Species
7.	Pseudanthias hawaiiensis		Species
8. 	Anthias ventralis hawaiiensis		any infraspecific rank
9.	Pseudanthias ventralis hawaiiensis	any infraspecific rank

These would be considered Orthographic or other variants, and would appear
in the "NameVerbatim" element of the TaxonConcept instance that used them.
There would be *NO* separate Nominal Concept instances for any of these:

Anthis [misspelling of #1]
Anthias (Pseudanthias) ventralis [points to name object #4]
Anthias ventralis var. hawaiiensis [points to name object #9]
Anthias ventralis f. hawaiiensis [Rank=forma; points to name object #9]
Any gender mis-match or improper latin usage

In keeping with the current definition of "Nominal Concept", these instances
would have no "AccordingTo", and would never be interpreted as defining a
concept circumscription beyond the primary type specimen.

Specific Elements and Structure:

- I am assuming from what I have read in the documentation that the "type"
attribute of the "Name" element is really just a flag for "Scientific" vs.
"Vernacular".  There was discussion somewhere of making this a Boolean
distinction, but I'm not sure where that went.  The point is, I represented
the first-tier content of "Name" as a Choice between Vernacular and
Scientific.  If I am misguided in this, please just ignore this part of the
schema, and pretend it doesn't exist -- it's not relevant to the major point
I'm trying to get across.

- I placed "Rank" within the ScientificName element, because I'm pretty sure
that it has no real context outside of scientific names (at least not the
enumerated version).  Also, the more I think about it, the more I realize
that "Rank" is not a property of a concept -- it is a property of a name.  A
concept circumscription of a specific set of individual organisms is not
affected by the rank of name that is applied to that concept
circumscription. Again, if I'm wrong about this, don't get hung up on it --
for it is also not the main point.

- Here's where it starts to get interesting:  I have implemented a Choice
within ScientificName that is determined by whether the "type" value of the
TaxonConcept instance is "Nominal", or "not Nominal".  Bob Morris just
posted a link to his work in illustrating how this Choice could be enforced
by the schema itself.  However, as I said, a lot of that is over my head; so
for now I am just representing it as a structurally disconnected choice, and
am relying on data providers to understand and implement the decision of
which set of elements to include under ScientificName, depending the value
of the TaxonConcept "type". I also don't know if I need a pair of elements
for "NominalConceptName" and "NonNominalConceptName" between the Choice and
the pair of Sequences -- but I was trying not to clutter the diagram with
more new elements, and XMLSpy let me do it, so....

- If TaxonConcept "type" is any (non-null?) value other than "Nominal", two
elements are available under ScientificName.  A required "NameVerbatim"
stores the string of text used in the "AccordingTo" publication of the
TaxonConcept instance to represent the name.  In my mind, this is a verbatim
string that would include authors and abbreviations and such (if included).
Basically, it is the way of capturing the actual text representation of a
scientific name as used in the publication.  The other element for
non-Nominal TaxonConcept instances is "NominalConcept".  This is simply a
reference to another TaxonConcept that is of Type Nominal.  This is where a
pointer is made to a "Name object" as contained within a TaxonConcept
instance of type "Nominal".  I'm assuming that this is only provided when it
is essentially certain which name object the AccordingTo author intended
when displaying the NameVerbatim text in the publication.  It is an optional
reference, because there may be times when it's not clear what name object
is intended, or maybe no details about the name object are available beyond
the simple string represented in NameVerbatim.  If needed, additional
elements could be added under NominalConcept to detail who/when made the
interpretation that this is the name object that was intended by the
AccordingTo author of the TaxonConcept, and at what confidence level.
Personally, I don't think this level of detail is necessary.

- If Concept "type" is "Nominal" then the other set of elements are used.
The first is a required "NameSimple" element, which I have redefined as
basically "Code-corrected name without authorships".  The second is
"NameComplex", which I created to represent essentially what the top-level
"Label" element represented in LC 0.1.5 (i.e., a code-corrected name with
authorship elements). Then there is the optional "NameDetailed", which is
essentially LC.  I will describe the internal NameDetailed more below in the
context of each particular set of files.  Also, for Nominal concepts, the
"AccordingTo" and "CharacterCircumscription" elements of the TaxonConcept
are never used.

Differences between Set 1 and Set 2:

Set 1 is basically intended to make use of the existing "Relationships" and
"SpecimenCircumscription" elements within the existing TaxonConcept, with
imposed business rules that are not easily enforced via the XML schema
itself.  In this set, when TaxonConcept type="Nominal", all Name-Name
relationship types are available within the "Relationships" element (but no
concept-concept relationships may be used) and the ToTaxonConcept reference
*must* point to another Nominal Concept.  Conversely, when TaxonConcept is
of any type other than "Nominal", only the concept-concept relationship
types are available (but none of the name-name relationships). One way to
enforce this more effectively is to establish two different "Relationships"
containers:  "NameRelationships" and "ConceptRelationships" -- each with a
different enumerated set of available relationships; the former used only
for "Nominal" concept types, and the latter used only for non-"Nominal"
concept types.  That's a discussion for the XML experts.  Also in this set,
it is assumed that "SpecimenCircumscriptions" may *only* point to a primary
type specimen, or a Syntype series of specimens -- but never any specimens
that are not considered part of the primary type series.  Again, this is a
business rule that would rely on the data providers to self-impose. Finally,
set 1 also has fewer elements in the "NameDetailed" part of the schema.

Set 2 assumes that neither "Relationships" nor "SpecimenCircumscriptions"
within TaxonConcept is ever used for concepts of type="Nominal" (just as
"AccordingTo" is never used for Nominal Concepts).  Instead, all name-name
relationships are embedded within the NameDetailed part of the schema -- as
they are with LC.  There are really only two versions of this set.  The
simple version (not represented in the attached files) would simply create
"NameRelationships" and "PrimaryTypeSpecimen" elements within NameDetailed
container.  These two elements would have the same basic structure, and
serve the same basic function, as "Relationships" and
"SpecimenCircumscriptions" would in Set 1, except by breaking them out as
separate elements, different enumerations could be applied to each
respective pair of elements (i.e., only concept-concept relationship types
and any specimens within the TaxonConcept elements, and only name-name
relationship types and only primary type specimens within the NameDetailed
elements).  The less simple version of this set (represented in the attached
files) treats "PrimaryTypeSpecimen" as just described, but breaks out the
various nomenclatural relationships as explicit elements, rather than within
a generic "NameRelationships" container. In this case, only two such
Name-Name relationships are shown (NominalType and Protonym), but more would
be established for other kinds of name-name relationships (e.g.,
IsReplacementFor).

In both cases, there may need to be more elements within the NameDetailed
section, as determined by the LC group.  But whatever those elements are,
they should never have any direct bearing on concept circumscriptions, and
should *only* apply to truly nomenclatural information.

Comments on NameDetailed elements:

Most of the comments apply to both Set 1 & Set 2.

- What used to be the "Label" element in LC has moved up a level (outside of
NameDetailed) to "NameComplex"

- I changed the "Nomenclature" element of LC around a bit.  It is renamed to
"NomenclaturalCode" and is now a required root element of NameDetailed, and
the enumeration of NomenclaturalCode has been converted to a required
attribute.  Not sure if this move to attribute is right, but it made sense
to me, so I just went ahead and did it.

- I removed a few of the less obvious elements of Nomenclatural Code stuff,
just to keep things simple. They can be restored later if desired.

- I moved NomenclaturalStatus out to the root of ScientificName -- just
seemed to make more sense there.

- I removed NameExtensions temporarily, just because we haven't fully sorted
out how those will work.  They will need to be accommodated somewhere.

- The proposed "Text" element within CanonicalName has been removed, and
replaced by "NameSimple" outside of NameDetailed.

- The elements dealing with hybrids and Notho taxa have been temporarily
removed, because the require a lot more discussion, and I didn't want to be
distracted.  They WILL be needed somewhere (just not sure where).

- Restored IsNovum

- Created IsProtonym to flag subset of name objects which are original
descriptions (and hence, represent original combinations, and are potential
basionyms)

- Created FirstPublication element to point to a publication instance that
was the first code-recognized appearance of this particular name or
combination.  Would point to original description if IsProtonym=True.

- I did not restore a number of the less-well defined and discussed LC
elements, but they can be restored from LC as needed.

The second batch of comments applies only to Set 2.

- Restored the "Protonym" element from LC (representing one important kind
of name-name relationship)

- Created “


Here are a few of the advantages of this basic approach to the overall
schema (i.e., embedding nomenclatural data in Nominal Concepts).

1) The Nominal Concept retains the same "concept" definition intended by the
original TCS schema.  There is no implied concept circumscription beyond the
primary type specimen.  All information contained within the NameDetailed
and other elements of a Concept circumscription are assumed to be strictly
nomenclatural, and of no direct bearing on the size/shape/position of a
concept circumscription (other than the inclusion of the primary type).

2) All nomenclatural data can be contained in a "modular" way; that is,
within Nominal concepts.  If someone wanted to pass only nomenclatural data,
they would only send concepts that conform to type="Nominal".  If someone
wanted to extract only the nomenclatural information from a Dataset, they
can very easily filter by TaxonConcept type="Nominal".

3) All name-only identifications must link to a Nominal concept (rather than
directly to a name), so there is a built-in enforcement/encouragement to
always present biological data in the context of a concept (even if just a
Nominal concept).

4) No additional top-level elements are needed (hence, no fundamental
changes in the basic structure of TCS).

5) Nomenclatural extensions are clearly either in the form of different
enumerated name-name relationships (Set 1), or additional elements defined
within NameDetailed (Set 2), and do not affect the rest of the TCS
structure.

Potential disadvantages:

1) Possible appearance of redundancy of information (e.g., when an
"Original" concept points to the corresponding "Nominal" concept, there will
often be duplication of links to publication instances, etc.)

2) Both sets depend somewhat on data-provider compliance with stated (but
unenforced) business rules of which data elements belong where (Set 1
especially).

3) Schema-level enforcement options are complex (reference recent post from
Bob Morris).

4) May represent the "worst of both worlds", rather than the "best of both
worlds" (too tired to tell right now).

I sure there are many, many more disadvantages, but I have high confidence
that these will be pointed out to me in exquisite detail.

Also, I have not thought through all the elements, so more discussion is
needed on a wide variety of topics.

Well...it’s almost 1:30am, so I better get working on those instance
documents....

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://www.bishopmuseum.org/bishop/HBS/pylerichard.html



-------------- next part --------------
A non-text attachment was scrubbed...
Name: lc015_napier1-RLP1.xsd
Type: text/xml
Size: 113212 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/tcs-lc/attachments/20050309/1da6cef6/lc015_napier1-RLP1.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lc015_napier1-RLP2.xsd
Type: text/xml
Size: 114601 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/tcs-lc/attachments/20050309/1da6cef6/lc015_napier1-RLP2.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v090b-RLP1.xsd
Type: text/xml
Size: 32274 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/tcs-lc/attachments/20050309/1da6cef6/v090b-RLP1.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v090b-RLP2.xsd
Type: text/xml
Size: 35976 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/tcs-lc/attachments/20050309/1da6cef6/v090b-RLP2.xml


More information about the Tcs-lc mailing list