[tcs-lc] Modularisation of standards

Tue Mar 8 03:04:50 PST 2005

I add my strong agreement to the philosphy supported below, but warn 
that not only in TDWG but in many communities attempting to develop 
exchange standards, there is a big risk of relying on XML-Schema alone 
to achieve the goals of robustness, extensibility, and reuse that the 
philosophy seeks. XML-Schema is by itself a little weak on its 
modularization facilities, which are largely limited to an inheritance 
mechanism that lacks multiple inheritance or any substitute for it and 
some name scoping (via namespaces and xs:include/xs:import).

Aside from these, relationships between data types are mostly limited to 
the key/keyref mechanism and its infrastructure. These are elegant but 
technically difficult to embody in schemata.  Worst of all, the 
nefarious <xs:any> and to a lesser extent the substitutionGroup 
mechanism, can easily defeat the designer's best intention to support 
reusability, sometimes in its very pursuit.

Finally, there are also a few anti-modularization landmines in Schema 
due to having been designed by committee. One that XMLSpy didn't deal 
with correctly until(?) XMLSpy 2005 is the misguided, annoying "unique 
particle constraint" 
http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/#cos-nonambig
Intended entirely to make it technically easier to build validating 
parsers, this constraint on schemas sometimes prohibits using the same 
name for elements of different structural variants, when it is often 
reasonable to do so (e.g. "TaxonConceptSchema" and "Name"). Too often, 
in pursuit of ubiquitous element names (and other social goals), schema 
designers simply make the variation rest on optional elements, and in 
annotation plead with producers of instance documents to do the right thing.

Some of these problems are discussed in the rather nice paper 
"Generating Data Bindings for an XML Schema-Based Language" by Eric M. 
Dashofy. http://www.ics.uci.edu/~edashofy/papers/xse2001.pdf
A few of its criticisms of XML Schema have since been addressed, but it 
is still pretty current.

Bob

Donald Hobern wrote:
> Gregor Hagedorn wrote:
> 
> 
>>Closely related, my feeling about much of the discussion whether LC should> 
>>be in TCS is that it misses the point. I think that rather than embedding 
>>LC in TCS, both name and concept issues, together with specimen and 
>>publication, and description and ecology, and... issues all belong 
>>together. That is what UBIF aims at. SDD does not work without much 
>>peripheral infrastructure of names, taxonomic hierarchy, publications. 
>>geography, agents, etc. Rather than considering it as part of descriptive 
>>data, we tried to push it out into UBIF. 
>>
>>[...]
>
> 
> I am happy for the discussion around the separate use of names outside
> concepts to be resolved either way (provided we end up with something that
> can handle nomenclatural resources as well as taxonomic resources).  However
> I would like very much to support modularisation of the kind that Gregor
> outlined here (and which Rich mentioned in one of his earlier posts).
> 
> Modularisation will be really important for the long-term success of the
> TDWG standards.  It may be a long way from where we are today, but the TDWG
> standards could (and probably should) ultimately become a library of
> reusable data types.  Better still there should also be a set of defined
> inter-type relationships.  This would not be to restrict the relationships
> that provider could define, but it would help to provide structure to some
> of the core connections within our information domain.  
> [...]

-- 
Robert A. Morris
Professor of Computer Science
UMASS-Boston
http://www.cs.umb.edu/~ram
phone (+1)617 287 6466