[Bug 544] - issues about storageType and attributeDomain

Mon Sep 2 10:29:37 PDT 2002

http://bugzilla.ecoinformatics.org/show_bug.cgi?id=544

jones at nceas.ucsb.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eml-dev at ecoinformatics.org

------- Additional Comments From jones at nceas.ucsb.edu  2002-09-02 10:29 -------
Thanks for the comments on these data typing issues, Dan.  There are two
distinct issues you raised, which I will address separately:

1) Enumerated domain doesn't allow a simple list without definitions

   This is true, and intentional.  When data are distributed, it is critical to 
   know the definitions for the string values that are present in the data 
   entity. String values or enumerated lists are generally codes that represent 
   some type of measurement (e.g., HIGH, MEDIUM, LOW), or are names of 
   sampling locations (e.g., SUBPLOT4).  
   In either case, it is critical to have the definition.  From a data re-use
   or data preservation perspective, can you show a case where it would be 
   acceptable to not have a definition of an enumerated value?  If so, I would
   agree that we should consider relaxing this requirement, but for now I think 
   it is a fundamental part of the definition of an enumerated attribute.

2) XML Schema data types used in storageType overlap with attributeDomain

   Also true, but the two fields serve different purposes.  
   The storageType of an attribute is an indication of the type that might be 
   used to represent the value in a data management system, such as 
   a database or programming language.  It is not actually an 
   expression of the true domain, as it may in fact be defined slightly 
   differently than the attributeDomain (e.g., storageType might be "character" 
   while the domain might be a restricted list of character values).

   That we recommend XML Schema Datatypes (which allow restrictions) for the 
   storageType does not change the need for an independent specification of the 
   domain.  If someone were to use a different type system for the storageType, 
   especially one which didn't have the restriction capabilities that XML Schema 
   Datatypes does, then the elimination of attributeDomain would be problematic.
   So, basically, attributeDomain is a required expression of the domain, while
   storageType is an optional expression of the likely type from some 
   (hopefully common) type system (e.g., Oracle datatypes, Java datatypes,
   XML Schema data types). One might think of storageType as a hint to 
   automated processing systems as to how one might represent the values of 
   the attribute.  storageType was originally repeatable, and one might
   argue that it should be repeatable so that the type from multiple systems
   can be indicated.  I think that would be a positive change.

In summary, although you make cogent points, I don't think that we should make
substantial changes to the model at this time.  I will, however, revise the
schemas to try to clarify the documentation with respect to these issues, and to
make storageType repeatable.  Comments?  In the absence of further comments,
I'll close this bug this week.  Thanks.