[eml-dev] [Bug 5308] New: Data Manager Library: storageType content should be stored and used

bugzilla-daemon at ecoinformatics.org bugzilla-daemon at ecoinformatics.org
Tue Feb 15 07:57:28 PST 2011


http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5308

           Summary: Data Manager Library: storageType content should be
                    stored and used
           Product: EML
           Version: 2.1.0
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: datamanager
        AssignedTo: tao at nceas.ucsb.edu
        ReportedBy: dcosta at lternet.edu
         QAContact: eml-dev at ecoinformatics.org
   Estimated Hours: 0.0


'storageType' is an optional, repeatable element within the EML 'attribute'
element. In addition to the documentation available in the EML normative
documents, several old bug tickets describe the rationale behind this element:
#484, #544, #599.

When the Data Manager Library parses EML attributes, it does not record any
'storageType' content that may be present. This means that the hints that may
have been provided by the metadata provider pertaining to how the attribute
should be stored optimally (say, in a relational database table), are
completely ignored by the Data Manager Library, which instead relies entirely
on the 'measurementScale' content for this purpose.

To cite a specific example of how 'storageType' content can be helpful, the
document knb-lter-gce.1.9
(http://metacat.lternet.edu/knb/metacat/knb-lter-gce.1.9) contains three
attributes for year, month, and day, respectively. Each of the attributes has
storageType set to 'integer' and measurementScale set to 'dateTime'. When
loading the data table into a relational database, the Data Manager Library
sets the corresponding database fields to type 'timestamp' (in Postgres),
having no knowledge that the storage type "hint" was to set the fields to type
integer ('int4' in Postgres). The result is that in the original data table
entity, the fields appear like this:

2000 8 26

while in the relational database, they appear like this:

        year         |         month          |          day           
---------------------+------------------------+------------------------
 2000-01-01 00:00:00 | 0001-08-01 00:00:00 BC | 0001-01-26 00:00:00 BC

It's clear that in this particular case, the Data Manager Library could have
used the storageType hint to select a more appropriate data type for these
attributes.


The goal of this task is to:

1. Enhance the EML parsing phase of the Data Manager Library, so that it parses
and stores all storageType elements that are provided for an attribute.

2. Enhance the data loading phase of the Data Manager Library, so that it uses
storageType content, if provided, to make a more informed decision about which
data type to define for the attribute. This may involve the need for heuristics
to determine which data type is most appropriate under a given set of
circumstances, particularly in cases where more than one storageType element is
provided for an attribute.

-- 
Configure bugmail: http://bugzilla.ecoinformatics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.


More information about the Eml-dev mailing list