[eml-dev] [Bug 5308] New: Data Manager Library: storageType content should be stored and used
bugzilla-daemon at ecoinformatics.org
bugzilla-daemon at ecoinformatics.org
Tue Feb 15 07:57:28 PST 2011
http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5308
Summary: Data Manager Library: storageType content should be
stored and used
Product: EML
Version: 2.1.0
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: datamanager
AssignedTo: tao at nceas.ucsb.edu
ReportedBy: dcosta at lternet.edu
QAContact: eml-dev at ecoinformatics.org
Estimated Hours: 0.0
'storageType' is an optional, repeatable element within the EML 'attribute'
element. In addition to the documentation available in the EML normative
documents, several old bug tickets describe the rationale behind this element:
#484, #544, #599.
When the Data Manager Library parses EML attributes, it does not record any
'storageType' content that may be present. This means that the hints that may
have been provided by the metadata provider pertaining to how the attribute
should be stored optimally (say, in a relational database table), are
completely ignored by the Data Manager Library, which instead relies entirely
on the 'measurementScale' content for this purpose.
To cite a specific example of how 'storageType' content can be helpful, the
document knb-lter-gce.1.9
(http://metacat.lternet.edu/knb/metacat/knb-lter-gce.1.9) contains three
attributes for year, month, and day, respectively. Each of the attributes has
storageType set to 'integer' and measurementScale set to 'dateTime'. When
loading the data table into a relational database, the Data Manager Library
sets the corresponding database fields to type 'timestamp' (in Postgres),
having no knowledge that the storage type "hint" was to set the fields to type
integer ('int4' in Postgres). The result is that in the original data table
entity, the fields appear like this:
2000 8 26
while in the relational database, they appear like this:
year | month | day
---------------------+------------------------+------------------------
2000-01-01 00:00:00 | 0001-08-01 00:00:00 BC | 0001-01-26 00:00:00 BC
It's clear that in this particular case, the Data Manager Library could have
used the storageType hint to select a more appropriate data type for these
attributes.
The goal of this task is to:
1. Enhance the EML parsing phase of the Data Manager Library, so that it parses
and stores all storageType elements that are provided for an attribute.
2. Enhance the data loading phase of the Data Manager Library, so that it uses
storageType content, if provided, to make a more informed decision about which
data type to define for the attribute. This may involve the need for heuristics
to determine which data type is most appropriate under a given set of
circumstances, particularly in cases where more than one storageType element is
provided for an attribute.
--
Configure bugmail: http://bugzilla.ecoinformatics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
More information about the Eml-dev
mailing list