Measurement scale in EML
Xiaoping Wang
xiaoping.y.wang at noaa.gov
Mon Feb 28 14:16:01 PST 2005
Dear Matt and Peter,
Thank you very much for your useful inputs. The following is the
revised piece of my EML document. Please note that I am not using the
values from standardUnit/unitDictionary to describe the unit for the
variables. I think the unitDictionary is used when you use the <unit>
element to discribe the unit. Here I use whatever that actually appear
in the varUnit column of the table as the "code" (for example, PPT as
the code for the unit of Sanility) for the <code> element and give
further explanation about the unit in the <codeDefinition> element.
Please let me know if you have further advices.
<attribute>
<attributeName>varName</attributeName>
<attributeDefinition>Name of chemical or physical property
measured</attributeDefinition>
<storageType>String</storageType>
<measurementScale>
<nominal>
<nonNumericDomain>
<enumeratedDomain>
<codeDefinition>
<code>T</code>
<definition>Temperature</definition>
</codeDefinition>
<codeDefinition>
<code>S</code>
<definition>Salinity</definition>
</codeDefinition>
<codeDefinition>
<code>ST</code>
<definition>Sigma-T</definition>
</codeDefinition>
</enumeratedDomain>
</nonNumericDomain>
</nominal>
</measurementScale>
</attribute>
<attribute>
<attributeName>varUnit</attributeName>
<attributeDefinition>Unit of chemical or physical property
measured</attributeDefinition>
<storageType>String</storageType>
<measurementScale>
<nominal>
<nonNumericDomain>
<enumeratedDomain>
<codeDefinition>
<code>C</code>
<definition>Degree, the unit for
Temperature</definition>
</codeDefinition>
<codeDefinition>
<code>PPT</code>
<definition>Unit for Salinity</definition>
</codeDefinition>
<codeDefinition>
<code>KG/M**3</code>
<definition>Kilogram per cubic meter, the unit
for Sigma-T</definition>
</codeDefinition>
</enumeratedDomain>
</nonNumericDomain>
</nominal>
</nonNumericDomain>
</attribute>
Thank you!
Xiaoping Wang
PMEL / NOAA
Matt Jones wrote:
> Hi Xiaoping,
>
> As Peter mentioned, your problems have arisen before. See below for
> some additional recommendations beyond Peter's from my personal
> perspective.
>
> Xiaoping Wang wrote:
>
>> Dear Matt and Peter:
>>
>> I have seen a lot of discussions recently on issues about measurement
>> scale and temporal coverage. They are very helpful for our better
>> understanding of EML. The following are my questions and concerns I
>> raised during my work on our EML-based metadata. <#temporalCoverage>
>>
>> 1. About the Measurement scale
>>
>> The measurementSclae is a little bit confusing. I spent a lot of
>> time working on the measurementScale for nominal data. Here I want
>> to give you an example about how I use the measurmentScale to
>> describe nominal data in our dataset, and you can see whether my
>> implementation is based on correct understanding of this element.
>>
>> We have a data table with four columns (attributes): recordID,
>> variable_name, variable_unit, and avriable_value. The values for
>> variable_name column include certain measurements for the chemical
>> and physical properites of sea water such as temperature, salinity,
>> nitrate...... The following is a sample piece of my EML file for
>> this dataset.
>> - <#> <attribute>
>> <attributeName>varName</attributeName>
>> <attributeDefinition>Name of chemical or physical property
>> measured</attributeDefinition>
>> <storageType>String</storageType>
>> - <#> <measurementScale>
>> - <#> <nominal>
>> - <#><nonNumericDomain>
>> - <#><enumeratedDomain>
>> - <#><codeDefinition>
>> <code>T</code>
>> <definition>Temperature, unit: C</definition>
>> </codeDefinition>
>> - <#> <codeDefinition>
>> <code>S</code>
>> <definition>Salinity, unit: PPT</definition>
>> </codeDefinition>
>> - <#><codeDefinition>
>> <code>ST</code>
>> <definition>Sigma-T, unit: KG/M**3</definition>
>> </codeDefinition> <#>
>> </enumeratedDomain>
>> </nonNumericDomain>
>> </nominal>
>> </measurementScale>
>> </attribute>
>> - <#> <attribute>
>> <attributeName>varUnit</attributeName>
>> <attributeDefinition>Unit of chemical or physical property
>> measured</attributeDefinition>
>> <storageType>String</storageType>
>> - <#> <measurementScale>
>> - <#> <nominal>
>> - <#> <nonNumericDomain>
>> - <#> <textDomain>
>> <definition>*</definition>
>> </textDomain>
>> </nonNumericDomain>
>> </nominal>
>> </measurementScale>
>> </attribute>
>>
>> My questions / concerns are:
>> (1) Is it suitable to use enumeratedDomain element to describe varName?
>
> Yes, that is fine, although if you wanted it to be free text that
> would be ok too (just use textDomain instead of enumeratedDomain).
> Encoding the unit information in the variable name is somewhat
> repetitive if you have the same unit information in the varUnit column.
>
>>
>> (2) For the varUnit, I don't think it is necessary to include
>> measurementScale element. However, since the measurementScale is an
>> required field, I have to put something there in order to pass the
>> EML validation. So I put a "*" sign for the definition element. I
>> have seen some other similar cases in which the EML metadata
>> developers use a "*" for the definition element. Obviously, the
>> measurementScale content described here tells no useful information
>> about the varUnit.
>
> The use of the '*' is inappropriate. The field is required because
> the authors of EML thought the information was important. In this
> case, I think you should put in the definition something that
> indicates that the values are names of units. One major thing that is
> missing here is that you don't use the EML Unit Dictionary when
> choosing your unit definitions. This eliminates the major advantage
> of EML in being able to provide quantitative information about units.
> If there is a 1:1 correspondence between your units and the EML unit
> dictionary, I think it would be good if you defined varUnit as an
> enumerated domain and for each of your units provide the EML standard
> name for the unit in the definition. This would help in translating,
> although it is unlikely that anyone could use this in automated
> systems because its such a non-standard use of the eml descriptors.
>
> In general, this model of variablename, varunit, value is a
> non-standard use of the relational model as the attributes do not
> really represent a single type. The relational model is generally
> intended to have attributes that contain a semantically homogenous set
> of values. In your case this is not true, unless considered from a
> meta-level. So, I think you are using the relational model as a
> schema language itself. This significantly complicates use of the data
> in standard analytical systems (e.g., SAS< Splus, R, Matlab) -- they
> basically all require different views of the data as described in
> Peter's note. Personally I think that documenting these more
> traditional views if you have them would be far more useful to
> scientists who wish to analyze the data. That would have the added
> benefit of being better described by EML structures. Documenting your
> "meta-level" schema isn't particularly informative because the
> information in one attribute is so heterogeneous.
>
>>
>> 2. About the information of metadata itself
>>
>> Based on my understanding of EML schemas, the only inforamtion
>> associated with the metadata itself is the information about metadata
>> provider(s). However, my supervisors and I think that it is
>> important to provide other metadata information, such as when
>> metadata document is created, if further update of metadata is neede,
>> and if the answer is yes, what is the metadata update frequency and
>> the date of last update. Those pieces of information are
>> particularly important in the case when the endDate value for the
>> dataset from on-going projects is going to change, because first they
>> can remind metadata providers / developer when they should update
>> their metadata, and second they can tell metadata users if the
>> metadata document provides the most current information about the
>> dataset described.
>
> Sure. In hindsight, I think we should have included these metadata
> information fields, particularly the timestamp fields. But we do have
> some related fields that describe ongoing data collection. Take a
> look at /eml/dataset/maintenance/description and
> /eml/dataset/maintenance/maintenanceUpdateFrequency. The latter is
> probably what you want. Ay fields that you want but that don't exist
> in the schema can be put in the "/eml/additionalMetadata" field, so
> you always have that as a recourse. If you have specific
> recommendations for fields that are needed you could send them to
> eml-dev at ecoinformatics.org and we'll try to get them into plans for a
> future release.
>
>>
>> 3. About the temporal coverage <#temporalCoverage>
>>
>> We have many metadata records with uncertain endDate because the new
>> data are being continuously loaded into the dataset. Whenever new
>> data are loaded, we have to change the values for end date, number of
>> records, and /or size of table...... I am wondering when you can
>> provide a solution for this issue.
>
> Personally I think this is a good thing. At any given point in time
> there is a finite amount of data available, and the metadata should
> describe that. If you have an automated data collection process, then
> you would simply have to update your metadata as part of that process.
> The number of records, table size, and checksum are useful when people
> get your data to validate that they got the data without error. The
> end date for temporal coverage provides valuable discovery
> information, and should simply be made to match the data that you
> release.
>
>>
>> In addition, I found from John's email that you had a KNB data
>> management workshop early this year. I am very interested in this
>> kind of workshop, particular workshop associated with the use of
>> metacat. If you have this type of workshop in the future, please let
>> me know.
>
> Yeah, we had one in February. We announce these opportunities on
> various web sites and mailing lists. You should subscribe to
> ecoinfo at ecoinformatics.org and watch http://seek.ecoinformatics.org in
> particular for announcements.
>
> Like Peter I also recommend that you get involved in the ongoing
> improvements related to EML. Your feedback and contributions would be
> extremely vauable. Good luck. Let us know if you have more questions.
>
> Matt
>
>>
>> Thank you very much for your support!
>>
>> Xiaoping Wang
>>
>> PMEL /NOAA
>>
>>
>>
>>
>>
>>
>
More information about the Eml-dev
mailing list