Measurement scale in EML

Xiaoping Wang xiaoping.y.wang at noaa.gov
Mon Feb 28 14:16:01 PST 2005


Dear Matt and Peter,

Thank you very much for your useful inputs.  The following is the 
revised piece of my EML document.  Please note that I am not using the 
values from standardUnit/unitDictionary to describe the unit for the 
variables.  I think the unitDictionary is used when you use the <unit> 
element to discribe the unit.  Here I use whatever that actually appear 
in the varUnit column of the table as the "code" (for example, PPT as 
the code for the unit of Sanility) for the <code> element and give 
further explanation about the unit in the <codeDefinition> element.  
Please let me know if you have further advices.

<attribute>   
    <attributeName>varName</attributeName>
    <attributeDefinition>Name of chemical or physical property 
measured</attributeDefinition>
    <storageType>String</storageType>
    <measurementScale>
        <nominal>
            <nonNumericDomain>
                <enumeratedDomain>
                    <codeDefinition>
                        <code>T</code>
                        <definition>Temperature</definition>
                    </codeDefinition>
                    <codeDefinition>
                        <code>S</code>
                        <definition>Salinity</definition>
                    </codeDefinition>
                    <codeDefinition>
                        <code>ST</code>
                        <definition>Sigma-T</definition>
                    </codeDefinition>
                </enumeratedDomain>
            </nonNumericDomain>
        </nominal>
    </measurementScale>
</attribute>
<attribute>
    <attributeName>varUnit</attributeName>
    <attributeDefinition>Unit of chemical or physical property 
measured</attributeDefinition>
    <storageType>String</storageType>
    <measurementScale>
        <nominal>
            <nonNumericDomain>
                <enumeratedDomain>
                    <codeDefinition>
                        <code>C</code>
                        <definition>Degree, the unit for 
Temperature</definition>
                    </codeDefinition>
                    <codeDefinition>
                        <code>PPT</code>
                        <definition>Unit for Salinity</definition>
                    </codeDefinition>
                    <codeDefinition>
                        <code>KG/M**3</code>
                        <definition>Kilogram per cubic meter, the unit 
for Sigma-T</definition>
                    </codeDefinition>
                </enumeratedDomain>
            </nonNumericDomain>
        </nominal>
         </nonNumericDomain>
</attribute>

Thank you!

Xiaoping Wang

PMEL / NOAA

Matt Jones wrote:

> Hi Xiaoping,
>
> As Peter mentioned, your problems have arisen before.  See below for 
> some additional recommendations beyond Peter's from my personal 
> perspective.
>
> Xiaoping Wang wrote:
>
>> Dear Matt and Peter:
>>
>> I have seen a lot of discussions recently on issues about measurement 
>> scale and temporal coverage.  They are very helpful for our better 
>> understanding of EML.  The following are my questions and concerns I 
>> raised during my work on our EML-based metadata. <#temporalCoverage>
>>
>> 1. About the Measurement scale
>>
>> The measurementSclae is a little bit confusing.  I spent a lot of 
>> time working on the measurementScale for nominal data.  Here I want 
>> to give you an example about how I use the measurmentScale to 
>> describe nominal data in our dataset, and you can see whether my 
>> implementation is based on correct understanding of this element.
>>
>> We have a data table with four columns (attributes): recordID, 
>> variable_name, variable_unit, and avriable_value.  The values for 
>> variable_name column include certain measurements for the chemical 
>> and physical properites of sea water such as temperature, salinity, 
>> nitrate......  The following is a sample piece of my EML file for 
>> this dataset.
>> - <#> <attribute>
>>      <attributeName>varName</attributeName>
>>      <attributeDefinition>Name of chemical or physical property 
>> measured</attributeDefinition>
>>      <storageType>String</storageType>
>> - <#>     <measurementScale>
>> - <#>         <nominal>
>> -            <#><nonNumericDomain>
>> -                <#><enumeratedDomain>
>> -                    <#><codeDefinition>
>>                      <code>T</code>
>>                      <definition>Temperature, unit: C</definition>
>>                  </codeDefinition>
>> -                <#>    <codeDefinition>
>>                         <code>S</code>
>>                         <definition>Salinity, unit: PPT</definition>
>>                  </codeDefinition>
>> -                    <#><codeDefinition>
>>                         <code>ST</code>
>>                         <definition>Sigma-T, unit: KG/M**3</definition>
>>                     </codeDefinition>   <#>
>>              </enumeratedDomain>
>>          </nonNumericDomain>
>>      </nominal>
>>  </measurementScale>
>> </attribute>
>> - <#> <attribute>
>>      <attributeName>varUnit</attributeName>
>>      <attributeDefinition>Unit of chemical or physical property 
>> measured</attributeDefinition>
>>      <storageType>String</storageType>
>> - <#>     <measurementScale>
>> - <#>         <nominal>
>> - <#>             <nonNumericDomain>
>> - <#>                 <textDomain>
>>                      <definition>*</definition>
>>              </textDomain>
>>          </nonNumericDomain>
>>      </nominal>
>>  </measurementScale>
>> </attribute>
>>
>> My questions / concerns are:
>> (1) Is it suitable to use enumeratedDomain element to describe varName?
>
> Yes, that is fine, although if you wanted it to be free text that 
> would be ok too (just use textDomain instead of enumeratedDomain).  
> Encoding the unit information in the variable name is somewhat 
> repetitive if you have the same unit information in the varUnit column.
>
>>
>> (2) For the varUnit, I don't think it is necessary to include 
>> measurementScale element.  However, since the measurementScale is an 
>> required field, I have to put something there in order to pass the 
>> EML validation.  So I put a "*" sign for the definition element.  I 
>> have seen some other similar cases in which the EML metadata 
>> developers use a "*" for the definition element.  Obviously, the 
>> measurementScale content described here tells no useful information 
>> about the varUnit.
>
> The use of the '*' is inappropriate.  The field is required because 
> the authors of EML thought the information was important.  In this 
> case, I think you should put in the definition something that 
> indicates that the values are names of units.  One major thing that is 
> missing here is that  you don't use the EML Unit Dictionary when 
> choosing your unit definitions.  This eliminates the major advantage 
> of EML in being able to provide quantitative information about units.  
> If there is a 1:1 correspondence between your units and the EML unit 
> dictionary, I think it would be good if you defined varUnit as an 
> enumerated domain and for each of your units provide the EML standard 
> name for the unit in the definition.  This would help in translating, 
> although it is unlikely that anyone could use this in automated 
> systems because its such a non-standard use of the eml descriptors.
>
> In general, this model of variablename, varunit, value is a 
> non-standard use of the relational model as the attributes do not 
> really represent a single type.  The relational model is generally 
> intended to have attributes that contain a semantically homogenous set 
> of values.  In your case this is not true, unless considered from a 
> meta-level.  So, I think you are using the relational model as a 
> schema language itself. This significantly complicates use of the data 
> in standard analytical systems (e.g., SAS< Splus, R, Matlab) -- they 
> basically all require different views of the data as described in 
> Peter's note.  Personally I think that documenting these more 
> traditional views if you have them would be far more useful to 
> scientists who wish to analyze the data. That would have the added 
> benefit of being better described by EML structures.  Documenting your 
> "meta-level" schema isn't particularly informative because the 
> information in one attribute is so heterogeneous.
>
>>
>> 2. About the information of metadata itself
>>
>> Based on my understanding of EML schemas, the only inforamtion 
>> associated with the metadata itself is the information about metadata 
>> provider(s).  However, my supervisors and I  think that  it is 
>> important to provide other metadata information, such as when 
>> metadata document is created, if further update of metadata is neede, 
>> and if the answer is yes, what is the metadata update frequency and 
>> the date of last update.  Those pieces of  information are 
>> particularly important in the case when the endDate value for the 
>> dataset from on-going projects is going to change, because first they 
>> can remind metadata providers / developer when they should update 
>> their metadata, and second they can tell metadata users if the 
>> metadata document provides the most current information about the 
>> dataset described.
>
> Sure.  In hindsight, I think we should have included these metadata 
> information fields, particularly the timestamp fields.  But we do have 
> some related fields that describe ongoing data collection.  Take a 
> look at /eml/dataset/maintenance/description and 
> /eml/dataset/maintenance/maintenanceUpdateFrequency.  The latter is 
> probably what you want.  Ay fields that you want but that don't exist 
> in the schema can be put in the "/eml/additionalMetadata" field, so 
> you always have that as a recourse.  If you have specific 
> recommendations for fields that are needed you could send them to 
> eml-dev at ecoinformatics.org and we'll try to get them into plans for a 
> future release.
>
>>
>> 3. About the temporal coverage <#temporalCoverage>
>>
>> We have many metadata records with uncertain endDate because the new 
>> data are being continuously loaded into the dataset.  Whenever new 
>> data are loaded, we have to change the values for end date, number of 
>> records, and /or size of table......  I am wondering when you can 
>> provide a solution for this issue.
>
> Personally I think this is a good thing.  At any given point in time 
> there is a finite amount of data available, and the metadata should 
> describe that.  If you have an automated data collection process, then 
> you would simply have to update your metadata as part of that process. 
> The number of records, table size, and checksum are useful when people 
> get your data to validate that they got the data without error.  The 
> end date for temporal coverage provides valuable discovery 
> information, and should simply be made to match the data that you 
> release.
>
>>
>> In addition, I found from John's email that you had a KNB data 
>> management workshop early this year.  I am very interested in this 
>> kind of workshop, particular workshop associated with the use of 
>> metacat.  If you have this type of workshop in the future, please let 
>> me know.
>
> Yeah, we had one in February.  We announce these opportunities on 
> various web sites and mailing lists.  You should subscribe to 
> ecoinfo at ecoinformatics.org and watch http://seek.ecoinformatics.org in 
> particular for announcements.
>
> Like Peter I also recommend that you get involved in the ongoing 
> improvements related to EML.  Your feedback and contributions would be 
> extremely vauable.  Good luck.  Let us know if you have more questions.
>
> Matt
>
>>
>> Thank you very much for your support!
>>
>> Xiaoping Wang
>>
>> PMEL /NOAA
>>
>>
>>
>>
>>
>>
>




More information about the Eml-dev mailing list