Measurement scale in EML

Peter McCartney peter.mccartney at asu.edu
Sat Feb 26 09:56:51 PST 2005


We have several data tables organized this way in our CAP LTER project
and so i can certainly sympathize with the efficiency of this structure
and with the frustrations you are encountering trying to describe this
table with EML. in ours, each line contains a column for the storet code
for one of several analytes done on the sample, another column for the
value, and yet another for the measurement units. this structure made
for very easy storage and entry forms, but is clearly beyond EML's
capacity to properly describe since EML tends to lose its link with the
data when there are record-level metadata embedded in the data file
itself. the best you can do is more or less what you have done, and put
descriptive information to indicate whats going on. this might be made a
little clearer by describing your var_unit as nominal  with an
enumeration listing the different units, and some text in the definition
element to explain its relationship to your value column. I did note
that you included the units within the var_names, so even as you have
it, someone could make sense of the data. Note that no other popular
metadata schema (FGDC, GCMD, ISO etc) would do a better job at
describing this sort of structure where there are dependencies between
attributes. 

The easy answer would be to say that you should consider writing some
views or queries that reorganize your data for publishing in a more
normalized way such that each variable appears in its own columm and can
thus be described in the manner EML is designed to do. However, it was
our desire to make EML an aid, not a constraint, to how people manage
data and I can assure you that the problems you are having accommodating
your data structure are not unique and will have to be considered in our
continuing improvement of EML.

Difficulties accomodating large, continuously updated datasets has also
been rasied by several sites, my own included. Most sites have taken a
short term work-aroundby making annual "snapshots" of these data in a
series, although again, i think its the responsibility of the team to
address this. There is an active bug on this issue posted at:
http://bugzilla.ecoinformatics.org/show_bug.cgi?id=1794
which you may wish to read and comment on. 

It is of tremendous value to have users like you contribute your
experiece with working with EML. It is intended to be a community
product, so we encourage you to become as involved as you can.

On Fri, 2005-02-25 at 16:25 -0800, Xiaoping Wang wrote:
> Dear Matt and Peter:
> 
> I have seen a lot of discussions recently on issues about measurement 
> scale and temporal coverage.  They are very helpful for our better 
> understanding of EML.  The following are my questions and concerns I 
> raised during my work on our EML-based metadata. <#temporalCoverage>
> 
> 1. About the Measurement scale
> 
> The measurementSclae is a little bit confusing.  I spent a lot of time 
> working on the measurementScale for nominal data.  Here I want to give 
> you an example about how I use the measurmentScale to describe nominal 
> data in our dataset, and you can see whether my implementation is based 
> on correct understanding of this element.
> 
> We have a data table with four columns (attributes): recordID, 
> variable_name, variable_unit, and avriable_value.  The values for 
> variable_name column include certain measurements for the chemical and 
> physical properites of sea water such as temperature, salinity, 
> nitrate......  The following is a sample piece of my EML file for this 
> dataset.
> - <#> <attribute>
>       <attributeName>varName</attributeName>
>       <attributeDefinition>Name of chemical or physical property 
> measured</attributeDefinition>
>       <storageType>String</storageType>
> - <#>     <measurementScale>
> - <#>         <nominal>
> -            <#><nonNumericDomain>
> -                <#><enumeratedDomain>
> -                    <#><codeDefinition>
>                       <code>T</code>
>                       <definition>Temperature, unit: C</definition>
>                   </codeDefinition>
> -                <#>    <codeDefinition>
>                          <code>S</code>
>                          <definition>Salinity, unit: PPT</definition>
>                   </codeDefinition>
> -                    <#><codeDefinition>
>                          <code>ST</code>
>                          <definition>Sigma-T, unit: KG/M**3</definition>
>                      </codeDefinition>   <#>
>               </enumeratedDomain>
>           </nonNumericDomain>
>       </nominal>
>   </measurementScale>
> </attribute>
> - <#> <attribute>
>       <attributeName>varUnit</attributeName>
>       <attributeDefinition>Unit of chemical or physical property 
> measured</attributeDefinition>
>       <storageType>String</storageType>
> - <#>     <measurementScale>
> - <#>         <nominal>
> - <#>             <nonNumericDomain>
> - <#>                 <textDomain>
>                       <definition>*</definition>
>               </textDomain>
>           </nonNumericDomain>
>       </nominal>
>   </measurementScale>
> </attribute>
> 
> My questions / concerns are:
> (1) Is it suitable to use enumeratedDomain element to describe varName?
> 
> (2) For the varUnit, I don't think it is necessary to include 
> measurementScale element.  However, since the measurementScale is an 
> required field, I have to put something there in order to pass the EML 
> validation.  So I put a "*" sign for the definition element.  I have 
> seen some other similar cases in which the EML metadata developers use a 
> "*" for the definition element.  Obviously, the measurementScale content 
> described here tells no useful information about the varUnit.
> 
> 2. About the information of metadata itself
> 
> Based on my understanding of EML schemas, the only inforamtion 
> associated with the metadata itself is the information about metadata 
> provider(s).  However, my supervisors and I  think that  it is important 
> to provide other metadata information, such as when metadata document is 
> created, if further update of metadata is neede, and if the answer is 
> yes, what is the metadata update frequency and the date of last update.  
> Those pieces of  information are particularly important in the case when 
> the endDate value for the dataset from on-going projects is going to 
> change, because first they can remind metadata providers / developer 
> when they should update their metadata, and second they can tell 
> metadata users if the metadata document provides the most current 
> information about the dataset described.
> 
> 3. About the temporal coverage <#temporalCoverage>
> 
> We have many metadata records with uncertain endDate because the new 
> data are being continuously loaded into the dataset.  Whenever new data 
> are loaded, we have to change the values for end date, number of 
> records, and /or size of table......  I am wondering when you can 
> provide a solution for this issue.
> 
> In addition, I found from John's email that you had a KNB data 
> management workshop early this year.  I am very interested in this kind 
> of workshop, particular workshop associated with the use of metacat.  If 
> you have this type of workshop in the future, please let me know.
> 
> Thank you very much for your support!
> 
> Xiaoping Wang
> 
> PMEL /NOAA
> 
> 
> 
> 
> 
> 
> 



More information about the Eml-dev mailing list