Recommendation about information associated with EML metadata document itself
Matt Jones
jones at nceas.ucsb.edu
Mon Feb 28 13:08:27 PST 2005
Xiaoping,
I entered your request into our bug tracking system and targeted it at
EML 2.1.0. You can follow its progress and our decisions on it here:
http://bugzilla.ecoinformatics.org/show_bug.cgi?id=1991
Thanks for the feedback.
Matt
Xiaoping Wang wrote:
> Dear eml-dev:
>
> As mentioned in my email to Matt and Peter (see below), providing
> necessary timestamp inforamtion for the EML metadata document is
> important not only to metadata generators but also to metadata users.
> Both /eml/dataset/maintenance/description and
> /eml/dataset/maintenance/maintenanceUpdateFrequency in EML schemas are
> used for description of dataset, not for the metadata document itself.
> Although we can use /eml/additionalMetadata to say something about the
> metadata document, I believe that the timestamp information about the
> EML metadata document is so important that it needs to be highlighted.
> The following is my recommendataion about the way you can provide more
> information about the metadata itself.
>
> In the /eml/dataset/res:ResourceGroup, instead of using metadataProvider
> as one of the elements, use metadataInformation as suggested below:
> <metadataInformation>
> A sequency of
> <metadataProvider>
> required (comment: inforamtion about metadata
> providers is listed here)
> <metadataCreationDate> required
> (comment: the date when the metadata document is originally created)
> <metadateMaintenance> Optional
> (comment: this element is used when the metadata document needs to be
> updated in the future)
> A sequency of
> <lastUpdateDate>
> required (comment: the date of last metadata update)
> <oldVule>
> required (comment: for example, the endDate for
> rangeOfDates, numberOfRecords for an entity (table), size of entity
> (table)........ These values will be changed after new data are loaded
> into the dataset)
> <updateFrequence>
> required (comment: by comapring updateFrequency and
> lastUpdateDate, metadata developers know when they need to update their
> metadata document, and metadata users know if the metadata document
> describes the most current information about the dataset)
>
> These are the necessary elements that I think they should be provided in
> EML metadata document. Hope my recommendation helps.
>
> Thank you very much for your support.
>
> Xiaoping Wang
>
> PMEL /NOAA
>
> Matt Jones wrote:
>
>> Hi Xiaoping,
>>
>> As Peter mentioned, your problems have arisen before. See below for
>> some additional recommendations beyond Peter's from my personal
>> perspective.
>>
>> Xiaoping Wang wrote:
>>
>>> Dear Matt and Peter:
>>>
>>> I have seen a lot of discussions recently on issues about measurement
>>> scale and temporal coverage. They are very helpful for our better
>>> understanding of EML. The following are my questions and concerns I
>>> raised during my work on our EML-based metadata. <#temporalCoverage>
>>>
>>> 1. About the Measurement scale
>>>
>>> The measurementSclae is a little bit confusing. I spent a lot of
>>> time working on the measurementScale for nominal data. Here I want
>>> to give you an example about how I use the measurmentScale to
>>> describe nominal data in our dataset, and you can see whether my
>>> implementation is based on correct understanding of this element.
>>>
>>> We have a data table with four columns (attributes): recordID,
>>> variable_name, variable_unit, and avriable_value. The values for
>>> variable_name column include certain measurements for the chemical
>>> and physical properites of sea water such as temperature, salinity,
>>> nitrate...... The following is a sample piece of my EML file for
>>> this dataset.
>>> - <#> <attribute>
>>> <attributeName>varName</attributeName>
>>> <attributeDefinition>Name of chemical or physical property
>>> measured</attributeDefinition>
>>> <storageType>String</storageType>
>>> - <#> <measurementScale>
>>> - <#> <nominal>
>>> - <#><nonNumericDomain>
>>> - <#><enumeratedDomain>
>>> - <#><codeDefinition>
>>> <code>T</code>
>>> <definition>Temperature, unit: C</definition>
>>> </codeDefinition>
>>> - <#> <codeDefinition>
>>> <code>S</code>
>>> <definition>Salinity, unit: PPT</definition>
>>> </codeDefinition>
>>> - <#><codeDefinition>
>>> <code>ST</code>
>>> <definition>Sigma-T, unit: KG/M**3</definition>
>>> </codeDefinition> <#>
>>> </enumeratedDomain>
>>> </nonNumericDomain>
>>> </nominal>
>>> </measurementScale>
>>> </attribute>
>>> - <#> <attribute>
>>> <attributeName>varUnit</attributeName>
>>> <attributeDefinition>Unit of chemical or physical property
>>> measured</attributeDefinition>
>>> <storageType>String</storageType>
>>> - <#> <measurementScale>
>>> - <#> <nominal>
>>> - <#> <nonNumericDomain>
>>> - <#> <textDomain>
>>> <definition>*</definition>
>>> </textDomain>
>>> </nonNumericDomain>
>>> </nominal>
>>> </measurementScale>
>>> </attribute>
>>>
>>> My questions / concerns are:
>>> (1) Is it suitable to use enumeratedDomain element to describe varName?
>>
>>
>> Yes, that is fine, although if you wanted it to be free text that
>> would be ok too (just use textDomain instead of enumeratedDomain).
>> Encoding the unit information in the variable name is somewhat
>> repetitive if you have the same unit information in the varUnit column.
>>
>>>
>>> (2) For the varUnit, I don't think it is necessary to include
>>> measurementScale element. However, since the measurementScale is an
>>> required field, I have to put something there in order to pass the
>>> EML validation. So I put a "*" sign for the definition element. I
>>> have seen some other similar cases in which the EML metadata
>>> developers use a "*" for the definition element. Obviously, the
>>> measurementScale content described here tells no useful information
>>> about the varUnit.
>>
>>
>> The use of the '*' is inappropriate. The field is required because
>> the authors of EML thought the information was important. In this
>> case, I think you should put in the definition something that
>> indicates that the values are names of units. One major thing that is
>> missing here is that you don't use the EML Unit Dictionary when
>> choosing your unit definitions. This eliminates the major advantage
>> of EML in being able to provide quantitative information about units.
>> If there is a 1:1 correspondence between your units and the EML unit
>> dictionary, I think it would be good if you defined varUnit as an
>> enumerated domain and for each of your units provide the EML standard
>> name for the unit in the definition. This would help in translating,
>> although it is unlikely that anyone could use this in automated
>> systems because its such a non-standard use of the eml descriptors.
>>
>> In general, this model of variablename, varunit, value is a
>> non-standard use of the relational model as the attributes do not
>> really represent a single type. The relational model is generally
>> intended to have attributes that contain a semantically homogenous set
>> of values. In your case this is not true, unless considered from a
>> meta-level. So, I think you are using the relational model as a
>> schema language itself. This significantly complicates use of the data
>> in standard analytical systems (e.g., SAS< Splus, R, Matlab) -- they
>> basically all require different views of the data as described in
>> Peter's note. Personally I think that documenting these more
>> traditional views if you have them would be far more useful to
>> scientists who wish to analyze the data. That would have the added
>> benefit of being better described by EML structures. Documenting your
>> "meta-level" schema isn't particularly informative because the
>> information in one attribute is so heterogeneous.
>>
>>>
>>> 2. About the information of metadata itself
>>>
>>> Based on my understanding of EML schemas, the only inforamtion
>>> associated with the metadata itself is the information about metadata
>>> provider(s). However, my supervisors and I think that it is
>>> important to provide other metadata information, such as when
>>> metadata document is created, if further update of metadata is neede,
>>> and if the answer is yes, what is the metadata update frequency and
>>> the date of last update. Those pieces of information are
>>> particularly important in the case when the endDate value for the
>>> dataset from on-going projects is going to change, because first they
>>> can remind metadata providers / developer when they should update
>>> their metadata, and second they can tell metadata users if the
>>> metadata document provides the most current information about the
>>> dataset described.
>>
>>
>> Sure. In hindsight, I think we should have included these metadata
>> information fields, particularly the timestamp fields. But we do have
>> some related fields that describe ongoing data collection. Take a
>> look at /eml/dataset/maintenance/description and
>> /eml/dataset/maintenance/maintenanceUpdateFrequency. The latter is
>> probably what you want. Ay fields that you want but that don't exist
>> in the schema can be put in the "/eml/additionalMetadata" field, so
>> you always have that as a recourse. If you have specific
>> recommendations for fields that are needed you could send them to
>> eml-dev at ecoinformatics.org and we'll try to get them into plans for a
>> future release.
>>
>>>
>>> 3. About the temporal coverage <#temporalCoverage>
>>>
>>> We have many metadata records with uncertain endDate because the new
>>> data are being continuously loaded into the dataset. Whenever new
>>> data are loaded, we have to change the values for end date, number of
>>> records, and /or size of table...... I am wondering when you can
>>> provide a solution for this issue.
>>
>>
>> Personally I think this is a good thing. At any given point in time
>> there is a finite amount of data available, and the metadata should
>> describe that. If you have an automated data collection process, then
>> you would simply have to update your metadata as part of that process.
>> The number of records, table size, and checksum are useful when people
>> get your data to validate that they got the data without error. The
>> end date for temporal coverage provides valuable discovery
>> information, and should simply be made to match the data that you
>> release.
>>
>>>
>>> In addition, I found from John's email that you had a KNB data
>>> management workshop early this year. I am very interested in this
>>> kind of workshop, particular workshop associated with the use of
>>> metacat. If you have this type of workshop in the future, please let
>>> me know.
>>
>>
>> Yeah, we had one in February. We announce these opportunities on
>> various web sites and mailing lists. You should subscribe to
>> ecoinfo at ecoinformatics.org and watch http://seek.ecoinformatics.org in
>> particular for announcements.
>>
>> Like Peter I also recommend that you get involved in the ongoing
>> improvements related to EML. Your feedback and contributions would be
>> extremely vauable. Good luck. Let us know if you have more questions.
>>
>> Matt
>>
>>>
>>> Thank you very much for your support!
>>>
>>> Xiaoping Wang
>>>
>>> PMEL /NOAA
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
> _______________________________________________
> eml-dev mailing list
> eml-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/eml-dev
--
-------------------------------------------------------------------
Matt Jones jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/ Fax: 425-920-2439 Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California Santa Barbara
Interested in ecological informatics? http://www.ecoinformatics.org
-------------------------------------------------------------------
More information about the Eml-dev
mailing list