EML 2 Beta 9 Feedback From LTER EML Workshop #2
Ken Ramsey
kramsey at jornada.nmsu.edu
Thu Jun 27 14:35:37 PDT 2002
I find the new schema much easier to work with in mapping the Jornada
Basin LTER site metadata documentation files to EML. I also realize that
the triple concept will most likely reappear in future releases of EML
in order for more advanced integration and usefulness with higher level
application development, such as a Semantic Web. I believe the new model
will ease the burden of mapping for sites currently participating in the
EML project and allow centralized development of conversion style sheets
to migrate to future EML versions that may support the triple packaging
method. Please forgive any misuse of any XML terminology, as I am
relatively new to XML.
I have the following comments to submit regarding EML 2 Beta 9 which
comes largely from a breakout working group session at the second LTER
EML Workshop held in Phoenix, AZ last week:
Dataset Element:
1. It would be useful to add an optional Protocol element under the
Dataset element.
2. It would be useful to add an optional Status element to indicate
whether the dataset is Ongoing or Closed. This element would enable
users to determine whether there may be more current data available than
is currently on-line, which could possibly be accessed by contacting the
Contact for the dataset.
3. It would be useful to add an optional DesignDescription element
under the Dataset element. The current theme of having DesignDescription
under project may fit those projects that contain multiple datasets
using the same design, but it does work so easily with projects that
contain multiple datasets that have different designs.
4. It would be useful if the Dataset element could be embedded as an
optional zero-to-many element within the Dataset element. This would be
useful in documenting datasets derived from multiple datasets which may
belong to multiple projects (for example, cross-site synthesis
studies).
5. It would be useful to create a new entity type, PhotoImage, to
support aerial photos that are not georectified as well as still or
video images. As an example, a collection of non-georectified aerial
photos or photographs of plant line transects taken over time could be
considered datasets. It would be nice to be able to document these types
of datasets within EML.
ResearchProject Element:
1. It would be useful if there were an optional Protocol element under
the ResearchProject element.
2. It would be useful if the Project Entity could be embedded as an
optional zero-to-many element within the Project element. This would
allow projects to reference multiple projects. It was mentioned at the
workshop that this was intended for Beta 9, but somehow got removed in
the rush to prepare Beta 9 for the workshop.
AttributeList Element:
1. Precision needs to be expanded to cover other levels of precision
besides base 10 numerical values. For example, quarters of an hour
(base4).
2. It would be useful if there were some mechanism to associate or
comment on row (or table) level domain constraints. For example, how
would you document that column1 valid values are from either A-E or 1-9
depending on whether column2 value was between 1-3 or 4-6. I am not sure
of the 'best' way to document these domain constraints. One could add
optional paragraphs elements or add a constraints element under
AttributeDomain. Another approach would be to extend Constraints to
include Domain element(s).
Coverage Element:
1. It would be useful if there were a DateType element under the
TemporalCoverage/SingleDateTime element to allow classifying dates as
start or end dates, as one example.
ResponsibleParty Resource Group:
1. It would be useful if an optional (zero-to-many) Coverage element
were added under the ResponsibleParty resource group. This would allow
associating a person's role in a dataset or project with a date range. I
believe that many LTER sites track this type of metadata.
2. It would also be useful if an optional ContactInformationValid
element under ResponsibleParty to denote whether the contact information
is current, inaccurate, or suspect. This could also allow flagging
whether the person is retired, deceased, or still with the
organization.
General Comments/Questions:
1. Shouldn't the Sampling element be considered a Protocol?
2. It would be very useful if something similar to
Eml/AdditionalMetadata were added to most, if not all elements (Resource
Groups/Entities ?). I was under the mistaken impression that EML would
be extensible. Chris Jones informed us that it was in the sense that a
site could add be Resource Groups (?), but could not add new elements to
existing Resource Groups (?) without producing invalid EML. I think that
adding the capability to at least add paragraphs describing additional
metadata that is not addressed in EML would make Morpho/Metacat and EML
more attractive to sites without an RDBMS metadata system who want to
adopt Morpho/Metacat as its primary metadata management system instead
of files stored on a traditional filing system. It would also allow
sites to ensure that they can document metadata as fully as in EML as
with current ASCII files, HTML files, or RDBMS tables.
3. It would be useful if a tool or API call could be made for Metacat
to allow sites to compact versions of EML files, if desired. Not all
sites care about versioning metadata files, let alone data files. As
described to me on the last day of the workshop, there is already a
mechanism in Metacat to allow sites to control access to versions of EML
files. This is a good feature, but I believe that some researchers will
be reluctant to archive there data or metadata in a Metacat repository
if they cannot be certain that secondary users of there data/metadata
are not publishing synthetic studies based on some out-of-date data or
associated metadata. Having the ability to compress versions could be
much easier to manage than keeping track of access control to previous
versions of metadata or data.
4. It would be extremely useful if a glossary were developed to
accompany the EML schema/dtd files to better define terms such as
dataset and project (or ResearchProject) in order to minimize confusion
when mapping to EML. The definition of dataset and project appeared to
be a huge impediment to even starting the process of mapping to EML. I
am not sure that simply changing the 'project' or 'research project'
element name would alleviate this problem. At least by providing a
glossary, sites would have a starting point in populating EML. Our
working group spent an inordinate amount of time trying to simply define
dataset and project. We came up with almost as many definitions as there
were people in the working group.
Let me know if you need clarification on any of the points I have
raised.
Ken
----------------------------------------
Ken Ramsey
Data Manager
Jornada Basin LTER Project
New Mexico State University
Box 30003, MSC 3JER
Las Cruces, NM 88003
(505)646-7918 (office)
(505)646-5665 (fax)
keramsey at nmsu.edu
More information about the Eml-dev
mailing list