EML beta 9 suggestions
Tim Bergsma
tbergsma at kbs.msu.edu
Mon Jun 24 10:34:13 PDT 2002
Dear eml-dev:
I like EML beta9 much better than the previous models. Below are some
remaining issues. First are major issues that I feel require change.
Then some minor issues that are either easy fixes or represent
inconveniences that I can more easily accomodate.
Major Issues.
1. <dataset> should have an optional <protocol> child. Currently it
does not. <project>, <dataTable>, and <attribute> have the <protocol>
option, but not <dataset>. Peter McCartney defined 'dataset' as "the
product of a discrete research activity" (6-20-2002). It is very
natural to suppose that a discrete research activity has a protocol. As
things are now, dataset protocols must be associated with their
entities, which makes it awkward to represent a protocol which
effectively corresponds to several entities. For instance, a bird
survey protocol could generate a table of weather conditions and a table
of sightings, maybe even a set of audio recordings. Such a protocol is
represented more naturally at the dataset level than at the entity
level.
2. There is a great need in the research community to maintain a
distinction between 'protocol' and 'method'. In my thinking, 'protocol'
is a prescriptive procedure, and 'method' is a descriptive procedure.
When interpreting data, it is critical to know whether a stated
procedure represents a record of what was done, or is only an
expectation of what should have been done under normal circumstances.
For small research efforts where procedures are not repetitive, it is
sufficient merely to document that which was done. For large efforts
with repetitive procedures, it is convenient to create a prescriptive
protocol, but then essential to maintain a record asserting that the
protocol was followed, and noting the (inevitable) deviations from the
protocol. Beta9 does nothing to sharpen the distinction between
protocol and method, using the words interchangeably, as many of us do.
Perhaps we should adopt 'procedure' in place of 'protocol', with a
required switch indicating whether the procedure is prescriptive or
descriptive.
3. In Beta9, <protocol> is modeled as a sequence of <methodSteps>
(which can be described by means of paragraphs, citations, or protocol
references). This is insufficient. It is quite natural for a protocol
to have introductory material (for instance) which is not a method
step. It is also possible to write a protocol which is a network of
contingencies rather than a list of steps. For example, 'fertilizer
recommendations based on soil nitrate tests' properly constitute a
protocol, but are better represented as prose rather than as a series of
steps. Perhaps protocol (or its successor) should be a series of choice
of paragraph or methodStep. At present, I can't use the eml model to
represent fully the protocols that exist for our site.
4. EML has no facility for representing hierarchically structured
text. The structure of text is itself information. Consider the very
real situation where PI's write a project abstract that contains an
outline of the global hypotheses, complete with sub-hypotheses. To
represent this as a paragraph, I have to strip out the structure or the
outline and represent it as an ASCII stream, or I have to hard-code some
paragraph returns and tabs (which is actually display features rather
than logical features--a great leap backward). Html handles structured
text very nicely (I think) with fully-nestable ordered and unordered
lists. I propose that a paragraph should be modeled as a series (1 to
many) of choice of textString or paragraph.
5. There should be an explicit statement of the legal content of a
paragraph (if there is not already...sorry if I missed it). I suspect
there is the implicit expectation that a paragraph cannot contain tags:
it is almost certain (has already happened) that paragraph content will
end up in a web page, where unrecognized tags will be dropped from the
display, thus corrupting the content. The text of issues 1 and 3 above
are examples of paragraphs that will not display correctly in a web
page, without some reworking. EML providers and consumers need a common
expectation of content in order to avoid such errors. For instance, if
tags are legal, a consumer generating html needs to escape the gt/lt
symbols before display.
Minor Issues
6. I'd like to see a non-spatial Image entity type.
7. Spelling: parentOccurances and childOccurances should be
-Occurences.
8. Regarding <attribute>: there is in science a classic distinction
between precision and accuracy. <accuracy> is used here in that sense;
we should be aware that <precision> is not, at least not strictly.
<precision> is used here in the sense of 'least significant digit',
which may be related to but is not identical to the classical sense in
which precision represents the repeatability of a measurement, and is
statistically qualified. Precision is a messy issue. Suppose rain fall
is measured to the nearest quarter of an inch. Converted to a decimal,
quarter inches are represented as 0.25, which misleadingly suggests that
precision is at the level of hundredths of an inch. Perhaps EML should
allow a statement of precision that is not decimal-oriented.
9. Should <unit> be optional under <attribute>? Many attributes do not
have units, such as <skyCondition>sunny</skyCondition>.
10. Didn't <dataTable> have <ResponsibleParty> associated with it in an
earlier EML draft? It should not be too hard to move all my dataTable
ResponsibleParties to <dataset>, if <dataTable> will not have
<ResponsibleParty>.
11. It looks from my printout as though <distribution> is defined
somewhat differently under <resourceGroup> vs. <physical>, i.e. no
<inline> option.
12. An <alternateIdentifier> for <dataTable> would be useful.
13. An <additionalMetadata> for <dataTable> would be useful. Several
of us have a 'Comments' field associated with our dataTables, but no
natural place to put them.
Regards,
Tim Bergsma
--
Tim Bergsma
LTER Information Manager
W.K. Kellogg Biological Station
Michigan State University
Hickory Corners, MI 49060
616/671-2337
tbergsma at kbs.msu.edu
http://lter.kbs.msu.edu
More information about the Eml-dev
mailing list