units sidebar
Tim Bergsma
tbergsma at kbs.msu.edu
Mon Nov 4 10:32:41 PST 2002
Chad, Matt:
attached and appended is a sidebar about units. I don't care what the
title is or where it goes, but probably someone should edit it for
length, clarity, and accuracy.
Tim
<sidebar>
<title>Philosophy of Attribute Units</title>
<para>
The concept of "unit" represents one of the most fundamental categories
of metadata. The classic example of data entropy is the case in which a
reported numeric value loses meaning due to lack of associated units.
Much of Ecology is driven by measurement, and most measurements are
inherently comparative. Good data description requires a representation
of the basis for comparison, i.e., the unit. In modelling the attribute
element, the authors of EML drew inspiration from the NIST Reference on
Constants, Units, and Uncertainty
(http://physics.nist.gov/cuu/Units/introduction.html). This document
defines a unit as "a particular physical quantity, defined and adopted
by convention, with which other particular quantities of the same kind
are compared to express their value." The authors of the EML 2.0
specification (hereafter "the authors") decided to make the unit element
required, wherever possible.
</para>
<para>
Units may also be one of the most problematic categories of metadata.
For instance, there are many candidate attributes that clearly have no
units, such as named places and letter grades. There are other
candidate attributes for which units are difficult to identify, despite
some suspicion that they should exist (e.g. pH, dates, times). In still
other cases, units may be meaningful, but apparently absent due to
dimensional analysis (e.g. grams of carbon per gram of soil). The
relationship between units and dimensions likewise is not completely
clear.
</para>
<para>
The authors decided to sharpen the model of attribute by nesting unit
under measurementScale. Measurement Scale is a data typology, borrowed
from Statistics, that was introduced in the 1940's. Under the adopted
model, attributes are classified as nominal, ordinal, interval, and
ratio. Though widely criticized, this classification is well-known and
provides at least first-order utility in EML. For example, nesting unit
under measurementScale allows EML to prevent their meaningless inclusion
for categorical data -- an approach judged superior to making unit
universally required or universally optional.
</para>
<para>
The sharpening of the attribute model allowed the elimination of the
unit type "undefined" from the standard unit dictionary (see
eml-dictionaryUnits.xml). It seemed self-defeating to require the unit
element exactly where appropriate, yet still allow its content to be
undefined. An attribute that requires a unit definition is malformed
until one is provided. The unit type "dimensionless" is preserved,
however. In EML 2.0, it is synonymous with "unitless" and represents
the case in which units cannot be associated with an attribute for some
reason, despite the proper classification of that attribute as interval
or ratio. Dimensionless may itself be an anomaly arising from the
limitations of the adopted measurement scale typology.
</para>
<para>
Closely related to the concept of unit is the concept of attribute
domain. The authors decided that a well-formed description of an
attribute must include some indication of the set of possible values for
that attribute. The set of possible values is useful, perhaps
necessary, for interpreting any particular observed value. While
universally required, attribute domain has different forms, depending on
the associated measurement scale.
</para>
<para>
The element storageType has an obvious relationship to domain. It gives
some indication of the range of possible values of an attribute, and
also gives some (potentially critical) operability information about the
way the attribute is represented or construed in the local storage
system. The storageType element seems to fall in a gray area between
the logical and physical aspects of stored data. Neither comfortable
with eliminating it nor making it required, the authors left it
available but optional under attribute.
</para>
<para>
Attributes representing dates, times, or combinations thereof (hereafter
"dateTime") were the most difficult to model in EML. Is dateTime of
type interval or ordinal? Does it have units or not? Strong cases can
be made on each side of the issue. The confusion may reflect the
limitations of the measurement scale typology. The final resolution of
the dateTime model is probably somewhat arbitrary. There was clearly a
need, however, to allow for the interoperability of dateTime formats.
EML 2.0 tries to provide an unambiguous mechanism for describing the
format of dateTime values.</para>
</sidebar>
--
Tim Bergsma
LTER Information Manager
W.K. Kellogg Biological Station
Michigan State University
Hickory Corners, MI 49060
616/671-2337
tbergsma at kbs.msu.edu
http://lter.kbs.msu.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sidebar.xml
Type: text/xml
Size: 4578 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20021104/7f3600c2/sidebar.xml
More information about the Eml-dev
mailing list