units sidebar

Chad Berkley berkley at nceas.ucsb.edu
Mon Nov 4 14:31:38 PST 2002


Hi,

I've attached the rendered html of Tim's sidebar including a couple of
corrections that I made.  Please look it over and tell me if you see any
errors. Look at section 2.5.2 under the heading "Philosophy of Attribute
Units".

Great job on that Tim!

thanks,
chad

On Mon, 2002-11-04 at 10:32, Tim Bergsma wrote:
> Chad, Matt:
> 
> attached and appended is a sidebar about units.  I don't care what the
> title is or where it goes, but probably someone should edit it for
> length, clarity, and accuracy.
> 
> Tim
> 
> 
> <sidebar>
>  <title>Philosophy of Attribute Units</title>
>   <para>
> The concept of "unit" represents one of the most fundamental categories
> of metadata. The classic example of data entropy is the case in which a
> reported numeric value loses meaning due to lack of associated units. 
> Much of Ecology is driven by measurement, and most measurements are
> inherently comparative.  Good data description requires a representation
> of the basis for comparison, i.e., the unit.  In modelling the attribute
> element, the authors of EML drew inspiration from the NIST Reference on
> Constants, Units, and Uncertainty
> (http://physics.nist.gov/cuu/Units/introduction.html).  This document
> defines a unit as "a particular physical quantity, defined and adopted
> by convention, with which other particular quantities of the same kind
> are compared to express their value."  The authors of the EML 2.0
> specification (hereafter "the authors") decided to make the unit element
> required, wherever possible.
>  </para>
>  <para>
> Units may also be one of the most problematic categories of metadata. 
> For instance, there are many candidate attributes that clearly have no
> units, such as named places and letter grades.  There are other
> candidate attributes for which units are difficult to identify, despite
> some suspicion that they should exist (e.g. pH, dates, times).  In still
> other cases, units may be meaningful, but apparently absent due to
> dimensional analysis (e.g. grams of carbon per gram of soil).  The
> relationship between units and dimensions likewise is not completely
> clear.
>  </para>
>  <para>
> The authors decided to sharpen the model of attribute by nesting unit
> under measurementScale.  Measurement Scale is a data typology, borrowed
> from Statistics, that was introduced in the 1940's.  Under the adopted
> model, attributes are classified as nominal, ordinal, interval, and
> ratio.  Though widely criticized, this classification is well-known and
> provides at least first-order utility in EML.  For example, nesting unit
> under measurementScale allows EML to prevent their meaningless inclusion
> for categorical data -- an approach judged superior to making unit
> universally required or universally optional.
>  </para>
>  <para>
> The sharpening of the attribute model allowed the elimination of the
> unit type "undefined" from the standard unit dictionary (see
> eml-dictionaryUnits.xml).  It seemed self-defeating to require the unit
> element exactly where appropriate, yet still allow its content to be
> undefined.  An attribute that requires a unit definition is malformed
> until one is provided.  The unit type "dimensionless" is preserved,
> however.  In EML 2.0, it is synonymous with "unitless" and represents
> the case in which units cannot be associated with an attribute for some
> reason, despite the proper classification of that attribute as interval
> or ratio.  Dimensionless may itself be an anomaly arising from the
> limitations of the adopted measurement scale typology.
>  </para>
>  <para>
> Closely related to the concept of unit is the concept of attribute
> domain.  The authors decided that a well-formed description of an
> attribute must include some indication of the set of possible values for
> that attribute.  The set of possible values is useful, perhaps
> necessary, for interpreting any particular observed value.  While
> universally required, attribute domain has different forms, depending on
> the associated measurement scale.
>  </para>
>  <para>
> The element storageType has an obvious relationship to domain.  It gives
> some indication of the range of possible values of an attribute, and
> also gives some (potentially critical) operability information about the
> way the attribute is represented or construed in the local storage
> system.  The storageType element seems to fall in a gray area between
> the logical and physical aspects of stored data.  Neither comfortable
> with eliminating it nor making it required, the authors left it
> available but optional under attribute.
>  </para>
>  <para>
> Attributes representing dates, times, or combinations thereof (hereafter
> "dateTime") were the most difficult to model in EML.  Is dateTime of
> type interval or ordinal?  Does it have units or not?  Strong cases can
> be made on each side of the issue.  The confusion may reflect the
> limitations of the measurement scale typology.  The final resolution of
> the dateTime model is probably somewhat arbitrary.  There was clearly a
> need, however, to allow for the interoperability of dateTime formats. 
> EML 2.0 tries to provide an unambiguous mechanism for describing the
> format of dateTime values.</para>
> 
> </sidebar>
> 
> -- 
> Tim Bergsma
> LTER Information Manager
> W.K. Kellogg Biological Station
> Michigan State University
> Hickory Corners, MI   49060
> 616/671-2337
> tbergsma at kbs.msu.edu
> http://lter.kbs.msu.edu
> ----
> 

> <sidebar>
>  <title>Philosophy of Attribute Units</title>
>   <para>
> The concept of "unit" represents one of the most fundamental categories of metadata. The classic example of data entropy is the case in which a reported numeric value loses meaning due to lack of associated units.  Much of Ecology is driven by measurement, and most measurements are inherently comparative.  Good data description requires a representation of the basis for comparison, i.e., the unit.  In modelling the attribute element, the authors of EML drew inspiration from the NIST Reference on Constants, Units, and Uncertainty (http://physics.nist.gov/cuu/Units/introduction.html).  This document defines a unit as "a particular physical quantity, defined and adopted by convention, with which other particular quantities of the same kind are compared to express their value."  The authors of the EML 2.0 specification (hereafter "the authors") decided to make the unit element required, wherever possible.
>  </para>
>  <para>
> Units may also be one of the most problematic categories of metadata.  For instance, there are many candidate attributes that clearly have no units, such as named places and letter grades.  There are other candidate attributes for which units are difficult to identify, despite some suspicion that they should exist (e.g. pH, dates, times).  In still other cases, units may be meaningful, but apparently absent due to dimensional analysis (e.g. grams of carbon per gram of soil).  The relationship between units and dimensions likewise is not completely clear.
>  </para>
>  <para>
> The authors decided to sharpen the model of attribute by nesting unit under measurementScale.  Measurement Scale is a data typology, borrowed from Statistics, that was introduced in the 1940's.  Under the adopted model, attributes are classified as nominal, ordinal, interval, and ratio.  Though widely criticized, this classification is well-known and provides at least first-order utility in EML.  For example, nesting unit under measurementScale allows EML to prevent their meaningless inclusion for categorical data -- an approach judged superior to making unit universally required or universally optional.
>  </para>
>  <para>
> The sharpening of the attribute model allowed the elimination of the unit type "undefined" from the standard unit dictionary (see eml-dictionaryUnits.xml).  It seemed self-defeating to require the unit element exactly where appropriate, yet still allow its content to be undefined.  An attribute that requires a unit definition is malformed until one is provided.  The unit type "dimensionless" is preserved, however.  In EML 2.0, it is synonymous with "unitless" and represents the case in which units cannot be associated with an attribute for some reason, despite the proper classification of that attribute as interval or ratio.  Dimensionless may itself be an anomaly arising from the limitations of the adopted measurement scale typology.
>  </para>
>  <para>
> Closely related to the concept of unit is the concept of attribute domain.  The authors decided that a well-formed description of an attribute must include some indication of the set of possible values for that attribute.  The set of possible values is useful, perhaps necessary, for interpreting any particular observed value.  While universally required, attribute domain has different forms, depending on the associated measurement scale.
>  </para>
>  <para>
> The element storageType has an obvious relationship to domain.  It gives some indication of the range of possible values of an attribute, and also gives some (potentially critical) operability information about the way the attribute is represented or construed in the local storage system.  The storageType element seems to fall in a gray area between the logical and physical aspects of stored data.  Neither comfortable with eliminating it nor making it required, the authors left it available but optional under attribute.
>  </para>
>  <para>
> Attributes representing dates, times, or combinations thereof (hereafter "dateTime") were the most difficult to model in EML.  Is dateTime of type interval or ordinal?  Does it have units or not?  Strong cases can be made on each side of the issue.  The confusion may reflect the limitations of the measurement scale typology.  The final resolution of the dateTime model is probably somewhat arbitrary.  There was clearly a need, however, to allow for the interoperability of dateTime formats.  EML 2.0 tries to provide an unambiguous mechanism for describing the format of dateTime values.</para>
> 
> </sidebar>
-- 
-----------------------
Chad Berkley
National Center for 
Ecological Analysis 
and Synthesis (NCEAS)
berkley at nceas.ucsb.edu
-----------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20021104/663258ef/index.html


More information about the Eml-dev mailing list