[eml-dev] Revisiting the <measurementScale> categories

Fri Sep 19 09:56:34 PDT 2008

Hi Inigo,

As I'm sure you're aware, we had extensive discussions and debates on this
when setting up EML's schema.  None of us were totally happy with it, but it
seemed reasonable at the time, even though there were issues it couldn't
address.  In retrospect, I now think that it would be reasonable to collapse
the interval and ratio measurements into one category.  I think the others
(nominal, ordinal, datetime) need to stay, as one needs to collect different
metadata for each, so we need a way to differentiate them.  So I could
easily see a four-way distinction in measurement scale.  But even with that,
there would still be confusion over things like counts and how to classify
them, as they are technically dimensionless (there is no constant unit by
which a count is scaled).  Some people have extended Steven's typology and
called counts an 'Absolute' scale.

The problems in classifying measurements and representing units go much
deeper though.  People have struggled with the mechanisms we provided in EML
for creating new units, and even with the definitions of the units
themselves.  Part of the problem is a lack of a differentiation in our
current system between the entity that is being is being quantified and the
unit in which that quantity is represented, especially for derived units.
For example, its simple to say that someone measured the mass in grams,
which EML supports.  However, we don;t have a way to indicate that the
measurement was of the of mass of Carbon in grams. Moreover, they also at
times will measure the ratio between the mass of Carbon in grams and the
mass of Nitrogen in grams.  Technically this is g/g and therefore is a
dimensionless ratio.  However, it is cirtical to know what was being
measured in the numerator and denominator, and there is no way to represent
this in EML outside of natural language fields.

We've been working on providing solutions to these problems in our
Extensible Observation Ontology (OBOE), and have developed a better
mechanism for describing both what was measured and various aspects of its
measurement standard such as the units used.  These 'semantic units' are
much better at representing the edge cases like counts and dimensionless
quantities than EML is.  But its also still a work in progress. We'll be
discussing these issues at the TDWG meeting in the Observations Data Model
session in Perth in October if you happen to be there.

Matt

On Fri, Sep 19, 2008 at 7:31 AM, inigo <isangil at lternet.edu> wrote:

>
>
> For the record too. The <measurementScale> categories -- nominal, ordinal,
> ratio, interval and datetime -- are not the best solution to classify the
> recorded measurables in a data table.  The categories are not exempt from
> controversy, and the guidelines and attempts to explain the proper use and
> dissection of variable types have not succeeded in different uses.
>
> Example.  a "date" has been documented sometimes as "datetime", but also as
> a "nominal or ordinal" and event "interval or ratio" - wonderful spread.
> these type of classification ambiguity is normal to certain extent, but
> something can be done to improve both the efficiency of the data
> documentation and post-processing and community agreement on practices.
> a good number of LTER sites have in practice simplified these categories
> further.  essentially, there are 'dates', 'quantifiable measurables' and
> 'all the rest' ( all the rest includes free text such as comments,
> identifiers, pair of code-code definitions, nominals and ordinals).  This
> practice has been adopted to remove part of ambiguities that the original
> categories present, and for clarity of use.  a few people may feel that the
> differences between those categories is crystal clear - no doubt - but i
> have not found many of those.
> If exploring different categories (identifiers, codes, quantifiable
> measure, text, flags, dates..) for EML is something that no eml-dev is even
> willing to consider, perhaps reducing the number from 5 to 3 would be a
> humbler goal for the sake of efficiency.  Sure, someone may be tempted to
> divide an interval type by a ratio type with undesirable results as a
> consequence, but i think the risk is still there now (because of the use in
> practice).
> cheers, inigo
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matthew B. Jones
Director of Informatics Research and Development
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara
jones at nceas.ucsb.edu Ph: 1-907-523-1960
http://www.nceas.ucsb.edu/ecoinfo
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20080919/686fd893/attachment-0001.html>