the relationship of NIST conceptology to EML

Tim Bergsma tbergsma at kbs.msu.edu
Tue Nov 5 09:27:24 PST 2002


Matt, and others interested,

The "something important" I was missing in the email quoted below was
the NIST distinction between quantities in the general sense and units. 
Length and elevation are quantities in the general sense.  Meters is a
unit:  a particular physical quantity.  Measurement scale can be
specified for a quantity in the general sense.  But a given unit can be
invoked under different measurement scales.

Mostly to promote my own understanding of EML, I'll try to map the NIST
concept definitions onto the EML specification.

NIST "quantity in the general sense" is EML unitType at name
(unitDictionary).  Mass, length, speed, transmissivity.  It is also
suggested by attributeName if the attribute is quantitative.

NIST "quantity in the particular sense" is represented indirectly in EML
as the association of a quantitative attributeName with a particular
record.  NIST gives the examples "mass of the moon" and "charge of a
proton".  In a data table environment, an example might be
"[StreamDischarge] for [Watershed3]".  This is not to be confused with
the content of the corresponding cell:  see below.

NIST "unit" is EML unit at name (unitDictionary) and customUnit
equivalents.  Kilogram, meter, milesPerSecond, tesla.  

NIST "value of a physical quantity" is represented indirectly in EML as
the combination of a quantitative datum (a number from a table) with the
associated unit listed in the metadata.  The two are logically
inseparable (number and unit) but are usually physically separated for
efficiency.  Yet the following is legal:

Bird	length
1	2.5 cm
2	3.5 cm
3	4.8 cm
4	2.0 cm

While extremely awkward for machine parsing, this format is technically
perfect, prevents data entropy, and may actually exist out there
somewhere.

Note on EML "standard quantities in the general sense".  I believe that
for practical reasons we collapse two derived unitTypes if they have the
same derivation.  For instance, "capacity" is expressed in length^3, so
it is redundant to "volume" and we don't need it.  I'm certainly not
recommending any changes, but I suspect there are cases where the
assumption does not hold.  For instance, the newton-meter (length *
force) is a measure of energy, but torque is also measured in
newton-meters, and has a very different physical meaning. 

Note also that "elevation", a perfectly legal "quantity in the general
sense", could be considered a (trivial) derivation from "length". 
Length seems to be a simple, extensive quantity, whereas elevation seems
to be a coordinate system assembled from length elements.  Calendar and
Celsius seem to have similar behaviors.  Seems like extensive quantities
are usually ratio scales, because negative extension is meaningless: 
length, time, mass, volume, speed.  But coordinate systems may be
interval scales, such as elevation.  Volume can also be the basis for an
interval scale:  even when my gas gauge reads dead empty, there's still
enough fuel to get to the next service station! 

regards,

Tim.

-------- Original Message --------
Subject: measurement scale typology is inadequate
Date: Mon, 04 Nov 2002 10:14:14 -0500
From: Tim Bergsma <tbergsma at kbs.msu.edu>
Organization: W. K. Kellogg Biological Station
To: Matt Jones <jones at nceas.ucsb.edu>,"Eml-Dev (E-mail)"
<eml-dev at ecoinformatics.org>,David Blankman <dblankman at lternet.edu>
References: <3DC2D91C.73FC5B3 at kbs.msu.edu>
<3DC2DDBA.3010505 at nceas.ucsb.edu>

Matt,

I retract my silly idea of preclassifying standard units by measurement
scale!

Consider meters.  Obviously ratio, right?  But elevation is measured in
meters, often relative to sea level.  So what is the ratio of Mt.
Everest to sea level?  You get a divide-by-zero error.  What about the
elevation of the Dead Sea, or Holland?  Suddenly, meters is looking
rather interval.

Seems to me like I'm missing something important.  What is the
relationship between "elevation" and "meters"?  How about "distance" and
"meters"?  When you subtract two elevations, you get a distance, not a
third elevation.  When you subract two dates, you get a duration, not a
third date.  It is meaninless to add two dates:  What is 1 July 2002
plus 1 September 2002?  Similarly, it is meaningless to add two GPS
locations, but meaningful to subtract them (but you get a distance, not
a third GPS point).  What about celsius?  You can certainly add 20
degrees and 30 degrees, but what does it mean?  You can subtract two
temperatures, but do you get a third temperature?  No.  You can express
the result in the same units, but there is a big difference between the
idea that something is 10 celsius degrees colder and the idea that
something is 10 celsius degrees.  One is an offset, like duration and
distance, and the other is a reference to a coordinate system, like
calendar and geoposition.

At this point, I've done enough damage to eml to disqualify myself from
commenting further on where to put dateTime.  I'd like to suggest,
however, that the problem may be the typology itself.  Surfing the web,
I stumbled across an article "Nominal, Ordinal, Interval, and Ratio
Typologies are Misleading"
http://www.spss.com/research/wilkinson/Publications/Stevens.pdf
(attached), which should be required reading for eml application
developers.  With respect to EML 2.0, I think the ability to describe
the format of a dateTime is absolutely critical, if for no other reason
than to trap the Old World habit of using day-month rather than
month-day. If we are specific now, we can map 2.0 dateTimes to a better
location when 3.0 comes out.  I'd like to hear what David thinks about
all this.

Tim.

P.S. why not make dateTime it's own measurementScale?  (oops!  I said I
wasn't gonna comment.  Sorry.)



Matt Jones wrote:
> 
> Tim,
> 
> So, are you saying that "unit" implies measurement scale?  That is
> probably true, but it would be much harder to nest measurement scale in
> unit as not all measurement scales have units, and there are infinite
> numbers of units.  An application that digests the units and thereby can
> suggest (or even automatically fill in) a measurement scale would make
> the whole thing more usable.  But I think that is an application issue
> more than an EML one.  I don't see a need for any changes to the schemas
> to enable applications to guide users based on units.
> 
> Can you outline a bit more concretely what you were proposing?
> 
> BTW, I found that I fully agreed with your treatise on datetime as a
> coordinate system.  You said "Arbitrariness is handled by projecting the
> points onto an evenly divided, "true" interval scale, such as a string
> of named seconds starting in 1970."  I fully fully agree that this is
> the case here. Datetimes are ordinals (given any two datetime values,
> they always have a pre-determined ordering).  Datetimes can be projected
> onto an interval scale such as seconds since 1970.  The fact that most
> scientists consider datetimes as interval values is more from pragmatism
> than theoretical correctness.  We could move the datetime format string
> out of unit and into ordinal if people prefer.  THis would be more
> technically correct, would also eliminate our problem with the precision
> field for datetime values, and would remove the format string from
> UnitType so it would no longer show up under interval (and ratio, which
> is good).  It would introduce a need for dateTimeDomain in ordinal,
> which would be ok too.
> 
> Does anyone have a preference about whether this should in fact be in
> ordinal or in interval?
> 
> Matt
> 
> Tim Bergsma wrote:
> > Matt,
> >
> > We know already the measurementScale for every dictionary unit.  But all
> > the units are available under interval and rato (in fact,
> > formattedDateTimeUnit also appears under ratio!).  Is there a way to
> > represent our prior knowledge of scale so that applications can provide
> > some guidance to users?
> >
> > Tim.
> 
> --
> *******************************************************************
> Matt Jones                                    jones at nceas.ucsb.edu
> http://www.nceas.ucsb.edu/    Fax: 425-920-2439   Ph: 907-789-0496
> National Center for Ecological Analysis and Synthesis (NCEAS)
> 
> Interested in ecological informatics? http://www.ecoinformatics.org
> *******************************************************************
> 
> _______________________________________________
> eml-dev mailing list
> eml-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/eml-dev

-- 
Tim Bergsma
LTER Information Manager
W.K. Kellogg Biological Station
Michigan State University
Hickory Corners, MI   49060
616/671-2337
tbergsma at kbs.msu.edu
http://lter.kbs.msu.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Stevens.pdf
Type: application/pdf
Size: 62862 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20021105/e93b9b81/Stevens.pdf


More information about the Eml-dev mailing list