measurementScale Decision Tree

Mon Feb 3 13:36:28 PST 2003

In many cases the relationship between measurement scale and units is
obvious enough to make a reasonable guess, but it really is dependent on
what you do with it. Meters used as a length measurement is ratio, but
meters used as the unit in a coordinate system like utm is interval since
zero is arbitrary. 

Our desion tree pretty much follows what you have. Im not sure you can go
much farther without having some semantic info to go on, especially for
recognizing interval and ordinal scale data. 

I think I was the one that said these were the least common in ecology,
based on the frequency ive been using them as I mark up Cap data. Most of
what I encounter have beem measurements of distance, area, volume, rates,
etc. or nominal classifications (or plain text). 

A great deal of legacy data is likely to use numeric codes for what are
inherently nominal or ordinal data (all our taxon codes are represented as
numeric IDs, for example). So no, I wouldn't expect ordinal or nominal to
always be text

Coming from a state that produced Ev Meacham and Fife Symington, I have no
comment on Governor's statements. But I would offer a quote from E. B.
White: "Some Pig". 

Peter McCartney (peter.mccartney at asu.edu)
Center for Environmental-Studies
Arizona State University

-----Original Message-----
From: Scott Chapal [mailto:scott.chapal at jonesctr.org] 
Sent: Monday, February 03, 2003 2:12 PM
To: eml-dev
Subject: measurementScale Decision Tree

My original query was intended to see how/if people are auto-assigning
measurementScale more so than to call in to question design decisions.
Sorry.

I am having trouble sorting this out myself.  Looks like Dan is too. (BTW,
Dan, thanks for the long and thoughtful email).

As a voter (in hindsight), I got impatient. The typology 'seemed' like a
satisfactory way to classify and solve the unit conundrum...but I didn't
spend sufficient effort creating EML 2.0 instance documents at that stage.
I am now.  Like Ann Richards once said about a challenger to the
governorship, if you put lipstick on a pig, it's still a *PIG*.

However, rather than brand this an unhealthy or counterproductive debate
over EML 2.0 design decisions, I'd rather:

1) View these discussions as future-EML-version pre-planning.

2) Develop a data-driven decision tree for picking measurementScale.

Here's what EML-attribute presents:

                    measurementScale
                   /               \ 
      dateTimeDomain             Stevens
                                /      \
                      numericDomain   nonNumericDomain
                        /    \            /    \
                    ratio   interval    nominal ordinal

So lets apply some lipstick.

Decision Tree:

(Assuming that we are starting from structured data or metadata).

1) Isolate dates/times.

   - Test: "Are the values dates, times, datetimes?"

     - Implies: Corresponding storageType:
       dateTime
       duration
       date
       time
       gYearMonth
       gYear
       gMonthDay
       gDay
       gMonth

Otherwise:

2) numeric or nonNumeric?
   Test:  " Is it a number?"
     -Implies: Corresponding storageType:
      float
      decimal (or derived type)
      double

   2.1) Numeric
   2.1.1) Is it measured from an origin?  RATIO
   - Test "Are negative values permissible?"
     -Implies: Corresponding storageType:
      decimal (or nonNegative derived type)
   Examples:
       Temperature K
       Volumetric flow (cubic m/s)

   2.1.2) Are the permissible values "evenly spaced". INTERVAL
   - Test: "Has definable precision, negative values permissable."
     -Implies precision and range check.

     Examples:
     Temperature C, F

   Otherwise:

   2.2) nonNumeric
   - Implies: Corresponding storageType
     Boolean
     String

   2.2.1) "Are the values ordered?". ORDINAL
   - Test: "Values comprise an enumerated, sortable list"
     -Implies: List exists.

    Examples:
     Species Code
     Quality Scale ( A, B, C etc.)

   Otherwise:

   2.2.2 Has identifiable, distinguishable values (numerals, strings etc.)
NOMINAL

   - Test "Is a string?"

     - Implies: Corresponding storageType
     String

    Examples:
    Species Name         
    Comment
_________________________________________________________________

Questions/Issues.

   - Ordinal data can be numeric can't it?  
     That is: "Has numeric order (that can withstand transformation)."?
     So that means that ordinal data must be represented as 'string'
     for EML's purposes?

   - Do we really know that the following are true?

> > "...interval and ordinal scales are much less common than ratio and  
> > nominal scales in ecological data"

> > Someone has pointed out choice of measurement scale on the Stevens 
> > typology depends in part on what you want to do with the data. So, 
> > for instance, measurement scale can't be predicted from, say, 
> > DictionaryUnit

Should we try to associate probable measuremenScale choice with each
DictionaryUnit?  It seems to me that many of them are pretty obvious.

I am not trying to cause consternation among the ranks.  I want to be able
to auto-generate EML from well-structured data.  This is based on my belief
that EML must be [almost] entirely auto-generated for it to succeed.
Anything less will present a barrier-to-adoption which will be very hard to
overcome.  If this means that the data model needs to be enhanced to
explicitly accomodate Stevens typology, then fine.

-Scott

-- 
\SEC
_______________________________________________
eml-dev mailing list
eml-dev at ecoinformatics.org
http://www.ecoinformatics.org/mailman/listinfo/eml-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20030203/23ede052/attachment.htm