measurementScale Decision Tree

Scott Chapal scott.chapal at jonesctr.org
Mon Feb 3 13:11:46 PST 2003


My original query was intended to see how/if people are auto-assigning
measurementScale more so than to call in to question design decisions.
Sorry.

I am having trouble sorting this out myself.  Looks like Dan is too.
(BTW, Dan, thanks for the long and thoughtful email).

As a voter (in hindsight), I got impatient. The typology 'seemed' like
a satisfactory way to classify and solve the unit conundrum...but I
didn't spend sufficient effort creating EML 2.0 instance documents at
that stage.  I am now.  Like Ann Richards once said about a challenger
to the governorship, if you put lipstick on a pig, it's still a *PIG*.

However, rather than brand this an unhealthy or counterproductive
debate over EML 2.0 design decisions, I'd rather:

1) View these discussions as future-EML-version pre-planning.

2) Develop a data-driven decision tree for picking measurementScale.

Here's what EML-attribute presents:

                               
                    measurementScale
                   /               \ 
      dateTimeDomain             Stevens
                                /      \
                      numericDomain   nonNumericDomain
                        /    \            /    \
                    ratio   interval    nominal ordinal


So lets apply some lipstick.

Decision Tree:

(Assuming that we are starting from structured data or metadata).

1) Isolate dates/times.

   - Test: "Are the values dates, times, datetimes?"

     - Implies: Corresponding storageType:
       dateTime
       duration
       date
       time
       gYearMonth
       gYear
       gMonthDay
       gDay
       gMonth

Otherwise:

2) numeric or nonNumeric?
   Test:  " Is it a number?"
     -Implies: Corresponding storageType:
      float
      decimal (or derived type)
      double

   2.1) Numeric
   2.1.1) Is it measured from an origin?  RATIO
   - Test "Are negative values permissible?"
     -Implies: Corresponding storageType:
      decimal (or nonNegative derived type)
   Examples:
       Temperature K
       Volumetric flow (cubic m/s)

   2.1.2) Are the permissible values "evenly spaced". INTERVAL
   - Test: "Has definable precision, negative values permissable."
     -Implies precision and range check.

     Examples:
     Temperature C, F

   Otherwise:

   2.2) nonNumeric
   - Implies: Corresponding storageType
     Boolean
     String

   2.2.1) "Are the values ordered?". ORDINAL
   - Test: "Values comprise an enumerated, sortable list"
     -Implies: List exists.

    Examples:
     Species Code
     Quality Scale ( A, B, C etc.)

   Otherwise:

   2.2.2 Has identifiable, distinguishable values (numerals, strings etc.) NOMINAL

   - Test "Is a string?"

     - Implies: Corresponding storageType
     String
     
    Examples:
    Species Name         
    Comment
_________________________________________________________________

Questions/Issues.

   - Ordinal data can be numeric can't it?  
     That is: "Has numeric order (that can withstand transformation)."?
     So that means that ordinal data must be represented as 'string'
     for EML's purposes?

   - Do we really know that the following are true?

> > "...interval and ordinal scales are much less common than ratio and
> >  nominal scales in ecological data"

> > Someone has pointed out choice of measurement scale on the Stevens
> > typology depends in part on what you want to do with the data.
> > So, for instance, measurement scale can't be predicted from, say,
> > DictionaryUnit

Should we try to associate probable measuremenScale choice with each
DictionaryUnit?  It seems to me that many of them are pretty obvious.

I am not trying to cause consternation among the ranks.  I want to be
able to auto-generate EML from well-structured data.  This is based on
my belief that EML must be [almost] entirely auto-generated for it to
succeed.  Anything less will present a barrier-to-adoption which will
be very hard to overcome.  If this means that the data model needs to
be enhanced to explicitly accomodate Stevens typology, then fine.

-Scott

-- 
\SEC



More information about the Eml-dev mailing list