measurementScale Decision Tree
Scott Chapal
scott.chapal at jonesctr.org
Mon Feb 3 13:11:46 PST 2003
My original query was intended to see how/if people are auto-assigning
measurementScale more so than to call in to question design decisions.
Sorry.
I am having trouble sorting this out myself. Looks like Dan is too.
(BTW, Dan, thanks for the long and thoughtful email).
As a voter (in hindsight), I got impatient. The typology 'seemed' like
a satisfactory way to classify and solve the unit conundrum...but I
didn't spend sufficient effort creating EML 2.0 instance documents at
that stage. I am now. Like Ann Richards once said about a challenger
to the governorship, if you put lipstick on a pig, it's still a *PIG*.
However, rather than brand this an unhealthy or counterproductive
debate over EML 2.0 design decisions, I'd rather:
1) View these discussions as future-EML-version pre-planning.
2) Develop a data-driven decision tree for picking measurementScale.
Here's what EML-attribute presents:
measurementScale
/ \
dateTimeDomain Stevens
/ \
numericDomain nonNumericDomain
/ \ / \
ratio interval nominal ordinal
So lets apply some lipstick.
Decision Tree:
(Assuming that we are starting from structured data or metadata).
1) Isolate dates/times.
- Test: "Are the values dates, times, datetimes?"
- Implies: Corresponding storageType:
dateTime
duration
date
time
gYearMonth
gYear
gMonthDay
gDay
gMonth
Otherwise:
2) numeric or nonNumeric?
Test: " Is it a number?"
-Implies: Corresponding storageType:
float
decimal (or derived type)
double
2.1) Numeric
2.1.1) Is it measured from an origin? RATIO
- Test "Are negative values permissible?"
-Implies: Corresponding storageType:
decimal (or nonNegative derived type)
Examples:
Temperature K
Volumetric flow (cubic m/s)
2.1.2) Are the permissible values "evenly spaced". INTERVAL
- Test: "Has definable precision, negative values permissable."
-Implies precision and range check.
Examples:
Temperature C, F
Otherwise:
2.2) nonNumeric
- Implies: Corresponding storageType
Boolean
String
2.2.1) "Are the values ordered?". ORDINAL
- Test: "Values comprise an enumerated, sortable list"
-Implies: List exists.
Examples:
Species Code
Quality Scale ( A, B, C etc.)
Otherwise:
2.2.2 Has identifiable, distinguishable values (numerals, strings etc.) NOMINAL
- Test "Is a string?"
- Implies: Corresponding storageType
String
Examples:
Species Name
Comment
_________________________________________________________________
Questions/Issues.
- Ordinal data can be numeric can't it?
That is: "Has numeric order (that can withstand transformation)."?
So that means that ordinal data must be represented as 'string'
for EML's purposes?
- Do we really know that the following are true?
> > "...interval and ordinal scales are much less common than ratio and
> > nominal scales in ecological data"
> > Someone has pointed out choice of measurement scale on the Stevens
> > typology depends in part on what you want to do with the data.
> > So, for instance, measurement scale can't be predicted from, say,
> > DictionaryUnit
Should we try to associate probable measuremenScale choice with each
DictionaryUnit? It seems to me that many of them are pretty obvious.
I am not trying to cause consternation among the ranks. I want to be
able to auto-generate EML from well-structured data. This is based on
my belief that EML must be [almost] entirely auto-generated for it to
succeed. Anything less will present a barrier-to-adoption which will
be very hard to overcome. If this means that the data model needs to
be enhanced to explicitly accomodate Stevens typology, then fine.
-Scott
--
\SEC
More information about the Eml-dev
mailing list