Calendar dates are interval scale

Tim Bergsma tbergsma at kbs.msu.edu
Wed Oct 30 08:02:17 PST 2002


I'm not on IRC, so if you want to hash this there, call me at
269-671-2337.

We can't rehash forever, but this is a usability issue of the first
order.

There are two problems with yesterday's conference call consensus
regarding datetime:  1) we provide no mechanism for handling durations;
2) calendar dates are interval scale not ordinal scale.

Regarding durations, one might argue that we provide xs:duration in the
kludge of the ordinal measurementScale.  But I looked at the
representation of xs:duration
(http://www.w3.org/TR/xmlschema-2/#duration), and quite frankly, no one
has duration data in that format!  EML has to handle data like this:

Watershed	YearOfClearCut	YearsToReforestation
W3		1887		40
LittleCreek	1910		35
JasperRidge	1950		52

-or-

EggMass		DateOfLaying	DaysTillHatching
DuckPond	5-15-2000	30
LittleCreek	4-31-2000	18
GullLake	6-1-2000	16

Recommendation:  we should provide categories in the unitDictionary such
as nominalYears, nominalDays, nominalMonths, nominalHours, etc. (or
YearsDuration, DaysDuration, etc) and define them in conventional terms,
explicitly acknowledging lack of precision.  For instance, a
nominalMinute is 60 seconds, +/- 1 second. A nominalYear is 365
nominalDays, +/- 1 day.  xs:gYear is fine for YearOfClearCut, but
xs:duration will not be adequate for YearsToReforestation.

Regarding scale: I'm convinced that ordinal scales are simply ranked
categories.  You don't do math on ranked categories, other than to test
for order relations.  But we do lots of math on CalendarDates, such as
taking the difference between two dates, or adding a duration to a
date.  The objection is raised that the duration of sub-units of the
Calendar are not constant.  True, but we do the math, still the same, so
it must be an interval scale.  Actually, it is a deeply nested
concatenation of interval scales of varying domain.  But the scale is
completely determined, and even naive calculations are valid, albeit
with qualified precision, while sophisticated calculations are exact.  I
found one webpage that explicitly assigns calendar dates to interval
scale: http://www.rattlesnake.com/notions/guttman-scales.html.

So, modeling DateTime etc. under ordinal is wrong.  But if we provide
DateTime etc. under interval MeasurementScale, what are the units? 
DateTime does have units (year-month-day-hour-min-sec) , but they are
concatenated.  The concatenation is a mechanism for traversing the
nested tree of (arbitrary, often-non periodic) interval scales that
comprise the calendar.  I think, as someone suggested yesterday, we will
have to provide a notation for indicating date format, such as
CCYY-MM-DD or MM-DD-YY, etc.  Applications will need the notation as a
key for digesting date strings.  We can't expect eml authors to change
their data to conform to some format. Given the ubiquity of date/time
data, we either have to enumerate some common formats (unit
concatenations) or provide a notation for describing formats.

And this just in...Campbell data loggers everywhere are storing dates as
as a field pair:  Year and DayOfYear.  This just proves that there are
alternate ways of traversing a nested interval scale.  This is perhaps
our last opportunity to trap DayOfYear and do something meaningful with
it.  It is not a duration.  It has exactly the same properties as
xs:gMonthDay:

"[Definition:]   gMonthDay is a gregorian date that recurs, specifically
a day of the year such as the third of May. Arbitrary recurring dates
are not supported by this datatype. The ·value space· of gMonthDay is
the set of calendar dates, as defined in § 3 of [ISO 8601].
Specifically, it is a set of one-day long, annually periodic instances."

Solutions welcome.

Tim.
-- 
Tim Bergsma
LTER Information Manager
W.K. Kellogg Biological Station
Michigan State University
Hickory Corners, MI   49060
616/671-2337
tbergsma at kbs.msu.edu
http://lter.kbs.msu.edu



More information about the Eml-dev mailing list