Calendar dates are interval scale

Matt Jones jones at nceas.ucsb.edu
Thu Oct 31 12:08:45 PST 2002


Peter,

I agree that this is a hack.  I disagree that it is easy to deal with as 
a unit.  The underlying problem is that datetime values are 
fundamentally non-deterministic (because the intervals between divisions 
on the scale change through time and are not fixed for future values, 
e.g., the number of seconds in the year 2005 has not yet been 
determined), and can't really be said to represent a quantifiable aspect 
of some phenomenon, substance or body.  Thus, it doesn't even fit within 
the definition of "unit" as SI, NIST, and STMML are defining them.  And 
so a fixed conversion to an SI unit of time like second is not at all 
possible -- any conversion to seconds must fluidly change its algorithm 
as the calendar changes each year.

Our practical problem is that people essentially ignore the 
idiosyncracies and use them *as if they were deterministic*.  So, 
because they use dates as a hack, we too must provide a hack to 
accomodate real-world data that doesn't fit the theory of measurement. 
I do not think there will be any other exceptions uncovered that don't 
fit in the stmml unit framework -- date-times are *the* exception.  THe 
use of format strings to describe calendar dates is well-established for 
other software, and so we've adapted it for here.

Comments?

Matt

Peter McCartney wrote:
> grrrrr....this is not a very extensible solution to come up with a new 
> element every time we come up with something different that stmml cant 
> handle. it tells me more about the inadequacies of stmml than anything 
> else. Its either one of a standard set of unit enumerations or its not. 
> Just put these in the dictionary and extend the stmml.xsd to accomodate 
> them.
> 
> -----Original Message-----
> From: Matt Jones [mailto:jones at nceas.ucsb.edu]
> Sent: Wednesday, October 30, 2002 1:16 PM
> To: Tim Bergsma
> Cc: Eml-Dev (E-mail)
> Subject: Re: Calendar dates are interval scale
> 
> 
> Tim and other date-time fanatics,
> 
> Thanks for the comments. I was having similar problems myself. I agree
> date-times are probably interval scale in some contorted way. I
> discussed this on IRC with chad, and we decided to try out your
> recommendation, with some slight changes.  Here's what we ended up with:
> 
> 1) a bunch of new units for expressing durations according to their
> nominal length (e.g., nominalMinute = 60 seconds, nominalHour = 3600
> seconds, nominalDay = 86,400 seconds).  People should use these for
> attributes that contain durations like "18 minutes" or "4.56 days"
> 
> 2) A new way of expressing unit and domain for date-time values.
> Basically, now unit is a choice of standardUnit, customUnit, and
> formattedDateTimeUnit.  The formattedDateTimeUnit takes as content a
> format representation of the date-time value that complies with the ISO
> 8601 format string rules (e.g., YYYY-MM-DD).  This should be sufficient
> information to allow software that understands the gregorian calendar
> and all of its idiosyncracies to calculate differences between date-time
> values.  The precision for these values should always be 1 (its sort of
> implied by the format string).  The domain of interval scale attributes
> can now be of type DateTimeDomainType, which allows one to use date-time
> values in the expression of the domain min and max.
> 
> Take a look at eml-attribute.xsd and let me know what you think.  The
> lib/sample/eml-sample.xml has an example of the use of these structures
> that would be common I think in datasets.
> 
> We're still cleaning up loose ends, but we're close.
> 
> Matt
> 
> Tim Bergsma wrote:
>  > I'm not on IRC, so if you want to hash this there, call me at
>  > 269-671-2337.
>  >
>  > We can't rehash forever, but this is a usability issue of the first
>  > order.
>  >
>  > There are two problems with yesterday's conference call consensus
>  > regarding datetime:  1) we provide no mechanism for handling durations;
>  > 2) calendar dates are interval scale not ordinal scale.
>  >
>  > Regarding durations, one might argue that we provide xs:duration in the
>  > kludge of the ordinal measurementScale.  But I looked at the
>  > representation of xs:duration
>  > (http://www.w3.org/TR/xmlschema-2/#duration), and quite frankly, no one
>  > has duration data in that format!  EML has to handle data like this:
>  >
>  > Watershed     YearOfClearCut  YearsToReforestation
>  > W3            1887            40
>  > LittleCreek   1910            35
>  > JasperRidge   1950            52
>  >
>  > -or-
>  >
>  > EggMass               DateOfLaying    DaysTillHatching
>  > DuckPond      5-15-2000       30
>  > LittleCreek   4-31-2000       18
>  > GullLake      6-1-2000        16
>  >
>  > Recommendation:  we should provide categories in the unitDictionary such
>  > as nominalYears, nominalDays, nominalMonths, nominalHours, etc. (or
>  > YearsDuration, DaysDuration, etc) and define them in conventional terms,
>  > explicitly acknowledging lack of precision.  For instance, a
>  > nominalMinute is 60 seconds, +/- 1 second. A nominalYear is 365
>  > nominalDays, +/- 1 day.  xs:gYear is fine for YearOfClearCut, but
>  > xs:duration will not be adequate for YearsToReforestation.
>  >
>  > Regarding scale: I'm convinced that ordinal scales are simply ranked
>  > categories.  You don't do math on ranked categories, other than to test
>  > for order relations.  But we do lots of math on CalendarDates, such as
>  > taking the difference between two dates, or adding a duration to a
>  > date.  The objection is raised that the duration of sub-units of the
>  > Calendar are not constant.  True, but we do the math, still the same, so
>  > it must be an interval scale.  Actually, it is a deeply nested
>  > concatenation of interval scales of varying domain.  But the scale is
>  > completely determined, and even naive calculations are valid, albeit
>  > with qualified precision, while sophisticated calculations are exact.  I
>  > found one webpage that explicitly assigns calendar dates to interval
>  > scale: http://www.rattlesnake.com/notions/guttman-scales.html.
>  >
>  > So, modeling DateTime etc. under ordinal is wrong.  But if we provide
>  > DateTime etc. under interval MeasurementScale, what are the units?
>  > DateTime does have units (year-month-day-hour-min-sec) , but they are
>  > concatenated.  The concatenation is a mechanism for traversing the
>  > nested tree of (arbitrary, often-non periodic) interval scales that
>  > comprise the calendar.  I think, as someone suggested yesterday, we will
>  > have to provide a notation for indicating date format, such as
>  > CCYY-MM-DD or MM-DD-YY, etc.  Applications will need the notation as a
>  > key for digesting date strings.  We can't expect eml authors to change
>  > their data to conform to some format. Given the ubiquity of date/time
>  > data, we either have to enumerate some common formats (unit
>  > concatenations) or provide a notation for describing formats.
>  >
>  > And this just in...Campbell data loggers everywhere are storing dates as
>  > as a field pair:  Year and DayOfYear.  This just proves that there are
>  > alternate ways of traversing a nested interval scale.  This is perhaps
>  > our last opportunity to trap DayOfYear and do something meaningful with
>  > it.  It is not a duration.  It has exactly the same properties as
>  > xs:gMonthDay:
>  >
>  > "[Definition:]   gMonthDay is a gregorian date that recurs, specifically
>  > a day of the year such as the third of May. Arbitrary recurring dates
>  > are not supported by this datatype. The ·value space· of gMonthDay is
>  > the set of calendar dates, as defined in § 3 of [ISO 8601].
>  > Specifically, it is a set of one-day long, annually periodic instances."
>  >
>  > Solutions welcome.
>  >
>  > Tim.
> 
> 
> -- 
> *******************************************************************
> Matt Jones                                    jones at nceas.ucsb.edu
> http://www.nceas.ucsb.edu/    Fax: 425-920-2439   Ph: 907-789-0496
> National Center for Ecological Analysis and Synthesis (NCEAS)
> 
> Interested in ecological informatics? http://www.ecoinformatics.org
> *******************************************************************
> 
> _______________________________________________
> eml-dev mailing list
> eml-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/eml-dev
> 


-- 
*******************************************************************
Matt Jones                                    jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439   Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)

Interested in ecological informatics? http://www.ecoinformatics.org
*******************************************************************




More information about the Eml-dev mailing list