Calendar dates are interval scale

Peter McCartney peter.mccartney at asu.edu
Thu Oct 31 07:33:50 PST 2002


grrrrr....this is not a very extensible solution to come up with a new
element every time we come up with something different that stmml cant
handle. it tells me more about the inadequacies of stmml than anything else.
Its either one of a standard set of unit enumerations or its not. Just put
these in the dictionary and extend the stmml.xsd to accomodate them.

-----Original Message-----
From: Matt Jones [mailto:jones at nceas.ucsb.edu]
Sent: Wednesday, October 30, 2002 1:16 PM
To: Tim Bergsma
Cc: Eml-Dev (E-mail)
Subject: Re: Calendar dates are interval scale


Tim and other date-time fanatics,

Thanks for the comments. I was having similar problems myself. I agree 
date-times are probably interval scale in some contorted way. I 
discussed this on IRC with chad, and we decided to try out your 
recommendation, with some slight changes.  Here's what we ended up with:

1) a bunch of new units for expressing durations according to their 
nominal length (e.g., nominalMinute = 60 seconds, nominalHour = 3600 
seconds, nominalDay = 86,400 seconds).  People should use these for 
attributes that contain durations like "18 minutes" or "4.56 days"

2) A new way of expressing unit and domain for date-time values. 
Basically, now unit is a choice of standardUnit, customUnit, and 
formattedDateTimeUnit.  The formattedDateTimeUnit takes as content a 
format representation of the date-time value that complies with the ISO 
8601 format string rules (e.g., YYYY-MM-DD).  This should be sufficient 
information to allow software that understands the gregorian calendar 
and all of its idiosyncracies to calculate differences between date-time 
values.  The precision for these values should always be 1 (its sort of 
implied by the format string).  The domain of interval scale attributes 
can now be of type DateTimeDomainType, which allows one to use date-time 
values in the expression of the domain min and max.

Take a look at eml-attribute.xsd and let me know what you think.  The 
lib/sample/eml-sample.xml has an example of the use of these structures 
that would be common I think in datasets.

We're still cleaning up loose ends, but we're close.

Matt

Tim Bergsma wrote:
> I'm not on IRC, so if you want to hash this there, call me at
> 269-671-2337.
> 
> We can't rehash forever, but this is a usability issue of the first
> order.
> 
> There are two problems with yesterday's conference call consensus
> regarding datetime:  1) we provide no mechanism for handling durations;
> 2) calendar dates are interval scale not ordinal scale.
> 
> Regarding durations, one might argue that we provide xs:duration in the
> kludge of the ordinal measurementScale.  But I looked at the
> representation of xs:duration
> (http://www.w3.org/TR/xmlschema-2/#duration), and quite frankly, no one
> has duration data in that format!  EML has to handle data like this:
> 
> Watershed	YearOfClearCut	YearsToReforestation
> W3		1887		40
> LittleCreek	1910		35
> JasperRidge	1950		52
> 
> -or-
> 
> EggMass		DateOfLaying	DaysTillHatching
> DuckPond	5-15-2000	30
> LittleCreek	4-31-2000	18
> GullLake	6-1-2000	16
> 
> Recommendation:  we should provide categories in the unitDictionary such
> as nominalYears, nominalDays, nominalMonths, nominalHours, etc. (or
> YearsDuration, DaysDuration, etc) and define them in conventional terms,
> explicitly acknowledging lack of precision.  For instance, a
> nominalMinute is 60 seconds, +/- 1 second. A nominalYear is 365
> nominalDays, +/- 1 day.  xs:gYear is fine for YearOfClearCut, but
> xs:duration will not be adequate for YearsToReforestation.
> 
> Regarding scale: I'm convinced that ordinal scales are simply ranked
> categories.  You don't do math on ranked categories, other than to test
> for order relations.  But we do lots of math on CalendarDates, such as
> taking the difference between two dates, or adding a duration to a
> date.  The objection is raised that the duration of sub-units of the
> Calendar are not constant.  True, but we do the math, still the same, so
> it must be an interval scale.  Actually, it is a deeply nested
> concatenation of interval scales of varying domain.  But the scale is
> completely determined, and even naive calculations are valid, albeit
> with qualified precision, while sophisticated calculations are exact.  I
> found one webpage that explicitly assigns calendar dates to interval
> scale: http://www.rattlesnake.com/notions/guttman-scales.html.
> 
> So, modeling DateTime etc. under ordinal is wrong.  But if we provide
> DateTime etc. under interval MeasurementScale, what are the units? 
> DateTime does have units (year-month-day-hour-min-sec) , but they are
> concatenated.  The concatenation is a mechanism for traversing the
> nested tree of (arbitrary, often-non periodic) interval scales that
> comprise the calendar.  I think, as someone suggested yesterday, we will
> have to provide a notation for indicating date format, such as
> CCYY-MM-DD or MM-DD-YY, etc.  Applications will need the notation as a
> key for digesting date strings.  We can't expect eml authors to change
> their data to conform to some format. Given the ubiquity of date/time
> data, we either have to enumerate some common formats (unit
> concatenations) or provide a notation for describing formats.
> 
> And this just in...Campbell data loggers everywhere are storing dates as
> as a field pair:  Year and DayOfYear.  This just proves that there are
> alternate ways of traversing a nested interval scale.  This is perhaps
> our last opportunity to trap DayOfYear and do something meaningful with
> it.  It is not a duration.  It has exactly the same properties as
> xs:gMonthDay:
> 
> "[Definition:]   gMonthDay is a gregorian date that recurs, specifically
> a day of the year such as the third of May. Arbitrary recurring dates
> are not supported by this datatype. The ·value space· of gMonthDay is
> the set of calendar dates, as defined in § 3 of [ISO 8601].
> Specifically, it is a set of one-day long, annually periodic instances."
> 
> Solutions welcome.
> 
> Tim.


-- 
*******************************************************************
Matt Jones                                    jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439   Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)

Interested in ecological informatics? http://www.ecoinformatics.org
*******************************************************************

_______________________________________________
eml-dev mailing list
eml-dev at ecoinformatics.org
http://www.ecoinformatics.org/mailman/listinfo/eml-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20021031/b726be89/attachment.htm


More information about the Eml-dev mailing list