STMML units and date/time values
Matt Jones
jones at nceas.ucsb.edu
Thu Oct 24 10:25:50 PDT 2002
Hi Scott,
I've been thinking about this date-time stuff because it really isn't
very obvious or consistent as we have it now in EML/STMML. Thinking
about it has clarified a few things about "dataType" for me, and muddied
the waters elsewhere. My previous email was only partly exposing my
thoughts on the matter as I was trying to get info from the STMML folks
about how they handle it. In my mind, they don't yet deal with it
properly. So, here's a a bit of rambling to set the stage for a
discussion of dates and times in eml. You'll find I disagree with one of
the points that you made in your email, namely that datetimes don't have
units.
First, we need to think carefully about what a unit is in relationship
to the quantity being measured. From the NIST SI Units page
(http://physics.nist.gov/cuu/Units/introduction.html), we have three
definitions:
1) "A *quantity in the particular sense* is a quantifiable or assignable
property ascribed to a particular phenomenon, body, or substance.
Examples are the mass of the moon and the electric charge of the proton."
2) "A *unit* is a particular physical quantity, defined and adopted by
convention, with which other particular quantities of the same kind are
compared to express their value."
3) "The *value of a physical quantity* is the quantitative expression of
a particular physical quantity as the product of a number and a unit,
the number being its numerical value. Thus, the numerical value of a
particular physical quantity depends on the unit in which it is expressed."
Applying these rules to dates, one might talk about a "datetime"
quantity such as "the amount of time elapsed since the beginning of the
Gregorian Calendar" (or since 1960 if you're a SAS user :) which might
be an attribute of a phenomenon such as a measurement event (e.g., when
I shoved a probe in the ground and pressed the "go" button). Such a
quantity might have a value measured in seconds.
Now, if we have a datetime value as defined above that represents a
phenomenon that happened yesterday, that's a lot of seconds. So, it
would be convenient to use a prefix on seconds, such as, in the SI,
gigaseconds (Gs), which is 10^9 seconds. There have been about 6.31 Gs
since the year of our Lord.
Of course, we don't represent seconds in such even quantities as that,
and instead use things like minutes, hours, days, months, years,
centuries, millenia, and eons! To complicate things, these traditional
time periods are not constant through time (e.g., this month contains a
different number of seconds than last month). Thus, we have a
complicated thing we called a calendar system that tells us, for any
given period, how many seconds were or will be contained in that period.
So, theoretically at least, gregorian years and gigaseconds are
interconvertible by using a calendar. As a datetime value is an elapsed
time value, it too is expressible in gigaseconds. So, it seems to me
that datetime values do, can, and should have units to go along with
them, because they are quantities. It seems to me that whenever we have
a numeric domain for a value (ie, numerical calculations are
legitimate), we should also have a unit for that value. The unit would
pertain to the value itself, to the domain of the value, and to the
precision. All three depend on the definition of "unit".
If expressed in gigaseconds, the datetime value "2002" is approximately
6.31 Gs. Higher levels of precision could be used, such as 6.3135072
Gs. In gregorian calendar units, a higher level of precision might be
expressed as "2002-01-01 00:00:00". This traditional format for a
datetime value is still the value of a quantity, even though it is
expressed in a unit that requires a detailed understanding of the length
of every period between the value and gregorian calendar value "0".
The calendar *is* the conversion formula between gregorian units and SI
seconds.
Now, lets turn the corner and start looking at things from a different
perspective. The STMML folks say that the datetime value "2002-01-01
01:13:45" has a datatype of "xs:gDateTime", but doesn't have any units.
So, what is a dataType? This gets back to our old discussion. To
me, a datatype is a shorthand mechanism for expressing the domain of a
quantity. For example, if a quantity could contain integer values
between 0 and 32767, in some languages you could label the domain of
that quantity as "unsigned int" and people would know what you mean.
Thus, it is a useful shorthand. The "dataType" that is typically
assigned to values in many processing systems often does not correspond
to the actual domain of the quantity as it is usually larger than the
domain so that it can contain all of the domains values. For example,
an attribute might have a domain of integer values where 0 <= value <=
1000, but its "dataType" for the C language might be assigned as "int"
so that all possible values from the domain can be represented, even
though some values outside of the domain (e.g., 1003) can also be
represented by that dataType. This is because we have a small set of
common dataTypes for a given language or system for pragmatic and not
theoretical reasons. Some database theoreticians (e.g., C.J. Date) have
argued strongly that existing database systems are messed up because
they don't allow you to model the domain precisely, and instead just
give you a limited set of "data types" as surrogates for the domain.
So, if we say that the "dataType" of a value is "xs:gYear", what does
this say about how we should interpret it as a quantity (ie, what's its
implied unit) and what does this say about its implied domain? I think
it says nothing precise about the domain, and everything about the unit
(ie, how to convert the value to seconds since 1960, for example). (In
contrast, most other dataTypes (e.g., int) say nothing precise about the
domain, and nothing about the unit.)
Finally, about formats and notations. The physical quantity "1800 km"
can be written equally as well as 1.8x10^3 km in scientific notation.
Thus, to interpret the "numerical value" part of the value one must know
how to parse and interpret scientific notation. The datetime value
"2002-10-10 17:45:32 gregorians" :-) is the same thing in that you need
to know how to parse the datetime value to extract its components in
order to look up the lengths of the periods involved in order to find
the number of seconds it represents. Scientific notation is of course
simpler, but it is nevertheless the same concept.
OK, to summarize... Datetime values are values of a physical quantity.
According to SI, these need to be expressed with both their numerical
value and their unit (see the (3) above). People generally don't think
of datetime values this way, mainly because they often don't use them in
a quantitative manner like we as scientists expect to. If a scientist
uses an event recorder to timestamp events (such as datetime values
every time a cell divides), it would be nice to know what units are used
so that mathematical calculations and comparisons can be made with those
values.
But the question still remains, what is the unit? And how does it
relate to SI? The major complicating factor is that the gregorian
calendar actually relies on many different units, one for each time
period that varies in length. As there are many thousands of time
periods in the gregorian calendar that vary in length, the conversion of
a compound unit that uses many of these to an SI value is complex. For
example, the datetime value "2002-10-10" could be re-expressed as "10
GregorianOctober2002Days" + "10 Gregorian2002Months" + "2
Gregorian3rdMilleniumYears" + "2 GregorianMillennia". I probably
haven't even been fine enough in my divisions to accurately represent
the variablity inherent in gregroian time periods, but you get the idea,
right? It certainly isn't a simple constant or linear relationship with SI.
This leads us, finally, to your comment about SAS datetime values. SAS
"knows" the gregorian calendar. When any datetime value is passed to
SAS, you can tell SAS what format it is in (akin to describing
scientific notation), and it will be able to parse that value, convert
to the number of seconds the value represents using its internal
calendar, and subtract the seconds before 1960 (to allow for higher
precision), and store the value in seconds. The fact that SAS stores
datetime values in seconds highlights even more strongly that 1)
datetime values are quantities, and 2) those quantities have units (in
SAS, they use SI seconds as the unit).
So, as I said, my thinking has lead me down a difficult path. Its
clarified some things (datetimes are quantities and so should have
units) and muddied others (what precisely should the datetime unit(s) be
for gregorian dates?). The problem seems to rest with the variable
length of gregorian calendar periods. What we should probably do is
recommend that everyone use SI seconds from a particular reference point
(say, 1960) for storing their datetime values, but that is somewhat
impractical. The alternative is to store the datetime values in
gregorian units, but then STMML is not up to the task of describing the
conversion formula to SI seconds for a "gregorian".
Food for thought. Chew well. I'd love to hear alternative views.
Matt
Scott Chapal wrote:
> Matt Jones <jones at nceas.ucsb.edu> writes:
>
>
>>OK, so, the question is:
>> What is the unit for a scalar date-time value like "2002-10-10
>>17:34:45"? How can this unit be defined in STMML? How is this unit
>>related to the fundamental SI unit of second?
>
>
> Considering the example of using SAS for date processing, the numeric
> representation for datetime is seconds (elapsed since Jan 1. 1960),
> but the presentation (expression) of that value is merely a a
> user-specified format. It is in the 'format' that the complexity of
> the calendar is algorithmed.
>
> I agree with Peter Murray-Rust:
>
>>I don't think it has units - it has a dataType of xsd:dateTime
>
>
> There are no units for "2002-10-10 17:34:45", date-time
> doesn't have units. Elasped time units are derived from seconds.
>
> -Scott
>
--
*******************************************************************
Matt Jones jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/ Fax: 425-920-2439 Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
Interested in ecological informatics? http://www.ecoinformatics.org
*******************************************************************
More information about the Eml-dev
mailing list