STMML units and date/time values

Matt Jones jones at nceas.ucsb.edu
Thu Oct 24 10:25:50 PDT 2002


Hi Scott,

I've been thinking about this date-time stuff because it really isn't 
very obvious or consistent as we have it now in EML/STMML.  Thinking 
about it has clarified a few things about "dataType" for me, and muddied 
the waters elsewhere.  My previous email was only partly exposing my 
thoughts on the matter as I was trying to get info from the STMML folks 
about how they handle it.  In my mind, they don't yet deal with it 
properly.  So, here's a a bit of rambling to set the stage for a 
discussion of dates and times in eml. You'll find I disagree with one of 
the points that you made in your email, namely that datetimes don't have 
units.

First, we need to think carefully about what a unit is in relationship 
to the quantity being measured.  From the NIST SI Units page 
(http://physics.nist.gov/cuu/Units/introduction.html), we have three 
definitions:

1) "A *quantity in the particular sense* is a quantifiable or assignable 
property ascribed to a particular phenomenon, body, or substance. 
Examples are the mass of the moon and the electric charge of the proton."

2) "A *unit* is a particular physical quantity, defined and adopted by 
convention, with which other particular quantities of the same kind are 
compared to express their value."

3) "The *value of a physical quantity* is the quantitative expression of 
a particular physical quantity as the product of a number and a unit, 
the number being its numerical value. Thus, the numerical value of a 
particular physical quantity depends on the unit in which it is expressed."

Applying these rules to dates, one might talk about a "datetime" 
quantity such as "the amount of time elapsed since the beginning of the 
Gregorian Calendar" (or since 1960 if you're a SAS user :) which might 
be an attribute of a phenomenon such as a measurement event (e.g., when 
I shoved a probe in the ground and pressed the "go" button).  Such a 
quantity might have a value measured in seconds.

Now, if we have a datetime value as defined above that represents a 
phenomenon that happened yesterday, that's a lot of seconds. So, it 
would be convenient to use a prefix on seconds, such as, in the SI, 
gigaseconds (Gs), which is 10^9 seconds.  There have been about 6.31 Gs 
since the year of our Lord.

Of course, we don't represent seconds in such even quantities as that, 
and instead use things like minutes, hours, days, months, years, 
centuries, millenia, and eons!  To complicate things, these traditional 
time periods are not constant through time (e.g., this month contains a 
different number of seconds than last month).  Thus, we have a 
complicated thing we called a calendar system that tells us, for any 
given period, how many seconds were or will be contained in that period.

So, theoretically at least, gregorian years and gigaseconds are 
interconvertible by using a calendar.  As a datetime value is an elapsed 
time value, it too is expressible in gigaseconds.  So, it seems to me 
that datetime values do, can, and should have units to go along with 
them, because they are quantities.  It seems to me that whenever we have 
a numeric domain for a value (ie, numerical calculations are 
legitimate), we should also have a unit for that value.  The unit would 
pertain to the value itself, to the domain of the value, and to the 
precision.  All three depend on the definition of "unit".

If expressed in gigaseconds, the datetime value "2002" is approximately 
6.31 Gs.  Higher levels of precision could be used, such as 6.3135072 
Gs.  In gregorian calendar units, a higher level of precision might be 
expressed as "2002-01-01 00:00:00".  This traditional format for a 
datetime value is still the value of a quantity, even though it is 
expressed in a unit that requires a detailed understanding of the length 
of every period between the value and gregorian calendar value "0".

The calendar *is* the conversion formula between gregorian units and SI 
seconds.

Now, lets turn the corner and start looking at things from a different 
perspective.  The STMML folks say that the datetime value "2002-01-01 
01:13:45" has a datatype of "xs:gDateTime", but doesn't have any units. 
   So, what is a dataType?  This gets back to our old discussion.  To 
me, a datatype is a shorthand mechanism for expressing the domain of a 
quantity.  For example, if a quantity could contain integer values 
between 0 and 32767, in some languages you could label the domain of 
that quantity as "unsigned int" and people would know what you mean. 
Thus, it is a useful shorthand.  The "dataType" that is typically 
assigned to values in many processing systems often does not correspond 
to the actual domain of the quantity as it is usually larger than the 
domain so that it can contain all of the domains values.  For example, 
an attribute might have a domain of integer values where 0 <= value <= 
1000, but its "dataType" for the C language might be assigned as "int" 
so that all possible values from the domain can be represented, even 
though some values outside of the domain (e.g., 1003) can also be 
represented by that dataType.  This is because we have a small set of 
common dataTypes for a given language or system for pragmatic and not 
theoretical reasons.  Some database theoreticians (e.g., C.J. Date) have 
argued strongly that existing database systems are messed up because 
they don't allow you to model the domain precisely, and instead just 
give you a limited set of "data types" as surrogates for the domain. 
So, if we say that the "dataType" of a value is "xs:gYear", what does 
this say about how we should interpret it as a quantity (ie, what's its 
implied unit) and what does this say about its implied domain?  I think 
it says nothing precise about the domain, and everything about the unit 
(ie, how to convert the value to seconds since 1960, for example). (In 
contrast, most other dataTypes (e.g., int) say nothing precise about the 
domain, and nothing about the unit.)

Finally, about formats and notations.  The physical quantity "1800 km" 
can be written equally as well as 1.8x10^3 km in scientific notation. 
Thus, to interpret the "numerical value" part of the value one must know 
how to parse and interpret scientific notation.  The datetime value 
"2002-10-10 17:45:32 gregorians" :-) is the same thing in that you need 
to know how to parse the datetime value to extract its components in 
order to look up the lengths of the periods involved in order to find 
the number of seconds it represents.  Scientific notation is of course 
simpler, but it is nevertheless the same concept.

OK, to summarize...  Datetime values are values of a physical quantity. 
  According to SI, these need to be expressed with both their numerical 
value and their unit (see the (3) above).  People generally don't think 
of datetime values this way, mainly because they often don't use them in 
a quantitative manner like we as scientists expect to.  If a scientist 
uses an event recorder to timestamp events (such as datetime values 
every time a cell divides), it would be nice to know what units are used 
so that mathematical calculations and comparisons can be made with those 
values.

But the question still remains, what is the unit?  And how does it 
relate to SI?  The major complicating factor is that the gregorian 
calendar actually relies on many different units, one for each time 
period that varies in length.  As there are many thousands of time 
periods in the gregorian calendar that vary in length, the conversion of 
a compound unit that uses many of these to an SI value is complex.  For 
example, the datetime value "2002-10-10" could be re-expressed as "10 
GregorianOctober2002Days" + "10 Gregorian2002Months" + "2 
Gregorian3rdMilleniumYears" + "2 GregorianMillennia".  I probably 
haven't even been fine enough in my divisions to accurately represent 
the variablity inherent in gregroian time periods, but you get the idea, 
right?  It certainly isn't a simple constant or linear relationship with SI.

This leads us, finally, to your comment about SAS datetime values.  SAS 
"knows" the gregorian calendar.  When any datetime value is passed to 
SAS, you can tell SAS what format it is in (akin to describing 
scientific notation), and it will be able to parse that value, convert 
to the number of seconds the value represents using its internal 
calendar, and subtract the seconds before 1960 (to allow for higher 
precision), and store the value in seconds.  The fact that SAS stores 
datetime values in seconds highlights even more strongly that 1) 
datetime values are quantities, and 2) those quantities have units (in 
SAS, they use SI seconds as the unit).

So, as I said, my thinking has lead me down a difficult path.  Its 
clarified some things (datetimes are quantities and so should have 
units) and muddied others (what precisely should the datetime unit(s) be 
for gregorian dates?).  The problem seems to rest with the variable 
length of gregorian calendar periods.  What we should probably do is 
recommend that everyone use SI seconds from a particular reference point 
(say, 1960) for storing their datetime values, but that is somewhat 
impractical.  The alternative is to store the datetime values in 
gregorian units, but then STMML is not up to the task of describing the 
conversion formula to SI seconds for a "gregorian".

Food for thought.  Chew well.  I'd love to hear alternative views.

Matt

Scott Chapal wrote:
> Matt Jones <jones at nceas.ucsb.edu> writes:
> 
> 
>>OK, so, the question is:
>>   What is the unit for a scalar date-time value like "2002-10-10
>>17:34:45"? How can this unit be defined in STMML? How is this unit
>>related to the fundamental SI unit of second?
> 
> 
> Considering the example of using SAS for date processing, the numeric
> representation for datetime is seconds (elapsed since Jan 1. 1960),
> but the presentation (expression) of that value is merely a a
> user-specified format.  It is in the 'format' that the complexity of
> the calendar is algorithmed.
> 
> I agree with Peter Murray-Rust:
> 
>>I don't think it has units - it has a dataType of xsd:dateTime
> 
> 
> There are no units for "2002-10-10 17:34:45", date-time
> doesn't have units.  Elasped time units are derived from seconds.
> 
> -Scott
> 


-- 
*******************************************************************
Matt Jones                                    jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439   Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)

Interested in ecological informatics? http://www.ecoinformatics.org
*******************************************************************




More information about the Eml-dev mailing list