[Fwd: Re: STMML units and date/time values]
Peter Murray-Rust
pm286 at cam.ac.uk
Fri Oct 25 08:22:53 PDT 2002
At 09:43 24/10/2002 -0800, Matt Jones wrote:
>Peter,
>
>We're continuing our discussion of date-time values wrt STMML on
>eml-dev at ecoinformatics.org. Here's the latest message from the thread if
>you'd be interested in commenting. Its kind of long, but captures some of
>my thoughts in a bit more detail. The other messages from the thread can
>be seen in the archive for eml-dev:
Thanks,
I have been working hard on units and have found the eml-unitDictionary
very valuable. (I think I discovered one or two typos in the values?). I
will return it to yourselves shortly.
>I've been thinking about this date-time stuff because it really isn't very
>obvious or consistent as we have it now in EML/STMML. Thinking about it
>has clarified a few things about "dataType" for me, and muddied the waters
>elsewhere. My previous email was only partly exposing my thoughts on the
>matter as I was trying to get info from the STMML folks about how they
>handle it. In my mind, they don't yet deal with it properly.
I am sure this is true in some cases! The experience of processing your
dictionary has highlighted some fuzzy areas in the original STMML
publication. It is a good time for us to correct errors.
> So, here's a a bit of rambling to set the stage for a discussion of
> dates and times in eml. You'll find I disagree with one of the points
> that you made in your email, namely that datetimes don't have units.
>
>First, we need to think carefully about what a unit is in relationship to
>the quantity being measured. From the NIST SI Units page
>(http://physics.nist.gov/cuu/Units/introduction.html), we have three
>definitions:
I am also using this as a "bible" and rewriting in places.
>1) "A *quantity in the particular sense* is a quantifiable or assignable
>property ascribed to a particular phenomenon, body, or substance. Examples
>are the mass of the moon and the electric charge of the proton."
>
>2) "A *unit* is a particular physical quantity, defined and adopted by
>convention, with which other particular quantities of the same kind are
>compared to express their value."
>
>3) "The *value of a physical quantity* is the quantitative expression of a
>particular physical quantity as the product of a number and a unit, the
>number being its numerical value. Thus, the numerical value of a
>particular physical quantity depends on the unit in which it is expressed."
I think the STMML use of <unit> adheres to these definitions. It is
arguable that the <unitType> might be better expressed as <kindOfQuantity>
(cf "quantities of the same kind"). It is difficult to know how to treat
chargeOfProton - it is a quantity when observed, but it can also be used as
a unit.
STMML uses <scalar units="foo">1.23</scalar> to represent scalar
quantities. It becomes slightly harder when the object is complex. For
homogeneous axes we can still use single units. We could write:
The car was moving at a speed of <scalar units="kph">100</scalar> with
velocity <array units="kph" size="3" dictRef="velocity">60 80 0</array>
(where the velocity is a vector quantity referred to ...some coordinate
system ...)
[There is nothing above that says the quantity must be a scalar nor that it
must be real. A complex number could have a magnitude on some measurement
scale an a phases angle on another. I can talk of a diffracted Xray as
having a magnitude of 50 electrons and a phase angle of 1.2 radians.
Neither is valid without the other. This isn't the current problem but it
shows that some concepts are not easy to represent.]
>Applying these rules to dates, one might talk about a "datetime" quantity
>such as "the amount of time elapsed since the beginning of the Gregorian
>Calendar" (or since 1960 if you're a SAS user :) which might be an
>attribute of a phenomenon such as a measurement event (e.g., when I shoved
>a probe in the ground and pressed the "go" button). Such a quantity might
>have a value measured in seconds.
There is a difference between a point on a measurement scale and the
difference between two points. Thus 0deg C = 273.15 K but kJ/K is the same
as kJ/C
>Now, if we have a datetime value as defined above that represents a
>phenomenon that happened yesterday, that's a lot of seconds. So, it would
>be convenient to use a prefix on seconds, such as, in the SI, gigaseconds
>(Gs), which is 10^9 seconds. There have been about 6.31 Gs since the year
>of our Lord.
>
>Of course, we don't represent seconds in such even quantities as that, and
>instead use things like minutes, hours, days, months, years, centuries,
>millenia, and eons! To complicate things, these traditional time periods
>are not constant through time (e.g., this month contains a different
>number of seconds than last month). Thus, we have a complicated thing we
>called a calendar system that tells us, for any given period, how many
>seconds were or will be contained in that period.
>
>So, theoretically at least, gregorian years and gigaseconds are
>interconvertible by using a calendar.
Agreed. But it is critical to know what the conversion is.
>As a datetime value is an elapsed time value
I may be wrong but I think that "datetime" ISO 8601 is for absolute times;
W3C use "duration" for differences. The STMML usage of datetime and
duration is defined to be that required by the W3C XML Schema specification.
http://www.w3.org/TR/xmlschema-2
E Adding durations to dateTimes
Given a dateTime S and a duration D, this appendix specifies how to compute
a dateTime E where E is the end of the time period with start S and
duration D i.e. E = S + D. Such computations are used, for example, to
determine whether a dateTime is within a specific time period. This
appendix also addresses the addition of durations to the datatypes date,
gYearMonth, gYear, gDay and gMonth, which can be viewed as a set of
dateTimes. In such cases, the addition is made to the first or starting
dateTime in the set.
This is a logical explanation of the process. Actual implementations are
free to optimize as long as they produce the same results. The calculation
uses the notation S[year] to represent the year field of S, S[month] to
represent the month field, and so on. It also depends on the following
functions:
<algorithm snipped>
>, it too is expressible in gigaseconds. So, it seems to me that datetime
>values do, can, and should have units to go along with them, because they
>are quantities. It seems to me that whenever we have a numeric domain for
>a value (ie, numerical calculations are legitimate), we should also have a
>unit for that value. The unit would pertain to the value itself, to the
>domain of the value, and to the precision. All three depend on the
>definition of "unit".
>
>If expressed in gigaseconds, the datetime value "2002" is approximately
>6.31 Gs. Higher levels of precision could be used, such as 6.3135072
>Gs. In gregorian calendar units, a higher level of precision might be
>expressed as "2002-01-01 00:00:00". This traditional format for a
>datetime value is still the value of a quantity, even though it is
>expressed in a unit that requires a detailed understanding of the length
>of every period between the value and gregorian calendar value "0".
>
>The calendar *is* the conversion formula between gregorian units and SI
>seconds.
I don't think there are gregorian units... I think the W3 Schemas shows how
to extract SI seconds from two gregorian datetimes.
>Now, lets turn the corner and start looking at things from a different
>perspective. The STMML folks say that the datetime value "2002-01-01
>01:13:45" has a datatype of "xs:gDateTime", but doesn't have any
>units. So, what is a dataType? This gets back to our old
>discussion. To me, a datatype is a shorthand mechanism for expressing the
>domain of a quantity.
dataType is used in the sense of W3C Schema. It includes things like URLs,
strings, etc. which have no units and are not quantities.
> For example, if a quantity could contain integer values between 0 and
> 32767, in some languages you could label the domain of that quantity as
> "unsigned int" and people would know what you mean. Thus, it is a useful
> shorthand. The "dataType" that is typically assigned to values in many
> processing systems often does not correspond to the actual domain of the
> quantity as it is usually larger than the domain so that it can contain
> all of the domains values. For example, an attribute might have a domain
> of integer values where 0 <= value <= 1000, but its "dataType" for the C
> language might be assigned as "int" so that all possible values from the
> domain can be represented, even though some values outside of the domain
> (e.g., 1003) can also be represented by that dataType. This is because
> we have a small set of common dataTypes for a given language or system
> for pragmatic and not theoretical reasons. Some database theoreticians
> (e.g., C.J. Date) have argued strongly that existing database systems are
> messed up because they don't allow you to model the domain precisely, and
> instead just give you a limited set of "data types" as surrogates for the
> domain. So, if we say that the "dataType" of a value is "xs:gYear", what
> does this say about how we should interpret it as a quantity (ie, what's
> its implied unit) and what does this say about its implied domain? I
> think it says nothing precise about the domain, and everything about the
> unit (ie, how to convert the value to seconds since 1960, for example).
> (In contrast, most other dataTypes (e.g., int) say nothing precise about
> the domain, and nothing about the unit.)
We have separated xsd:dataTypes from the units (if any). The dataTypes can
only have units in our scheme if they are numeric (integer,
nonNegativeNumber, etc.). DataTypes can be constrained to have lexical and
numeric constraints.
>Finally, about formats and notations. The physical quantity "1800 km" can
>be written equally as well as 1.8x10^3 km in scientific notation. Thus, to
>interpret the "numerical value" part of the value one must know how to
>parse and interpret scientific notation. The datetime value "2002-10-10
>17:45:32 gregorians" :-) is the same thing in that you need to know how to
>parse the datetime value to extract its components in order to look up the
>lengths of the periods involved in order to find the number of seconds it
>represents. Scientific notation is of course simpler, but it is
>nevertheless the same concept.
>
>OK, to summarize... Datetime values are values of a physical
>quantity. According to SI, these need to be expressed with both their
>numerical value and their unit (see the (3) above). People generally
>don't think of datetime values this way, mainly because they often don't
>use them in a quantitative manner like we as scientists expect to. If a
>scientist uses an event recorder to timestamp events (such as datetime
>values every time a cell divides), it would be nice to know what units are
>used so that mathematical calculations and comparisons can be made with
>those values.
>
>But the question still remains, what is the unit? And how does it relate
>to SI? The major complicating factor is that the gregorian calendar
>actually relies on many different units, one for each time period that
>varies in length. As there are many thousands of time periods in the
>gregorian calendar that vary in length, the conversion of a compound unit
>that uses many of these to an SI value is complex. For example, the
>datetime value "2002-10-10" could be re-expressed as "10
>GregorianOctober2002Days" + "10 Gregorian2002Months" + "2
>Gregorian3rdMilleniumYears" + "2 GregorianMillennia". I probably haven't
>even been fine enough in my divisions to accurately represent the
>variablity inherent in gregroian time periods, but you get the idea,
>right? It certainly isn't a simple constant or linear relationship with SI.
The same problem occurs with currencies. Thus 2/6 (two shillings and
sixpence) = 2.5 shillings = 12.5 newpence (UK). There cannot be a single
unit in 2/6
>This leads us, finally, to your comment about SAS datetime values. SAS
>"knows" the gregorian calendar. When any datetime value is passed to SAS,
>you can tell SAS what format it is in (akin to describing scientific
>notation), and it will be able to parse that value, convert to the number
>of seconds the value represents using its internal calendar, and subtract
>the seconds before 1960 (to allow for higher precision), and store the
>value in seconds. The fact that SAS stores datetime values in seconds
>highlights even more strongly that 1) datetime values are quantities, and
>2) those quantities have units (in SAS, they use SI seconds as the unit).
>
>So, as I said, my thinking has lead me down a difficult path. Its
>clarified some things (datetimes are quantities and so should have units)
>and muddied others (what precisely should the datetime unit(s) be for
>gregorian dates?). The problem seems to rest with the variable length of
>gregorian calendar periods. What we should probably do is recommend that
>everyone use SI seconds from a particular reference point (say, 1960) for
>storing their datetime values, but that is somewhat impractical. The
>alternative is to store the datetime values in gregorian units, but then
>STMML is not up to the task of describing the conversion formula to SI
>seconds for a "gregorian".
STMML offers the approaches:
(a) use SI seconds, minutes and hours FOR DURATIONS. Everything else
requires careful definition (day. month, year...)
(b) use XSD durations for durations. This is precise in the XSD document =
you just have to have software that processes it
(c) use XSD/ISO8601 for datetimes (for actual dates and times). This is
almost universally regarded as the best way
If you have metersPerSecond - no problem. If you have metersPerYear, YOU
have to say what a year is :-)
P.
>Food for thought. Chew well. I'd love to hear alternative views.
>
>Matt
>
>Scott Chapal wrote:
>>Matt Jones <jones at nceas.ucsb.edu> writes:
>>
>>>OK, so, the question is:
>>> What is the unit for a scalar date-time value like "2002-10-10
>>>17:34:45"? How can this unit be defined in STMML? How is this unit
>>>related to the fundamental SI unit of second?
>>
>>Considering the example of using SAS for date processing, the numeric
>>representation for datetime is seconds (elapsed since Jan 1. 1960),
>>but the presentation (expression) of that value is merely a a
>>user-specified format. It is in the 'format' that the complexity of
>>the calendar is algorithmed.
>>I agree with Peter Murray-Rust:
>>
>>>I don't think it has units - it has a dataType of xsd:dateTime
>>
>>There are no units for "2002-10-10 17:34:45", date-time
>>doesn't have units. Elasped time units are derived from seconds.
>>-Scott
>
>
>--
>*******************************************************************
>Matt Jones jones at nceas.ucsb.edu
>http://www.nceas.ucsb.edu/ Fax: 425-920-2439 Ph: 907-789-0496
>National Center for Ecological Analysis and Synthesis (NCEAS)
>
>Interested in ecological informatics? http://www.ecoinformatics.org
>*******************************************************************
>
>_______________________________________________
>eml-dev mailing list
>eml-dev at ecoinformatics.org
>http://www.ecoinformatics.org/mailman/listinfo/eml-dev
More information about the Eml-dev
mailing list