semantics of counting and measuring things
jones at nceas.ucsb.edu
Wed Mar 26 09:45:39 PST 2003
Here's my naive perspective on this set of thorny issues that have risen
to the surface yet again...
We put 'nominalDay' into EML as a 'day' unit of constant length just to
accomodate those people that wanted to refer to time durations without
reference to a calendar and all of its associated problems. Thus, a
nominal day is a unit of time that is exactly 60*60*24 seconds. And so
it does not really correspond to they concept of Julian day in any
meaningful way, in that Julian day is tied to a calendar and can not be
unambiguously converted to 'seconds' without consulting a calendar to
determine how many seconds were in each of those days. That is, each
Julian day can contain a different number of seconds, while nominalDays
are constant duration. Thus, 'nominalDay' is not a count as I think
Peter was saying. It would be perfectly legitimate to refer to "1.87
nominalDay", which isn't really a count. Counts are usually integral.
Once you bring up the 'count issue' things get murky quickly. Whether or
not we are going to "start naming units for all the things we might
count integer numbers of" is a tough issue. On the one hand, I agree
that it is silly to do so -- we all know that a count is a count is a
count. On the other, sometimes you do need to know exactly what was
counted. And the issue extends far outside counts, and includes the
other types of measurements such as measurements of mass or length.
Basically, we need to determine what is the relationship between
measurement unit and the more complex semantics of the substance or
phenomenon that was measured, and how this gets encoded in EML.
Here are some example quantities that might be found in an ecological
data set (albeit some aren't very realistic):
1 (just a dimensionless number)
4 meters per second
6 grams of soil
7 micrograms of Carbon
8 liters of water
9 micrograms of Carbon per liter of water
10 micrograms of Carbon per liter of ethanol
11 micrograms of carbon per microgram of Nitrogen
12 micrograms of Carbon per microgram of Potassium
13 cells of human blood
14 cells of E. coli
15 cells of E. coli per cell of human blood
17 mountain lions
18 antelope per mountain lion
19 square centimeters
20 square centimeters of algal turf
30 square centimeters of mussels
40 square centimeter quadrat
0.5 square centimeters of algal turf per square centimeter of quadrat
50% areal cover of algal turf in a 40 square centimeter quadrat
0.67 square centimeters of algal turf per square centimeter of mussels
OK, so....maybe you see where I'm going with this. I could go on with
the list, but I'll restrain myself :)
Things like meters are pretty straightforward all by themselves. Even
ratios such as meters per second are pretty clear. For "8 liters of
water", do we need the "of water" phrase? Is a "liter of water" a
different measurement unit from a "liter of ethanol"? I honestly don't
know, but I think there is something fundamentally different about them
(ie, you can't willy nilly add them together in an analysis without
somehow taking into account the difference). Is "cells of human blood"
a different measurement unit from "cells of E. coli", or are they both
just "cells"? Is "grams of soil" a different measurement unit from
"grams of Nitrogen", or are they both just "grams"?
It seems the difficulties crop up when either 1) the counts are of
different 'things' (e.g., 13,14,16,17), or 2) you try to combine or
compare measurements or counts on different substances (e.g., 9-12, 15,
18). What is the difference between 4 Carbon atoms, 4 bacterial cells,
and 4 antelope, and 4 mountain lions? Is the 'measurement unit' the
same for all of them (ie, a dimensionless count)? If you take a ratio of
two of these is the result dimensionless (e.g., 4 antelope/2 mountain
lions = 2) or not (e.g., 4 antelope/2mountain lions = 2 antelope per
mountain lion)? I think not, but this is just from my gut. How about
for more abstact things like grams or moles of atoms (e.g., 10
micrograms of Carbon / 5 micrograms of Nitrogen = 2, or does it equal 2
micrograms of Carbon per microgram of Nitrogen)? How about moles of
atoms (5 moles Carbon / 10 moles Nitrogen = 0.5 moles, or is it 0.5
moles of Carbon per mole of Nitrogen)? How about atoms, which really is
the same as moles (5 atoms of Carbon / 10 atoms of Nitrogen = 0.5, or is
it 0.5 Carbon atoms per Nitrogen atom)?
I think these are very tricky, and I can come up with a lot of tough
examples. percentages, ppm, and ppb come quickly to mind from our
previous discussions because, on first inspection, the units cancel and
result in a dimensionless number. But I'm not sure they truly do.
The bottom line to me is that it is really impossible to segregate the
measurement from what is being measured. Thus, to really understand the
'measurement unit' for a quantity, one has to have a typology of what is
being measured as well as how it is measured. For example, such a
typology would know that a 'Carbon atom' is a type of 'atom', and that
both a 'blood cell' and an 'E. coli cell' are types of 'cell'. This is
what I think of when I start thinking about 'semantic types' for
measured values. Of course, I haven't thought this out enough to figure
out if 'semantic type' is just an extension of measurement unit or if it
is something fundamentally orthogonal to it. I think the SEEK project
will be tackling these issues head on, so I at least plan on addressing
some of these thorns in the SEEK Knowledge Representation group, and the
SEEK Semantic Mediation System. I think it should be a while before we
let these considerations have a direct impact on the released version of
EML, as this is very unstable ground.
Peter McCartney wrote:
> I guess the 'nominalDay' unit slipped past my notice.
> This question is drifting out of the datetime issue and into the count
> issue which I don't think we ever resolved very well. To my
> recollection, counts (and percents, etc) were not considered
> measurements by STMML and are all dimensionless units. I don't think we
> are going to start naming units for all the things we might count
> integer numbers of, are we? I would have encoded day of the year as
> simply ratio with a unit of dimensionless and a domain of 0 to 365.
> Peter McCartney (peter.mccartney at asu.edu)
> Center for Environmental-Studies
> Arizona State University
> -----Original Message-----
> From: Matt Jones [mailto:jones at nceas.ucsb.edu]
> Sent: Wednesday, March 19, 2003 9:58 AM
> To: eml-dev at ecoinformatics.org
> Cc: Henshaw, Don; Spycher, Gody
> Subject: Re: Julian Date format -- interval not dateTime (my thought)
> I agree. We created the unit 'nominalDay' precisely for this purpose.
> It represents an integer number of days.
> Tim Bergsma wrote:
> > Scott,
> > I was also wondering about "this advice". I was taught somewhere not
> > to confuse Julian Day with day-of-year. I use day-of-year, but I
> > don't really know what Julian Day is, and therefore hesitate to say
> > too much. With regard to "saying that something takes 200 Julian
> > Days", this is clearly the same concept as eml dictionary unit
> > nominalDay.
> > Tim.
> > Scott Chapal wrote:
> >>Was there ever any determination about Julian Day?
> >>What are we calling Julian Day in EML any way?
> >>YYYYddd or ddd ??
> >>David's numbers are what? dddYYYY?
> >>Or does this advice pertain? "The system of Julian days should not be
> >>confused with the simpler system of the same name which associates a
> >>date with the number of days elapsed since January 1st of the same
> >>year (according to which 2000-12-31 is day 366 of the year 2000)."
> >>Because the 'real' Julian Day is used in astronomy to number
> >>chronological days...
> >>ddd - RATIO
> >>YYYYddd - ORDINAL
> >>Or just other dateTime formats??
> >>David Blankman <dblankman at lternet.edu> writes:
> >>>I am not sure what the correct representation of Julian dates would
> >>>be. My sense is that the Julian date scale is actually an INTERVAL
> >>>scale not a dateTIME scale; arithmetic calculations are consistent,
> >>>that is, 2451919 - 2451819 gives the same value as 2351919 - 2351819.
> >>>It probably also makes sense to say that something that takes 200
> >>>julian days = 2 * 100 julian days. My first thought was that it was
> >>>a ratio scale, but it is more like the celcius scale than the kelvin
> >>>scale in that the 0 on the julian scale is an arbitrary one.
> >>>The julian date scale does not suffer from the problems that are
> >>>associated with a standard calendar scale, that is, the only unit is
> >>>the day and the fraction of a day; there is nothing like Feb 20 - Jan
> >>>20 representing a different number of days than Aug 20 - July 20.
> >>>I would appreciate enl-dev feedback on that.
> >>>Henshaw, Don wrote:
> >>>> On another topic:
> >>>>Can a julian date be represented in the format string for
> >>>>measurementScale of datetime
> >>>>i.e., YYYYddd
> >>>> Other notes (being rather picky): pertaining to
> >>>>eml-unitDictionary.xml (2.0.0)
> >>eml-dev mailing list
> >>eml-dev at ecoinformatics.org
> eml-dev mailing list
> eml-dev at ecoinformatics.org
More information about the Eml-dev