semantics of counting and measuring things

Matt Jones jones at nceas.ucsb.edu
Wed Mar 26 09:45:39 PST 2003


Here's my naive perspective on this set of thorny issues that have risen 
to the surface yet again...

We put 'nominalDay' into EML as a 'day' unit of constant length just to 
accomodate those people that wanted to refer to time durations without 
reference to a calendar and all of its associated problems.  Thus, a 
nominal day is a unit of time that is exactly 60*60*24 seconds.  And so 
it does not really correspond to they concept of Julian day in any 
meaningful way, in that Julian day is tied to a calendar and can not be 
unambiguously converted to 'seconds' without consulting a calendar to 
determine how many seconds were in each of those days. That is, each 
Julian day can contain a different number of seconds, while nominalDays 
are constant duration.  Thus, 'nominalDay' is not a count as I think 
Peter was saying.  It would be perfectly legitimate to refer to "1.87 
nominalDay", which isn't really a count.  Counts are usually integral.

Once you bring up the 'count issue' things get murky quickly. Whether or 
not we are going to "start naming units for all the things we might 
count integer numbers of" is a tough issue. On the one hand, I agree 
that it is silly to do so -- we all know that a count is a count is a 
count.  On the other, sometimes you do need to know exactly what was 
counted.  And the issue extends far outside counts, and includes the 
other types of measurements such as measurements of mass or length. 
Basically, we need to determine what is the relationship between 
measurement unit and the more complex semantics of the substance or 
phenomenon that was measured, and how this gets encoded in EML.

Here are some example quantities that might be found in an ecological 
data set (albeit some aren't very realistic):

    1         (just a dimensionless number)
    2 seconds
    3 meters
    4 meters per second
    5 joules
    6 grams of soil
    7 micrograms of Carbon
    8 liters of water
    9 micrograms of Carbon per liter of water
   10 micrograms of Carbon per liter of ethanol
   11 micrograms of carbon per microgram of Nitrogen
   12 micrograms of Carbon per microgram of Potassium
   13 cells of human blood
   14 cells of E. coli
   15 cells of E. coli per cell of human blood
   16 antelope
   17 mountain lions
   18 antelope per mountain lion
   19 square centimeters
   20 square centimeters of algal turf
   30 square centimeters of mussels
   40 square centimeter quadrat
  0.5 square centimeters of algal turf per square centimeter of quadrat
  50% areal cover of algal turf in a 40 square centimeter quadrat
0.67 square centimeters of algal turf per square centimeter of mussels

OK, so....maybe you see where I'm going with this.  I could go on with 
the list, but I'll restrain myself :)

Things like meters are pretty straightforward all by themselves.  Even 
ratios such as meters per second are pretty clear.  For "8 liters of 
water", do we need the "of water" phrase?  Is a "liter of water" a 
different measurement unit from a "liter of ethanol"?  I honestly don't 
know, but I think there is something fundamentally different about them 
(ie, you can't willy nilly add them together in an analysis without 
somehow taking into account the difference).  Is "cells of human blood" 
a different measurement unit from "cells of E. coli", or are they both 
just "cells"?  Is "grams of soil" a different measurement unit from 
"grams of Nitrogen", or are they both just "grams"?

It seems the difficulties crop up when either 1) the counts are of 
different 'things'  (e.g., 13,14,16,17), or 2) you try to combine or 
compare measurements or counts on different substances (e.g., 9-12, 15, 
18).  What is the difference between 4 Carbon atoms, 4 bacterial cells, 
and 4 antelope, and 4 mountain lions?  Is the 'measurement unit' the 
same for all of them (ie, a dimensionless count)? If you take a ratio of 
two of these is the result dimensionless (e.g., 4 antelope/2 mountain 
lions = 2) or not (e.g., 4 antelope/2mountain lions = 2 antelope per 
mountain lion)?  I think not, but this is just from my gut.  How about 
for more abstact things like grams or moles of atoms (e.g., 10 
micrograms of Carbon / 5 micrograms of Nitrogen = 2, or does it equal 2 
micrograms of Carbon per microgram of Nitrogen)?  How about moles of 
atoms (5 moles Carbon / 10 moles Nitrogen = 0.5 moles, or is it  0.5 
moles of Carbon per mole of Nitrogen)?  How about atoms, which really is 
the same as moles (5 atoms of Carbon / 10 atoms of Nitrogen = 0.5, or is 
it 0.5 Carbon atoms per Nitrogen atom)?

I think these are very tricky, and I can come up with a lot of tough 
examples.  percentages, ppm, and ppb come quickly to mind from our 
previous discussions because, on first inspection, the units cancel and 
result in a dimensionless number.  But I'm not sure they truly do.

The bottom line to me is that it is really impossible to segregate the 
measurement from what is being measured.  Thus, to really understand the 
'measurement unit' for a quantity, one has to have a typology of what is 
being measured as well as how it is measured. For example, such a 
typology would know that a 'Carbon atom' is a type of 'atom', and that 
both a 'blood cell' and an 'E. coli cell' are types of 'cell'.  This is 
what I think of when I start thinking about 'semantic types' for 
measured values.  Of course, I haven't thought this out enough to figure 
out if 'semantic type' is just an extension of measurement unit or if it 
is something fundamentally orthogonal to it.  I think the SEEK project 
will be tackling these issues head on, so I at least plan on addressing 
some of these thorns in the SEEK Knowledge Representation group, and the 
SEEK Semantic Mediation System.  I think it should be a while before we 
let these considerations have a direct impact on the released version of 
EML, as this is very unstable ground.

Cheers,
Matt


Peter McCartney wrote:
> I guess the 'nominalDay' unit slipped past my notice.
> 
> This question is drifting out of the datetime issue and into the count 
> issue which I don't think we ever resolved very well. To my 
> recollection, counts (and percents, etc) were not considered 
> measurements by STMML and are all dimensionless units. I don't think we 
> are going to start naming units for all the things we might count 
> integer numbers of, are we? I would have encoded day of the year as 
> simply ratio with a unit of dimensionless and a domain of 0 to 365.
> 
> Peter McCartney (peter.mccartney at asu.edu)
> Center for Environmental-Studies
> Arizona State University
>  
> 
> 
> -----Original Message-----
> From: Matt Jones [mailto:jones at nceas.ucsb.edu]
> Sent: Wednesday, March 19, 2003 9:58 AM
> To: eml-dev at ecoinformatics.org
> Cc: Henshaw, Don; Spycher, Gody
> Subject: Re: Julian Date format -- interval not dateTime (my thought)
> 
> 
> I agree.  We created the unit 'nominalDay' precisely for this purpose. 
> It represents an integer number of days.
> 
> Matt
> 
> Tim Bergsma wrote:
>  > Scott,
>  >
>  > I was also wondering about "this advice".  I was taught somewhere not
>  > to confuse Julian Day with day-of-year.  I use day-of-year, but I
>  > don't really know what Julian Day is, and therefore hesitate to say
>  > too much. With regard to "saying that something takes 200 Julian
>  > Days", this is clearly the same concept as eml dictionary unit
>  > nominalDay.
>  >
>  > Tim.
>  >
>  > Scott Chapal wrote:
>  >
>  >>Was there ever any determination about Julian Day?
>  >>
>  >>What are we calling Julian Day in EML any way?
>  >>
>  >>YYYYddd or ddd ??
>  >>
>  >>David's numbers are what?  dddYYYY?
>  >>
>  >>Or does this advice pertain?  "The system of Julian days should not be
>  >>confused with the simpler system of the same name which associates a
>  >>date with the number of days elapsed since January 1st of the same
>  >>year (according to which 2000-12-31 is day 366 of the year 2000)."
>  >>
>  >>Because the 'real' Julian Day is used in astronomy to number
>  >>chronological days...
>  >>
>  >>So,
>  >>
>  >>ddd     - RATIO
>  >>YYYYddd - ORDINAL
>  >>
>  >>Or just other dateTime formats??
>  >>
>  >>-Scott
>  >>
>  >>David Blankman <dblankman at lternet.edu> writes:
>  >>
>  >>
>  >>>Don,
>  >>>
>  >>>I am not sure what the correct representation of Julian dates would
>  >>>be. My sense is that the Julian date scale is actually an INTERVAL
>  >>>scale not a dateTIME scale; arithmetic calculations are consistent,
>  >>>that is, 2451919 - 2451819 gives the same value as 2351919 - 2351819.
>  >>>It probably also makes sense to say that something that takes 200
>  >>>julian days  = 2 * 100 julian days. My first thought was that it was
>  >>>a ratio scale, but it is more like the celcius scale than the kelvin
>  >>>scale in that the 0 on the julian scale is an arbitrary one.
>  >>>
>  >>>
>  >>>The julian date scale does not suffer from the problems that are
>  >>>associated with a standard calendar scale, that is, the only unit is
>  >>>the day and the fraction of a day; there is nothing like Feb 20 - Jan
>  >>>20 representing a different number of days than Aug 20 - July 20.
>  >>
>  >>>I would appreciate enl-dev feedback on that.
>  >>>
>  >>>David
>  >>>
>  >>>Henshaw, Don wrote:
>  >>
>  >>>> On another topic:
>  >>>
>  >>>>Can a julian date be represented in the format string for
>  >>>>measurementScale of datetime
>  >>>
>  >>>>i.e., YYYYddd
>  >>>> Other notes (being rather picky): pertaining to
>  >>>>eml-unitDictionary.xml (2.0.0)
>  >>
>  >>--
>  >>\SEC
>  >>_______________________________________________
>  >>eml-dev mailing list
>  >>eml-dev at ecoinformatics.org
>  >>http://www.ecoinformatics.org/mailman/listinfo/eml-dev
>  >
>  >
> 
> _______________________________________________
> eml-dev mailing list
> eml-dev at ecoinformatics.org 
> http://www.ecoinformatics.org/mailman/listinfo/eml-dev
> 




More information about the Eml-dev mailing list