semantics of counting and measuring things

Tim Bergsma tbergsma at kbs.msu.edu
Thu Mar 27 08:36:05 PST 2003


At this point, we should re-read the nist page: 
http://physics.nist.gov/cuu/Units/introduction.html.  One problem we're
getting into is that, in part, we are speaking more than one language,
but using the same words.  Translation is in order.

In NIST-speak, what is "7 micrograms of Carbon"?
quantity in the general sense:  mass
quantity in the particular sense:  mass of carbon-in-the-sample
unit: the microgram
value of the physical quantity: 7 * the microgram

Peter's syntax "<value><units>" is equivalent to NIST
"value-of-the-physical-quantity".
Peter's syntax "<quantity>" is equivalent to NIST "particular
phenomenon, body, or substance".

In NIST-speak, what is "16 counts of antelope"?  
The exercise is doomed, because counting is FUNDAMENTALLY DIFFERENT from
measuring.  Measurement is inherently comparative:  "a unit is a
particular physical quantity, defined and adopted by convention, with
which other particular quantities of the same kind are compared to
express their value."  Counting antelope is not comparative.  It is
simply enumerative.  The number 16 is, in NIST-speak, unitless; in fact,
there is not even a physical quantity.  There is no phenomenon, body, or
substance to which "16 whatever" is ascribed.  You could certainly
recast this as a measure of herd size, and define the unit
"standardAntelope", but that is a research-level decision.  "Counts of"
is equivalent to multiplying by one: it is not really part of the unit.

When, however, we report complex observations or calculations, such as
grams of carbon per gram of soil, or antelope per mountain lion, we are
sorely tempted to embed the physical qualifiers, e.g. soil, antelope. 
To do so is practical, legitimate, and often necessary to prevent naive
manipulation of the observations.  However, soil and antelope function
as DIMENSIONS, not as units.  They can be used effectively in
dimensional analysis both with a cancelation intent (10 square
kilometers * 16 antelope per square kilometer / 5 antelope per mountain
lion  = 32 mountain lions) or to prevent cancelation (5 g-Carbon per 100
g-soil does not equal .05 nothings).  

A meaningful distinction between dimensions and units was acknowledged
as lacking from eml's data typology.  I now propose that
1.  the concept "unit" is a subclass of the concept "dimension"
2.  counts have dimensionality but not units, whether presented simply
or in complex with other counts and measurements
3.  units can be standardized because they are by definition
conventional; whereas dimensions cannot be practically standardized, due
to the inexaustible set of phenomena, bodies, and substances which
qualify as meaningful dimensions
4.  EML should be understood to document units formally (e.g. STMML;
machine processable) and dimensions informally (e.g. supporting
metadata; intervention required for processing).


-Tim.

Another problem is the case where a datum has a hybrid type involving
count and measurement, e.g.  antelope per square kilometer.  In
NIST-speak, the antelope are simply invisible, and the units are
"perSquareKilometer".

> Peter McCartney wrote:
> 
> Ive edited some of matts examples (and added some) to make them all
> comparably expressed with the form: <value> <unit>s of <quantity>. Ive
> then grouped them into what my naïve perception of categories would
> be, some of which are clearly supported in stmml already. This is
> hardly a serious consideration of the problem, but my point is that we
> are not so far from some common framwork for all these things even if
> it means recognizing that stmml only covers a portion of it.
> 
> 
>     counts:
>    16 counts of antelope
>    13 counts of   human blood cells
>    14 counts of  E. coli cells
> 
>    measurements:
>     7 micrograms of Carbon
>     8 liters of water
>   19 square centimeters of ground surface
> 
>     Proportions (similar quantities, similar units):
>     20 counts of hispanic students per 100 counts of students
>     6 dollars of taxed income per 100 dollars of income
> 
>         ratios (similar units, different quantities:
>     1 count of partridge per 1 count of pear tree
>    18 counts of antelope per 1 count of mountain lion
>     15 counts of cells of E. coli per 1 count of cells of human blood
>   11 micrograms of carbon per 1 microgram of Nitrogen
>    12 micrograms of Carbon per 1 microgram of Potassium
> 
>    concentrations/densities (different quantities, different units):
>    10 micrograms of Carbon per 1 liter of ethanol
>     5 counts of palo verde tree per 1 square kilometer of land surface
> 
> 
> 
> 
> 
> Peter McCartney (peter.mccartney at asu.edu)
> Center for Environmental-Studies
> Arizona State University
> 
> 
> -----Original Message-----
> From: Matt Jones [mailto:jones at nceas.ucsb.edu]
> Sent: Wednesday, March 26, 2003 10:46 AM
> To: eml-dev at ecoinformatics.org
> Cc: Henshaw, Don; Spycher, Gody; Bertram Ludaescher
> Subject: semantics of counting and measuring things
> 
> Here's my naive perspective on this set of thorny issues that have
> risen
> to the surface yet again...
> 
> We put 'nominalDay' into EML as a 'day' unit of constant length just
> to
> accomodate those people that wanted to refer to time durations without
> 
> reference to a calendar and all of its associated problems.  Thus, a
> nominal day is a unit of time that is exactly 60*60*24 seconds.  And
> so
> it does not really correspond to they concept of Julian day in any
> meaningful way, in that Julian day is tied to a calendar and can not
> be
> unambiguously converted to 'seconds' without consulting a calendar to
> determine how many seconds were in each of those days. That is, each
> Julian day can contain a different number of seconds, while
> nominalDays
> are constant duration.  Thus, 'nominalDay' is not a count as I think
> Peter was saying.  It would be perfectly legitimate to refer to "1.87
> nominalDay", which isn't really a count.  Counts are usually integral.
> 
> Once you bring up the 'count issue' things get murky quickly. Whether
> or
> not we are going to "start naming units for all the things we might
> count integer numbers of" is a tough issue. On the one hand, I agree
> that it is silly to do so -- we all know that a count is a count is a
> count.  On the other, sometimes you do need to know exactly what was
> counted.  And the issue extends far outside counts, and includes the
> other types of measurements such as measurements of mass or length.
> Basically, we need to determine what is the relationship between
> measurement unit and the more complex semantics of the substance or
> phenomenon that was measured, and how this gets encoded in EML.
> 
> Here are some example quantities that might be found in an ecological
> data set (albeit some aren't very realistic):
> 
>     1         (just a dimensionless number)
>     2 seconds
>     3 meters
>     4 meters per second
>     5 joules
>     6 grams of soil
>     7 micrograms of Carbon
>     8 liters of water
>     9 micrograms of Carbon per liter of water
>    10 micrograms of Carbon per liter of ethanol
>    11 micrograms of carbon per microgram of Nitrogen
>    12 micrograms of Carbon per microgram of Potassium
>    13 cells of human blood
>    14 cells of E. coli
>    15 cells of E. coli per cell of human blood
>    16 antelope
>    17 mountain lions
>    18 antelope per mountain lion
>    19 square centimeters
>    20 square centimeters of algal turf
>    30 square centimeters of mussels
>    40 square centimeter quadrat
>   0.5 square centimeters of algal turf per square centimeter of
> quadrat
>   50% areal cover of algal turf in a 40 square centimeter quadrat 0.67
> square centimeters of algal turf per square centimeter of mussels
> 
> OK, so....maybe you see where I'm going with this.  I could go on with
> 
> the list, but I'll restrain myself :)
> 
> Things like meters are pretty straightforward all by themselves.  Even
> 
> ratios such as meters per second are pretty clear.  For "8 liters of
> water", do we need the "of water" phrase?  Is a "liter of water" a
> different measurement unit from a "liter of ethanol"?  I honestly
> don't
> know, but I think there is something fundamentally different about
> them
> (ie, you can't willy nilly add them together in an analysis without
> somehow taking into account the difference).  Is "cells of human
> blood"
> a different measurement unit from "cells of E. coli", or are they both
> 
> just "cells"?  Is "grams of soil" a different measurement unit from
> "grams of Nitrogen", or are they both just "grams"?
> 
> It seems the difficulties crop up when either 1) the counts are of
> different 'things'  (e.g., 13,14,16,17), or 2) you try to combine or
> compare measurements or counts on different substances (e.g., 9-12,
> 15,
> 18).  What is the difference between 4 Carbon atoms, 4 bacterial
> cells,
> and 4 antelope, and 4 mountain lions?  Is the 'measurement unit' the
> same for all of them (ie, a dimensionless count)? If you take a ratio
> of
> two of these is the result dimensionless (e.g., 4 antelope/2 mountain
> lions = 2) or not (e.g., 4 antelope/2mountain lions = 2 antelope per
> mountain lion)?  I think not, but this is just from my gut.  How about
> 
> for more abstact things like grams or moles of atoms (e.g., 10
> micrograms of Carbon / 5 micrograms of Nitrogen = 2, or does it equal
> 2
> micrograms of Carbon per microgram of Nitrogen)?  How about moles of
> atoms (5 moles Carbon / 10 moles Nitrogen = 0.5 moles, or is it  0.5
> moles of Carbon per mole of Nitrogen)?  How about atoms, which really
> is
> the same as moles (5 atoms of Carbon / 10 atoms of Nitrogen = 0.5, or
> is
> it 0.5 Carbon atoms per Nitrogen atom)?
> 
> I think these are very tricky, and I can come up with a lot of tough
> examples.  percentages, ppm, and ppb come quickly to mind from our
> previous discussions because, on first inspection, the units cancel
> and
> result in a dimensionless number.  But I'm not sure they truly do.
> 
> The bottom line to me is that it is really impossible to segregate the
> 
> measurement from what is being measured.  Thus, to really understand
> the
> 'measurement unit' for a quantity, one has to have a typology of what
> is
> being measured as well as how it is measured. For example, such a
> typology would know that a 'Carbon atom' is a type of 'atom', and that
> 
> both a 'blood cell' and an 'E. coli cell' are types of 'cell'.  This
> is
> what I think of when I start thinking about 'semantic types' for
> measured values.  Of course, I haven't thought this out enough to
> figure
> out if 'semantic type' is just an extension of measurement unit or if
> it
> is something fundamentally orthogonal to it.  I think the SEEK project
> 
> will be tackling these issues head on, so I at least plan on
> addressing
> some of these thorns in the SEEK Knowledge Representation group, and
> the
> SEEK Semantic Mediation System.  I think it should be a while before
> we
> let these considerations have a direct impact on the released version
> of
> EML, as this is very unstable ground.
> 
> Cheers,
> Matt
> 
> Peter McCartney wrote:
> > I guess the 'nominalDay' unit slipped past my notice.
> >
> > This question is drifting out of the datetime issue and into the
> count
> > issue which I don't think we ever resolved very well. To my
> > recollection, counts (and percents, etc) were not considered
> > measurements by STMML and are all dimensionless units. I don't think
> we
> > are going to start naming units for all the things we might count
> > integer numbers of, are we? I would have encoded day of the year as
> > simply ratio with a unit of dimensionless and a domain of 0 to 365.
> >
> > Peter McCartney (peter.mccartney at asu.edu)
> > Center for Environmental-Studies
> > Arizona State University
> >
> >
> >
> > -----Original Message-----
> > From: Matt Jones [mailto:jones at nceas.ucsb.edu]
> > Sent: Wednesday, March 19, 2003 9:58 AM
> > To: eml-dev at ecoinformatics.org
> > Cc: Henshaw, Don; Spycher, Gody
> > Subject: Re: Julian Date format -- interval not dateTime (my
> thought)
> >
> >
> > I agree.  We created the unit 'nominalDay' precisely for this
> purpose.
> > It represents an integer number of days.
> >
> > Matt
> >
> > Tim Bergsma wrote:
> >  > Scott,
> >  >
> >  > I was also wondering about "this advice".  I was taught somewhere
> 
> > not  > to confuse Julian Day with day-of-year.  I use day-of-year,
> but
> > I  > don't really know what Julian Day is, and therefore hesitate to
> 
> > say  > too much. With regard to "saying that something takes 200
> > Julian  > Days", this is clearly the same concept as eml dictionary
> > unit  > nominalDay.  >
> >  > Tim.
> >  >
> >  > Scott Chapal wrote:
> >  >
> >  >>Was there ever any determination about Julian Day?
> >  >>
> >  >>What are we calling Julian Day in EML any way?
> >  >>
> >  >>YYYYddd or ddd ??
> >  >>
> >  >>David's numbers are what?  dddYYYY?
> >  >>
> >  >>Or does this advice pertain?  "The system of Julian days should
> not be
> >  >>confused with the simpler system of the same name which
> associates a
> >  >>date with the number of days elapsed since January 1st of the
> same
> >  >>year (according to which 2000-12-31 is day 366 of the year
> 2000)."
> >  >>
> >  >>Because the 'real' Julian Day is used in astronomy to number
> >  >>chronological days...
> >  >>
> >  >>So,
> >  >>
> >  >>ddd     - RATIO
> >  >>YYYYddd - ORDINAL
> >  >>
> >  >>Or just other dateTime formats??
> >  >>
> >  >>-Scott
> >  >>
> >  >>David Blankman <dblankman at lternet.edu> writes:
> >  >>
> >  >>
> >  >>>Don,
> >  >>>
> >  >>>I am not sure what the correct representation of Julian dates
> would
> >  >>>be. My sense is that the Julian date scale is actually an
> INTERVAL
> >  >>>scale not a dateTIME scale; arithmetic calculations are
> consistent,
> >  >>>that is, 2451919 - 2451819 gives the same value as 2351919 -
> 2351819.
> >  >>>It probably also makes sense to say that something that takes
> 200
> >  >>>julian days  = 2 * 100 julian days. My first thought was that it
> was
> >  >>>a ratio scale, but it is more like the celcius scale than the
> kelvin
> >  >>>scale in that the 0 on the julian scale is an arbitrary one.
> >  >>>
> >  >>>
> >  >>>The julian date scale does not suffer from the problems that are
> 
> >  >>>associated with a standard calendar scale, that is, the only
> unit is
> >  >>>the day and the fraction of a day; there is nothing like Feb 20
> - Jan
> >  >>>20 representing a different number of days than Aug 20 - July
> 20.
> >  >>
> >  >>>I would appreciate enl-dev feedback on that.
> >  >>>
> >  >>>David
> >  >>>
> >  >>>Henshaw, Don wrote:
> >  >>
> >  >>>> On another topic:
> >  >>>
> >  >>>>Can a julian date be represented in the format string for
> >  >>>>measurementScale of datetime
> >  >>>
> >  >>>>i.e., YYYYddd
> >  >>>> Other notes (being rather picky): pertaining to
> >  >>>>eml-unitDictionary.xml (2.0.0)
> >  >>
> >  >>--
> >  >>\SEC
> >  >>_______________________________________________
> >  >>eml-dev mailing list
> >  >>eml-dev at ecoinformatics.org
> >  >>http://www.ecoinformatics.org/mailman/listinfo/eml-dev
> >  >
> >  >
> >
> > _______________________________________________
> > eml-dev mailing list
> > eml-dev at ecoinformatics.org
> > http://www.ecoinformatics.org/mailman/listinfo/eml-dev
> >
> 
> _______________________________________________
> eml-dev mailing list
> eml-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/eml-dev

-- 
Tim Bergsma
LTER Information Manager
W.K. Kellogg Biological Station
Michigan State University
Hickory Corners, MI   49060
269/671-2337
tbergsma at kbs.msu.edu
http://lter.kbs.msu.edu



More information about the Eml-dev mailing list