[LTER-im] measurmentScale/precision - what definition? how tohandle?

Mon Aug 4 07:47:42 PDT 2003

Peter, others,

regarding the previous comment...

> However we go, its obivous that we need to re write the definiation of
> precision since, as David points out, its doesnt define the term
> precision. - is it significant digits or an iterval? and does that
> refer only to the mimumum reported digit or interval or is it a
> statement of accuracy?

I agree.  We need to rewrite the definition.  It is interval, not
significant digits, because significant digits is just a special case of
interval, and we need to stay general.  It is not a statement of
accuracy OR statistically qualified "precision", because there is no
universally-received definitions for these things, and no way to attach
a custom definition to the precision element.  

Tim

P.S.  This doesn't solve Barbara's two-fish-scales problem.  Alas!  I
fear it is unsolvable.  Perhaps she should just report the worst of the
two relevant precisions, or split the table.  

> 
> 
> 
> Peter McCartney (peter.mccartney at asu.edu)
> Center for Environmental-Studies
> Arizona State University
> 
> 
>      -----Original Message-----
>      From: Wade Sheldon [mailto:sheldon at uga.edu]
>      Sent: Friday, August 01, 2003 7:11 AM
>      To: dblankman at lternet.edu; Matt Jones
>      Cc: im at lternet.edu; eml-dev at ecoinformatics.org
>      Subject: Re: [LTER-im] measurmentScale/precision - what
>      definition? how to handle?
> 
>      David and all,
> 
>      This is an important point to nail down, because it has
>      bearings on both statistical analysis and display of data
>      set values by eml-savvy software (i.e. when the data are
>      stored in an RDBMS field or program variable using a single
>      or double-precision floating point storage type that
>      supports arbitrary scale and precision).
> 
>      In my experience, most researchers use "precision" to
>      reflect the number of significant decimal places to display
>      based on the stated or perceived accuracy of the analytical
>      procedure, or instrument readability if that information is
>      not known. In other words this is used as a surrogate
>      for significant digits, which is generally a more accurate
>      way of conveying this information but poorly supported in
>      most computational software (i.e. without resorting to
>      scientific notation).
> 
>      When I read the eml spec I interpreted "precision" to
>      be what I more commonly see described as "accuracy", or the
>      smallest difference between two measurements that can be
>      resolved using the stated analytical method. This is closely
>      related to the significant digits concept but allows values
>      that are not even powers of 10 (e.g. .005).
> 
>      At GCE we store precision information for all numerical
>      attributes in data sets as integers indicating the number of
>      significant decimal points to display (i.e. our approach is
>      most consistent with your mathematics definition below).
>      This value is based on the accuracy/readability reported by
>      the investigators on metadata forms, or is determined by
>      instrument specifications or value inspection if the
>      investigator didn't provide the information and couldn't be
>      contacted. For data that span many orders of magnitude (e.g.
>      bacterial abundances ranging from 10^4 to 10^8) we use an
>      exponential data storage type and report precision as
>      significant digits. This precision information is used to
>      generate input masks for data editing forms and output
>      format commands when data sets are exported in ASCII format.
>      It is also used to (optionally) round or truncate values
>      following calculations of derived attributes to remove
>      spurious trailing decimal places. To support eml precision I
>      am just using the inverse power of 10 of my precision values
>      (i.e. 10^-x, so GCE precision = 2 becomes eml precision =
>      .01), and software writers will presumably have to reverse
>      this process (using common logs and rounding) when integer
>      decimal place tokens are needed for formatted output
>      statement arguments.
> 
>      I am interested to hear other comments on this, but in the
>      absence of reported precision I think using 0 would be worse
>      than nothing as it could definitely lead to inappropriate
>      data handling and analysis. I think the only legitimate way
>      to "fudge" precision in the absence of contributor feedback
>      is value inspection for flat files (i.e. look up maximum
>      number of digits past the decimal point) or maximum number
>      of "used" decimal places for RDBMS entries. It appears to me
>      that precision and units-dictionary compliance are clearly
>      going to be the make-or-break issues in the decision to
>      provide attribute-level metadata for legacy data sets, and
>      where the most effort and resources will be required.
> 
>      Wade Sheldon
>      GCE-LTER Information Manager
> 
> 
>      ----- Original Message -----
> 
>           From: David Blankman
>           To: Matt Jones
>           Cc: im at lternet.edu ; eml-dev at ecoinformatics.org
>           Sent: Thursday, July 31, 2003 9:38 PM
>           Subject: [LTER-im] measurmentScale/precision -
>           what definition? how to handle?
> 
>           Matt & IMs & EML-Dev
> 
>           How to Handle Missing Precision Information
>           Most of the metadata files that I have been
>           working with and most of those from sites like NTL
>           do not have precision information. While XML Spy
>           seems to validate empty elements, the EML
>           Validator service does a better job and will not
>           allow empty elements.
> 
>           Because many, if not most, of the LTER Information
>           Managers have told me that they need to check with
>           researchers to get precision informaton, it may be
>           some time before we are able to get precision
>           information.
> 
>           Initially I thought that we could handle precision
>           by just using empty elements but that seems not
>           possible.
> 
>           It seems to me that we have two alternatives:
> 
>             1. Use a precision of "0" to indicate that
>                precision is missing.
>             2. Put in metadata without dataTable.
> 
>           Perhaps the problem with precision is that
>           different people are interpreting precision
>           differently.
> 
>           The eml documentation states:
>           <doc:description>The precision element represents
>           the precision
>                   of the measurement, in the same unit as
>           the measurement. For
>                   example, for an attribute with unit
>           "meter", a precision of "0.1"
>                   would be interpreted as precise to the
>           nearest 1/10th of a
>                   meter, and a precision of "1" would be
>           interpreted as precise
>                   to the nearest 1 meter.
>           </doc:description>
> 
>           This description does not help since it does not
>           defiine precision, but rather assumes that you
>           know what precison means.  I remember that we
>           discissed the definition, but I cannot remember
>           what definition we decided to use.
> 
>           Some definitions:
>           b. The number of significant digits to which a
>           value has been reliably measured.
> 
>           precision: 1. The degree of mutual agreement among
>           a series of individual measurements, values, or
>           results; often, but not necessarily, expressed by
>           the standard deviation. 2. With respect to a set
>           of independent devices of the same design, the
>           ability of these devices to produce the same value
>           or result, given the same input conditions and
>           operating in the same environment. 3. With respect
>           to a single device, put into operation repeatedly
>           without adjustments, the ability to produce the
>           same value or result, given the same input
>           conditions and operating in the same environment.
>           Synonym (for defs. 1, 2, and 3) reproducibility.
>           4. In computer science, a measure of the ability
>           to distinguish between nearly equal values. (188)
>           5. The degree of discrimination with which a
>           quantity is stated; for example, a three-digit
>           numeral to the base 10 discriminates among 1000
>           possibilities.
> 
>           <mathematics> The number of decimal places to
>           which a number
>           is computed.
> 
>           What concept are we trying to capture?
> 
>           Can the precision be simply a statement of the
>           number of decimal points in the data, e.g. unit =
>           meter
>           DATA
>           1.75
>           10.6
>           11.765
> 
>           Can we say that the precision is .001 without
>           knowing anything about the source of the data?
> 
>           Or are we making a statement about the number of
>           significant digits, for example, a data logger can
>           record 4 digits, e.g.
> 
>           The following can be recorded:
> 
>           12.75
>           127.5
>           1.275
>           1275
> 
>           but NOT 127.53
> 
>           Is the precision here also .001?
> 
>           If the data is derived data, is the precsion
>           depenmdent on the precision of the original data,
>           e.g. an instrument can only discriminate to .1
>           meter, but the data involves some statistical
>           operation and the data is reported with additional
>           decimal places.
> 
>           unit = meter
> 
>           Original Data
> 
>           12.1
>           11.5
>           26.4
> 
>           Reported/Derived DATA
>           11.75
>           10.6
>           21.765
> 
>           Is the precision 0.1 or 0.001?
> 
>           David
> 
>           --
>           David Blankman
>           EML Integration Developer
>           LTER Network Office
>           801 University, SE #104
>           Albuquerque, NM 87106
>           (505) 272-7346

-- 
Tim Bergsma
LTER Information Manager
W.K. Kellogg Biological Station
Michigan State University
Hickory Corners, MI   49060
269/671-2337
tbergsma at kbs.msu.edu
http://lter.kbs.msu.edu