[LTER-im] measurmentScale/precision - what definition? how tohandle?

Sun Aug 3 08:15:12 PDT 2003

MessagePeter,

Thanks for the perspective. In response to your point:

"What im hearing, however, is the use of precision as a means of conveying accuracy by stating the interval (or significant digit, depnding on your definition) that spans the perceived error. Implicit in this perspective is the expectation that the data have been truncated or rounded according to that precision."

That's correct, and in my experience that process is typically carried out prior to data submission by investigators or is intrinsic to the data logging or post-processing routines. The major exception is calculation of secondary/derived attributes after data submission, and in those cases we report precision based on the significant digits of the primary attributes used for the calculation

--Wade Sheldon

----- Original Message ----- 
  From: Peter McCartney 
  To: 'Wade Sheldon' ; dblankman at lternet.edu ; Matt Jones 
  Cc: im at lternet.edu ; eml-dev at ecoinformatics.org 
  Sent: Friday, August 01, 2003 1:53 PM
  Subject: RE: [LTER-im] measurmentScale/precision - what definition? how tohandle?

  My impression is that these debates over precision involve people looking at essentially the same beast from different perspectives. To clear the record - i didnt write the precision element, but i did contribute the measurement accuracy element (from FGDC). My own personal understanding of the difference between them was that precision merely identified the recorded resolution of the data ("values represent meters to the nearest 100th"), corresponding to FGDC 5.1.2.4.2.4.  Attribute Resolution. where accuracy reflected some assessment of the likelyhood that the reported value corresponds to the actual value (usually determined through some statistical test either on the acutal data stream or on some calibration data stream - FGDC 5.1.2.7). What im hearing, however, is the use of precision as a means of conveying accuracy by stating the interval (or significant digit, depnding on your definition) that spans the perceived error. Implicit in this perspective is the expectation that the data have been truncated or rounded according to that precision. 

  In reading the description for the precision element, i can see how Wade would arrive at the conclusion that this latter description is the intended use. According to my understanding, precision is merely a qualifier to units to show the lowest increment that values are reported and that everything being debated here should be focused on Accuracy rather than Precision.  

  on the one hand, it could be seen as pointless to release data to three decimal places but state that they carry an error of 1.2. On the other hand, i could see an argument for releasing data as they are and allowing the end user to make their own adjustments according to the accuracy information rather than rounding the data in advance.  

  However we go, its obivous that we need to re write the definiation of precision since, as David points out, its doesnt define the term precision. - is it significant digits or an iterval? and does that refer only to the mimumum reported digit or interval or is it a statement of accuracy? 

  Peter McCartney (peter.mccartney at asu.edu)
  Center for Environmental-Studies
  Arizona State University

    -----Original Message-----
    From: Wade Sheldon [mailto:sheldon at uga.edu] 
    Sent: Friday, August 01, 2003 7:11 AM
    To: dblankman at lternet.edu; Matt Jones
    Cc: im at lternet.edu; eml-dev at ecoinformatics.org
    Subject: Re: [LTER-im] measurmentScale/precision - what definition? how to handle?

    David and all,

    This is an important point to nail down, because it has bearings on both statistical analysis and display of data set values by eml-savvy software (i.e. when the data are stored in an RDBMS field or program variable using a single or double-precision floating point storage type that supports arbitrary scale and precision).

    In my experience, most researchers use "precision" to reflect the number of significant decimal places to display based on the stated or perceived accuracy of the analytical procedure, or instrument readability if that information is not known. In other words this is used as a surrogate for significant digits, which is generally a more accurate way of conveying this information but poorly supported in most computational software (i.e. without resorting to scientific notation). 

    When I read the eml spec I interpreted "precision" to be what I more commonly see described as "accuracy", or the smallest difference between two measurements that can be resolved using the stated analytical method. This is closely related to the significant digits concept but allows values that are not even powers of 10 (e.g. .005).

    At GCE we store precision information for all numerical attributes in data sets as integers indicating the number of significant decimal points to display (i.e. our approach is most consistent with your mathematics definition below). This value is based on the accuracy/readability reported by the investigators on metadata forms, or is determined by instrument specifications or value inspection if the investigator didn't provide the information and couldn't be contacted. For data that span many orders of magnitude (e.g. bacterial abundances ranging from 10^4 to 10^8) we use an exponential data storage type and report precision as significant digits. This precision information is used to generate input masks for data editing forms and output format commands when data sets are exported in ASCII format. It is also used to (optionally) round or truncate values following calculations of derived attributes to remove spurious trailing decimal places. To support eml precision I am just using the inverse power of 10 of my precision values (i.e. 10^-x, so GCE precision = 2 becomes eml precision = .01), and software writers will presumably have to reverse this process (using common logs and rounding) when integer decimal place tokens are needed for formatted output statement arguments.

    I am interested to hear other comments on this, but in the absence of reported precision I think using 0 would be worse than nothing as it could definitely lead to inappropriate data handling and analysis. I think the only legitimate way to "fudge" precision in the absence of contributor feedback is value inspection for flat files (i.e. look up maximum number of digits past the decimal point) or maximum number of "used" decimal places for RDBMS entries. It appears to me that precision and units-dictionary compliance are clearly going to be the make-or-break issues in the decision to provide attribute-level metadata for legacy data sets, and where the most effort and resources will be required.

    Wade Sheldon
    GCE-LTER Information Manager

    ----- Original Message ----- 
      From: David Blankman 
      To: Matt Jones 
      Cc: im at lternet.edu ; eml-dev at ecoinformatics.org 
      Sent: Thursday, July 31, 2003 9:38 PM
      Subject: [LTER-im] measurmentScale/precision - what definition? how to handle?

      Matt & IMs & EML-Dev

      How to Handle Missing Precision Information
      Most of the metadata files that I have been working with and most of those from sites like NTL do not have precision information. While XML Spy seems to validate empty elements, the EML Validator service does a better job and will not allow empty elements.

      Because many, if not most, of the LTER Information Managers have told me that they need to check with researchers to get precision informaton, it may be some time before we are able to get precision information. 

      Initially I thought that we could handle precision by just using empty elements but that seems not possible.

      It seems to me that we have two alternatives:

        1.. Use a precision of "0" to indicate that precision is missing. 
        2.. Put in metadata without dataTable. 
      Perhaps the problem with precision is that different people are interpreting precision differently. 

      The eml documentation states: 
      <doc:description>The precision element represents the precision
              of the measurement, in the same unit as the measurement. For
              example, for an attribute with unit "meter", a precision of "0.1"
              would be interpreted as precise to the nearest 1/10th of a
              meter, and a precision of "1" would be interpreted as precise
              to the nearest 1 meter.
      </doc:description>

      This description does not help since it does not defiine precision, but rather assumes that you know what precison means.  I remember that we discissed the definition, but I cannot remember what definition we decided to use.

      Some definitions:
      b. The number of significant digits to which a value has been reliably measured.

      precision: 1. The degree of mutual agreement among a series of individual measurements, values, or results; often, but not necessarily, expressed by the standard deviation. 2. With respect to a set of independent devices of the same design, the ability of these devices to produce the same value or result, given the same input conditions and operating in the same environment. 3. With respect to a single device, put into operation repeatedly without adjustments, the ability to produce the same value or result, given the same input conditions and operating in the same environment. Synonym (for defs. 1, 2, and 3) reproducibility. 4. In computer science, a measure of the ability to distinguish between nearly equal values. (188) 5. The degree of discrimination with which a quantity is stated; for example, a three-digit numeral to the base 10 discriminates among 1000 possibilities. 

      <mathematics> The number of decimal places to which a number
      is computed.

      What concept are we trying to capture?

      Can the precision be simply a statement of the number of decimal points in the data, e.g. unit = meter
      DATA
      1.75
      10.6
      11.765

      Can we say that the precision is .001 without knowing anything about the source of the data?

      Or are we making a statement about the number of significant digits, for example, a data logger can record 4 digits, e.g.

      The following can be recorded:

      12.75
      127.5
      1.275
      1275

      but NOT 127.53

      Is the precision here also .001?

      If the data is derived data, is the precsion depenmdent on the precision of the original data, e.g. an instrument can only discriminate to .1 meter, but the data involves some statistical operation and the data is reported with additional decimal places.

      unit = meter

      Original Data

      12.1
      11.5
      26.4

      Reported/Derived DATA
      11.75
      10.6
      21.765

      Is the precision 0.1 or 0.001?

      David

-- 
David Blankman
EML Integration Developer
LTER Network Office
801 University, SE #104
Albuquerque, NM 87106
(505) 272-7346
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20030803/2b649fca/attachment.htm