[LTER-im] measurmentScale/precision - what definition? how to handle?

Wade Sheldon sheldon at uga.edu
Fri Aug 1 07:11:27 PDT 2003


David and all,

This is an important point to nail down, because it has bearings on both statistical analysis and display of data set values by eml-savvy software (i.e. when the data are stored in an RDBMS field or program variable using a single or double-precision floating point storage type that supports arbitrary scale and precision).

In my experience, most researchers use "precision" to reflect the number of significant decimal places to display based on the stated or perceived accuracy of the analytical procedure, or instrument readability if that information is not known. In other words this is used as a surrogate for significant digits, which is generally a more accurate way of conveying this information but poorly supported in most computational software (i.e. without resorting to scientific notation). 

When I read the eml spec I interpreted "precision" to be what I more commonly see described as "accuracy", or the smallest difference between two measurements that can be resolved using the stated analytical method. This is closely related to the significant digits concept but allows values that are not even powers of 10 (e.g. .005).

At GCE we store precision information for all numerical attributes in data sets as integers indicating the number of significant decimal points to display (i.e. our approach is most consistent with your mathematics definition below). This value is based on the accuracy/readability reported by the investigators on metadata forms, or is determined by instrument specifications or value inspection if the investigator didn't provide the information and couldn't be contacted. For data that span many orders of magnitude (e.g. bacterial abundances ranging from 10^4 to 10^8) we use an exponential data storage type and report precision as significant digits. This precision information is used to generate input masks for data editing forms and output format commands when data sets are exported in ASCII format. It is also used to (optionally) round or truncate values following calculations of derived attributes to remove spurious trailing decimal places. To support eml precision I am just using the inverse power of 10 of my precision values (i.e. 10^-x, so GCE precision = 2 becomes eml precision = .01), and software writers will presumably have to reverse this process (using common logs and rounding) when integer decimal place tokens are needed for formatted output statement arguments.

I am interested to hear other comments on this, but in the absence of reported precision I think using 0 would be worse than nothing as it could definitely lead to inappropriate data handling and analysis. I think the only legitimate way to "fudge" precision in the absence of contributor feedback is value inspection for flat files (i.e. look up maximum number of digits past the decimal point) or maximum number of "used" decimal places for RDBMS entries. It appears to me that precision and units-dictionary compliance are clearly going to be the make-or-break issues in the decision to provide attribute-level metadata for legacy data sets, and where the most effort and resources will be required.

Wade Sheldon
GCE-LTER Information Manager


----- Original Message ----- 
  From: David Blankman 
  To: Matt Jones 
  Cc: im at lternet.edu ; eml-dev at ecoinformatics.org 
  Sent: Thursday, July 31, 2003 9:38 PM
  Subject: [LTER-im] measurmentScale/precision - what definition? how to handle?


  Matt & IMs & EML-Dev


  How to Handle Missing Precision Information
  Most of the metadata files that I have been working with and most of those from sites like NTL do not have precision information. While XML Spy seems to validate empty elements, the EML Validator service does a better job and will not allow empty elements.

  Because many, if not most, of the LTER Information Managers have told me that they need to check with researchers to get precision informaton, it may be some time before we are able to get precision information. 

  Initially I thought that we could handle precision by just using empty elements but that seems not possible.

  It seems to me that we have two alternatives:

    1.. Use a precision of "0" to indicate that precision is missing. 
    2.. Put in metadata without dataTable. 
  Perhaps the problem with precision is that different people are interpreting precision differently. 

  The eml documentation states: 
  <doc:description>The precision element represents the precision
          of the measurement, in the same unit as the measurement. For
          example, for an attribute with unit "meter", a precision of "0.1"
          would be interpreted as precise to the nearest 1/10th of a
          meter, and a precision of "1" would be interpreted as precise
          to the nearest 1 meter.
  </doc:description>

  This description does not help since it does not defiine precision, but rather assumes that you know what precison means.  I remember that we discissed the definition, but I cannot remember what definition we decided to use.

  Some definitions:
  b. The number of significant digits to which a value has been reliably measured.


  precision: 1. The degree of mutual agreement among a series of individual measurements, values, or results; often, but not necessarily, expressed by the standard deviation. 2. With respect to a set of independent devices of the same design, the ability of these devices to produce the same value or result, given the same input conditions and operating in the same environment. 3. With respect to a single device, put into operation repeatedly without adjustments, the ability to produce the same value or result, given the same input conditions and operating in the same environment. Synonym (for defs. 1, 2, and 3) reproducibility. 4. In computer science, a measure of the ability to distinguish between nearly equal values. (188) 5. The degree of discrimination with which a quantity is stated; for example, a three-digit numeral to the base 10 discriminates among 1000 possibilities. 

  <mathematics> The number of decimal places to which a number
  is computed.


  What concept are we trying to capture?


  Can the precision be simply a statement of the number of decimal points in the data, e.g. unit = meter
  DATA
  1.75
  10.6
  11.765


  Can we say that the precision is .001 without knowing anything about the source of the data?


  Or are we making a statement about the number of significant digits, for example, a data logger can record 4 digits, e.g.


  The following can be recorded:


  12.75
  127.5
  1.275
  1275


  but NOT 127.53


  Is the precision here also .001?


  If the data is derived data, is the precsion depenmdent on the precision of the original data, e.g. an instrument can only discriminate to .1 meter, but the data involves some statistical operation and the data is reported with additional decimal places.


  unit = meter

  Original Data


  12.1
  11.5
  26.4


  Reported/Derived DATA
  11.75
  10.6
  21.765

  Is the precision 0.1 or 0.001?


  David





-- 
David Blankman
EML Integration Developer
LTER Network Office
801 University, SE #104
Albuquerque, NM 87106
(505) 272-7346
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20030801/89f418f2/attachment.htm


More information about the Eml-dev mailing list