[LTER-im] measurmentScale/precision - what definition? how to handle?

Mon Aug 4 09:28:08 PDT 2003

Yes, we get the same thing here. Data contributors are quite adamant that
they do NOT want any digits reported below the significant digit that they
have  chosen based on whatever criteria they have used to deterimine
"resolution". As you point out, thats rather hard to persist when you load
the data into some binary storage format that supports much higher
precision, so we should rely on the precision metadata to suppress those
digits when we ouput data. 

So whats the relationship between precision and accuracy to  be in EML? is
there any difference? according to the eml docs, precision is expressed as
an interval, not as a significant digit. This can be cofusing for people who
think only interms of a number of decimal places. Further more, if i have
reported a precision of .025, its not clear to me if i am avising the user
to round the data to that interval, or am i telling the user that the data
have been rounded to that interval? 

Accuracy can be either an interval or a text statement and can be applied to
all types of measurement scales. unlike accuracy, precision gives no place
to identify the basis for that assessment - presumably this is covered in a
data quality statement in the methods. (have you ever filled out a methods
statement at the attribute level yet?!!!). IF i have a nominal variable, i
can use accuracy to convey some assessment of the classification accuracy (
eg i could say 86% correct based on classification of known cases). If i
have a measurement of concentration of some chemical, then it seems that if
we use precision as we agree most scientists understand it, my value for
precision and accuracy should be the same. Can any one think of a case where
they would be different?

-----Original Message-----
From: Wade Sheldon [mailto:sheldon at uga.edu]
Sent: Sunday, August 03, 2003 8:15 AM
To: Peter McCartney; dblankman at lternet.edu; Matt Jones
Cc: im at lternet.edu; eml-dev at ecoinformatics.org
Subject: Re: [LTER-im] measurmentScale/precision - what definition? how
tohandle?

Peter,

Thanks for the perspective. In response to your point:

"What im hearing, however, is the use of precision as a means of conveying
accuracy by stating the interval (or significant digit, depnding on your
definition) that spans the perceived error. Implicit in this perspective is
the expectation that the data have been truncated or rounded according to
that precision."

That's correct, and in my experience that process is typically carried out
prior to data submission by investigators or is intrinsic to the data
logging or post-processing routines. The major exception is calculation of
secondary/derived attributes after data submission, and in those cases we
report precision based on the significant digits of the primary attributes
used for the calculation

--Wade Sheldon

----- Original Message ----- 

From: Peter  <mailto:peter.mccartney at asu.edu> McCartney 
To: 'Wade Sheldon' <mailto:sheldon at uga.edu>  ; dblankman at lternet.edu
<mailto:dblankman at lternet.edu>  ; Matt Jones <mailto:jones at nceas.ucsb.edu>  
Cc: im at lternet.edu <mailto:im at lternet.edu>  ; eml-dev at ecoinformatics.org
<mailto:eml-dev at ecoinformatics.org>  
Sent: Friday, August 01, 2003 1:53 PM
Subject: RE: [LTER-im] measurmentScale/precision - what definition? how
tohandle?

My impression is that these debates over precision involve people looking at
essentially the same beast from different perspectives. To clear the record
- i didnt write the precision element, but i did contribute the measurement
accuracy element (from FGDC). My own personal understanding of the
difference between them was that precision merely identified the recorded
resolution of the data ("values represent meters to the nearest 100th"),
corresponding to FGDC 5.1.2.4.2.4.  Attribute Resolution. where accuracy
reflected some assessment of the likelyhood that the reported value
corresponds to the actual value (usually determined through some statistical
test either on the acutal data stream or on some calibration data stream -
FGDC 5.1.2.7). What im hearing, however, is the use of precision as a means
of conveying accuracy by stating the interval (or significant digit,
depnding on your definition) that spans the perceived error. Implicit in
this perspective is the expectation that the data have been truncated or
rounded according to that precision. 

In reading the description for the precision element, i can see how Wade
would arrive at the conclusion that this latter description is the intended
use. According to my understanding, precision is merely a qualifier to units
to show the lowest increment that values are reported and that everything
being debated here should be focused on Accuracy rather than Precision.  

on the one hand, it could be seen as pointless to release data to three
decimal places but state that they carry an error of 1.2. On the other hand,
i could see an argument for releasing data as they are and allowing the end
user to make their own adjustments according to the accuracy information
rather than rounding the data in advance.  

However we go, its obivous that we need to re write the definiation of
precision since, as David points out, its doesnt define the term precision.
- is it significant digits or an iterval? and does that refer only to the
mimumum reported digit or interval or is it a statement of accuracy? 

Peter McCartney ( peter.mccartney at asu.edu <mailto:peter.mccartney at asu.edu> )
Center for Environmental-Studies
Arizona State University

-----Original Message-----
From: Wade Sheldon [mailto:sheldon at uga.edu] 
Sent: Friday, August 01, 2003 7:11 AM
To: dblankman at lternet.edu; Matt Jones
Cc: im at lternet.edu; eml-dev at ecoinformatics.org
Subject: Re: [LTER-im] measurmentScale/precision - what definition? how to
handle?

David and all,

This is an important point to nail down, because it has bearings on both
statistical analysis and display of data set values by eml-savvy software
(i.e. when the data are stored in an RDBMS field or program variable using a
single or double-precision floating point storage type that supports
arbitrary scale and precision).

In my experience, most researchers use "precision" to reflect the number of
significant decimal places to display based on the stated or perceived
accuracy of the analytical procedure, or instrument readability if that
information is not known. In other words this is used as a surrogate for
significant digits, which is generally a more accurate way of conveying this
information but poorly supported in most computational software (i.e.
without resorting to scientific notation). 

When I read the eml spec I interpreted "precision" to be what I more
commonly see described as "accuracy", or the smallest difference between two
measurements that can be resolved using the stated analytical method. This
is closely related to the significant digits concept but allows values that
are not even powers of 10 (e.g. .005).

At GCE we store precision information for all numerical attributes in data
sets as integers indicating the number of significant decimal points to
display (i.e. our approach is most consistent with your mathematics
definition below). This value is based on the accuracy/readability reported
by the investigators on metadata forms, or is determined by instrument
specifications or value inspection if the investigator didn't provide the
information and couldn't be contacted. For data that span many orders of
magnitude (e.g. bacterial abundances ranging from 10^4 to 10^8) we use an
exponential data storage type and report precision as significant digits.
This precision information is used to generate input masks for data editing
forms and output format commands when data sets are exported in ASCII
format. It is also used to (optionally) round or truncate values following
calculations of derived attributes to remove spurious trailing decimal
places. To support eml precision I am just using the inverse power of 10 of
my precision values (i.e. 10^-x, so GCE precision = 2 becomes eml precision
= .01), and software writers will presumably have to reverse this process
(using common logs and rounding) when integer decimal place tokens are
needed for formatted output statement arguments.

I am interested to hear other comments on this, but in the absence of
reported precision I think using 0 would be worse than nothing as it could
definitely lead to inappropriate data handling and analysis. I think the
only legitimate way to "fudge" precision in the absence of contributor
feedback is value inspection for flat files (i.e. look up maximum number of
digits past the decimal point) or maximum number of "used" decimal places
for RDBMS entries. It appears to me that precision and units-dictionary
compliance are clearly going to be the make-or-break issues in the decision
to provide attribute-level metadata for legacy data sets, and where the most
effort and resources will be required.

Wade Sheldon
GCE-LTER Information Manager

----- Original Message ----- 

From: David  <mailto:dblankman at lternet.edu> Blankman 
To: Matt Jones <mailto:jones at nceas.ucsb.edu>  
Cc: im at lternet.edu <mailto:im at lternet.edu>  ; eml-dev at ecoinformatics.org
<mailto:eml-dev at ecoinformatics.org>  
Sent: Thursday, July 31, 2003 9:38 PM
Subject: [LTER-im] measurmentScale/precision - what definition? how to
handle?

Matt & IMs & EML-Dev

How to Handle Missing Precision Information
Most of the metadata files that I have been working with and most of those
from sites like NTL do not have precision information. While XML Spy seems
to validate empty elements, the EML Validator service does a better job and
will not allow empty elements.

Because many, if not most, of the LTER Information Managers have told me
that they need to check with researchers to get precision informaton, it may
be some time before we are able to get precision information. 

Initially I thought that we could handle precision by just using empty
elements but that seems not possible.

It seems to me that we have two alternatives:

1.	Use a precision of "0" to indicate that precision is missing. 

2.	Put in metadata without dataTable. 

Perhaps the problem with precision is that different people are interpreting
precision differently. 

The eml documentation states: 
<doc:description>The precision element represents the precision
        of the measurement, in the same unit as the measurement. For
        example, for an attribute with unit "meter", a precision of "0.1"
        would be interpreted as precise to the nearest 1/10th of a
        meter, and a precision of "1" would be interpreted as precise
        to the nearest 1 meter.
</doc:description>

This description does not help since it does not defiine precision, but
rather assumes that you know what precison means.  I remember that we
discissed the definition, but I cannot remember what definition we decided
to use.

Some definitions:
b. The number of significant digits to which a value has been reliably
measured.

precision: 1. The degree of mutual agreement among a series of individual
measurements, values, or results; often, but not necessarily, expressed by
the standard deviation. 2. With respect to a set
<http://glossary.its.bldrdoc.gov/fs-1037/dir-033/_4806.htm>  of independent
devices of the same design, the ability of these devices to produce the same
value or result, given the same input
<http://glossary.its.bldrdoc.gov/fs-1037/dir-019/_2740.htm>  conditions and
operating in the same environment. 3. With respect to a single device, put
into operation <http://glossary.its.bldrdoc.gov/fs-1037/dir-025/_3691.htm>
repeatedly without adjustments, the ability to produce the same value or
result, given the same input conditions and operating in the same
environment. Synonym (for defs. 1, 2, and 3) reproducibility. 4. In computer
<http://glossary.its.bldrdoc.gov/fs-1037/dir-008/_1196.htm> science, a
measure of the ability to distinguish between nearly equal values. ( 188
<http://glossary.its.bldrdoc.gov/fs-1037/dir-001/_0063.htm#188> ) 5. The
degree of discrimination with which a quantity is stated; for example, a
three- digit <http://glossary.its.bldrdoc.gov/fs-1037/dir-011/_1632.htm>
numeral to the base 10 discriminates among 1000 possibilities. 

<mathematics> The number of decimal places to which a number
is computed.

What concept are we trying to capture?

Can the precision be simply a statement of the number of decimal points in
the data, e.g. unit = meter
DATA
1.75
10.6
11.765

Can we say that the precision is .001 without knowing anything about the
source of the data?

Or are we making a statement about the number of significant digits, for
example, a data logger can record 4 digits, e.g.

The following can be recorded:

12.75
127.5
1.275
1275

but NOT 127.53

Is the precision here also .001?

If the data is derived data, is the precsion depenmdent on the precision of
the original data, e.g. an instrument can only discriminate to .1 meter, but
the data involves some statistical operation and the data is reported with
additional decimal places.

unit = meter

Original Data

12.1
11.5
26.4

Reported/Derived DATA
11.75
10.6
21.765

Is the precision 0.1 or 0.001?

David

-- 

David Blankman

EML Integration Developer

LTER Network Office

801 University, SE #104

Albuquerque, NM 87106

(505) 272-7346

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20030804/e6fa8a85/attachment.htm