[kepler-dev] Bad dataset from workshop. Re: wf

Matt Jones jones at nceas.ucsb.edu
Fri Jan 13 09:10:21 PST 2006


Hi Kevin,

Kevin Ruland wrote:
> I don't understand how there can be a variable number of delimiters 
> between tokens.  In a delimited file, two adjacent delimiters would be 
> interpreted as a "missing value" in that particular column.  

In EML, the 'collapseDelimiters' field is used to indicate what to do 
when multiple adjacent delimiters are encountered.  If 
collapseDelimiters is set, then multiple adjacent delimiters are 
collapsed to a single delimiter.  This is common practice in some types 
of delimited files, particularly space delimited files as used by SAS 
and other analytical programs.  If collapseDelimiters is not set, then 
the default bahavior is as you describe -- assume each adjacent 
delimeter represents a column with a missing value.

The relevant sections of EML for describing this stuff are in one of the 
three subtrees:

/eml/dataset/dataTable/physical/dataFormat/textFormat/simpleDelimited
/eml/dataset/dataTable/physical/dataFormat/textFormat/complex/textDelimited
/eml/dataset/dataTable/physical/dataFormat/textFormat/complex/textFixed

Matt

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matt Jones                                   Ph: 907-789-0496
jones at nceas.ucsb.edu                    SIP #: 1-747-626-7082
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara     http://www.nceas.ucsb.edu/ecoinformatics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


More information about the Kepler-dev mailing list