[kepler-dev] Bad dataset from workshop. Re: wf
Matt Jones
jones at nceas.ucsb.edu
Fri Jan 13 09:10:21 PST 2006
Hi Kevin,
Kevin Ruland wrote:
> I don't understand how there can be a variable number of delimiters
> between tokens. In a delimited file, two adjacent delimiters would be
> interpreted as a "missing value" in that particular column.
In EML, the 'collapseDelimiters' field is used to indicate what to do
when multiple adjacent delimiters are encountered. If
collapseDelimiters is set, then multiple adjacent delimiters are
collapsed to a single delimiter. This is common practice in some types
of delimited files, particularly space delimited files as used by SAS
and other analytical programs. If collapseDelimiters is not set, then
the default bahavior is as you describe -- assume each adjacent
delimeter represents a column with a missing value.
The relevant sections of EML for describing this stuff are in one of the
three subtrees:
/eml/dataset/dataTable/physical/dataFormat/textFormat/simpleDelimited
/eml/dataset/dataTable/physical/dataFormat/textFormat/complex/textDelimited
/eml/dataset/dataTable/physical/dataFormat/textFormat/complex/textFixed
Matt
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matt Jones Ph: 907-789-0496
jones at nceas.ucsb.edu SIP #: 1-747-626-7082
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara http://www.nceas.ucsb.edu/ecoinformatics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
More information about the Kepler-dev
mailing list