[kepler-dev] Bad dataset from workshop. Re: wf
Kevin Ruland
kruland at ku.edu
Fri Jan 13 09:40:10 PST 2006
Matt,
Excellent! We can reliably leverage this. Of course, the metadata for
this dataset is incorrect.
Kevin
Matt Jones wrote:
> Hi Kevin,
>
> Kevin Ruland wrote:
>
>> I don't understand how there can be a variable number of delimiters
>> between tokens. In a delimited file, two adjacent delimiters would
>> be interpreted as a "missing value" in that particular column.
>
>
> In EML, the 'collapseDelimiters' field is used to indicate what to do
> when multiple adjacent delimiters are encountered. If
> collapseDelimiters is set, then multiple adjacent delimiters are
> collapsed to a single delimiter. This is common practice in some
> types of delimited files, particularly space delimited files as used
> by SAS and other analytical programs. If collapseDelimiters is not
> set, then the default bahavior is as you describe -- assume each
> adjacent delimeter represents a column with a missing value.
>
> The relevant sections of EML for describing this stuff are in one of
> the three subtrees:
>
> /eml/dataset/dataTable/physical/dataFormat/textFormat/simpleDelimited
> /eml/dataset/dataTable/physical/dataFormat/textFormat/complex/textDelimited
>
> /eml/dataset/dataTable/physical/dataFormat/textFormat/complex/textFixed
>
> Matt
>
More information about the Kepler-dev
mailing list