[kepler-dev] Bad dataset from workshop. Re: wf

Fri Jan 13 09:40:10 PST 2006

Matt,

Excellent!  We can reliably leverage this.  Of course, the metadata for 
this dataset is incorrect.

Kevin

Matt Jones wrote:

> Hi Kevin,
>
> Kevin Ruland wrote:
>
>> I don't understand how there can be a variable number of delimiters 
>> between tokens.  In a delimited file, two adjacent delimiters would 
>> be interpreted as a "missing value" in that particular column.  
>
>
> In EML, the 'collapseDelimiters' field is used to indicate what to do 
> when multiple adjacent delimiters are encountered.  If 
> collapseDelimiters is set, then multiple adjacent delimiters are 
> collapsed to a single delimiter.  This is common practice in some 
> types of delimited files, particularly space delimited files as used 
> by SAS and other analytical programs.  If collapseDelimiters is not 
> set, then the default bahavior is as you describe -- assume each 
> adjacent delimeter represents a column with a missing value.
>
> The relevant sections of EML for describing this stuff are in one of 
> the three subtrees:
>
> /eml/dataset/dataTable/physical/dataFormat/textFormat/simpleDelimited
> /eml/dataset/dataTable/physical/dataFormat/textFormat/complex/textDelimited 
>
> /eml/dataset/dataTable/physical/dataFormat/textFormat/complex/textFixed
>
> Matt
>