[kepler-dev] Bad dataset from workshop. Re: wf

Kevin Ruland kruland at ku.edu
Fri Jan 13 09:40:10 PST 2006


Excellent!  We can reliably leverage this.  Of course, the metadata for 
this dataset is incorrect.


Matt Jones wrote:

> Hi Kevin,
> Kevin Ruland wrote:
>> I don't understand how there can be a variable number of delimiters 
>> between tokens.  In a delimited file, two adjacent delimiters would 
>> be interpreted as a "missing value" in that particular column.  
> In EML, the 'collapseDelimiters' field is used to indicate what to do 
> when multiple adjacent delimiters are encountered.  If 
> collapseDelimiters is set, then multiple adjacent delimiters are 
> collapsed to a single delimiter.  This is common practice in some 
> types of delimited files, particularly space delimited files as used 
> by SAS and other analytical programs.  If collapseDelimiters is not 
> set, then the default bahavior is as you describe -- assume each 
> adjacent delimeter represents a column with a missing value.
> The relevant sections of EML for describing this stuff are in one of 
> the three subtrees:
> /eml/dataset/dataTable/physical/dataFormat/textFormat/simpleDelimited
> /eml/dataset/dataTable/physical/dataFormat/textFormat/complex/textDelimited 
> /eml/dataset/dataTable/physical/dataFormat/textFormat/complex/textFixed
> Matt

