[kepler-dev] [Bug 2712] New: - Problem with EML2DataSource with extra cols in csv file

Sun Dec 31 13:27:11 PST 2006

http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2712

           Summary: Problem with EML2DataSource with extra cols in csv file
           Product: Kepler
           Version: 1.0.0beta2
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: actors
        AssignedTo: tao at nceas.ucsb.edu
        ReportedBy: higgins at nceas.ucsb.edu
         QAContact: kepler-dev at ecoinformatics.org

This problem can be seen by seaching for 'biomass'. First result is "1999
Sevilleta NPP Quadrat Sampling Data". Drag onto canvas and configure Data
Output Format to return 'As Column Vector'. If you make a SDF workflow and try
to display any of the columns to a Display actor the message

"Metadata sees data has 12columns but actually data has 13columns. Please make
sure metadata is correct!"

Most of the rows in the table do have only 12 comma-separated columns. However,
there are a few rows that have some additional comma-separated comments AFTER
the 12th column value! This apparently causes a parsing failure.

I suggest that the parser should be modified to ignore any addition data beyond
the last column. This would allow additional comments to the right of actual
data columns. Note that R does this when it parses dataframes. Morpho also will
display the data in this dataset without 'choking' on additional data off the
right in some rows.  (Dan Higgins)