[kepler-dev] Bad dataset from workshop. Re: wf

Kevin Ruland kruland at ku.edu
Fri Jan 13 07:44:14 PST 2006


Chad, et al.

I looked at this.  It appears the problem is the metadata for this
dataset does not accurately describe its structure.

The metadata states:

            <physical scope="document">
                <objectName>w6-pcp.txt</objectName>
                <characterEncoding>ASCII</characterEncoding>
                <dataFormat>
                    <textFormat>
                        <recordDelimiter>\n\r</recordDelimiter>
                        <attributeOrientation>column</attributeOrientation>
                        <simpleDelimited>
                            <fieldDelimiter>0x20</fieldDelimiter>
                        </simpleDelimited>
                    </textFormat>
                </dataFormat>
                <distribution scope="document">
                    <online>
                        <url
function="download">http://www.hubbardbrook.org/research/data/atmos/pcp_chem/w6-pcp.txt</url>
                    </online>
                </distribution>
            </physical>

Note the following:

The recordDelimiter is \n\r.
The fieldDelimiter is 0x20 (space).
It does not have a node physical/dataFormat/textFormat/numHeaderLines
which we interpret as meaning there are no header records.

Here are the first two records of the dataset:

    ws  year  mo    precip      Ca      Mg       K      Na      Al    
NH4      pH     SO4     NO3      Cl     PO4   
Sio2                                             
     6  1963   6    67.500   0.300   0.070   0.100   0.070  -3.000 
-3.000   -3.00  -3.000  -3.000  -3.000 -3.0000  -3.000

And it's hexdump:

0000000: 2020 2020 7773 2020 7965 6172 2020 6d6f      ws  year  mo
0000010: 2020 2020 7072 6563 6970 2020 2020 2020      precip     
0000020: 4361 2020 2020 2020 4d67 2020 2020 2020  Ca      Mg     
0000030: 204b 2020 2020 2020 4e61 2020 2020 2020   K      Na     
0000040: 416c 2020 2020 204e 4834 2020 2020 2020  Al     NH4     
0000050: 7048 2020 2020 2053 4f34 2020 2020 204e  pH     SO4     N
0000060: 4f33 2020 2020 2020 436c 2020 2020 2050  O3      Cl     P
0000070: 4f34 2020 2020 5369 6f32 2020 2020 2020  O4    Sio2     
0000080: 2020 2020 2020 2020 2020 2020 2020 2020                 
0000090: 2020 2020 0909 0920 2020 2009 0a20 2020      ...    ..  
00000a0: 2020 3620 2031 3936 3320 2020 3620 2020    6  1963   6  
00000b0: 2036 372e 3530 3020 2020 302e 3330 3020   67.500   0.300
00000c0: 2020 302e 3037 3020 2020 302e 3130 3020    0.070   0.100
00000d0: 2020 302e 3037 3020 202d 332e 3030 3020    0.070  -3.000
00000e0: 202d 332e 3030 3020 2020 2d33 2e30 3020   -3.000   -3.00
00000f0: 202d 332e 3030 3020 202d 332e 3030 3020   -3.000  -3.000
0000100: 202d 332e 3030 3020 2d33 2e30 3030 3020   -3.000 -3.0000
0000110: 202d 332e 3030 300a 2020 2020 2036 2020   -3.000.     6 

The top record is the header.  The metadata did not state it had a header.

Note there are 4 spaces before 'ws' then 2 spaces before 'year'.  There
is a bunch of white space padding (0x20 and 0x09 tabs).  Also the line
terminator for this record is 0x0a (nl) whereas the metadata states the
line terminator is \n\r which should be (0x0a 0x0d).

The second record is the first record of data.  It's structure is:  5
spaces then '6'

We are using HSQL's Text Table functionality.  This essentially binds a
ddl definition in the database to an external data file (the text
file).  The functionality provided by hsql is not flexible enough to
handle regular expressions for the column seperators (which would be
required in this instance).  It is also not possible to parse a fixed
format text file (which this particular file appears to be).

Essentially there are two problems with this dataset:  The metadata does
not adequately describe the data format, and, the hsql functionality
does not support this type of text file structure.

I'm not exactly certain what we can do here.  I don't know if eml can
describe a fixed format file.  And even if it did, we'd have to do some
work to parse the fixed format file and insert into the database table.

Kevin

Chad Berkley wrote:

><?xml version="1.0" standalone="no"?>
><!DOCTYPE entity PUBLIC "-//UC Berkeley//DTD MoML 1//EN"
>    "http://ptolemy.eecs.berkeley.edu/xml/dtd/MoML_1.dtd">
><entity name="model" class="ptolemy.actor.TypedCompositeActor">
>    <property name="_createdBy" class="ptolemy.kernel.attributes.VersionAttribute" value="5.1-alpha">
>    </property>
>    <property name="SDF Director" class="ptolemy.domains.sdf.kernel.SDFDirector">
>        <property name="_svgIcon" class="ptolemy.kernel.util.ConfigurableAttribute">
>            <configure>../kepler-docs/dev/usability/graphics/svg/director.svg</configure>
>        </property>
>        <property name="_thumbnailRasterIcon" class="ptolemy.kernel.util.ConfigurableAttribute">
>            <configure>/actorthumbs/director-sm.gif</configure>
>        </property>
>        <property name="timeResolution" class="ptolemy.moml.SharedParameter" value="1E-10">
>        </property>
>        <property name="entityId" class="org.kepler.moml.NamedObjId" value="urn:lsid:kepler-project.org:director:1:1">
>        </property>
>        <property name="class" class="ptolemy.kernel.util.StringAttribute" value="ptolemy.domains.sdf.kernel.SDFDirector">
>            <property name="id" class="ptolemy.kernel.util.StringAttribute" value="urn:lsid:kepler-project.org:directorclass:1:1">
>            </property>
>        </property>
>        <property name="semanticType000" class="org.kepler.sms.SemanticType" value="urn:lsid:localhost:onto:1:1#Director">
>        </property>
>        <property name="_location" class="ptolemy.kernel.util.Location" value="{160, 110}">
>        </property>
>    </property>
>    <property name="_windowProperties" class="ptolemy.actor.gui.WindowPropertiesAttribute" value="{bounds={82, 22, 850, 732}, maximized=false}">
>    </property>
>    <property name="_vergilSize" class="ptolemy.actor.gui.SizeAttribute" value="[590, 610]">
>    </property>
>    <property name="_vergilZoomFactor" class="ptolemy.data.expr.ExpertParameter" value="1.0">
>    </property>
>    <property name="_vergilCenter" class="ptolemy.data.expr.ExpertParameter" value="{295.0, 305.0}">
>    </property>
>    <entity name="Display" class="ptolemy.actor.lib.gui.Display">
>        <property name="_svgIcon" class="ptolemy.kernel.util.ConfigurableAttribute">
>            <configure>../kepler-docs/dev/usability/graphics/svg/text_disp.svg</configure>
>        </property>
>        <property name="_thumbnailRasterIcon" class="ptolemy.kernel.util.ConfigurableAttribute">
>            <configure>/actorthumbs/text_disp-sm.gif</configure>
>        </property>
>        <property name="rowsDisplayed" class="ptolemy.data.expr.Parameter" value="10">
>        </property>
>        <property name="columnsDisplayed" class="ptolemy.data.expr.Parameter" value="40">
>        </property>
>        <property name="suppressBlankLines" class="ptolemy.data.expr.Parameter" value="false">
>        </property>
>        <property name="_windowProperties" class="ptolemy.actor.gui.WindowPropertiesAttribute" value="{bounds={270, 279, 484, 209}, maximized=false}">
>        </property>
>        <property name="entityId" class="org.kepler.moml.NamedObjId" value="urn:lsid:kepler-project.org:actor:7:1">
>        </property>
>        <property name="class" class="ptolemy.kernel.util.StringAttribute" value="ptolemy.actor.lib.gui.Display">
>            <property name="id" class="ptolemy.kernel.util.StringAttribute" value="urn:lsid:kepler-project.org:class:883:1">
>            </property>
>        </property>
>        <property name="semanticType000" class="org.kepler.sms.SemanticType" value="urn:lsid:localhost:onto:1:1#TextualOutputActor">
>        </property>
>        <property name="_location" class="ptolemy.kernel.util.Location" value="{310, 345}">
>        </property>
>    </entity>
>    <entity name="Chemistry of Bulk Precipitation at HBEF WS-2" class="org.ecoinformatics.seek.datasource.eml.eml2.Eml200DataSource">
>        <property name="_svgIcon" class="ptolemy.kernel.util.ConfigurableAttribute">
>            <configure>../kepler-docs/dev/usability/graphics/svg/dataFile.svg</configure>
>        </property>
>        <property name="_thumbnailRasterIcon" class="ptolemy.kernel.util.ConfigurableAttribute">
>            <configure>/actorthumbs/dataFile-sm.gif</configure>
>        </property>
>        <property name="schemaDef" class="ptolemy.kernel.util.StringAttribute" value="&lt;schema&gt;&#10;  &lt;table name=&quot;w6-pcp&quot;&gt;&#10;    &lt;field name=&quot;ws&quot; dataType=&quot;STRING&quot;/&gt;&#10;    &lt;field name=&quot;year&quot; dataType=&quot;STRING&quot;/&gt;&#10;    &lt;field name=&quot;mo&quot; dataType=&quot;STRING&quot;/&gt;&#10;    &lt;field name=&quot;precip&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;    &lt;field name=&quot;Ca&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;    &lt;field name=&quot;Mg&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;    &lt;field name=&quot;K&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;    &lt;field name=&quot;Na&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;    &lt;field name=&quot;Al&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;    &lt;field name=&quot;NH4&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;    &lt;field name=&quot;pH&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;    &lt;field name=&quot;SO4&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;    &lt;field name=&quot;NO3&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;    &lt;field name=&quot;Cl&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;    &lt;field name=&quot;PO4&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;    &lt;field name=&quot;SiO2&quot; dataType=&quot;FLOAT&quot;/&gt;&#10;  &lt;/table&gt;&#10;&lt;/schema&gt;&#10;">
>            <property name="schemaDef" class="ptolemy.actor.gui.style.TextStyle">
>                <property name="height" class="ptolemy.data.expr.Parameter" value="10">
>                </property>
>                <property name="width" class="ptolemy.data.expr.Parameter" value="30">
>                </property>
>            </property>
>        </property>
>        <property name="sqlDef" class="ptolemy.kernel.util.StringAttribute">
>            <property name="sqlDef" class="ptolemy.actor.gui.style.TextStyle">
>                <property name="height" class="ptolemy.data.expr.Parameter" value="10">
>                </property>
>                <property name="width" class="ptolemy.data.expr.Parameter" value="30">
>                </property>
>            </property>
>        </property>
>        <property name="Selected Entity" class="ptolemy.data.expr.StringParameter" value="w6-pcp">
>        </property>
>        <property name="outputType" class="ptolemy.data.expr.StringParameter" value="As Field">
>        </property>
>        <property name="_tableauFactory" class="org.kepler.objectmanager.data.db.QBTableauFactory">
>            <property name="sqlName" class="ptolemy.kernel.util.StringAttribute" value="sqlDef">
>            </property>
>            <property name="schemaName" class="ptolemy.kernel.util.StringAttribute" value="schemaDef">
>            </property>
>        </property>
>        <property name="recordid" class="ptolemy.kernel.util.StringAttribute" value="knb-lter-hbr.20.1">
>        </property>
>        <property name="endpoint" class="ptolemy.kernel.util.StringAttribute" value="http://ecogrid.ecoinformatics.org/knb/services/EcoGridQuery">
>        </property>
>        <property name="namespace" class="ptolemy.kernel.util.StringAttribute" value="eml://ecoinformatics.org/eml-2.0.0">
>        </property>
>        <property name="w6-pcp" class="org.ecoinformatics.seek.ecogrid.ResultRecordDetail">
>        </property>
>        <property name="_location" class="ptolemy.kernel.util.Location" value="[140.0, 285.0]">
>        </property>
>        <port name="ws" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="year" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="mo" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="precip" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="Ca" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="Mg" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="K" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="Na" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="Al" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="NH4" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="pH" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="SO4" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="NO3" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="Cl" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="PO4" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>        <port name="SiO2" class="ptolemy.actor.TypedIOPort">
>            <property name="output"/>
>        </port>
>    </entity>
>    <relation name="relation" class="ptolemy.actor.TypedIORelation">
>        <property name="width" class="ptolemy.data.expr.Parameter" value="1">
>        </property>
>    </relation>
>    <link port="Display.input" relation="relation"/>
>    <link port="Chemistry of Bulk Precipitation at HBEF WS-2.pH" relation="relation"/>
></entity>
>  
>



More information about the Kepler-dev mailing list