[kepler-dev] [Bug 2050] - EMLDataSource output as Ptolemy records
Shawn Bowers
sbowers at ucdavis.edu
Thu Apr 7 16:16:17 PDT 2005
Matt Jones wrote:
> Shawn,
>
> The EML actor supports only flat views of a data source -- if the source
> has multiple tables, then they have to be joined using a SQL query in
> the QueryBuilder to select the attributes to emit on the ports and the
> join condition. Without this its not clear how you would output
> attributes from different tables. In the case where there is only one
> table in the dataset, the default query exposes all attributes.
It's obvious: have two structured ports ;-)
> I agree the tuple-at-a-time output is desirable in some situations, but
> I also think a vector of all values for a given attribute will also be
> useful at times. Rather than having to plumb these up as addition
> processing steps, we have been building in multiple output options to
> the EML actor, which I think is easier for the user to manipulate,
> although we need to do some work on the UI for switching between output
> modes to make it clear what is happening.
Also, you need to verify that the two choices result in equivalent
semantics in terms of the execution of the workflow. (I'm not convinced
yet, but haven't looked deeply into the directors, etc.)
shawn
> Matt
>
> Shawn Bowers wrote:
>
>> Dan Higgins wrote:
>>
>>> Shawn,
>>> I am not sure that it is 'incorrect' to have one port per
>>> attribute, but I do see your point. I think you are just recommending
>>> the second option I suggested (the record/column array) ? Options
>>> for either seem useful to me. The column based ports allow the
>>> selection of specific columns from a table when all the columns are
>>> not needed.
>>
>>
>>
>> Sorry, I didn't really understand what was meant by the record/column
>> array option listed in the bug ... It wasn't clear to me what that
>> was; I didn't realize you meant that a tuple is output on a single port.
>>
>> It seems like there are potentially other problems with one port per
>> attribute. For example, what if the EML file contains multiple data
>> sets? I also assume that when one selects just one port p, the
>> operation is 'select p from r' as opposed to 'select distinct p from
>> r' (the latter is probably more desirable). This brings up another
>> question: does the approach limit (or semantically change) the types
>> of queries that can be posed on the data eminiating from the actor?
>> (For example, can one still do self-joins, aggregrates, group-by's,
>> havings, and so on?)
>>
>> shawn
>>
>>
>>> Shawn Bowers wrote:
>>>
>>>
>>>> bugzilla-daemon at ecoinformatics.org wrote:
>>>>
>>>>
>>>>
>>>>> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2050
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------- Additional Comments From sbowers at ucdavis.edu 2005-04-07
>>>>> 09:33 -------
>>>>> I don't think it is correct to have one port per attribute. This
>>>>> approach
>>>>> looses the information that the ports are actually dependent. This
>>>>> assumes a
>>>>> particular domain functionality (i.e., that the director knows the
>>>>> dependency);
>>>>> but the constraint cannot be captured in Ptolemy's constraint
>>>>> language.
>>>>>
>>>>>> From a modeling perspective, it is more appropriate in Ptolemy to
>>>>>> use a single
>>>>>
>>>>>
>>>>> port that outputs a tuple (i.e., a record). Of course, one could
>>>>> always connect
>>>>> an array deconstructor after the data set if desired. This approach
>>>>> (of
>>>>>
>>>>
>>>>
>>>>
>>>> Sorry: I mean "record dissasembler" not array deconstructor :-)
>>>>
>>>>
>>>>
>>>>
>>>>> outputing tuples instead of individual values) follows more closely
>>>>> the standard
>>>>> approach used in database systems and makes collection-oriented
>>>>> dataflows/programming much easier.
>>>>> _______________________________________________
>>>>> Kepler-dev mailing list
>>>>> Kepler-dev at ecoinformatics.org
>>>>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Kepler-dev mailing list
>>>> Kepler-dev at ecoinformatics.org
>>>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>>>>
>>>>
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Kepler-dev mailing list
>> Kepler-dev at ecoinformatics.org
>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-dev
>
>
More information about the Kepler-dev
mailing list