[kepler-dev] seek/kepler conference call notes
Shawn Bowers
bowers at sdsc.edu
Mon Aug 23 15:29:58 PDT 2004
Edward, thanks for the responses. My comments are interspersed.
Edward A Lee wrote:
> At 11:39 AM 8/23/2004 -0700, Shawn Bowers wrote:
>
>> I guess my biggest thing when it comes to semantically annotating MoML
>> directly is the following.
>>
>> There are two main uses that have been proposed in SEEK for semantic
>> annotations on actors/workflows: (1) to search for and discover actors
>> via their semantic markup and (2) to use the semantic markup to help
>> compose heterogeneous actors (i.e., actors that don't have compatible
>> input/output types).
>>
>> For (1), if we were to store the annotations directly in MoML, then
>> searching based on semantic types would require obtaining, loading,
>> and parsing *every* MoML file that describes/represents an
>> actor/workflow. (Note also that the majority of time parsing MoML in
>> this case would be spent on non-semantic markup.) We could
>> alternatively create some form of semantic-annotation "index." But
>> then it seems the index would end up just being the proposed external
>> semantic annotation.
>
>
> Hmm... First, It seems that if you keep the annotations together with the
> actor/workflows, then _any_ scheme requires some level of parsing
> the file, regardless of what the format is.
I was thinking we would basically have some way to uniquely identify an
actor or workflow, e.g., through LSIDs or some such mechanism. Then, we
would externally (i.e., in some index) associate the id with the
semantic annotation. Thus, the annotation wouldn't be stored with the
actor or workflow, but in some other location.
> Second, the parsing could be greatly speeded up if, for example,
> you first searched the MoML file for a full class name that matched
> the class used to specify the semantic markup.
In general, a semantic annotation isn't a class name, and could be as
complex as a query.
> Third (much more interestingly), if you use a custom class (subclass
> of Attribute) to specify semantic markup, then you could have actors
> that transparently (in the background) update a dynamically maintained
> peer-to-peer index of actors/workflows ... a Napster of actors.
> Yang Zhao has, in fact, prototyped a mechanism like this... This
> index could be updated whenever an instance of this custom attribute
> is instantiated, for example.
I was thinking that the index for semantic annotations might be stored
in a separate location through the EcoGrid, which I suppose acts sort of
like a P2P framework. Actually, using P2P technology for Ptolemy sounds
very sexy, and perhaps the EcoGrid folks too have thought about this.
>
> I think that if you have a separate file that is the semantic markup,
> separately maintained from the actor/workflow source, then keeping
> the two consistent will be very challenging...
>
>
>> A smaller issue is that we would need to change/extend the existing
>> parser for MoML. Whereas using external semantic annotations, we could
>> build our own, and possibly keep this parser at the location of the
>> semantic annotations (e.g., on the "EcoGrid"). It also seems that
>> building our own parser for this is probably just quicker and easier
>> to get going.
>
>
> I don't see why you would need to extend the parser for MoML (?).
> Can you give me an example of what it might look like? I'll show
> then how it can be done in MoML with no changes to the parser
> (hopefully)...
>
I guess I was thinking that you would have to at least extend the parser
to extract the semantic annotation part and forward it to some other
service to start up annotation-handler code.
There are two types of annotations for an actor. The simplest form is to
just say that an actor represents some instance of a particular ontology
class (e.g., EcoNicheModel). This type of annotation could also be made
more specific, e.g., by giving more details on how it instantiates a
class (which by the way, essentially creates a new class "on-the-fly").
For example, we might say it is an instance of an EcoNicheModel that
uses a specific type of LogisticRegressionModel, etc. An actor might
also be an instance of multiple ontological concepts (its an instance of
an EcoNicheModel and something else ... which doesn't really fit this
example, but anyway).
For input/output types, semantic annotations are similar to the unit
stuff, but generally resemble views (queries). Mainly because we need
finer levels of detail. For example, an actor might take as input a list
of records:
[{lt1=double, lt2=double, ln1=double, ln2=double, s=int, n=int}]
and the input annotation might be (where here record attributes denote
values from the same tuple in the list):
Community(C), LatLonPt(NW), lat(NW, lt1), lon(NW, ln1), LatLonPt(SE),
lat(SE, lt2), lon(SE, ln2), BBox(B), nwCorner(B, NW), seCorner(B, SE),
communityLocation(C, B), communitySpeciesCount(C, s),
communityPopulation(C, n)
>> Perhaps there are other ways around these issues for MoML that I am
>> unaware of. But generally speaking, it seems like it is reasonable to
>> store "extra" information in MoML so long as it isn't used as the
>> mechanism to query and search for actors.
>
> I'm thinking that the information should be in MoML, but that your
> annotation classes could generate rapidly searchable indexes. This way,
> the "source" is all together, but the search is quick...
At one point I was calling the result of computing what you call the
"annotation class" the "context" of the annotation. In general, it is an
overestimate (generalization) of the annotation. It seems reasonable to
compute such a thing to speed up searching either for MoML-embedded or
external annotations. It sounds reasonable to use it as an index
mechanism as well, which would allow MoML to include the actual
annotation and not bog down search too much.
shawn
>
> Edward
>
>
>
> ------------
> Edward A. Lee, Professor
> 518 Cory Hall, UC Berkeley, Berkeley, CA 94720
> phone: 510-642-0455, fax: 510-642-2739
> eal at eecs.Berkeley.EDU, http://ptolemy.eecs.berkeley.edu/~eal
More information about the Kepler-dev
mailing list