[seek-dev] Re: GRID etc. questions from European LTER

Matt Jones jones at nceas.ucsb.edu
Tue Aug 24 11:10:54 PDT 2004


Hi Katharina,

I had some great discussions with Mandy Lane while I was in Edinburgh, 
and I am looking forward to building collaborations with our colleagues 
in Europe.  I was looking for the Alter-Net web site, but had difficulty 
finding it -- there's not really any mention of it at the CEH web site. 
  Can you point me to the right web site?

I'll give some brief answers to your questions below, but I assume 
you've seen both the SEEK web site (http://seek.ecoinformatics.org) and 
the Kepler site (http://kepler-project.org).  All of our code and tools 
are openly available through those sites.

Schleidt Katharina wrote:
> Hi Matt,
> 
> as already discussed with Mandy Lane (context: UK research-council 
> Funding for sister projects), we of the European ALTER-Net Project are 
> seeking cooperation with the American LTER and SEEK Communities. From 
> what we have been able discern from the various online publications we 
> have gotten the impression that you are the EcoGRID expert for the SEEK 
> project. When your name also fell in a recent meeting with 
> representatives of the AUSTRIAN GRID Initiative who are going to support 
> us in gaining an overview of available GRID Middleware components (Do 
> you happen to know Mr. Volkert and Mr. Kranzlmüller from Kepler 
> University in Linz, Austria?), we decided it was time to get in touch 
> with you personally!
> 
> As we are the ones in the ALTER-Net project who get to do the grunt work 
> (looking for the technology to be used, finding an ontology based on the 
> consensus of the European scientists, developing methods for semantic 
> mediation), we would like to take the opportunity to ask a few concrete 
> questions on subjects, we are working on:
> 
>     * What is the current status of EcoGRID?
EcoGrid is an evolving framework for allowing diverse data systems to 
interoperate.  Right now it is a small set of grid services (OGSA 
compliant) that allow a uniform syntax for expressing metadata queries, 
providing responses, and downloading and uploading data.  Ancillary 
services such as authentication and access control are part of the 
definitions as well.  The data objects themselves are described in one 
of several metadata languages, but we are mostly focusing on Ecological 
Metadata Language (EML) because it is rich enough to allow machine 
processing of heterogeneous data sources. However, we also use Darwin 
Core and plan to support agency standards like the Bio Data Profile.  We 
are currently implementing the interfaces for 'writing' data to EcoGrid 
nodes, which when complete will finish a large number of the basic data 
access interfaces we intend to develop.

We are currently prototyping EcoGrid interfaces to work against four 
very diverse data systems: the KNB Metacat system [1], the DiGIR 
protocol [2] for access to collections data, the Storage Resource Broker 
(SRB), and the Xanthoria system.  We have implemented basic EcoGrid 
interfaces for all of these services, and think that they are 
representative of the types of systems widely used to store 
environemntal data.

At the current time we do not have mediators that can do standard query 
translation among the various metadata languages, but we are hoping to 
incorporate that in the next year -- if possible, I'm hoping we can 
leverage our semantic mediation system for this, but we'll see.

>     * Is an overview of the architecture of GRID Middleware components
>       available? We are having difficulties determining which components
>       could be relevant for our project requirements.

I'm not exactly sure what you're asking for here. We use the Globus 
Toolkit (GT) as our middleware layer, and there are lots of overviews of 
GT around (see http://globus.org).  So far we have found GT to be 
extremely difficult to use, and we're not using the built-in services 
very much.  Compared to using plain web services, the GT has been 
extremely slow going.

If what you're looking for here is an overview of the EcoGrid, maybe a 
recent presentation I gave would help (see [3]).

>     * What are the further plans for EcoGRID?
We envision EcoGrid to be a tiered set of interface definitions that can 
be implemented by any data provider.  EcoGrid will provide a distributed 
registry service that allows providers to register their existence and 
describe which of the interfaces they support as well as to document 
their coverage.  EcoGrid will also provide aggregation and indexing 
nodes that will collate content from multiple data providers to allow 
for efficient searching of large numbers of nodes.  Finally, EcoGrid 
will define a number of standardized interfaces for registering 
computational services that are available.  We hope to provide a 
scheduling and optimization service that can calculate the best set of 
nodes on which to run analyses and models based on a node availability, 
data location, and other characteristics.  Finally, we are working on 
semantic annotations to both data and computational services that will 
allow for more effective discovery and integration of those services. 
These integration services will likely be manifested as part of a 
workflow system called Kepler that we are building with partners, but it 
may be incorporated directly into the EcoGrid as well.

>     * Is EcoGRID based on the Globus Toolkit (which version?), and what
>       is the general relationship between the Globus Alliance and SEEK?
Yes.  We've used versions 2.1, 3.0.0, 3.0.1, 3.0.2, and 3.2.  We may 
move to the WSRF implementations as they stabilize, but that has yet to 
be determined.  We've had lots of difficulty using the GT, so we have 
considered dropping it in favor of plain web services for some time. 
The Grid Center project, a member of the Globus Alliance, has devoted 
some support to our project, and we hope that this will allow us to more 
effectively make use of the GT.

>     * Does your GRID implementation currently support "semantic
>       mediation", or what are the plans for the future in regards to this?

SEEK has developed a semantic mediation system called Sparrow that 
includes both a human-readable syntax for logic statements that can 
translate to and from OWL, and a reasoner based on some existing 
reasoning tools.  Our general approach is outlined in some papers, 
notably [4], [5], and [6].  We have prototyped some semantic mediation 
tasks in a case study using Sparrow (see [7]), but are not finished 
enough to say that we have completed the semantic mediation component. 
Currently, none of our EcoGrid or Kepler releases incorporate any of 
these semantic mediation tools, but we are hoping to integrate them in 
the next year.  We are hoping that our late fall 2004 release of Kepler 
will have an ontology-driven data and model browsing facility included.

>     * Does EcoGRID support the ontologies (in Edinburgh several
>       statements made us suspect this)
What do you mean by support?  We plan to have ontology-driven searching 
allowing indirect references through the EcoGrid, but it currently does 
not support this.  The current EcoGrid system is a simple abstraction in 
  front of metadata catalogs that also supports data access and data 
queries.  We are hoping that this year it will support semantics much 
more fully.  We have been developing ontologies for describing the 
semantics of heterogeneous data sources, and have several fairly 
complete top-level ontologies that we've been testing.  I think our two 
major challenges to making this work will be: 1) developing fully 
specified ontologies that cover much of biology and environemntal 
science, and 2) annotating large numbers of data sets with references 
into these ontologies so that they can be used within the mediation system.

> The answers to these questions would be a great help to us, not only in 
> the continuation of our work, but also in our goal of avoiding redundant 
> work.
There are never enough people working on this stuff.  We would be 
excited if you would be interested in collaborating.  We run all of our 
efforts as open collaborations.  The one that has garnered the most 
interest is our Kepler Project (http://kepler-project.org) to build an 
open framework for scientific workflows.  If you were interested in 
contributing to that environment or collaborating on other aspects we 
would welcome it.  Lets start a conversation to see how we might work 
together to make best use of our resources.

Cheers,
Matt

References
------------
[1] http://knb.ecoinformatics.org/ and 
http://knb.ecoinformatics.org/software/metacat
[2] http://digir.sourceforge.net/
[3] 
http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/seek/docs/presentations/jones-ecogrid-lterim-20040728.ppt?rev=1.2&content-type=application/vnd.ms-powerpoint
[4]
http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/seek/projects/kr-sms/docs/ssdbm04_bowers.pdf?rev=1.1&content-type=application/pdf
[5] 
http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/seek/projects/kr-sms/docs/scisw03.pdf?rev=1.2&content-type=application/pdf
[6] 
http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/seek/projects/kr-sms/docs/DILS04/dils04-Bowers-Ludaescher.pdf?rev=1.1&content-type=application/pdf
[7]
http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/seek/projects/kr-sms/docs/swdb04.pdf?rev=1.2&content-type=application/pdf

> 
>  
> 
> Thanks for your help!
> 
>  
> 
> Herbert Schentz & Kathi Schleidt
> 
>  
> 
> **__________________________________________________________________________________________________***
> 
> *kathi schleidt
> 
> IT-Entwicklung
> IT-Development
> T: +43-(0)1-313 04/5363
> F: +43-(0)1-313 04/3555
> katharina.schleidt at umweltbundesamt.at 
> <mailto:katharina.schleidt at umweltbundesamt.at>
> 
> *umwelt**bundesamt*
> Spittelauer Lände 5
> A-1090 Wien
> Österreich/Austria
> http://www.umweltbundesamt.at <http://www.umweltbundesamt.at/>
> 
> Die Informationen in dieser Nachricht sind vertraulich und 
> ausschließlich für die/den AdressatIn bestimmt. Sollten
> Sie diese Nachricht irrtümlich erhalten haben, benachrichtigen Sie bitte 
> umgehend die/den SenderIn und löschen
> Sie das Original. Jede andere Verwendung dieses E-Mails ist untersagt.
> 
> This message is for the designated recipient only and may contain 
> privileged, proprietary, or otherwise private
> information. If you have recieved it in error, please notify the sender 
> immediately and delete the original. Any other
> use of the email by you is prohibited.
> 
>  
> 

-- 
-------------------------------------------------------------------
Matt Jones                                     jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California Santa Barbara
Interested in ecological informatics? http://www.ecoinformatics.org
-------------------------------------------------------------------



More information about the Seek-dev mailing list