[kepler-users] Metadata repository for Kepler.

Matt Jones jones at nceas.ucsb.edu
Thu Dec 15 13:33:43 PST 2011


Hi Bina --

On Thu, Dec 15, 2011 at 9:42 AM, Bina Philip <binabhas at umail.iu.edu> wrote:

> Hi Matt,
>
> I have a few questions to follow up with. I am still trying to learn here
> so please pardon possible ignorance about something you have already
> explained.
>
> 1. When you mention "It supports an extensible set of metadata
> standards, including EML, DarwinCore, and others" -- Does this mean that I
> can search for these metadata? Is there like a tool/client which helps me
> to look for the data.
>

Yes, under the "Data" tab is a search box -- if you type in that  box a
query will be generated against the registered EcoGrid servers that
searches the metadata in each of the supported standards.  Which fields in
 each of the standards are searched is determined by the programmer who
wrote the adapter to that standard for Kepler -- they can configure it to
search different fields in the metadata.  In addition, we have worked on an
"advanced search" feature in Kepler that lets you search more (e.g.,
spatial bounds, time, etc), but that work has not yet matured enough to be
released with Kepler yet.  But it very well could be a future feature if we
can find someone to finish off the work.


> 2. "This REST based interface is being promoted by DataONE as a
> cross-repository interoperability layer, enabling client tools to use a
> standard set of web services to interact with many repository software
> systems" -- How will this work? Does this also allow searching of which
> repository I would want to browse through? Please give me some more detail
> here in terms of the concept. You mentioned certain client tools to
> interact with these different repositories, what could the tools be like
> and what could be the interactions involved.

The DataONE system is designed as a network of repositories (that we call
Member Nodes), each of which agrees to provide the same set of services for
clients to use.  A set of coordinating nodes provides a search index across
all of those repositories. and client tools can issue queries to the search
index and find out which repositories hold data of interest.  The client
issuing the query can include a restriction on which repositories they want
to search in the query they submit.

For tools, we will be releasing a web search portal for the whole DataONE
network that provides a nice web UI for searching for and finding data, and
we will be extending other client tools (like Kepler, Morpho, and R) to
also be able to directly search the index to find content. We hope Kepler
support for DataONE will be introduced by the middle of next year, but we
haven't started that work yet, so I don't have a firm date for it.  We also
are developing a virtual filesystem view of the whole network (called
'ONEDrive') that lets users mount the DataONE federation as if it were a
local drive and then be able to browse and open data directly on their
computer -- we hide all of the repositories and metadata searching and data
retrieval behind the proverbial curtain.  This tool will be released
sometime next spring if all goes well.  Also, anyone who wants to develop
their own search tool can also do so -- the API is open.




> 3. For metadata related to workflow processes is there anyway I can run
> one of the already existing workflows (I have kepler set-up on my machine)
> to see what data is being captured and how exactly I can view this
> metadata? Kindly let me know if I can check for this in such a manner.
>
Yes, depending on which version of Kepler you have.  You'd need to install
some additional Kepler modules (namely the workflow run manager module),
which then enables the provenance collection and keeps track of the runs
you do, and lets you save those runs to the server, and get them back from
the server.  The provenance that is collected is stored in a database on
the machine running Kepler, so you can inspect that database for each run
if you really wanted a ton of details. But for most people, the workflow
rrun manager and workflow report that you can generate and save with the
run is probably what they would want to inspect.  This system is being
extended as we speak and will come out with the new version of Kepler
(2.3), very soon.  You can see the current workflow run manager and
provenance module documentation for the upcoming release here:

http://code.kepler-project.org/code/kepler/trunk/modules/workflow-run-manager/docs/workflow-run-manager.pdf

http://code.kepler-project.org/code/kepler/trunk/modules/reporting/docs/reporting.pdf

http://code.kepler-project.org/code/kepler/trunk/modules/provenance/docs/provenance.pdf

and the details of the provenance schema are here:

http://code.kepler-project.org/code/kepler/trunk/modules/provenance/docs/schema.pdf

Matt


> Best
>
>
> On Tue, Dec 6, 2011 at 4:50 PM, Bina Philip <binabhas at umail.iu.edu> wrote:
>
>> Thanks for the very detailed response Matt, really appreciate it. I will
>> dig in further and post again if I encounter questions. Thanks again!!
>>
>>
>> On Tue, Dec 6, 2011 at 4:32 PM, Matt Jones <jones at nceas.ucsb.edu> wrote:
>>
>>> Hi Bina ---
>>>
>>> The answer differs for metadata for data and metadata about processes.
>>>
>>> For data, Kepler can utilize metadata from a wide variety of
>>> repositories via the 'EcoGrid' SOAP web service interfaces, but mostly it
>>> uses the Metacat repository run as part of the KNB network for now.  That
>>> is what is being searched under the 'Data' tab in Kepler.  It supports an
>>> extensible set of metadata standards, including EML, DarwinCore, and
>>> others.  FGDC could be supported as the backend repository supports it, but
>>> we don't have a search adapter for FGDC in Kepler as of now -- this could
>>> be added. Kepler can also write metadata and data to Metacat through the
>>> EcoGridWriter actor.
>>>
>>> We also are working on enabling Kepler to work across a much wider
>>> variety of data repositories by changes that we are introducing to support
>>> the DataONE web service interface.  This REST based interface is being
>>> promoted by DataONE as a cross-repository interoperability layer, enabling
>>> client tools to use a standard set of web services to interact with many
>>> repository software systems (e.g., such as Metacat, Mercury, DSpace,
>>> Merritt, AKN, and in the future others like iRODS and Fedora).
>>>
>>> For metadata about processes, Kepler records provenance traces
>>> associated with workflow executions, and can serialize and store those
>>> traces and the associated workflows in archive files that can be uploaded
>>> to a repository.  Right now we run one repository for these
>>> workflow/provenance artifacts for Kepler itself, but I've heard that others
>>> run the repository systems themselves to have local repositories that they
>>> can use for their work. The Kepler Workflow Run Manager and Provenance
>>> modules handle these features.  This provenance metadata system also uses
>>> the EcoGrid services to read and write archives on remote repositories, and
>>> I expect this will also be a subject of future DataONE work to enable
>>> cross-repository interoperability.
>>>
>>> In addition, there is a working group focused on coming up with a
>>> cross-workflow metadata specification for provenance that is an extension
>>> of the OPM model. I expect that work will be incorporated in Kepler,
>>> Taverna, and other workflow systems as it matures.  See
>>> https://www.dataone.org/content/scientific-workflows-provenance-working-group
>>> .
>>>
>>> These features are all described in the Kepler documentation (the
>>> provenance system is described in the associated run manager and provenance
>>> module documentation), all of which is available here:
>>>    https://kepler-project.org/users/documentation
>>>
>>> Hope this helps.
>>>
>>> On Tue, Dec 6, 2011 at 12:09 PM, Bina Philip <binabhas at umail.iu.edu>wrote:
>>>
>>>> Hi,
>>>>
>>>> Does Kepler have a metadata repository? I am trying to find out if
>>>> Kepler stores metadata that could emerge out of a particular workflow, if
>>>> it does support metadata capture then what schemas does it support? For
>>>> reference to what exactly I am trying to inquire about please refer to this
>>>> link of FDGC schema http://www.fgdc.gov/metadata. I am trying to see
>>>> if there is a way that kepler captures metadata in such schema. Kindly shed
>>>> some light on this topic.
>>>>
>>>> Best
>>>>
>>>> --
>>>> Regards,
>>>> Bina
>>>> Indiana University Bloomington
>>>> Dept Of Computer Science (Master's).
>>>> Contact:- 812-327-4780
>>>>
>>>>
>>>> _______________________________________________
>>>> Kepler-users mailing list
>>>> Kepler-users at kepler-project.org
>>>> http://lists.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users
>>>>
>>>>
>>>
>>
>>
>> --
>> Regards,
>> Bina
>> Indiana University Bloomington
>> Dept Of Computer Science (Master's).
>> Contact:- 812-327-4780
>>
>>
>
>
> --
> Regards,
> Bina
> Indiana University Bloomington
> Dept Of Computer Science (Master's).
> Contact:- 812-327-4780
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nceas.ucsb.edu/kepler/pipermail/kepler-users/attachments/20111215/34133f9a/attachment.html>


More information about the Kepler-users mailing list