[seek-dev] Re: metacatiness

Matt Jones jones at nceas.ucsb.edu
Mon Nov 15 10:58:16 PST 2004


OK, well, this is a rich topic...

The simplest metacat operations can be expressed as a url, like this:

http://metacat.nceas.ucsb.edu/knb/metacat?action=read&qformat=xml&docid=knb-lter-gce.109.6

or for one of the HTML versions:

http://metacat.nceas.ucsb.edu/knb/metacat?action=read&qformat=knb&docid=knb-lter-gce.109.6

Obviously, changing the docid on the end of the URL gets you a different 
object from the server.

Metacat stores a lot of EML documents that are metadata for data objects 
that can be stored in the metacat too, or not.  The EML tells you where 
they are stored.  So, in the above example, you can see a "distribution" 
section that tells you to get the data object from a web site:

<distribution scope="document">
<online>
<url 
function="download">http://gce-lter.marsci.uga.edu/lter/asp/db/send_file.asp?name=metacat-user&email=none&affiliation=LNO&notify=0&accession=INV-GCEM-0305a1&filename=INV-GCEM-0305a1_1_1.TXT</url>
</online>
</distribution>

Or the EML documents may contain references to (deprecated) "ecogrid" 
urls, which should be interpreted as meaning the document is locally 
stored on the metacat server. For example, this EML document about 
grasshoppers:

http://metacat.nceas.ucsb.edu/knb/servlet/metacat?action=read&qformat=xml&docid=sev.106.2

contains a reference to this data object:

ecogrid://knb/sev.13705.1

which can be accessed here:

http://metacat.nceas.ucsb.edu/knb/servlet/metacat?action=read&qformat=knb&docid=sev.13705.1

Note that any given EML document may in fact reference multiple data 
objects (as does the grasshopper example above, which also contains a 
reference to ecogrid://knb/sev.13703.1), so don't assume a 1:1 one 
metadata->data correspondence when parsing EML.

As I mentioned in an earlier email, Metacat ID's should probably be 
mapped to lsids like this:

sev.13703.1
urn:lsid:lsid.ecoinformatics.org:sev:13703:1

Ok, so that was the short answer.  In addition to this simplistic URL 
interface, there is 1) a Java client API, and 2) a perl client API that 
allow you to access metacat programatically.  And on top of that there 
is the EcoGrid Grid Service API that can be used to retrieve all of the 
same objects.  These APIs need to be used when login is required, as 
some of the documents in metacat are access controlled and a session 
needs to be established to determine access rights.  The metacat Java 
client APIs can be seen in action in the metacat JUnit test that 
demonstrates its use (checkout the metacat cvs module).  The EcoGrid 
client API can be seen in use in the Kepler code and some sample code in 
the ecogrid project directory (checkout the kepler and seek modules).

There's a good developer's overview of Metacat in this set of slides:
http://knb.ecoinformatics.org/knbws/knbws-jones-metacat-20040927.ppt

Finally, I think it would be good to use LSIDs directly in metacat, so 
that EML documents themselves might contain LSID identifiers.  Could you 
comment on what would be needed to make an LSID server be shipped as a 
standard part of a metacat db to resolve identifiers for objects that 
might be stored in that metacat?

Thanks,
Matt

dave thau wrote:
> Hey there!
> 
> I'm going to try to tie a metacat server into the LSID thing.  Is there a
> good server to tap?  Is there a list of good queries anywhere?  Any good
> examples of using the API to make a call and parse results?
> 
> Dave

-- 
-------------------------------------------------------------------
Matt Jones                                     jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California Santa Barbara
Interested in ecological informatics? http://www.ecoinformatics.org
-------------------------------------------------------------------



More information about the Seek-dev mailing list