Catalog questions

Matt Jones jones at nceas.ucsb.edu
Mon Apr 15 12:26:32 PDT 2002


Hey Mike,

Feel free to ask away.  I'll answer as soon as possible, but I am 
heading out of town for 10 days so will be in limited email contact for 
a bit.

Glad you're interested in our work.  Although we use EML in our system, 
I want to point out that neither Metacat nor Morpho are tied to a 
specific content standard.  Metacat inherently allows storage and search 
of ANY XML metadata, and Morpho can be configured to support any DTD. 
So, for example, it is perfectly legit to load FGDC metadata into 
metacat, and all of the features of metacat will be available to you.

That said, we started work on this stuff in the early days before FGDC 
was revised to accomodate the biologically relevant metadata that EML 
supports.  So part of it was historical.  I was (and am) on the 
committee that created the FGDC/NBII Biological Data Profile of the 
CSDGM.  If you look in detail, you'll find than many of the structures 
in the Bio Profile actually are taken verbatim from earlier versions of 
EML (because I put them there :).  After we released the bio profile, I 
created an XML DTD for it (which you can find at FGDC) so that it could 
be used within out framework, and we will be providing translation tools 
between them.

So why have we continued with EML now that NBII supports bio stuff? 
Short answer, modularity & extensible structures.  Long answer: The 
CSDGM is one huge monolithic standard, and so it is difficult to mix and 
match parts of it with other standards -- mainly because of all of the 
spatial requirements.  So, we built EML as a series of modules that can 
be linked together and can be linked to other metadata standards.  This 
gives us the most flexibility, and given that we can easily translate 
into FGDC compliant documents, there is little cost.  Second, we're 
building advanced data processing tools that can automatically parse 
data sets and analyze them based on the EML metadata descriptions.  Due 
to various shortcomings in the FGDC standard, mostly oriented around its 
tight focus on spatial data, we have found that the CSDGM isn't adequate 
for these needs.  As a research project, we are constantly trying to 
expand the suite of services that metadata enables, and the FGDC spec 
isn't accomodating in that regard (e.g., how can one add macine 
parsable, semantically oriented attribute tags to FGDC?  Answer, you 
can't, because it is monolithic and doesn't permit dynamic ties to other 
metadata specs -- the only extension method is a huge administrative 
task of actually creating a superset of the FGDC -- not very 
maintainable).  In addition, the level of granularity for metadata in 
FGDC is very patchy -- it goes into tremendous detail for spatial 
projections, etc, but is incredibly terse with respect to describing 
methods and non standard data formats.  This is appropriate in the 
spatial world where there are so few data formats (< 100, many sensor 
derived streams), but not so good in ecology where there is no 
standardization of data formats (>>>5000, very few sensor derived).

Nevertheless, I fully understand your point, and often find myself 
asking the same question -- why yet another metadata standard?  We have 
tried to wholesale adopt those sections of other specs where they make 
sense.  For example, the eml-coverage module is almost identical to the 
NBII bio profile, and eml-party is drawn almost totally from the ISO 
Geospatial standard.   We're certainly willing to adapt EML to be more 
in line with the other standards, unless it compromises our ability to 
deploy advanced services.

How's that for a ramble?  BTW, I wanted to point out that we released a 
new version of Metacat (1.1.0) and Morpho (1.1.0) last week -- a number 
of new features and useful bug fixes that you might want if you're 
considering using the software.

Cheers,
Matt


McCann, Mike wrote:
> Hi Matt,
> 
> I work with John Graybeal at MBARI.  I am surveying various data catalog systems in preparation for building one for use with ocean observatory data.   The work you've done with Metacat (available for perusal at http://www.ecoinformatics.org/) looks very interesting.  I have some specific questions to ask about your system.  Would you mind me asking these questions?  (Please let me know if I should refer to mail list archives or some other reference.)
> 
> My initial question is why did you not use FGDC for describing your data sets?  Why do ecologists need their own metadata language?
> 
> Thanks in advance,
> Mike
> 
> --
> Mike McCann (mccann at mbari.org)
> Monterey Bay Aquarium Research Institute
> 7700 Sandholdt Road
> Moss Landing, CA 95039-9644
> Voice: (831) 775-1769 Fax: (831) 775-1646 http://www.mbari.org/rd/iag.htm


-- 
*******************************************************************
Matt Jones                                    jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439   Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)

Interested in ecological informatics? http://www.ecoinformatics.org
*******************************************************************




More information about the Eml-dev mailing list