[eml-dev] Can one EML document describe multiple datasets?
Margaret O'Brien
mob at msi.ucsb.edu
Tue Oct 27 11:59:48 PDT 2009
Hi Steve -
An EML document can have only one dataset child (or
protocol|software|citation). It does not describe a collection. The
dataset elements can have multiple dataTable elements (or combinations
of other entities - see the specs). The functionality that you describe
-- to use EML to house collections of resources has been discussed in
this forum. And in the LTER, we have been exploring ways to use the
EML's project module as a wrapper for a collection, e.g, a research
project and all the material associated with it. Since the associated
material might not be EML, we have not used internal EML references for
this.
BTW, It sounds like your definition of dataset is very similar to ours
(and others in the LTER), although some would classify our group as a
"lumper", since we update metadata and append data each year (as long as
protocols and sites remain consistent) to facilitate querying across
years. Our datasets sometimes contain multiple dataTables - for example
if they represent a cohesive group for some experimental reason.
best,
Margaret
========================
Margaret O'Brien
Information Management
Santa Barbara Coastal LTER
Marine Science Institute
University of California
Santa Barbara, CA 93106-6150
805-893-2071
mob at msi.ucsb.edu
http://sbc.lternet.edu
========================
Steve Rentmeester wrote:
> All,
>
> Thanks for the feedback.
>
> Typically, our data management team thinks of a dataset in terms of
> the data collection effort and defines a dataset as all data collected
> by a given project or field crew under a given protocol during a
> specified time frame. Typically the time frame is one field season or
> one year. Datasets typically include multiple observation tables.
>
> Reading the EML spec, I can see how to document a single dataset using
> eml. The dataset would have a title, contact, and creator. We would
> include a protocol, multiple method steps, equipment for methods, and
> dataTable to describe multiple entities and their attributes.
>
> So, the way we think about datasets works just fine in eml.
>
> The next question is: Can a single eml document include multiple datasets?
> Can the top-level eml-module include multiple datasets and then each
> dataset be referenced later in the document when describing individual
> protocols, method steps, and dataTables?
>
> My ultimate goal is to make the data in my database available as a
> data source in Kepler. Assuming that my database stores multiple
> "datasets", should we produce multiple eml documents, one for each
> "dataset". Or should we produce a single eml document and use the
> top-level eml-module as a wrapper to describe the collection of all
> datasets in our database and then describe each dataset individually
> using the lower-level module eml-dataset?
>
> It should be noted that we are writing scripts that will dynamically
> generate the eml document. So, the document can be updated whenever
> the db is updated or a subset of the database is distributed.
>
> Thanks again for any advice or feedback,
>
> steve
>
>
> On Tue, Oct 27, 2009 at 7:24 AM, inigo san gil <isangil at lternet.edu> wrote:
>
>> Steve,
>>
>> EML is flexible enough -- it accepts any implementation of your vision of
>> what a dataset is.
>> Use this flexibility to your advantage -- see how you manage sets locally,
>> and try to reflect
>> it in the EML packages (or documents).
>> Would it be nice to standardize what a 'dataset' is? i think it would have
>> help, but we would
>> not reach consensus.
>>
>> There are all sort of interpretations out there - i'd point out to a paper
>> that we submitted a
>> year ago, but paper is still held captive by the reviewers. At LTER, most of
>> the sites (26)
>> use the element "dataTable" to describe either spreadsheet types of data, or
>> views from a
>> database. Some use the "spatialVector" or "spatialRaster", to detail GIS
>> type of data, but
>> most defer those documentations to ESRI-based metadata, tights better with
>> the GIS data
>> management systems.
>>
>> Most LTER EML documents ("datasets") contain more than one table.
>> ("dataTable" - think
>> a meteo station describing several measurements, in different spreadsheets).
>> A few sites
>> describe *a lot* of data within one data set (lump data), and a few others
>> split data to
>> a level close to the most atomic of parts -perhaps an example of a
>> quintessential atomized
>> EML would be certain "EcoTrends" project generated records, where an EML
>> "dataset"
>> may just describe one time series (two variables - time and something
>> else.).
>>
>> What I would take home about EML is that is a vehicle to transport
>> information in a
>> common specification - you should define and manage your datasets according
>> to your
>> group understanding ( see your database's collection events as a possible
>> working
>> understanding, or split it a bit from there if those become too massive. )
>>
>> cheers, inigo
>>
>> Steve Rentmeester wrote:
>>
>>> Hello,
>>>
>>> My programmer and I are working to export EML documents from a
>>> relational database that stores data and metadata from many data
>>> collection events, protocols, sites, and projects. We are attempting
>>> to gain a better understanding of how a dataset is defined within EML.
>>>
>>> Can one EML document describe multiple datasets?
>>>
>>> How is a dataset defined?
>>>
>>> Currently, I'm assuming datasets are defined based on data collection
>>> characteristics (agency, project, protocol, temporal range) and not
>>> defined based on data analysis or synthesis requirements (all data
>>> used to evaluate question x).
>>>
>>> thank you for any advice,
>>>
>>> steve
>>>
>>> Steve Rentmeester
>>> Environmental Data Services
>>> Contractor to Bonneville Power Administration
>>> Portland, OR 97203
>>> office: 503-247-8431
>>> cell: 503-348-5839
>>> _______________________________________________
>>> Eml-dev mailing list
>>> Eml-dev at ecoinformatics.org
>>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>
>>>
>>
>
>
>
>
More information about the Eml-dev
mailing list