[eml-dev] Can one EML document describe multiple datasets?
Cushing, Judy
judyc at evergreen.edu
Wed Oct 28 14:20:23 PDT 2009
Hi all,
I have been following this conversation with some interest since we have a similar kind of issue upcoming wrt providing eml (or not) with an aggregate dataset - one that combines several other datasets....
I'll keep in touch with steve and see what he does!
judy
-----Original Message-----
From: eml-dev-bounces at ecoinformatics.org [mailto:eml-dev-bounces at ecoinformatics.org] On Behalf Of Steve Rentmeester
Sent: Tuesday, October 27, 2009 10:58 AM
To: eml-dev at ecoinformatics.org
Subject: Re: [eml-dev] Can one EML document describe multiple datasets?
All,
Thanks for the feedback.
Typically, our data management team thinks of a dataset in terms of
the data collection effort and defines a dataset as all data collected
by a given project or field crew under a given protocol during a
specified time frame. Typically the time frame is one field season or
one year. Datasets typically include multiple observation tables.
Reading the EML spec, I can see how to document a single dataset using
eml. The dataset would have a title, contact, and creator. We would
include a protocol, multiple method steps, equipment for methods, and
dataTable to describe multiple entities and their attributes.
So, the way we think about datasets works just fine in eml.
The next question is: Can a single eml document include multiple datasets?
Can the top-level eml-module include multiple datasets and then each
dataset be referenced later in the document when describing individual
protocols, method steps, and dataTables?
My ultimate goal is to make the data in my database available as a
data source in Kepler. Assuming that my database stores multiple
"datasets", should we produce multiple eml documents, one for each
"dataset". Or should we produce a single eml document and use the
top-level eml-module as a wrapper to describe the collection of all
datasets in our database and then describe each dataset individually
using the lower-level module eml-dataset?
It should be noted that we are writing scripts that will dynamically
generate the eml document. So, the document can be updated whenever
the db is updated or a subset of the database is distributed.
Thanks again for any advice or feedback,
steve
On Tue, Oct 27, 2009 at 7:24 AM, inigo san gil <isangil at lternet.edu> wrote:
>
> Steve,
>
> EML is flexible enough -- it accepts any implementation of your vision of
> what a dataset is.
> Use this flexibility to your advantage -- see how you manage sets locally,
> and try to reflect
> it in the EML packages (or documents).
> Would it be nice to standardize what a 'dataset' is? i think it would have
> help, but we would
> not reach consensus.
>
> There are all sort of interpretations out there - i'd point out to a paper
> that we submitted a
> year ago, but paper is still held captive by the reviewers. At LTER, most of
> the sites (26)
> use the element "dataTable" to describe either spreadsheet types of data, or
> views from a
> database. Some use the "spatialVector" or "spatialRaster", to detail GIS
> type of data, but
> most defer those documentations to ESRI-based metadata, tights better with
> the GIS data
> management systems.
>
> Most LTER EML documents ("datasets") contain more than one table.
> ("dataTable" - think
> a meteo station describing several measurements, in different spreadsheets).
> A few sites
> describe *a lot* of data within one data set (lump data), and a few others
> split data to
> a level close to the most atomic of parts -perhaps an example of a
> quintessential atomized
> EML would be certain "EcoTrends" project generated records, where an EML
> "dataset"
> may just describe one time series (two variables - time and something
> else.).
>
> What I would take home about EML is that is a vehicle to transport
> information in a
> common specification - you should define and manage your datasets according
> to your
> group understanding ( see your database's collection events as a possible
> working
> understanding, or split it a bit from there if those become too massive. )
>
> cheers, inigo
>
> Steve Rentmeester wrote:
>>
>> Hello,
>>
>> My programmer and I are working to export EML documents from a
>> relational database that stores data and metadata from many data
>> collection events, protocols, sites, and projects. We are attempting
>> to gain a better understanding of how a dataset is defined within EML.
>>
>> Can one EML document describe multiple datasets?
>>
>> How is a dataset defined?
>>
>> Currently, I'm assuming datasets are defined based on data collection
>> characteristics (agency, project, protocol, temporal range) and not
>> defined based on data analysis or synthesis requirements (all data
>> used to evaluate question x).
>>
>> thank you for any advice,
>>
>> steve
>>
>> Steve Rentmeester
>> Environmental Data Services
>> Contractor to Bonneville Power Administration
>> Portland, OR 97203
>> office: 503-247-8431
>> cell: 503-348-5839
>> _______________________________________________
>> Eml-dev mailing list
>> Eml-dev at ecoinformatics.org
>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>
>
>
--
Steve Rentmeester
Environmental Data Services
Contractor to Bonneville Power Administration
Portland, OR 97203
office: 503-247-8431
cell: 503-348-5839
_______________________________________________
Eml-dev mailing list
Eml-dev at ecoinformatics.org
http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
More information about the Eml-dev
mailing list