[Fwd: protocol, methods, and project]

Peter McCartney peter.mccartney at asu.edu
Thu Aug 29 10:04:42 PDT 2002


See comments by Matt and me in the last few emails....maybe protocol IS a
literature citation.

Peter McCartney (peter.mccartney at asu.edu)
Center for Environmental Studies
Arizona State University
480-965-6791 

> -----Original Message-----
> From: David Blankman [mailto:dblankman1 at comcast.net]
> Sent: Thursday, August 29, 2002 9:13 AM
> To: Eml-Dev (E-mail)
> Cc: Scott Chapal; Tim Bergsma
> Subject: Re: [Fwd: protocol, methods, and project]
> 
> 
> James,
> 
> Protocol has been made into a stand alone "resource" so that protocol 
> libraries such as the one that Bill Mitchener is developing can be 
> published and referenced much as a literature citation.
> 
> David
> 
> James W Brunt wrote:
> 
> > I agree with all of the comments and suggestions in this 
> thread which 
> > only further demonstrates the ambiguities. There are two conceptual 
> > bubbles at work here - we started out with EML to describe 
> a dataset - 
> > this was and still is the intent in some of our minds. 
> Meanwhile, we 
> > all see the usefulness of EML as a general tool for the 
> > interoperability of  ecological information systems. 
> Nonetheless both 
> > forces are to work here and it has resulted in the 
> ambiguities that we 
> > are experiencing. If we think of eml as providing the 
> information to 
> > fully understand "a" dataset and not providing the 
> information needed 
> > to fully understand a research architecture and hierarchy 
> then I think 
> > we will be better off because that's where the roots of EML reside.
> >
> > In 15 years or so of doing this stuff it is invariable that people 
> > will interpret dataset (data set) differently and thus 
> essential not 
> > that it be defined but that it can be defined within EML. I think 
> > we've given adequate latitude here for "dataset" here.  
> It's datasets 
> > that we want to preserve with metadata and be able to 
> deconstruct and 
> > understand. If your dataset definition is a normalized set of data 
> > collections under the same or different experimental design 
> then you 
> > may have need for multiple projects and protocols.
> >
> > In another context there is a need for multiple project 
> descriptions - 
> > those metadata germain to the dataset and those abstract to the 
> > dataset. An LTER project may fit into either category depending on 
> > whether experimental design is defined as the project or 
> the project 
> > is driven by generalized hypotheses that are then addressed by 
> > multiple experimental designs. We discussed this a lot at Sevilleta 
> > EML 2002 and I thought we had it covered. So the point is I 
> think it's 
> > a mistake to take sampling out of project.
> >
> > I also think it's a mistake to create method - sorry Peter. 
> Somebody 
> > please remind me why protocol is standalone and imports 
> dataset in the 
> > first place?
> >
> > Thread tangled,
> >
> > James
> >
> > Scott Chapal wrote:
> >
> >> David,
> >>
> >> Good analysis of the ambiguities in our thinking about EML.
> >>
> >> David Blankman <dblankman at lternet.edu> writes:
> >>
> >>
> >>> Tim et al,
> >>>
> >>> I understand your issues about where do I put what (in 
> dataset? , in
> >>> project?  in dataTable?)
> >>>
> >>>
> >>> The following is not meant to imply a need to change the 
> EML model,
> >>> but to comment on some of the issues that Matt and Tim raise. The
> >>> Project module has been a source of confusion from its 
> inception (if
> >>> I remember correctly Project started out at a level equal 
> to that of
> >>> dataset). The confusion stems in part from the fuzziness of the
> >>> concept "Project" and to a certain extent "Dataset".:
> >>
> >>
> >>
> >> For the sake of the clarity of the discussion, and for the eventual
> >> clarity in EML itself, I think all terms such as these need to be
> >> formally defined.  This 'vocabulary' should be part of the
> >> 'specification' that we have agreed is important for EML 2 to have.
> >>
> >> Lets agree to define all ambiguous terms and put them in the
> >> specification, immediately.
> >>
> >> The definition of these terms, and the relationships they 
> inhabit in
> >> EML 2 need to be architected from a 'Generic' perspective.  That is
> >> from a ecological research "requirements" view, rather 
> than from any
> >> particular example(s) at existing sites.  There will be no 
> consensus
> >> achievable if we start from existing site architectures.  The
> >> challenge is to create EML in such a way that it is mappable to the
> >> broad range of information management techniques used at all the
> >> different LTER sites, and all the other sites doing ecological
> >> research -- and to do that without creating confusion for those who
> >> will be using EML.
> >>
> >> Also EML needs to be architected for evolution.  At my 
> site, we have
> >> pretty clear notions of what a project and dataset are.  But we are
> >> now challenged to map our preconceptions to the EML model and
> >> vocabulary.  Our projects can change configuration over time; or a
> >> single dataset can become a project, eventually.  But the details
> >> employed at my site, or any site, should not be design criteria for
> >> the goals of EML, in my opinion.
> >>
> >>
> >>> 1. The scope of "project" is variable and indeterminate.
> >>
> >>
> >>
> >> But it wouldn't be if everyone understood and agreed on 
> the definition
> >> of the term as used in EML.
> >>
> >>
> >>> 2. The fact that project is contained within dataset adds to the
> >>> confusion, since normally we would view a dataset as part of a
> >>> project. ["Contained within" may not be technically correct in
> >>> XML Schema terms but it is the way that most of us would describe
> >>> it].
> >>
> >>
> >>
> >> The project element contained within dataset, points to a
> >> proj:ResearchProjectType (complexType) representation of the
> >> researchProject.  Through the ID mechanism it can be 
> normalized in the
> >> instance document.  So it's not really contained within, 
> it's "pointed
> >> to" although it could be redundantly repeated in the 
> instance if you
> >> want.
> >>
> >>
> >>> 3. The change in the packaging concept from triples to containment
> >>> makes some > things clearer but makes the question of "what goes
> >>> where" more important. .
> >>
> >>
> >>
> >>> 4. For some, perhaps many, of the LTER information managers, the
> >>> term > "dataset" is is used to describe what is represented in EML
> >>> as "dataTable", > while the term "project" is used to 
> describe what
> >>> is represented in EML as > "dataset".
> >>
> >>
> >>
> >> This apparent ambiguity contrasts with the EML documentation for
> >> eml-dataset:
> >>
> >> "The eml-dataset module contains general information that describes
> >> dataset resources. It is intended to provide overview information
> >> about the dataset, including title, abstract, keywords, 
> contacts, and
> >> the links to associated metadata for the given resource. It also
> >> describes the temporal, geographic, and taxonomic coverage of the
> >> overall dataset. A dataset can be (and often is) composed 
> of a series
> >> of data entities (tables) that are linked together by particular
> >> integrity constraints."
> >>
> >> This is not using dataset in the way that David is thinking about
> >> 'dataset'.  This might be because a lot of data in 
> ecological research
> >> are not normalized, it's just one big honkin' 
> table-o-stuff.  If there
> >> is disagreement about the use of 'project', 'dataset' and 
> 'datatable'
> >> as currently defined in EML, then it needs to be addressed: either
> >> change the terms or make the definitions clearer.  If this issue is
> >> not addressed, the success of EML will be affected.  If the
> >> information managers themselves are not able to conceptualize and
> >> agree on these terms and relationships, then how can we 
> expect others
> >> to?  Terms in EML need to be precise, and if that forces us to be
> >> clear, that is a good thing IMHO.
> >>
> >>
> >>> I would suggest that  in Tim's example, "
> >>>
> >>> "For instance, at KBS, our > mainsite layout is a randomized
> >>> complete block agricultural experiment, > installed in 1986.
> >>> That's a project, with a method, and created no data > 
> per se.  All
> >>> of our main datasets, however, each with their own sampling >
> >>> techniques (methods/protocols) implicitly rely on the project
> >>> method."
> >>>
> >>> This information would logically go in
> >>> Project/designDescription/paragraph (or whatever paragraph
> >>> becomes). Perhaps the designDescription module should also be able
> >>> to reference a resource-level protocol, although that may be
> >>> possible already through the use of references.
> >>
> >>
> >>
> >> I don't think so in its current form.  This is an example of what I
> >> was getting at with 'Granularity of Repeatable Content'
> >>
> >>
> >>> If I understand the distinction Peter makes between protocol and
> >>> method, then anything at the project level would be a protocol.
> >>
> >>
> >>
> >>> Specific methods then belong at the dataTable level. 
> Since "project"
> >>> is optional, dataTable then needs to be able to include 
> protocols as
> >>> well as methods.
> >>
> >>
> >>
> >>> While I agree with Matt on trying to minimize the places where
> >>> elements reside, the needs of site-based programs may be different
> >>> from those of individual ecologists. An LTER site may need/want to
> >>> have a richer set of items at the project level than an individual
> >>> ecologist might need. Over time, I can foresee variations 
> in Morpho
> >>> configurations (or other EML tools) that might give users 
> different
> >>> recommended eml subsets in the same way that Quickbooks does for
> >>> helping businesses choose account configurations that are relevant
> >>> to their business, e.g. service vs retail vs manufacturer.
> >>
> >>
> >>
> >> EML should be able to handle the complexity of a huge project, but
> >> should employ graceful degradation, so that it is useful for
> >> documenting a simple dataset.  Or was that datatable?  :)
> >
> >
> >
> 
> 
> 
> _______________________________________________
> eml-dev mailing list
> eml-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/eml-dev
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20020829/1607b941/attachment.htm


More information about the Eml-dev mailing list