[Fwd: protocol, methods, and project]

Thu Aug 29 08:39:32 PDT 2002

I agree with all of the comments and suggestions in this thread which 
only further demonstrates the ambiguities. There are two conceptual 
bubbles at work here - we started out with EML to describe a dataset - 
this was and still is the intent in some of our minds. Meanwhile, we all 
see the usefulness of EML as a general tool for the interoperability of 
  ecological information systems. Nonetheless both forces are to work 
here and it has resulted in the ambiguities that we are experiencing. If 
we think of eml as providing the information to fully understand "a" 
dataset and not providing the information needed to fully understand a 
research architecture and hierarchy then I think we will be better off 
because that's where the roots of EML reside.

In 15 years or so of doing this stuff it is invariable that people will 
interpret dataset (data set) differently and thus essential not that it 
be defined but that it can be defined within EML. I think we've given 
adequate latitude here for "dataset" here.  It's datasets that we want 
to preserve with metadata and be able to deconstruct and understand. If 
your dataset definition is a normalized set of data collections under 
the same or different experimental design then you may have need for 
multiple projects and protocols.

In another context there is a need for multiple project descriptions - 
those metadata germain to the dataset and those abstract to the dataset. 
An LTER project may fit into either category depending on whether 
experimental design is defined as the project or the project is driven 
by generalized hypotheses that are then addressed by multiple 
experimental designs. We discussed this a lot at Sevilleta EML 2002 and 
I thought we had it covered. So the point is I think it's a mistake to 
take sampling out of project.

I also think it's a mistake to create method - sorry Peter. Somebody 
please remind me why protocol is standalone and imports dataset in the 
first place?

Thread tangled,

James

Scott Chapal wrote:
> David,
> 
> Good analysis of the ambiguities in our thinking about EML.
> 
> David Blankman <dblankman at lternet.edu> writes:
> 
> 
>>Tim et al,
>>
>>I understand your issues about where do I put what (in dataset? , in
>>project?  in dataTable?)
>>
>>
>>The following is not meant to imply a need to change the EML model,
>>but to comment on some of the issues that Matt and Tim raise. The
>>Project module has been a source of confusion from its inception (if
>>I remember correctly Project started out at a level equal to that of
>>dataset). The confusion stems in part from the fuzziness of the
>>concept "Project" and to a certain extent "Dataset".:
> 
> 
> For the sake of the clarity of the discussion, and for the eventual
> clarity in EML itself, I think all terms such as these need to be
> formally defined.  This 'vocabulary' should be part of the
> 'specification' that we have agreed is important for EML 2 to have.
> 
> Lets agree to define all ambiguous terms and put them in the
> specification, immediately.
> 
> The definition of these terms, and the relationships they inhabit in
> EML 2 need to be architected from a 'Generic' perspective.  That is
> from a ecological research "requirements" view, rather than from any
> particular example(s) at existing sites.  There will be no consensus
> achievable if we start from existing site architectures.  The
> challenge is to create EML in such a way that it is mappable to the
> broad range of information management techniques used at all the
> different LTER sites, and all the other sites doing ecological
> research -- and to do that without creating confusion for those who
> will be using EML.
> 
> Also EML needs to be architected for evolution.  At my site, we have
> pretty clear notions of what a project and dataset are.  But we are
> now challenged to map our preconceptions to the EML model and
> vocabulary.  Our projects can change configuration over time; or a
> single dataset can become a project, eventually.  But the details
> employed at my site, or any site, should not be design criteria for
> the goals of EML, in my opinion.
> 
> 
>> 1. The scope of "project" is variable and indeterminate.
> 
> 
> But it wouldn't be if everyone understood and agreed on the definition
> of the term as used in EML.
> 
> 
>> 2. The fact that project is contained within dataset adds to the
>> confusion, since normally we would view a dataset as part of a
>> project. ["Contained within" may not be technically correct in
>> XML Schema terms but it is the way that most of us would describe
>> it].
> 
> 
> The project element contained within dataset, points to a
> proj:ResearchProjectType (complexType) representation of the
> researchProject.  Through the ID mechanism it can be normalized in the
> instance document.  So it's not really contained within, it's "pointed
> to" although it could be redundantly repeated in the instance if you
> want.
> 
> 
>> 3. The change in the packaging concept from triples to containment
>> makes some > things clearer but makes the question of "what goes
>> where" more important. .
> 
> 
>> 4. For some, perhaps many, of the LTER information managers, the
>> term > "dataset" is is used to describe what is represented in EML
>> as "dataTable", > while the term "project" is used to describe what
>> is represented in EML as > "dataset".
> 
> 
> This apparent ambiguity contrasts with the EML documentation for
> eml-dataset:
> 
> "The eml-dataset module contains general information that describes
> dataset resources. It is intended to provide overview information
> about the dataset, including title, abstract, keywords, contacts, and
> the links to associated metadata for the given resource. It also
> describes the temporal, geographic, and taxonomic coverage of the
> overall dataset. A dataset can be (and often is) composed of a series
> of data entities (tables) that are linked together by particular
> integrity constraints."
> 
> This is not using dataset in the way that David is thinking about
> 'dataset'.  This might be because a lot of data in ecological research
> are not normalized, it's just one big honkin' table-o-stuff.  If there
> is disagreement about the use of 'project', 'dataset' and 'datatable'
> as currently defined in EML, then it needs to be addressed: either
> change the terms or make the definitions clearer.  If this issue is
> not addressed, the success of EML will be affected.  If the
> information managers themselves are not able to conceptualize and
> agree on these terms and relationships, then how can we expect others
> to?  Terms in EML need to be precise, and if that forces us to be
> clear, that is a good thing IMHO.
> 
> 
>>I would suggest that  in Tim's example, "
>>
>> "For instance, at KBS, our > mainsite layout is a randomized
>> complete block agricultural experiment, > installed in 1986.
>> That's a project, with a method, and created no data > per se.  All
>> of our main datasets, however, each with their own sampling >
>> techniques (methods/protocols) implicitly rely on the project
>> method."
>>
>>This information would logically go in
>>Project/designDescription/paragraph (or whatever paragraph
>>becomes). Perhaps the designDescription module should also be able
>>to reference a resource-level protocol, although that may be
>>possible already through the use of references.
> 
> 
> I don't think so in its current form.  This is an example of what I
> was getting at with 'Granularity of Repeatable Content'
> 
> 
>>If I understand the distinction Peter makes between protocol and
>>method, then anything at the project level would be a protocol.
> 
> 
>>Specific methods then belong at the dataTable level. Since "project"
>>is optional, dataTable then needs to be able to include protocols as
>>well as methods.
> 
> 
>>While I agree with Matt on trying to minimize the places where
>>elements reside, the needs of site-based programs may be different
>>from those of individual ecologists. An LTER site may need/want to
>>have a richer set of items at the project level than an individual
>>ecologist might need. Over time, I can foresee variations in Morpho
>>configurations (or other EML tools) that might give users different
>>recommended eml subsets in the same way that Quickbooks does for
>>helping businesses choose account configurations that are relevant
>>to their business, e.g. service vs retail vs manufacturer.
> 
> 
> EML should be able to handle the complexity of a huge project, but
> should employ graceful degradation, so that it is useful for
> documenting a simple dataset.  Or was that datatable?  :)

-- 
James W. Brunt
Associate Director for Information Management
Long Term Ecological Research Network Office
Department of Biology
University of New Mexico
Albuquerque, NM 87131-1091
505 272 7085
jbrunt at lternet.edu