[Fwd: protocol, methods, and project]

Thu Aug 29 09:13:17 PDT 2002

James,

Protocol has been made into a stand alone "resource" so that protocol 
libraries such as the one that Bill Mitchener is developing can be 
published and referenced much as a literature citation.

David

James W Brunt wrote:

> I agree with all of the comments and suggestions in this thread which 
> only further demonstrates the ambiguities. There are two conceptual 
> bubbles at work here - we started out with EML to describe a dataset - 
> this was and still is the intent in some of our minds. Meanwhile, we 
> all see the usefulness of EML as a general tool for the 
> interoperability of  ecological information systems. Nonetheless both 
> forces are to work here and it has resulted in the ambiguities that we 
> are experiencing. If we think of eml as providing the information to 
> fully understand "a" dataset and not providing the information needed 
> to fully understand a research architecture and hierarchy then I think 
> we will be better off because that's where the roots of EML reside.
>
> In 15 years or so of doing this stuff it is invariable that people 
> will interpret dataset (data set) differently and thus essential not 
> that it be defined but that it can be defined within EML. I think 
> we've given adequate latitude here for "dataset" here.  It's datasets 
> that we want to preserve with metadata and be able to deconstruct and 
> understand. If your dataset definition is a normalized set of data 
> collections under the same or different experimental design then you 
> may have need for multiple projects and protocols.
>
> In another context there is a need for multiple project descriptions - 
> those metadata germain to the dataset and those abstract to the 
> dataset. An LTER project may fit into either category depending on 
> whether experimental design is defined as the project or the project 
> is driven by generalized hypotheses that are then addressed by 
> multiple experimental designs. We discussed this a lot at Sevilleta 
> EML 2002 and I thought we had it covered. So the point is I think it's 
> a mistake to take sampling out of project.
>
> I also think it's a mistake to create method - sorry Peter. Somebody 
> please remind me why protocol is standalone and imports dataset in the 
> first place?
>
> Thread tangled,
>
> James
>
> Scott Chapal wrote:
>
>> David,
>>
>> Good analysis of the ambiguities in our thinking about EML.
>>
>> David Blankman <dblankman at lternet.edu> writes:
>>
>>
>>> Tim et al,
>>>
>>> I understand your issues about where do I put what (in dataset? , in
>>> project?  in dataTable?)
>>>
>>>
>>> The following is not meant to imply a need to change the EML model,
>>> but to comment on some of the issues that Matt and Tim raise. The
>>> Project module has been a source of confusion from its inception (if
>>> I remember correctly Project started out at a level equal to that of
>>> dataset). The confusion stems in part from the fuzziness of the
>>> concept "Project" and to a certain extent "Dataset".:
>>
>>
>>
>> For the sake of the clarity of the discussion, and for the eventual
>> clarity in EML itself, I think all terms such as these need to be
>> formally defined.  This 'vocabulary' should be part of the
>> 'specification' that we have agreed is important for EML 2 to have.
>>
>> Lets agree to define all ambiguous terms and put them in the
>> specification, immediately.
>>
>> The definition of these terms, and the relationships they inhabit in
>> EML 2 need to be architected from a 'Generic' perspective.  That is
>> from a ecological research "requirements" view, rather than from any
>> particular example(s) at existing sites.  There will be no consensus
>> achievable if we start from existing site architectures.  The
>> challenge is to create EML in such a way that it is mappable to the
>> broad range of information management techniques used at all the
>> different LTER sites, and all the other sites doing ecological
>> research -- and to do that without creating confusion for those who
>> will be using EML.
>>
>> Also EML needs to be architected for evolution.  At my site, we have
>> pretty clear notions of what a project and dataset are.  But we are
>> now challenged to map our preconceptions to the EML model and
>> vocabulary.  Our projects can change configuration over time; or a
>> single dataset can become a project, eventually.  But the details
>> employed at my site, or any site, should not be design criteria for
>> the goals of EML, in my opinion.
>>
>>
>>> 1. The scope of "project" is variable and indeterminate.
>>
>>
>>
>> But it wouldn't be if everyone understood and agreed on the definition
>> of the term as used in EML.
>>
>>
>>> 2. The fact that project is contained within dataset adds to the
>>> confusion, since normally we would view a dataset as part of a
>>> project. ["Contained within" may not be technically correct in
>>> XML Schema terms but it is the way that most of us would describe
>>> it].
>>
>>
>>
>> The project element contained within dataset, points to a
>> proj:ResearchProjectType (complexType) representation of the
>> researchProject.  Through the ID mechanism it can be normalized in the
>> instance document.  So it's not really contained within, it's "pointed
>> to" although it could be redundantly repeated in the instance if you
>> want.
>>
>>
>>> 3. The change in the packaging concept from triples to containment
>>> makes some > things clearer but makes the question of "what goes
>>> where" more important. .
>>
>>
>>
>>> 4. For some, perhaps many, of the LTER information managers, the
>>> term > "dataset" is is used to describe what is represented in EML
>>> as "dataTable", > while the term "project" is used to describe what
>>> is represented in EML as > "dataset".
>>
>>
>>
>> This apparent ambiguity contrasts with the EML documentation for
>> eml-dataset:
>>
>> "The eml-dataset module contains general information that describes
>> dataset resources. It is intended to provide overview information
>> about the dataset, including title, abstract, keywords, contacts, and
>> the links to associated metadata for the given resource. It also
>> describes the temporal, geographic, and taxonomic coverage of the
>> overall dataset. A dataset can be (and often is) composed of a series
>> of data entities (tables) that are linked together by particular
>> integrity constraints."
>>
>> This is not using dataset in the way that David is thinking about
>> 'dataset'.  This might be because a lot of data in ecological research
>> are not normalized, it's just one big honkin' table-o-stuff.  If there
>> is disagreement about the use of 'project', 'dataset' and 'datatable'
>> as currently defined in EML, then it needs to be addressed: either
>> change the terms or make the definitions clearer.  If this issue is
>> not addressed, the success of EML will be affected.  If the
>> information managers themselves are not able to conceptualize and
>> agree on these terms and relationships, then how can we expect others
>> to?  Terms in EML need to be precise, and if that forces us to be
>> clear, that is a good thing IMHO.
>>
>>
>>> I would suggest that  in Tim's example, "
>>>
>>> "For instance, at KBS, our > mainsite layout is a randomized
>>> complete block agricultural experiment, > installed in 1986.
>>> That's a project, with a method, and created no data > per se.  All
>>> of our main datasets, however, each with their own sampling >
>>> techniques (methods/protocols) implicitly rely on the project
>>> method."
>>>
>>> This information would logically go in
>>> Project/designDescription/paragraph (or whatever paragraph
>>> becomes). Perhaps the designDescription module should also be able
>>> to reference a resource-level protocol, although that may be
>>> possible already through the use of references.
>>
>>
>>
>> I don't think so in its current form.  This is an example of what I
>> was getting at with 'Granularity of Repeatable Content'
>>
>>
>>> If I understand the distinction Peter makes between protocol and
>>> method, then anything at the project level would be a protocol.
>>
>>
>>
>>> Specific methods then belong at the dataTable level. Since "project"
>>> is optional, dataTable then needs to be able to include protocols as
>>> well as methods.
>>
>>
>>
>>> While I agree with Matt on trying to minimize the places where
>>> elements reside, the needs of site-based programs may be different
>>> from those of individual ecologists. An LTER site may need/want to
>>> have a richer set of items at the project level than an individual
>>> ecologist might need. Over time, I can foresee variations in Morpho
>>> configurations (or other EML tools) that might give users different
>>> recommended eml subsets in the same way that Quickbooks does for
>>> helping businesses choose account configurations that are relevant
>>> to their business, e.g. service vs retail vs manufacturer.
>>
>>
>>
>> EML should be able to handle the complexity of a huge project, but
>> should employ graceful degradation, so that it is useful for
>> documenting a simple dataset.  Or was that datatable?  :)
>
>
>