[Fwd: protocol, methods, and project]
David Blankman
dblankman1 at comcast.net
Thu Aug 29 09:13:17 PDT 2002
James,
Protocol has been made into a stand alone "resource" so that protocol
libraries such as the one that Bill Mitchener is developing can be
published and referenced much as a literature citation.
David
James W Brunt wrote:
> I agree with all of the comments and suggestions in this thread which
> only further demonstrates the ambiguities. There are two conceptual
> bubbles at work here - we started out with EML to describe a dataset -
> this was and still is the intent in some of our minds. Meanwhile, we
> all see the usefulness of EML as a general tool for the
> interoperability of ecological information systems. Nonetheless both
> forces are to work here and it has resulted in the ambiguities that we
> are experiencing. If we think of eml as providing the information to
> fully understand "a" dataset and not providing the information needed
> to fully understand a research architecture and hierarchy then I think
> we will be better off because that's where the roots of EML reside.
>
> In 15 years or so of doing this stuff it is invariable that people
> will interpret dataset (data set) differently and thus essential not
> that it be defined but that it can be defined within EML. I think
> we've given adequate latitude here for "dataset" here. It's datasets
> that we want to preserve with metadata and be able to deconstruct and
> understand. If your dataset definition is a normalized set of data
> collections under the same or different experimental design then you
> may have need for multiple projects and protocols.
>
> In another context there is a need for multiple project descriptions -
> those metadata germain to the dataset and those abstract to the
> dataset. An LTER project may fit into either category depending on
> whether experimental design is defined as the project or the project
> is driven by generalized hypotheses that are then addressed by
> multiple experimental designs. We discussed this a lot at Sevilleta
> EML 2002 and I thought we had it covered. So the point is I think it's
> a mistake to take sampling out of project.
>
> I also think it's a mistake to create method - sorry Peter. Somebody
> please remind me why protocol is standalone and imports dataset in the
> first place?
>
> Thread tangled,
>
> James
>
> Scott Chapal wrote:
>
>> David,
>>
>> Good analysis of the ambiguities in our thinking about EML.
>>
>> David Blankman <dblankman at lternet.edu> writes:
>>
>>
>>> Tim et al,
>>>
>>> I understand your issues about where do I put what (in dataset? , in
>>> project? in dataTable?)
>>>
>>>
>>> The following is not meant to imply a need to change the EML model,
>>> but to comment on some of the issues that Matt and Tim raise. The
>>> Project module has been a source of confusion from its inception (if
>>> I remember correctly Project started out at a level equal to that of
>>> dataset). The confusion stems in part from the fuzziness of the
>>> concept "Project" and to a certain extent "Dataset".:
>>
>>
>>
>> For the sake of the clarity of the discussion, and for the eventual
>> clarity in EML itself, I think all terms such as these need to be
>> formally defined. This 'vocabulary' should be part of the
>> 'specification' that we have agreed is important for EML 2 to have.
>>
>> Lets agree to define all ambiguous terms and put them in the
>> specification, immediately.
>>
>> The definition of these terms, and the relationships they inhabit in
>> EML 2 need to be architected from a 'Generic' perspective. That is
>> from a ecological research "requirements" view, rather than from any
>> particular example(s) at existing sites. There will be no consensus
>> achievable if we start from existing site architectures. The
>> challenge is to create EML in such a way that it is mappable to the
>> broad range of information management techniques used at all the
>> different LTER sites, and all the other sites doing ecological
>> research -- and to do that without creating confusion for those who
>> will be using EML.
>>
>> Also EML needs to be architected for evolution. At my site, we have
>> pretty clear notions of what a project and dataset are. But we are
>> now challenged to map our preconceptions to the EML model and
>> vocabulary. Our projects can change configuration over time; or a
>> single dataset can become a project, eventually. But the details
>> employed at my site, or any site, should not be design criteria for
>> the goals of EML, in my opinion.
>>
>>
>>> 1. The scope of "project" is variable and indeterminate.
>>
>>
>>
>> But it wouldn't be if everyone understood and agreed on the definition
>> of the term as used in EML.
>>
>>
>>> 2. The fact that project is contained within dataset adds to the
>>> confusion, since normally we would view a dataset as part of a
>>> project. ["Contained within" may not be technically correct in
>>> XML Schema terms but it is the way that most of us would describe
>>> it].
>>
>>
>>
>> The project element contained within dataset, points to a
>> proj:ResearchProjectType (complexType) representation of the
>> researchProject. Through the ID mechanism it can be normalized in the
>> instance document. So it's not really contained within, it's "pointed
>> to" although it could be redundantly repeated in the instance if you
>> want.
>>
>>
>>> 3. The change in the packaging concept from triples to containment
>>> makes some > things clearer but makes the question of "what goes
>>> where" more important. .
>>
>>
>>
>>> 4. For some, perhaps many, of the LTER information managers, the
>>> term > "dataset" is is used to describe what is represented in EML
>>> as "dataTable", > while the term "project" is used to describe what
>>> is represented in EML as > "dataset".
>>
>>
>>
>> This apparent ambiguity contrasts with the EML documentation for
>> eml-dataset:
>>
>> "The eml-dataset module contains general information that describes
>> dataset resources. It is intended to provide overview information
>> about the dataset, including title, abstract, keywords, contacts, and
>> the links to associated metadata for the given resource. It also
>> describes the temporal, geographic, and taxonomic coverage of the
>> overall dataset. A dataset can be (and often is) composed of a series
>> of data entities (tables) that are linked together by particular
>> integrity constraints."
>>
>> This is not using dataset in the way that David is thinking about
>> 'dataset'. This might be because a lot of data in ecological research
>> are not normalized, it's just one big honkin' table-o-stuff. If there
>> is disagreement about the use of 'project', 'dataset' and 'datatable'
>> as currently defined in EML, then it needs to be addressed: either
>> change the terms or make the definitions clearer. If this issue is
>> not addressed, the success of EML will be affected. If the
>> information managers themselves are not able to conceptualize and
>> agree on these terms and relationships, then how can we expect others
>> to? Terms in EML need to be precise, and if that forces us to be
>> clear, that is a good thing IMHO.
>>
>>
>>> I would suggest that in Tim's example, "
>>>
>>> "For instance, at KBS, our > mainsite layout is a randomized
>>> complete block agricultural experiment, > installed in 1986.
>>> That's a project, with a method, and created no data > per se. All
>>> of our main datasets, however, each with their own sampling >
>>> techniques (methods/protocols) implicitly rely on the project
>>> method."
>>>
>>> This information would logically go in
>>> Project/designDescription/paragraph (or whatever paragraph
>>> becomes). Perhaps the designDescription module should also be able
>>> to reference a resource-level protocol, although that may be
>>> possible already through the use of references.
>>
>>
>>
>> I don't think so in its current form. This is an example of what I
>> was getting at with 'Granularity of Repeatable Content'
>>
>>
>>> If I understand the distinction Peter makes between protocol and
>>> method, then anything at the project level would be a protocol.
>>
>>
>>
>>> Specific methods then belong at the dataTable level. Since "project"
>>> is optional, dataTable then needs to be able to include protocols as
>>> well as methods.
>>
>>
>>
>>> While I agree with Matt on trying to minimize the places where
>>> elements reside, the needs of site-based programs may be different
>>> from those of individual ecologists. An LTER site may need/want to
>>> have a richer set of items at the project level than an individual
>>> ecologist might need. Over time, I can foresee variations in Morpho
>>> configurations (or other EML tools) that might give users different
>>> recommended eml subsets in the same way that Quickbooks does for
>>> helping businesses choose account configurations that are relevant
>>> to their business, e.g. service vs retail vs manufacturer.
>>
>>
>>
>> EML should be able to handle the complexity of a huge project, but
>> should employ graceful degradation, so that it is useful for
>> documenting a simple dataset. Or was that datatable? :)
>
>
>
More information about the Eml-dev
mailing list