[Fwd: protocol, methods, and project]

Thu Aug 29 07:11:25 PDT 2002

David,

Good analysis of the ambiguities in our thinking about EML.

David Blankman <dblankman at lternet.edu> writes:

> Tim et al,
> 
> I understand your issues about where do I put what (in dataset? , in
> project?  in dataTable?)
> 
> 
> The following is not meant to imply a need to change the EML model,
> but to comment on some of the issues that Matt and Tim raise. The
> Project module has been a source of confusion from its inception (if
> I remember correctly Project started out at a level equal to that of
> dataset). The confusion stems in part from the fuzziness of the
> concept "Project" and to a certain extent "Dataset".:

For the sake of the clarity of the discussion, and for the eventual
clarity in EML itself, I think all terms such as these need to be
formally defined.  This 'vocabulary' should be part of the
'specification' that we have agreed is important for EML 2 to have.

Lets agree to define all ambiguous terms and put them in the
specification, immediately.

The definition of these terms, and the relationships they inhabit in
EML 2 need to be architected from a 'Generic' perspective.  That is
from a ecological research "requirements" view, rather than from any
particular example(s) at existing sites.  There will be no consensus
achievable if we start from existing site architectures.  The
challenge is to create EML in such a way that it is mappable to the
broad range of information management techniques used at all the
different LTER sites, and all the other sites doing ecological
research -- and to do that without creating confusion for those who
will be using EML.

Also EML needs to be architected for evolution.  At my site, we have
pretty clear notions of what a project and dataset are.  But we are
now challenged to map our preconceptions to the EML model and
vocabulary.  Our projects can change configuration over time; or a
single dataset can become a project, eventually.  But the details
employed at my site, or any site, should not be design criteria for
the goals of EML, in my opinion.

>  1. The scope of "project" is variable and indeterminate.

But it wouldn't be if everyone understood and agreed on the definition
of the term as used in EML.

>  2. The fact that project is contained within dataset adds to the
>  confusion, since normally we would view a dataset as part of a
>  project. ["Contained within" may not be technically correct in
>  XML Schema terms but it is the way that most of us would describe
>  it].

The project element contained within dataset, points to a
proj:ResearchProjectType (complexType) representation of the
researchProject.  Through the ID mechanism it can be normalized in the
instance document.  So it's not really contained within, it's "pointed
to" although it could be redundantly repeated in the instance if you
want.

>  3. The change in the packaging concept from triples to containment
>  makes some > things clearer but makes the question of "what goes
>  where" more important. .

>  4. For some, perhaps many, of the LTER information managers, the
>  term > "dataset" is is used to describe what is represented in EML
>  as "dataTable", > while the term "project" is used to describe what
>  is represented in EML as > "dataset".

This apparent ambiguity contrasts with the EML documentation for
eml-dataset:

"The eml-dataset module contains general information that describes
dataset resources. It is intended to provide overview information
about the dataset, including title, abstract, keywords, contacts, and
the links to associated metadata for the given resource. It also
describes the temporal, geographic, and taxonomic coverage of the
overall dataset. A dataset can be (and often is) composed of a series
of data entities (tables) that are linked together by particular
integrity constraints."

This is not using dataset in the way that David is thinking about
'dataset'.  This might be because a lot of data in ecological research
are not normalized, it's just one big honkin' table-o-stuff.  If there
is disagreement about the use of 'project', 'dataset' and 'datatable'
as currently defined in EML, then it needs to be addressed: either
change the terms or make the definitions clearer.  If this issue is
not addressed, the success of EML will be affected.  If the
information managers themselves are not able to conceptualize and
agree on these terms and relationships, then how can we expect others
to?  Terms in EML need to be precise, and if that forces us to be
clear, that is a good thing IMHO.

> I would suggest that  in Tim's example, "
> 
>  "For instance, at KBS, our > mainsite layout is a randomized
>  complete block agricultural experiment, > installed in 1986.
>  That's a project, with a method, and created no data > per se.  All
>  of our main datasets, however, each with their own sampling >
>  techniques (methods/protocols) implicitly rely on the project
>  method."
> 
> This information would logically go in
> Project/designDescription/paragraph (or whatever paragraph
> becomes). Perhaps the designDescription module should also be able
> to reference a resource-level protocol, although that may be
> possible already through the use of references.

I don't think so in its current form.  This is an example of what I
was getting at with 'Granularity of Repeatable Content'

> If I understand the distinction Peter makes between protocol and
> method, then anything at the project level would be a protocol.

> Specific methods then belong at the dataTable level. Since "project"
> is optional, dataTable then needs to be able to include protocols as
> well as methods.

> While I agree with Matt on trying to minimize the places where
> elements reside, the needs of site-based programs may be different
> from those of individual ecologists. An LTER site may need/want to
> have a richer set of items at the project level than an individual
> ecologist might need. Over time, I can foresee variations in Morpho
> configurations (or other EML tools) that might give users different
> recommended eml subsets in the same way that Quickbooks does for
> helping businesses choose account configurations that are relevant
> to their business, e.g. service vs retail vs manufacturer.

EML should be able to handle the complexity of a huge project, but
should employ graceful degradation, so that it is useful for
documenting a simple dataset.  Or was that datatable?  :)
-- 
Scott E. Chapal_________________________________________________
Database & Network Manager             scott.chapal at jonesctr.org
J.W. Jones Ecological Research Center          229.734.4706 x227
Rt. 2. Box. 2324. Newton, GA 31770-9651        229.734.6650 :FAX