[Fwd: [Fwd: Re: FW: Report from Metacat Harvester: Wed Aug 25 11:00:36 MDT 2004]]
James W Brunt
jbrunt at lternet.edu
Mon Aug 30 14:57:15 PDT 2004
Peter, et. al,
Mark's email to me (below) has reinforced my own conclusion about the
id, system, references question. There at least 2 possibly 3 issues
(bugs if you will) here to be dealt with:
1. The eml normative documentation needs to reflect the real intent and
use of the system attribute. Read (Can O Worms). Options as I see them:
a. deprecate the system attribute until it can be better defined -
ignore 2 and 3 below (Mark goes even further on this one below).
b. clearly define the system attribute and make the changes in 2 and 3
below.
2. <references> tag needs to be made system/scope aware
3. EMLparser needs to enforce the final outcome of 1 and 2.
Currently, the documentation introduces system but it's definition does
not supercede the unique ID requirement within a document, references
is not system aware, EMLparser is enforcing exactly what the
documentation says.
Turning off the ID checking as Peter has suggested (different thread)
would result in uninterpretable EML documents were the references tag
to be used (Although, in all but one case in the example below there
were no references to the IDs). I don't see this as an intermediate
solution.
The intent as I remember all that long discussion ago was to create a
way to get around having to completely duplicate content in a document.
Thus creating a more compact document and one that would be more
easily maintained for someone not generating the documents from a
database. I'm sure I can be clarified some here by others that were
present. I realize the difficulty in tracking a document ID map for
every document you automatically generate however I really don't
understand why you wouldn't completely duplicate the content. However,
the inclusion of a second qualifying attribute that has to be checked
for every id tag is doable but before we begin something like this it
must be clearly spelled-out and agreeable to the group(s). We'd like to
hear from eml-dev, eml-bestpractices, and im as well as individual
stakeholders.
Thanks,
James
--
James W. Brunt
Associate Director for Information Management
Long Term Ecological Research Network Office
Department of Biology
University of New Mexico
Albuquerque, NM 87131-1091
505 272 7085
jbrunt at lternet.edu
-------- Original Message --------
From: Mark Servilla <servilla at lternet.edu>
To: James Brunt <jbrunt at lternet.edu>
Subject: [Fwd: Re: FW: Report from Metacat Harvester: Wed Aug 25
11:00:36 MDT 2004]
James,
After reviewing the EML specification documents, it appears to me that
duplicate IDs within a single instance document is not valid EML, and
therefore (IMHO), the EML Parser is behaving correctly. I cannot see
how setting either the SYSTEM or SCOPE attribute can be used by the
REFERENCES element to distinguish duplicate IDs within a single document
(perhaps someone in eml-dev can help answer how SYSTEM/SCOPE are used in
this context).
Some possible solutions are:
(1) Deprecate SYSTEM/SCOPE attributes in this context, update the
specification to reflect such change, and do not allow duplicate IDs.
(2) Modify the specification to allow SYSTEM/SCOPE to narrow the ID
scope, thereby allowing duplicate IDs when qualified by either
SYSTEM/SCOPE -- and, modify the specification for REFERENCES to make use
of such change.
(3) Deprecate REFERENCES completely and force repeated content.
Just my thoughts - thanks!
Mark
-------- Original Message --------
Subject: Re: FW: Report from Metacat Harvester: Wed Aug 25 11:00:36 MDT
2004
Date: Mon, 30 Aug 2004 09:26:13 -0600
From: Mark Servilla <servilla at lternet.edu>
To: 'Corinna Gries' <corinna at asu.edu>
CC: James Brunt <jbrunt at lternet.edu>, Duane Costa <dcosta at lternet.edu>
References: <E1C0TNQ-00066I-00 at lternet.lternet.edu>
Hi Corinna,
I have been discussing this issue of ID attributes with James and Duane
here at LNO. Please correct me if I am wrong, but the section on
Reusable Content (below or
http://knb.ecoinformatics.org/software/eml/eml-2.0.1/index.html#reusableContent)
states that "two identical ids cannot exist in a single document". It
appears that the "SYSTEM" attribute only allows identical ids in
multiple documents within the system (that is, only if the repeated ids
reference the exact same object) - something like globalizing the id'ed
object to the system for repeated reference in one or more documents,
but not necessarily allowing identical ids within a single document by
changing the SYSTEM attribute value. I am not really sure how one would
take advantage of the SYSTEM attribute for reusable content. And, I
don't know the provenance of this particular issue (the documentation
could certainly be more clear), but if we were to follow the
documentation as we interpret, would this still be a bug in the
Harvester/Metacat software?
Sincerely,
Mark
3.3. Reusable Content
EML allows the reuse of previously defined structured content (DOM
sub-trees) through the use of key/keyRef type references. In order for
an EML package to remain cohesive and to allow for the cross platform
compatability of packages, the following rules with respect to packaging
must be followed.
1. An ID is required on the eml root element.
2. IDs are optional on all other elements.
3. If an ID is not provided, that content must be interpreted as
representing a distinct object.
4. If an ID is provided for content then that content is distinct from
all other content except for that content that references its ID.
5. If a user wants to reuse content to indicate the repetition of an
object, a reference must be used. Two identical ids cannot exist in a
single document.
6. "Document" scope is defined as identifiers unique only to a single
instance document (if a document does not have a system attribute or if
scope is set to 'document' then all IDs are defined as distinct content).
7. "System" scope is defined as identifiers unique to an entire data
management system (if two documents share a system string, then any IDs
in those two documents that are identical refer to the same object).
8. If an element references another element, it must not have an ID itself.
9. All EML packages must have the 'eml' module as the root.
10. The system and scope attribute are always optional except for at the
'eml' module where the scope attribute is fixed as 'system'. The scope
attribute defaults to 'document' for all other modules.
Duane Costa wrote:
> Could anyone comment as to whether the EML error reported by Metacat below
> is a genuine EML error versus a bug in Metacat or the EML validator program?
> The issue is whether the id value for <dataset> must be unique from the id
> value for <creator>.
>
> Thanks,
> Duane
>
> -----Original Message-----
> From: Corinna Gries [mailto:corinna at asu.edu]
> Sent: Thursday, August 26, 2004 3:48 PM
> To: dcosta at lternet.edu
> Subject: RE: Report from Metacat Harvester: Wed Aug 25 11:00:36 MDT 2004
>
> Hi Duane,
>
> I am trying to fix these problems with our eml files. Some are easy
> because they are actual errors in our files, but there is one where I
> wonder if the ID checking is right. I understood IDs should be unique
> within the system, that is for example:
>
> <dataset id="30" system="ces_dataset"> ... Is different from
> <creator id="30" system="ces_party"> ....
>
> However, your harvester complains that they are the same:
>
> ************************************************************************
> *****
> *
> * METACAT HARVESTER REPORT: Wed Aug 25 11:00:36 MDT 2004
> *
> * A TOTAL OF 22 ERRORS WERE DETECTED.
> * Please see the log entries below for additonal details.
> *
> ************************************************************************
> *****
> ************************************************************************
> *****
> *
> * harvestLogID: 5549
> * harvestDate: Wed Aug 25 11:00:36 MDT 2004
> * status: 1
> * message:
> * harvestOperationCode: InsertDocError
> * description: Error inserting EML document to Metacat
> * detailLogID: 383
> * errorMessage: MetacatException: <?xml version="1.0"?>
> <error>
> Error running xpath expression:
> //dateTimeDomain|//nonNumericDomain|//numericDomain|//access|//attribute
> List|//constraint|//coverage|//temporalCoverage|//geographicCoverage|//t
> axonomicCoverage|/dataset|/eml/dataset|//dataSource|//dataTable|//otherE
> ntity|//citation|//address|//conferenceLocation|//party|//originator|//c
> reator|//contact|//publisher|//editor|//recipient|//performer|//institut
> ion|//metadataProvider|//associatedParty|//personnel|//physical|//connec
> tionDefinition|//distribution|//researchProject|//project|//relatedProje
> ct|//software|//spatialRaster|//spatialReference|//spatialVector|//store
> dProcedure|//view|//protocol|//additionalMetadata : Error in xml
> document. This EML document is not valid because the id 30 occurs more
> than once. IDs must be unique. </error>
>
> * scope: ces_dataset
> * identifier: 30
> * revision: 1
> * documentType: eml://ecoinformatics.org/eml-2.0.0
> * documentURL:
> http://seinet.asu.edu/DataCatalog/getXanthoriaRecord.jsp?source=ces_data
> set_mohave&id=30
> *
> ************************************************************************
> *****
>
> What do you think?
>
> Corinna
>
> _______________________________________________
> eml-dev mailing list
> eml-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/eml-dev
--
Mark Servilla, Ph.D.
LTER Network Office
Department of Biology
MSC 03 2020
1 University of New Mexico
Albuquerque, NM 87131-0001
servilla at lternet.edu
Office (505) 277-2619
Cell (505) 453-8593
--
Mark Servilla, Ph.D.
LTER Network Office
Department of Biology
MSC 03 2020
1 University of New Mexico
Albuquerque, NM 87131-0001
servilla at lternet.edu
Office (505) 277-2619
Cell (505) 453-8593
--
James W. Brunt
Associate Director for Information Management
Long Term Ecological Research Network Office
Department of Biology
University of New Mexico
Albuquerque, NM 87131-1091
505 272 7085
jbrunt at lternet.edu
More information about the Eml-dev
mailing list