[LTER-im] [Fwd: [Fwd: Re: FW: Report from Metacat Harvester: Wed Aug 25 11:00:36 MDT 2004]]

Thu Sep 2 08:25:48 PDT 2004

That probably depends on whethere a 2.02 should only address this issue
(in which case I think a month could handle it) or more. I did not pay a
great deal of attention to the 2.01 process, so I don't know the
procedural details - did a branch or tag get created for 2.01 in cvs?
Bugzilla does not seem to have a version tag for 2.01 so how were bugs
related to that kept separate from other bugs? The few discussions ive
particpated in would indicate that a possible roadmap out there goes
something like this:

2.02 - support for scoping id's to system
	? Support for multiple authentication systems within eml-access.
Ive talked with matt about this, but I 	don't think there is a bug
entered yet.

2.1? - support for updatable, online dictionaries for enumerated content
(file format, connection schemas, 	units, projections, etc) -
similar to virus definition files.
	? New modules for resource types - we are working on an
eml-model candidate under our ITR grant and are 	about to send
out invitations for a meeting on that this fall.
3.0? - probably major restructuring to better support semantic
extensions...

Peter McCartney (peter.mccartney at asu.edu)
Center for Environmental-Studies
Arizona State University

> -----Original Message-----
> From: Mark Servilla [mailto:servilla at lternet.edu] 
> Sent: Wednesday, September 01, 2004 4:00 PM
> To: Peter McCartney
> Cc: Matt Jones; jbrunt at LTERnet.edu; 
> eml-dev at ecoinformatics.org; emlbestpractices at lternet.edu; 
> im at lternet.edu
> Subject: Re: [LTER-im] [Fwd: [Fwd: Re: FW: Report from 
> Metacat Harvester: Wed Aug 25 11:00:36 MDT 2004]]
> 
> 
> Matt/Peter,
> 
> Duane and I will evaluate the level of effort necessary for 
> the changes 
> to the EML-parser based on Peter's schema mods.  I hope to have a LOE 
> defined by next week.  Assuming it is not too great (and with 
> agreement 
> from our management), we will then enter the task into our 
> schedule.  In 
> addition, we would be glad to take a crack at reviewing/updating the 
> documentation.
> 
> What is (in your opinions) the overall urgency of this task 
> (i.e., what 
> would be a reasonable target date for EML-2.0.2)?
> --------------
> 
> Matt,
> 
> Would you please add both Duane and myself to the eml-cvs 
> list service.
> 
> Is the EML-parser within the Metacat cvs or a separate cvs?  If 
> separate, Duane will need update permission.
> 
> Thanks!
> 
> Sincerely,
> Mark
> 
> Peter McCartney wrote:
> 
> > I will.
> > On Tue, 2004-08-31 at 14:42, Matt Jones wrote:
> > 
> >>Yeah, I think there might be essentially full agreement on the right
> >>approach here -- minor differences maybe in what we 
> emphasize.  In the 
> >>interest of moving forward, is anyone willing to take the lead on 
> >>developing the schema changes and other changes needed for a 2.0.2 
> >>release that would deal Mark's #2 proposal?  They should be pretty 
> >>minor, but I'm feeling kind of swamped, and the 2.0.1 
> release was enough 
> >>of a burden that I'm not real excited to start right back 
> up on it given 
> >>other priorities.
> >>
> >>Matt
> >>
> >>Peter McCartney wrote:
> >>
> >>>Careful. i never said that to solve the example james just 
> described 
> >>>that one should take the first node. i said that in the case where 
> >>>you have duplicated content and have given both of them 
> the same id 
> >>>and system, you can take the first, or any, node and it doesn't 
> >>>matter. in the case of James's example, Mark's fix# 2 applies - i 
> >>>think we are all in agreement on that.
> >>>
> >>>The suggestion that we just don't include ids for things 
> we know are 
> >>>duplicating will of course solve the problem and that is probably 
> >>>what we will do for now. However, it has the unfortunate 
> side effect 
> >>>that it takes away our ability to maintain a relationship 
> within EML 
> >>>back to the original source content (because all of the content in 
> >>>our EML files is just a copy of the original record in our 
> database 
> >>>anyway). This is very useful when loading EML files into a 
> relational 
> >>>database through the xanthoria put method. But thats our problem...
> >>>
> >>>
> >>>On Tue, 2004-08-31 at 13:43, Matt Jones wrote:
> >>>
> >>>
> >>>>Hi James,
> >>>>
> >>>>Yes, that's exactly the problem.  Peter is proposing to 
> solve it by
> >>>>taking the *first* of the redundant trees. But, which is 
> first depends 
> >>>>on whether you traverse the document in breadth-first order or 
> >>>>depth-first order.  That, to me, is just asking for 
> trouble -- we'd be 
> >>>>asking people to remember to put the subtree they want 
> referenced in the 
> >>>>"depth-first" first node, which can change as the 
> structure of the tree 
> >>>>changes.  Hard to do and harder to maintain.
> >>>>
> >>>>Also, if we do it this way, we should probably check to 
> be sure that 
> >>>>two
> >>>>subtrees that have identical id's also have identical 
> content, which is 
> >>>>not a trivial programming task (assuming they are 
> identical could easily 
> >>>>lead to conflicting information).
> >>>>
> >>>>I would far prefer to keep the links unambiguous (ie, references 
> >>>>always
> >>>>can be resolved to one and only one id).  If someone 
> doesn't want to 
> >>>>deal with that stuff, they can always omit the ids and 
> just duplicate 
> >>>>the content, which is why we made the ids optional originally.
> >>>>
> >>>>Matt
> >>>>
> >>>>James W Brunt wrote:
> >>>>
> >>>>
> >>>>>Just a clarification...The specific error example we have been
> >>>>>discussing is concerning two identical ids with 
> different content...
> >>>>>
> >>>>><dataset id="30" system="ces_dataset"> ... Is different from 
> >>>>><creator id="30" system="ces_party"> ....
> >>>>>
> >>>>>Admittedly, were the content the same we would still get 
> the error 
> >>>>>(if
> >>>>>the parser is written to the spec). However, if there 
> were (in this case 
> >>>>>there wasn't) a
> >>>>>
> >>>>><references>30</references>
> >>>>>
> >>>>>it would be ambiguous. Correct?
> >>>>>
> >>>>>James
> >>>>>
> >>>>>Peter McCartney wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>On Tue, 2004-08-31 at 11:35, Matt Jones wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>>It would really help me justify the extra work involed in 
> >>>>>>>>managing ids and references if someone could give me 
> a concrete 
> >>>>>>>>example of why it would be bad to have a document contain two 
> >>>>>>>>elements with identical ids and identical content.
> >>>>>>>
> >>>>>>>
> >>>>>>>Like in other relational systems, The key (id) acts as a 
> >>>>>>>surrogate
> >>>>>>>for the content.  So, references should resolve to one 
> (and only one) 
> >>>>>>>id. It is far harder to validate that the content is 
> the same between 
> >>>>>>>two nodes with identical keys than it is to validate 
> that no key is 
> >>>>>>>duplicated.  I think they got this right in the 
> relational model, and 
> >>>>>>>we should follow that lead.  If you allow duplicate 
> ids, then I am 
> >>>>>>>sure this situation will arise:
> >>>>>>>
> >>>>>>><a id="1">foo</a>
> >>>>>>><a id="1">bar></a>
> >>>>>>><b><references>1</references></b>
> >>>>>>>
> >>>>>>>What is the value of <b>?  foo, or bar?  It is indeterminate.  
> >>>>>>>And
> >>>>>>>this is precisely why this is a problem.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>I agree this would be bad, but this is not what is 
> happening. The 
> >>>>>>documents that are being rejected have: <a id="1">foo</a>
> >>>>>><a id="1">foo></a>
> >>>>>>Typically, when this happens, the code is obviously not 
> bothering with
> >>>>>>references tags, so we aren't likely to create broken 
> or ambiguous
> >>>>>>reference tags. Even if we did throw in a
> >>>>>><b><references>1</references></b>, it really wouldn't 
> be a problem. In
> >>>>>>some of our files where attributes are repeated in view 
> entities, we are
> >>>>>>also getting this:
> >>>>>>
> >>>>>><a id="1">foo</a>
> >>>>>><a id="2">foo></a>
> >>>>>>
> >>>>>>but your parser hasn't spotted that one yet :) and again, even 
> >>>>>>though it violates the spec,  i would contend that this 
> causes no 
> >>>>>>problem.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>If my xpath returns one or several nodes and they
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>are all identical, why is it so bad to just assume 
> that the rule 
> >>>>>>>>is: "identical id (and system) means identical 
> content" and just 
> >>>>>>>>use the first one in the list?
> >>>>>>>
> >>>>>>>
> >>>>>>>Because relational models have shown that this never works.  I 
> >>>>>>>think
> >>>>>>>that such an assumption will result in lots of broken docs.
> >>>>>>>
> >>>>>>>I think it is no more work to write parsers to
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>check for differences between nodes ith similar ids 
> than it is 
> >>>>>>>>to check for duplicate ids in the first place, but it makes 
> >>>>>>>>generating valid eml a LOT simpler.
> >>>>>>>
> >>>>>>>
> >>>>>>>Generating valid eml with only one copy of a subtree 
> is easy -- 
> >>>>>>>just
> >>>>>>>track whether you've already inserted it, and reference it 
> >>>>>>>thereafter. I don't understand at all why this is 
> hard.  However, I 
> >>>>>>>do understand the problem with system not being 
> included in the 
> >>>>>>>assessment of the uniqueness of the ID.  So I like the idea of 
> >>>>>>>pursuing Mark's suggestion (2).
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>I also like pursuing 2 regardless of how we debate over 3 and 
> >>>>>>would support a hasty 2.02 to revise the spec documentation and 
> >>>>>>add an optional system attribute as such:
> >>>>>>
> >>>>>><references system="ces_dataset">201</references>.
> >>>>>>Keep in mind that when people hear you (or me, or 
> anyone...) say 
> >>>>>>"its not that hard" they are thinking "sure, if you 
> have a team of 
> >>>>>>Java programmers!"). So perhaps it would help to 
> provide some code 
> >>>>>>samples that can be adapted to the kind of approaches 
> people are 
> >>>>>>taking with more off-the-shelf tools so that people don't feel 
> >>>>>>like the only way to work with valid eml is to use one set of 
> >>>>>>tools from one shop. For example, the approach we take in 
> >>>>>>Xanthoria for converting from RDBMS to xml is actually a fairly 
> >>>>>>common one that appears in Cocoon, XML spy's RDPMS 
> mapping tool, 
> >>>>>>and many other vendor-specific DB->xml modules. 
> Specifically, the 
> >>>>>>rdbms content is exported to a generic, denormalized 
> xml and then 
> >>>>>>transformed with xsl to map to the desired schema. So for most 
> >>>>>>cases, the place where this tracking needs to be done 
> is likely to 
> >>>>>>be in XSL. While we have found it relatively easy when 
> parsing EML 
> >>>>>>in XSL to follow references to find the content, we have also 
> >>>>>>found that tracking things within xsls when writing out 
> eml to be 
> >>>>>>a cumbersome process, let alone making sure that each 
> time we do 
> >>>>>>it it is going to come out consistent. So if there is some xsl 
> >>>>>>sample that we can easily add to xanthoria style sheets 
> to solve 
> >>>>>>this problem, then thats cool. Otherwise, I really 
> think it would 
> >>>>>>be folly to hang too long on this when we (LTER that is) have 
> >>>>>>bigger fish to fry. Namely, building a better search 
> interface for 
> >>>>>>searching LTER data via eml. The query interface is what the CC 
> >>>>>>spent hours talking about in Fairbanks, so if we come back in 
> >>>>>>Miami with the ID problem solved but no improved query 
> system, I'd 
> >>>>>>prefer not be the one to give that powerpoint.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>Matt
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>Peter McCartney (peter.mccartney at asu.edu)
> >>>>>>>>Center for Environmental-Studies
> >>>>>>>>Arizona State University
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>-----Original Message-----
> >>>>>>>>>From: owner-im at lternet.edu [mailto:owner-im at lternet.edu] On 
> >>>>>>>>>Behalf Of James W Brunt
> >>>>>>>>>Sent: Monday, August 30, 2004 2:57 PM
> >>>>>>>>>To: eml-dev at ecoinformatics.org; emlbestpractices at lternet.edu;
> >>>>>>>>>im at lternet.edu
> >>>>>>>>>Subject: [LTER-im] [Fwd: [Fwd: Re: FW: Report from Metacat 
> >>>>>>>>>Harvester: Wed Aug 25 11:00:36 MDT 2004]]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>Peter,  et. al,
> >>>>>>>>>
> >>>>>>>>>Mark's email to me (below) has reinforced my own conclusion 
> >>>>>>>>>about the id, system, references question. There at least 2 
> >>>>>>>>>possibly 3 issues (bugs if you will) here to be dealt with:
> >>>>>>>>>
> >>>>>>>>>1. The eml normative documentation needs to reflect the real 
> >>>>>>>>>intent and use of the system attribute. Read (Can O Worms). 
> >>>>>>>>>Options as I see them:
> >>>>>>>>>   a. deprecate the system attribute until it can be better
> >>>>>>>>>defined - ignore 2 and 3 below (Mark goes even 
> further on this one 
> >>>>>>>>>below).
> >>>>>>>>>   b. clearly define the system attribute and make 
> the changes in 
> >>>>>>>>>2 and 3 below.
> >>>>>>>>>
> >>>>>>>>>2. <references> tag needs to be made system/scope aware
> >>>>>>>>>
> >>>>>>>>>3. EMLparser needs to enforce the final outcome of 1 and 2.
> >>>>>>>>>
> >>>>>>>>>Currently, the documentation introduces system but it's 
> >>>>>>>>>definition does not supercede the unique ID 
> requirement within 
> >>>>>>>>>a document, references is not system aware, EMLparser is 
> >>>>>>>>>enforcing exactly what the documentation says.
> >>>>>>>>>
> >>>>>>>>>Turning off the ID checking as Peter has suggested 
> (different 
> >>>>>>>>>thread) would  result in uninterpretable EML 
> documents were the 
> >>>>>>>>>references tag to be used (Although, in all but one 
> case in the 
> >>>>>>>>>example below there were no references to the IDs). 
> I don't see 
> >>>>>>>>>this as an intermediate solution.
> >>>>>>>>>
> >>>>>>>>>The intent as I remember all that long discussion ago was to 
> >>>>>>>>>create a way to get around having to completely duplicate 
> >>>>>>>>>content in a document. Thus creating a more compact document 
> >>>>>>>>>and one that would be more easily maintained for someone not 
> >>>>>>>>>generating the documents
> >>>>>>>>
> >>>>>>>>>from a database. I'm sure I can be clarified some here by 
> >>>>>>>>>others
> >>>>>>>>
> >>>>>>>>>that were present. I realize the difficulty in tracking a 
> >>>>>>>>>document
> >>>>>>>>>ID map for every document you automatically generate 
> however I 
> >>>>>>>>>really don't understand why you wouldn't completely 
> duplicate the 
> >>>>>>>>>content. However, the inclusion of a second 
> qualifying attribute 
> >>>>>>>>>that has to be checked for every id tag is doable 
> but before we 
> >>>>>>>>>begin something like this it must be clearly spelled-out and 
> >>>>>>>>>agreeable to the group(s). We'd like to hear from eml-dev, 
> >>>>>>>>>eml-bestpractices, and im as well as individual stakeholders.
> >>>>>>>>>
> >>>>>>>>>Thanks,
> >>>>>>>>>
> >>>>>>>>>James
> >>>>>>>>>
> >>>>>>>>>--
> >>>>>>>>>James W. Brunt
> >>>>>>>>>Associate Director for Information Management
> >>>>>>>>>Long Term Ecological Research Network Office
> >>>>>>>>>Department of Biology
> >>>>>>>>>University of New Mexico
> >>>>>>>>>Albuquerque, NM 87131-1091
> >>>>>>>>>505 272 7085
> >>>>>>>>>jbrunt at lternet.edu
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>-------- Original Message --------
> >>>>>>>>>From: Mark Servilla <servilla at lternet.edu>
> >>>>>>>>>To: James Brunt <jbrunt at lternet.edu>
> >>>>>>>>>Subject: [Fwd: Re: FW: Report from Metacat 
> Harvester: Wed Aug 
> >>>>>>>>>25 11:00:36 MDT 2004]
> >>>>>>>>>
> >>>>>>>>>James,
> >>>>>>>>>
> >>>>>>>>>After reviewing the EML specification documents, it 
> appears to 
> >>>>>>>>>me that duplicate IDs within a single instance 
> document is not 
> >>>>>>>>>valid EML, and therefore (IMHO), the EML Parser is behaving 
> >>>>>>>>>correctly.  I cannot see how setting either the 
> SYSTEM or SCOPE 
> >>>>>>>>>attribute can be used by the REFERENCES element to 
> distinguish 
> >>>>>>>>>duplicate IDs within a single document (perhaps someone in 
> >>>>>>>>>eml-dev can help answer how SYSTEM/SCOPE are used in this 
> >>>>>>>>>context).
> >>>>>>>>>
> >>>>>>>>>Some possible solutions are:
> >>>>>>>>>(1) Deprecate SYSTEM/SCOPE attributes in this 
> context, update 
> >>>>>>>>>the specification to reflect such change, and do not allow 
> >>>>>>>>>duplicate IDs.
> >>>>>>>>>(2) Modify the specification to allow SYSTEM/SCOPE to narrow 
> >>>>>>>>>the ID
> >>>>>>>>>scope, thereby allowing duplicate IDs when qualified 
> by either 
> >>>>>>>>>SYSTEM/SCOPE -- and, modify the specification for 
> REFERENCES to 
> >>>>>>>>>make use of such change.
> >>>>>>>>>(3) Deprecate REFERENCES completely and force 
> repeated content.
> >>>>>>>>>
> >>>>>>>>>Just my thoughts - thanks!
> >>>>>>>>>
> >>>>>>>>>Mark
> >>>>>>>>>
> >>>>>>>>>-------- Original Message --------
> >>>>>>>>>Subject: Re: FW: Report from Metacat Harvester: Wed Aug 25 
> >>>>>>>>>11:00:36 MDT 2004
> >>>>>>>>>Date: Mon, 30 Aug 2004 09:26:13 -0600
> >>>>>>>>>From: Mark Servilla <servilla at lternet.edu>
> >>>>>>>>>To: 'Corinna Gries' <corinna at asu.edu>
> >>>>>>>>>CC: James Brunt <jbrunt at lternet.edu>, Duane Costa 
> >>>>>>>>><dcosta at lternet.edu>
> >>>>>>>>>References: <E1C0TNQ-00066I-00 at lternet.lternet.edu>
> >>>>>>>>>
> >>>>>>>>>Hi Corinna,
> >>>>>>>>>
> >>>>>>>>>I have been discussing this issue of ID attributes 
> with James 
> >>>>>>>>>and Duane here at LNO.  Please correct me if I am wrong, but 
> >>>>>>>>>the section on Reusable Content (below or 
> >>>>>>>>>http://knb.ecoinformatics.org/software/eml/eml-2.0.1/
> index.htm
> >>>>>>>>>l#reusableContent)
> >>>>>>>>>states that "two identical ids cannot exist in a single 
> >>>>>>>>>document".
> >>>>>>>>>It appears that the "SYSTEM" attribute only allows 
> identical ids in 
> >>>>>>>>>multiple documents within the system (that is, only 
> if the repeated 
> >>>>>>>>>ids reference the exact same object) - something 
> like globalizing 
> >>>>>>>>>the id'ed object to the system for repeated 
> reference in one or 
> >>>>>>>>>more documents, but not necessarily allowing 
> identical ids within a 
> >>>>>>>>>single document by changing the SYSTEM attribute 
> value.  I am not 
> >>>>>>>>>really sure how one would take advantage of the 
> SYSTEM attribute 
> >>>>>>>>>for reusable content.  And, I don't know the 
> provenance of this 
> >>>>>>>>>particular issue (the documentation could certainly 
> be more clear), 
> >>>>>>>>>but if we were to follow the documentation as we 
> interpret, would 
> >>>>>>>>>this still be a bug in the Harvester/Metacat software?
> >>>>>>>>>
> >>>>>>>>>Sincerely,
> >>>>>>>>>Mark
> >>>>>>>>>
> >>>>>>>>>3.3. Reusable Content
> >>>>>>>>>EML allows the reuse of previously defined 
> structured content 
> >>>>>>>>>(DOM
> >>>>>>>>>sub-trees) through the use of key/keyRef type references. In
> >>>>>>>>>order for an EML package to remain cohesive and to 
> allow for the 
> >>>>>>>>>cross platform compatability of packages, the 
> following rules with 
> >>>>>>>>>respect to packaging must be followed. 1. An ID is 
> required on the 
> >>>>>>>>>eml root element. 2. IDs are optional on all other 
> elements. 3. If 
> >>>>>>>>>an ID is not provided, that content must be interpreted as 
> >>>>>>>>>representing a distinct object. 4. If an ID is 
> provided for content 
> >>>>>>>>>then that content is distinct 
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>from all other content except for that content that
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>references its ID. 5. If a user wants to reuse content to 
> >>>>>>>>>indicate
> >>>>>>>>>the repetition of an object, a reference must be used. Two 
> >>>>>>>>>identical ids cannot exist in a single document. 6. 
> "Document" 
> >>>>>>>>>scope is defined as identifiers unique only to a 
> single instance 
> >>>>>>>>>document (if a document does not have a system 
> attribute or if 
> >>>>>>>>>scope is set to 'document' then all IDs are defined 
> as distinct 
> >>>>>>>>>content). 7. "System" scope is defined as 
> identifiers unique to an 
> >>>>>>>>>entire data management system (if two documents 
> share a system 
> >>>>>>>>>string, then any IDs in those two documents that are 
> identical 
> >>>>>>>>>refer to the same object). 8. If an element 
> references another 
> >>>>>>>>>element, it must not have an ID itself. 9. All EML 
> packages must 
> >>>>>>>>>have the 'eml' module as the root. 10. The system and scope 
> >>>>>>>>>attribute are always optional except for at the 
> 'eml' module where 
> >>>>>>>>>the scope attribute is fixed as 'system'. The scope 
> attribute 
> >>>>>>>>>defaults to 'document' for all other modules.
> >>>>>>>>>
> >>>>>>>>>Duane Costa wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>Could anyone comment as to whether the EML error reported
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>by Metacat
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>below is a genuine EML error versus a bug in Metacat or the 
> >>>>>>>>>>EML validator program? The issue is whether the id 
> value for 
> >>>>>>>>>><dataset> must be unique from the id value for <creator>.
> >>>>>>>>>>
> >>>>>>>>>>Thanks,
> >>>>>>>>>>Duane
> >>>>>>>>>>
> >>>>>>>>>>-----Original Message-----
> >>>>>>>>>>From: Corinna Gries [mailto:corinna at asu.edu]
> >>>>>>>>>>Sent: Thursday, August 26, 2004 3:48 PM
> >>>>>>>>>>To: dcosta at lternet.edu
> >>>>>>>>>>Subject: RE: Report from Metacat Harvester: Wed Aug 25
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>11:00:36 MDT 2004
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>Hi Duane,
> >>>>>>>>>>
> >>>>>>>>>>I am trying to fix these problems with our eml 
> files. Some are 
> >>>>>>>>>>easy because they are actual errors in our files, 
> but there is
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>one where I
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>wonder if the ID checking is right. I understood IDs should
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>be unique
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>within the system, that is for example:
> >>>>>>>>>>
> >>>>>>>>>><dataset id="30" system="ces_dataset"> ... Is different
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>from <creator
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>id="30" system="ces_party"> ....
> >>>>>>>>>>
> >>>>>>>>>>However, your harvester complains that they are the same:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>*****************************************************
> **********
> >>>>>>>>>*******
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>**
> >>>>>>>>>>*****
> >>>>>>>>>>*
> >>>>>>>>>>* METACAT HARVESTER REPORT: Wed Aug 25 11:00:36 MDT 2004
> >>>>>>>>>>*
> >>>>>>>>>>* A TOTAL OF 22 ERRORS WERE DETECTED.
> >>>>>>>>>>* Please see the log entries below for additonal details.
> >>>>>>>>>>*
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>*****************************************************
> *********
> >>>>>>>>>**********
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>*****
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>*****************************************************
> *********
> >>>>>>>>>**********
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>*****
> >>>>>>>>>>*
> >>>>>>>>>>* harvestLogID:         5549
> >>>>>>>>>>* harvestDate:          Wed Aug 25 11:00:36 MDT 2004
> >>>>>>>>>>* status:               1
> >>>>>>>>>>* message:              * harvestOperationCode: 
> InsertDocError
> >>>>>>>>>>* description:          Error inserting EML 
> document to Metacat
> >>>>>>>>>>* detailLogID:          383
> >>>>>>>>>>* errorMessage:         MetacatException: <?xml 
> version="1.0"?>
> >>>>>>>>>><error>
> >>>>>>>>>>Error running xpath expression:
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>//dateTimeDomain|//nonNumericDomain|//numericDomain|/
> /access|/
> >>>>>>>>>/attribute
> >>>>>>>>>
> >>>>>>>>>List|//constraint|//coverage|//temporalCoverage|//geo
> graphicCov
> >>>>>>>>>List|erage|/
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>List|/t
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>axonomicCoverage|/dataset|/eml/dataset|//dataSource|/
> /dataTable
> >>>>>>>>>axonomicCoverage||//othe
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>axonomicCoverage|rE
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>ntity|//citation|//address|//conferenceLocation|//par
> ty|//origi
> >>>>>>>>>ntity|nator|/
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>ntity|/c
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>reator|//contact|//publisher|//editor|//recipient|//p
> erformer|/
> >>>>>>>>>reator|/instit
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>reator|ut
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>ion|//metadataProvider|//associatedParty|//personnel|
> //physical
> >>>>>>>>>ion||//conn
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>ion|ec
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>tionDefinition|//distribution|//researchProject|//pro
> ject|//rel
> >>>>>>>>>tionDefinition|atedPro
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>tionDefinition|je
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>ct|//software|//spatialRaster|//spatialReference|//sp
> atialVecto
> >>>>>>>>>ct|r|//sto
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>ct|re
> >>>>>>>>>>dProcedure|//view|//protocol|//additionalMetadata : 
> Error in 
> >>>>>>>>>>dProcedure|xml
> >>>>>>>>>>document.  This EML document is not valid because the id 30 
> >>>>>>>>>>occurs more than once.  IDs must be unique. </error>
> >>>>>>>>>>
> >>>>>>>>>>* scope:                ces_dataset
> >>>>>>>>>>* identifier:           30
> >>>>>>>>>>* revision:             1
> >>>>>>>>>>* documentType:         eml://ecoinformatics.org/eml-2.0.0
> >>>>>>>>>>* documentURL:
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>http://seinet.asu.edu/DataCatalog/getXanthoriaRecord.
> jsp?source
> >>>>>>>>>=ces_da
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>ta
> >>>>>>>>>>set_mohave&id=30
> >>>>>>>>>>*
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>*****************************************************
> *********
> >>>>>>>>>**********
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>*****
> >>>>>>>>>>
> >>>>>>>>>>What do you think?
> >>>>>>>>>>
> >>>>>>>>>>Corinna
> >>>>>>>>>>
> >>>>>>>>>>_______________________________________________
> >>>>>>>>>>eml-dev mailing list
> >>>>>>>>>>eml-dev at ecoinformatics.org
> >>>>>>>>>>http://www.ecoinformatics.org/mailman/listinfo/eml-dev
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>--
> >>>>>>>>>Mark Servilla, Ph.D.
> >>>>>>>>>
> >>>>>>>>>LTER Network Office
> >>>>>>>>>Department of Biology
> >>>>>>>>>MSC 03 2020
> >>>>>>>>>1 University of New Mexico
> >>>>>>>>>Albuquerque, NM 87131-0001
> >>>>>>>>>
> >>>>>>>>>servilla at lternet.edu
> >>>>>>>>>Office (505) 277-2619
> >>>>>>>>>Cell   (505) 453-8593
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>--
> >>>>>>>>>Mark Servilla, Ph.D.
> >>>>>>>>>
> >>>>>>>>>LTER Network Office
> >>>>>>>>>Department of Biology
> >>>>>>>>>MSC 03 2020
> >>>>>>>>>1 University of New Mexico
> >>>>>>>>>Albuquerque, NM 87131-0001
> >>>>>>>>>
> >>>>>>>>>servilla at lternet.edu
> >>>>>>>>>Office (505) 277-2619
> >>>>>>>>>Cell   (505) 453-8593
> >>>>>>>>>
> >>>>>>>>>--
> >>>>>>>>>James W. Brunt
> >>>>>>>>>Associate Director for Information Management
> >>>>>>>>>Long Term Ecological Research Network Office
> >>>>>>>>>Department of Biology
> >>>>>>>>>University of New Mexico
> >>>>>>>>>Albuquerque, NM 87131-1091
> >>>>>>>>>505 272 7085
> >>>>>>>>>jbrunt at lternet.edu
> >>>>>>>>>
> >>>>>>>>>-------------------------------------------------
> >>>>>>>>>Long-Term Ecological Research Network Mailing List 
> >>>>>>>>>im at LTERnet.edu 
> http://sql.lternet.edu/cgi/mailgroups_view.pl?> im
> >>>>>>>>>
> 
> >>>>>>>>
> 
> >>>>>>>>_______________________________________________
> >>>>>>>>eml-dev mailing list
> >>>>>>>>eml-dev at ecoinformatics.org
> >>>>>>>>http://www.ecoinformatics.org/mailman/listinfo/eml-dev
> 
> -- 
> Mark Servilla, Ph.D.
> 
> LTER Network Office
> Department of Biology
> MSC 03 2020
> 1 University of New Mexico
> Albuquerque, NM 87131-0001
> 
> servilla at lternet.edu
> Office (505) 277-2619
> Cell   (505) 453-8593
>