[LTER-im] [Fwd: [Fwd: Re: FW: Report from Metacat Harvester: Wed Aug 25 11:00:36 MDT 2004]]

Thu Sep 2 09:49:30 PDT 2004

Hi Peter,

Peter McCartney wrote:
> That probably depends on whethere a 2.02 should only address this issue
> (in which case I think a month could handle it) or more. I did not pay a
> great deal of attention to the 2.01 process, so I don't know the
> procedural details - did a branch or tag get created for 2.01 in cvs?New Development is done on the HEAD of cvs.  Then a tag is created to 
mark the files as released for a particular version (e.g., the tag 
RELEASE_EML_2_0_1 was the most recent).  If fixes need to happen to a 
release, then the tagged files are forked into a branch and patched -- 
so far we haven't needed to do that.

> Bugzilla does not seem to have a version tag for 2.01 so how were bugs
> related to that kept separate from other bugs? 

We create a target milestone for each release, so bugs are targeted at 
that.  If you 'Change columns' in your bugzilla display and add 'Target 
Milestone' and sort by that field then things make more sense.  Official 
'versions' get created when each release is made so that bugs can be 
filed against that version.  I usually create a tracker bug as the 
release nears that lists all of the odds and ends that need to be done 
to get the release out the door (e.g., see bug 1195 
http://bugzilla.ecoinformatics.org/show_bug.cgi?id=1195)

When bugs are initially filed, they default to the milestone 
'Unspecified'.  Someone then needs to choose a target milestone for that 
bug, at which point it enters the list of TODO's before the release gets 
released.  Obviously, due to various time constraints some bug targets 
are changed to a later target in order to release in a timely manner.

The few discussions ive
> particpated in would indicate that a possible roadmap out there goes
> something like this:
> 
> 2.02 - support for scoping id's to system
> 	? Support for multiple authentication systems within eml-access.
> Ive talked with matt about this, but I 	don't think there is a bug
> entered yet.
> 
> 2.1? - support for updatable, online dictionaries for enumerated content
> (file format, connection schemas, 	units, projections, etc) -
> similar to virus definition files.
> 	? New modules for resource types - we are working on an
> eml-model candidate under our ITR grant and are 	about to send
> out invitations for a meeting on that this fall.
> 3.0? - probably major restructuring to better support semantic
> extensions...
Sounds good to me, and is in line with the issues that I've seen 
discussed.  We have target milestones for each of those versions, but 
don't have bug descriptions for all of those tasks.  I'm not sure about 
what 'multiple authentication systems' would involve, but its certainly 
worth creating a bug and discussing what to do about it.

Cheers,
Matt

> 
> Peter McCartney (peter.mccartney at asu.edu)
> Center for Environmental-Studies
> Arizona State University
>  
> 
> 
> 
>>-----Original Message-----
>>From: Mark Servilla [mailto:servilla at lternet.edu] 
>>Sent: Wednesday, September 01, 2004 4:00 PM
>>To: Peter McCartney
>>Cc: Matt Jones; jbrunt at LTERnet.edu; 
>>eml-dev at ecoinformatics.org; emlbestpractices at lternet.edu; 
>>im at lternet.edu
>>Subject: Re: [LTER-im] [Fwd: [Fwd: Re: FW: Report from 
>>Metacat Harvester: Wed Aug 25 11:00:36 MDT 2004]]
>>
>>
>>Matt/Peter,
>>
>>Duane and I will evaluate the level of effort necessary for 
>>the changes 
>>to the EML-parser based on Peter's schema mods.  I hope to have a LOE 
>>defined by next week.  Assuming it is not too great (and with 
>>agreement 
>>from our management), we will then enter the task into our 
>>schedule.  In 
>>addition, we would be glad to take a crack at reviewing/updating the 
>>documentation.
>>
>>What is (in your opinions) the overall urgency of this task 
>>(i.e., what 
>>would be a reasonable target date for EML-2.0.2)?
>>--------------
>>
>>Matt,
>>
>>Would you please add both Duane and myself to the eml-cvs 
>>list service.
>>
>>Is the EML-parser within the Metacat cvs or a separate cvs?  If 
>>separate, Duane will need update permission.
>>
>>Thanks!
>>
>>Sincerely,
>>Mark
>>
>>Peter McCartney wrote:
>>
>>
>>>I will.
>>>On Tue, 2004-08-31 at 14:42, Matt Jones wrote:
>>>
>>>
>>>>Yeah, I think there might be essentially full agreement on the right
>>>>approach here -- minor differences maybe in what we 
>>
>>emphasize.  In the 
>>
>>>>interest of moving forward, is anyone willing to take the lead on 
>>>>developing the schema changes and other changes needed for a 2.0.2 
>>>>release that would deal Mark's #2 proposal?  They should be pretty 
>>>>minor, but I'm feeling kind of swamped, and the 2.0.1 
>>
>>release was enough 
>>
>>>>of a burden that I'm not real excited to start right back 
>>
>>up on it given 
>>
>>>>other priorities.
>>>>
>>>>Matt
>>>>
>>>>Peter McCartney wrote:
>>>>
>>>>
>>>>>Careful. i never said that to solve the example james just 
>>
>>described 
>>
>>>>>that one should take the first node. i said that in the case where 
>>>>>you have duplicated content and have given both of them 
>>
>>the same id 
>>
>>>>>and system, you can take the first, or any, node and it doesn't 
>>>>>matter. in the case of James's example, Mark's fix# 2 applies - i 
>>>>>think we are all in agreement on that.
>>>>>
>>>>>The suggestion that we just don't include ids for things 
>>
>>we know are 
>>
>>>>>duplicating will of course solve the problem and that is probably 
>>>>>what we will do for now. However, it has the unfortunate 
>>
>>side effect 
>>
>>>>>that it takes away our ability to maintain a relationship 
>>
>>within EML 
>>
>>>>>back to the original source content (because all of the content in 
>>>>>our EML files is just a copy of the original record in our 
>>
>>database 
>>
>>>>>anyway). This is very useful when loading EML files into a 
>>
>>relational 
>>
>>>>>database through the xanthoria put method. But thats our problem...
>>>>>
>>>>>
>>>>>On Tue, 2004-08-31 at 13:43, Matt Jones wrote:
>>>>>
>>>>>
>>>>>
>>>>>>Hi James,
>>>>>>
>>>>>>Yes, that's exactly the problem.  Peter is proposing to 
>>
>>solve it by
>>
>>>>>>taking the *first* of the redundant trees. But, which is 
>>
>>first depends 
>>
>>>>>>on whether you traverse the document in breadth-first order or 
>>>>>>depth-first order.  That, to me, is just asking for 
>>
>>trouble -- we'd be 
>>
>>>>>>asking people to remember to put the subtree they want 
>>
>>referenced in the 
>>
>>>>>>"depth-first" first node, which can change as the 
>>
>>structure of the tree 
>>
>>>>>>changes.  Hard to do and harder to maintain.
>>>>>>
>>>>>>Also, if we do it this way, we should probably check to 
>>
>>be sure that 
>>
>>>>>>two
>>>>>>subtrees that have identical id's also have identical 
>>
>>content, which is 
>>
>>>>>>not a trivial programming task (assuming they are 
>>
>>identical could easily 
>>
>>>>>>lead to conflicting information).
>>>>>>
>>>>>>I would far prefer to keep the links unambiguous (ie, references 
>>>>>>always
>>>>>>can be resolved to one and only one id).  If someone 
>>
>>doesn't want to 
>>
>>>>>>deal with that stuff, they can always omit the ids and 
>>
>>just duplicate 
>>
>>>>>>the content, which is why we made the ids optional originally.
>>>>>>
>>>>>>Matt
>>>>>>
>>>>>>James W Brunt wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Just a clarification...The specific error example we have been
>>>>>>>discussing is concerning two identical ids with 
>>
>>different content...
>>
>>>>>>><dataset id="30" system="ces_dataset"> ... Is different from 
>>>>>>><creator id="30" system="ces_party"> ....
>>>>>>>
>>>>>>>Admittedly, were the content the same we would still get 
>>
>>the error 
>>
>>>>>>>(if
>>>>>>>the parser is written to the spec). However, if there 
>>
>>were (in this case 
>>
>>>>>>>there wasn't) a
>>>>>>>
>>>>>>><references>30</references>
>>>>>>>
>>>>>>>it would be ambiguous. Correct?
>>>>>>>
>>>>>>>James
>>>>>>>
>>>>>>>Peter McCartney wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>On Tue, 2004-08-31 at 11:35, Matt Jones wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>It would really help me justify the extra work involed in 
>>>>>>>>>>managing ids and references if someone could give me 
>>
>>a concrete 
>>
>>>>>>>>>>example of why it would be bad to have a document contain two 
>>>>>>>>>>elements with identical ids and identical content.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>Like in other relational systems, The key (id) acts as a 
>>>>>>>>>surrogate
>>>>>>>>>for the content.  So, references should resolve to one 
>>
>>(and only one) 
>>
>>>>>>>>>id. It is far harder to validate that the content is 
>>
>>the same between 
>>
>>>>>>>>>two nodes with identical keys than it is to validate 
>>
>>that no key is 
>>
>>>>>>>>>duplicated.  I think they got this right in the 
>>
>>relational model, and 
>>
>>>>>>>>>we should follow that lead.  If you allow duplicate 
>>
>>ids, then I am 
>>
>>>>>>>>>sure this situation will arise:
>>>>>>>>>
>>>>>>>>><a id="1">foo</a>
>>>>>>>>><a id="1">bar></a>
>>>>>>>>><b><references>1</references></b>
>>>>>>>>>
>>>>>>>>>What is the value of <b>?  foo, or bar?  It is indeterminate.  
>>>>>>>>>And
>>>>>>>>>this is precisely why this is a problem.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>I agree this would be bad, but this is not what is 
>>
>>happening. The 
>>
>>>>>>>>documents that are being rejected have: <a id="1">foo</a>
>>>>>>>><a id="1">foo></a>
>>>>>>>>Typically, when this happens, the code is obviously not 
>>
>>bothering with
>>
>>>>>>>>references tags, so we aren't likely to create broken 
>>
>>or ambiguous
>>
>>>>>>>>reference tags. Even if we did throw in a
>>>>>>>><b><references>1</references></b>, it really wouldn't 
>>
>>be a problem. In
>>
>>>>>>>>some of our files where attributes are repeated in view 
>>
>>entities, we are
>>
>>>>>>>>also getting this:
>>>>>>>>
>>>>>>>><a id="1">foo</a>
>>>>>>>><a id="2">foo></a>
>>>>>>>>
>>>>>>>>but your parser hasn't spotted that one yet :) and again, even 
>>>>>>>>though it violates the spec,  i would contend that this 
>>
>>causes no 
>>
>>>>>>>>problem.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>If my xpath returns one or several nodes and they
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>are all identical, why is it so bad to just assume 
>>
>>that the rule 
>>
>>>>>>>>>>is: "identical id (and system) means identical 
>>
>>content" and just 
>>
>>>>>>>>>>use the first one in the list?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>Because relational models have shown that this never works.  I 
>>>>>>>>>think
>>>>>>>>>that such an assumption will result in lots of broken docs.
>>>>>>>>>
>>>>>>>>>I think it is no more work to write parsers to
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>check for differences between nodes ith similar ids 
>>
>>than it is 
>>
>>>>>>>>>>to check for duplicate ids in the first place, but it makes 
>>>>>>>>>>generating valid eml a LOT simpler.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>Generating valid eml with only one copy of a subtree 
>>
>>is easy -- 
>>
>>>>>>>>>just
>>>>>>>>>track whether you've already inserted it, and reference it 
>>>>>>>>>thereafter. I don't understand at all why this is 
>>
>>hard.  However, I 
>>
>>>>>>>>>do understand the problem with system not being 
>>
>>included in the 
>>
>>>>>>>>>assessment of the uniqueness of the ID.  So I like the idea of 
>>>>>>>>>pursuing Mark's suggestion (2).
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>I also like pursuing 2 regardless of how we debate over 3 and 
>>>>>>>>would support a hasty 2.02 to revise the spec documentation and 
>>>>>>>>add an optional system attribute as such:
>>>>>>>>
>>>>>>>><references system="ces_dataset">201</references>.
>>>>>>>>Keep in mind that when people hear you (or me, or 
>>
>>anyone...) say 
>>
>>>>>>>>"its not that hard" they are thinking "sure, if you 
>>
>>have a team of 
>>
>>>>>>>>Java programmers!"). So perhaps it would help to 
>>
>>provide some code 
>>
>>>>>>>>samples that can be adapted to the kind of approaches 
>>
>>people are 
>>
>>>>>>>>taking with more off-the-shelf tools so that people don't feel 
>>>>>>>>like the only way to work with valid eml is to use one set of 
>>>>>>>>tools from one shop. For example, the approach we take in 
>>>>>>>>Xanthoria for converting from RDBMS to xml is actually a fairly 
>>>>>>>>common one that appears in Cocoon, XML spy's RDPMS 
>>
>>mapping tool, 
>>
>>>>>>>>and many other vendor-specific DB->xml modules. 
>>
>>Specifically, the 
>>
>>>>>>>>rdbms content is exported to a generic, denormalized 
>>
>>xml and then 
>>
>>>>>>>>transformed with xsl to map to the desired schema. So for most 
>>>>>>>>cases, the place where this tracking needs to be done 
>>
>>is likely to 
>>
>>>>>>>>be in XSL. While we have found it relatively easy when 
>>
>>parsing EML 
>>
>>>>>>>>in XSL to follow references to find the content, we have also 
>>>>>>>>found that tracking things within xsls when writing out 
>>
>>eml to be 
>>
>>>>>>>>a cumbersome process, let alone making sure that each 
>>
>>time we do 
>>
>>>>>>>>it it is going to come out consistent. So if there is some xsl 
>>>>>>>>sample that we can easily add to xanthoria style sheets 
>>
>>to solve 
>>
>>>>>>>>this problem, then thats cool. Otherwise, I really 
>>
>>think it would 
>>
>>>>>>>>be folly to hang too long on this when we (LTER that is) have 
>>>>>>>>bigger fish to fry. Namely, building a better search 
>>
>>interface for 
>>
>>>>>>>>searching LTER data via eml. The query interface is what the CC 
>>>>>>>>spent hours talking about in Fairbanks, so if we come back in 
>>>>>>>>Miami with the ID problem solved but no improved query 
>>
>>system, I'd 
>>
>>>>>>>>prefer not be the one to give that powerpoint.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>Matt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>Peter McCartney (peter.mccartney at asu.edu)
>>>>>>>>>>Center for Environmental-Studies
>>>>>>>>>>Arizona State University
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>-----Original Message-----
>>>>>>>>>>>From: owner-im at lternet.edu [mailto:owner-im at lternet.edu] On 
>>>>>>>>>>>Behalf Of James W Brunt
>>>>>>>>>>>Sent: Monday, August 30, 2004 2:57 PM
>>>>>>>>>>>To: eml-dev at ecoinformatics.org; emlbestpractices at lternet.edu;
>>>>>>>>>>>im at lternet.edu
>>>>>>>>>>>Subject: [LTER-im] [Fwd: [Fwd: Re: FW: Report from Metacat 
>>>>>>>>>>>Harvester: Wed Aug 25 11:00:36 MDT 2004]]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>Peter,  et. al,
>>>>>>>>>>>
>>>>>>>>>>>Mark's email to me (below) has reinforced my own conclusion 
>>>>>>>>>>>about the id, system, references question. There at least 2 
>>>>>>>>>>>possibly 3 issues (bugs if you will) here to be dealt with:
>>>>>>>>>>>
>>>>>>>>>>>1. The eml normative documentation needs to reflect the real 
>>>>>>>>>>>intent and use of the system attribute. Read (Can O Worms). 
>>>>>>>>>>>Options as I see them:
>>>>>>>>>>>  a. deprecate the system attribute until it can be better
>>>>>>>>>>>defined - ignore 2 and 3 below (Mark goes even 
>>
>>further on this one 
>>
>>>>>>>>>>>below).
>>>>>>>>>>>  b. clearly define the system attribute and make 
>>
>>the changes in 
>>
>>>>>>>>>>>2 and 3 below.
>>>>>>>>>>>
>>>>>>>>>>>2. <references> tag needs to be made system/scope aware
>>>>>>>>>>>
>>>>>>>>>>>3. EMLparser needs to enforce the final outcome of 1 and 2.
>>>>>>>>>>>
>>>>>>>>>>>Currently, the documentation introduces system but it's 
>>>>>>>>>>>definition does not supercede the unique ID 
>>
>>requirement within 
>>
>>>>>>>>>>>a document, references is not system aware, EMLparser is 
>>>>>>>>>>>enforcing exactly what the documentation says.
>>>>>>>>>>>
>>>>>>>>>>>Turning off the ID checking as Peter has suggested 
>>
>>(different 
>>
>>>>>>>>>>>thread) would  result in uninterpretable EML 
>>
>>documents were the 
>>
>>>>>>>>>>>references tag to be used (Although, in all but one 
>>
>>case in the 
>>
>>>>>>>>>>>example below there were no references to the IDs). 
>>
>>I don't see 
>>
>>>>>>>>>>>this as an intermediate solution.
>>>>>>>>>>>
>>>>>>>>>>>The intent as I remember all that long discussion ago was to 
>>>>>>>>>>>create a way to get around having to completely duplicate 
>>>>>>>>>>>content in a document. Thus creating a more compact document 
>>>>>>>>>>>and one that would be more easily maintained for someone not 
>>>>>>>>>>>generating the documents
>>>>>>>>>>
>>>>>>>>>>>from a database. I'm sure I can be clarified some here by 
>>>>>>>>>>
>>>>>>>>>>>others
>>>>>>>>>>
>>>>>>>>>>>that were present. I realize the difficulty in tracking a 
>>>>>>>>>>>document
>>>>>>>>>>>ID map for every document you automatically generate 
>>
>>however I 
>>
>>>>>>>>>>>really don't understand why you wouldn't completely 
>>
>>duplicate the 
>>
>>>>>>>>>>>content. However, the inclusion of a second 
>>
>>qualifying attribute 
>>
>>>>>>>>>>>that has to be checked for every id tag is doable 
>>
>>but before we 
>>
>>>>>>>>>>>begin something like this it must be clearly spelled-out and 
>>>>>>>>>>>agreeable to the group(s). We'd like to hear from eml-dev, 
>>>>>>>>>>>eml-bestpractices, and im as well as individual stakeholders.
>>>>>>>>>>>
>>>>>>>>>>>Thanks,
>>>>>>>>>>>
>>>>>>>>>>>James
>>>>>>>>>>>
>>>>>>>>>>>--
>>>>>>>>>>>James W. Brunt
>>>>>>>>>>>Associate Director for Information Management
>>>>>>>>>>>Long Term Ecological Research Network Office
>>>>>>>>>>>Department of Biology
>>>>>>>>>>>University of New Mexico
>>>>>>>>>>>Albuquerque, NM 87131-1091
>>>>>>>>>>>505 272 7085
>>>>>>>>>>>jbrunt at lternet.edu
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>-------- Original Message --------
>>>>>>>>>>>From: Mark Servilla <servilla at lternet.edu>
>>>>>>>>>>>To: James Brunt <jbrunt at lternet.edu>
>>>>>>>>>>>Subject: [Fwd: Re: FW: Report from Metacat 
>>
>>Harvester: Wed Aug 
>>
>>>>>>>>>>>25 11:00:36 MDT 2004]
>>>>>>>>>>>
>>>>>>>>>>>James,
>>>>>>>>>>>
>>>>>>>>>>>After reviewing the EML specification documents, it 
>>
>>appears to 
>>
>>>>>>>>>>>me that duplicate IDs within a single instance 
>>
>>document is not 
>>
>>>>>>>>>>>valid EML, and therefore (IMHO), the EML Parser is behaving 
>>>>>>>>>>>correctly.  I cannot see how setting either the 
>>
>>SYSTEM or SCOPE 
>>
>>>>>>>>>>>attribute can be used by the REFERENCES element to 
>>
>>distinguish 
>>
>>>>>>>>>>>duplicate IDs within a single document (perhaps someone in 
>>>>>>>>>>>eml-dev can help answer how SYSTEM/SCOPE are used in this 
>>>>>>>>>>>context).
>>>>>>>>>>>
>>>>>>>>>>>Some possible solutions are:
>>>>>>>>>>>(1) Deprecate SYSTEM/SCOPE attributes in this 
>>
>>context, update 
>>
>>>>>>>>>>>the specification to reflect such change, and do not allow 
>>>>>>>>>>>duplicate IDs.
>>>>>>>>>>>(2) Modify the specification to allow SYSTEM/SCOPE to narrow 
>>>>>>>>>>>the ID
>>>>>>>>>>>scope, thereby allowing duplicate IDs when qualified 
>>
>>by either 
>>
>>>>>>>>>>>SYSTEM/SCOPE -- and, modify the specification for 
>>
>>REFERENCES to 
>>
>>>>>>>>>>>make use of such change.
>>>>>>>>>>>(3) Deprecate REFERENCES completely and force 
>>
>>repeated content.
>>
>>>>>>>>>>>Just my thoughts - thanks!
>>>>>>>>>>>
>>>>>>>>>>>Mark
>>>>>>>>>>>
>>>>>>>>>>>-------- Original Message --------
>>>>>>>>>>>Subject: Re: FW: Report from Metacat Harvester: Wed Aug 25 
>>>>>>>>>>>11:00:36 MDT 2004
>>>>>>>>>>>Date: Mon, 30 Aug 2004 09:26:13 -0600
>>>>>>>>>>>From: Mark Servilla <servilla at lternet.edu>
>>>>>>>>>>>To: 'Corinna Gries' <corinna at asu.edu>
>>>>>>>>>>>CC: James Brunt <jbrunt at lternet.edu>, Duane Costa 
>>>>>>>>>>><dcosta at lternet.edu>
>>>>>>>>>>>References: <E1C0TNQ-00066I-00 at lternet.lternet.edu>
>>>>>>>>>>>
>>>>>>>>>>>Hi Corinna,
>>>>>>>>>>>
>>>>>>>>>>>I have been discussing this issue of ID attributes 
>>
>>with James 
>>
>>>>>>>>>>>and Duane here at LNO.  Please correct me if I am wrong, but 
>>>>>>>>>>>the section on Reusable Content (below or 
>>>>>>>>>>>http://knb.ecoinformatics.org/software/eml/eml-2.0.1/
>>
>>index.htm
>>
>>>>>>>>>>>l#reusableContent)
>>>>>>>>>>>states that "two identical ids cannot exist in a single 
>>>>>>>>>>>document".
>>>>>>>>>>>It appears that the "SYSTEM" attribute only allows 
>>
>>identical ids in 
>>
>>>>>>>>>>>multiple documents within the system (that is, only 
>>
>>if the repeated 
>>
>>>>>>>>>>>ids reference the exact same object) - something 
>>
>>like globalizing 
>>
>>>>>>>>>>>the id'ed object to the system for repeated 
>>
>>reference in one or 
>>
>>>>>>>>>>>more documents, but not necessarily allowing 
>>
>>identical ids within a 
>>
>>>>>>>>>>>single document by changing the SYSTEM attribute 
>>
>>value.  I am not 
>>
>>>>>>>>>>>really sure how one would take advantage of the 
>>
>>SYSTEM attribute 
>>
>>>>>>>>>>>for reusable content.  And, I don't know the 
>>
>>provenance of this 
>>
>>>>>>>>>>>particular issue (the documentation could certainly 
>>
>>be more clear), 
>>
>>>>>>>>>>>but if we were to follow the documentation as we 
>>
>>interpret, would 
>>
>>>>>>>>>>>this still be a bug in the Harvester/Metacat software?
>>>>>>>>>>>
>>>>>>>>>>>Sincerely,
>>>>>>>>>>>Mark
>>>>>>>>>>>
>>>>>>>>>>>3.3. Reusable Content
>>>>>>>>>>>EML allows the reuse of previously defined 
>>
>>structured content 
>>
>>>>>>>>>>>(DOM
>>>>>>>>>>>sub-trees) through the use of key/keyRef type references. In
>>>>>>>>>>>order for an EML package to remain cohesive and to 
>>
>>allow for the 
>>
>>>>>>>>>>>cross platform compatability of packages, the 
>>
>>following rules with 
>>
>>>>>>>>>>>respect to packaging must be followed. 1. An ID is 
>>
>>required on the 
>>
>>>>>>>>>>>eml root element. 2. IDs are optional on all other 
>>
>>elements. 3. If 
>>
>>>>>>>>>>>an ID is not provided, that content must be interpreted as 
>>>>>>>>>>>representing a distinct object. 4. If an ID is 
>>
>>provided for content 
>>
>>>>>>>>>>>then that content is distinct 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>from all other content except for that content that
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>references its ID. 5. If a user wants to reuse content to 
>>>>>>>>>>>indicate
>>>>>>>>>>>the repetition of an object, a reference must be used. Two 
>>>>>>>>>>>identical ids cannot exist in a single document. 6. 
>>
>>"Document" 
>>
>>>>>>>>>>>scope is defined as identifiers unique only to a 
>>
>>single instance 
>>
>>>>>>>>>>>document (if a document does not have a system 
>>
>>attribute or if 
>>
>>>>>>>>>>>scope is set to 'document' then all IDs are defined 
>>
>>as distinct 
>>
>>>>>>>>>>>content). 7. "System" scope is defined as 
>>
>>identifiers unique to an 
>>
>>>>>>>>>>>entire data management system (if two documents 
>>
>>share a system 
>>
>>>>>>>>>>>string, then any IDs in those two documents that are 
>>
>>identical 
>>
>>>>>>>>>>>refer to the same object). 8. If an element 
>>
>>references another 
>>
>>>>>>>>>>>element, it must not have an ID itself. 9. All EML 
>>
>>packages must 
>>
>>>>>>>>>>>have the 'eml' module as the root. 10. The system and scope 
>>>>>>>>>>>attribute are always optional except for at the 
>>
>>'eml' module where 
>>
>>>>>>>>>>>the scope attribute is fixed as 'system'. The scope 
>>
>>attribute 
>>
>>>>>>>>>>>defaults to 'document' for all other modules.
>>>>>>>>>>>
>>>>>>>>>>>Duane Costa wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>Could anyone comment as to whether the EML error reported
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>by Metacat
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>below is a genuine EML error versus a bug in Metacat or the 
>>>>>>>>>>>>EML validator program? The issue is whether the id 
>>
>>value for 
>>
>>>>>>>>>>>><dataset> must be unique from the id value for <creator>.
>>>>>>>>>>>>
>>>>>>>>>>>>Thanks,
>>>>>>>>>>>>Duane
>>>>>>>>>>>>
>>>>>>>>>>>>-----Original Message-----
>>>>>>>>>>>>From: Corinna Gries [mailto:corinna at asu.edu]
>>>>>>>>>>>>Sent: Thursday, August 26, 2004 3:48 PM
>>>>>>>>>>>>To: dcosta at lternet.edu
>>>>>>>>>>>>Subject: RE: Report from Metacat Harvester: Wed Aug 25
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>11:00:36 MDT 2004
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>Hi Duane,
>>>>>>>>>>>>
>>>>>>>>>>>>I am trying to fix these problems with our eml 
>>
>>files. Some are 
>>
>>>>>>>>>>>>easy because they are actual errors in our files, 
>>
>>but there is
>>
>>>>>>>>>>>
>>>>>>>>>>>one where I
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>wonder if the ID checking is right. I understood IDs should
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>be unique
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>within the system, that is for example:
>>>>>>>>>>>>
>>>>>>>>>>>><dataset id="30" system="ces_dataset"> ... Is different
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>from <creator
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>id="30" system="ces_party"> ....
>>>>>>>>>>>>
>>>>>>>>>>>>However, your harvester complains that they are the same:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>*****************************************************
>>
>>**********
>>
>>>>>>>>>>>*******
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>**
>>>>>>>>>>>>*****
>>>>>>>>>>>>*
>>>>>>>>>>>>* METACAT HARVESTER REPORT: Wed Aug 25 11:00:36 MDT 2004
>>>>>>>>>>>>*
>>>>>>>>>>>>* A TOTAL OF 22 ERRORS WERE DETECTED.
>>>>>>>>>>>>* Please see the log entries below for additonal details.
>>>>>>>>>>>>*
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>*****************************************************
>>
>>*********
>>
>>>>>>>>>>>**********
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>*****
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>*****************************************************
>>
>>*********
>>
>>>>>>>>>>>**********
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>*****
>>>>>>>>>>>>*
>>>>>>>>>>>>* harvestLogID:         5549
>>>>>>>>>>>>* harvestDate:          Wed Aug 25 11:00:36 MDT 2004
>>>>>>>>>>>>* status:               1
>>>>>>>>>>>>* message:              * harvestOperationCode: 
>>
>>InsertDocError
>>
>>>>>>>>>>>>* description:          Error inserting EML 
>>
>>document to Metacat
>>
>>>>>>>>>>>>* detailLogID:          383
>>>>>>>>>>>>* errorMessage:         MetacatException: <?xml 
>>
>>version="1.0"?>
>>
>>>>>>>>>>>><error>
>>>>>>>>>>>>Error running xpath expression:
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>//dateTimeDomain|//nonNumericDomain|//numericDomain|/
>>
>>/access|/
>>
>>>>>>>>>>>/attribute
>>>>>>>>>>>
>>>>>>>>>>>List|//constraint|//coverage|//temporalCoverage|//geo
>>
>>graphicCov
>>
>>>>>>>>>>>List|erage|/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>List|/t
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>axonomicCoverage|/dataset|/eml/dataset|//dataSource|/
>>
>>/dataTable
>>
>>>>>>>>>>>axonomicCoverage||//othe
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>axonomicCoverage|rE
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>ntity|//citation|//address|//conferenceLocation|//par
>>
>>ty|//origi
>>
>>>>>>>>>>>ntity|nator|/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>ntity|/c
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>reator|//contact|//publisher|//editor|//recipient|//p
>>
>>erformer|/
>>
>>>>>>>>>>>reator|/instit
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>reator|ut
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>ion|//metadataProvider|//associatedParty|//personnel|
>>
>>//physical
>>
>>>>>>>>>>>ion||//conn
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>ion|ec
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>tionDefinition|//distribution|//researchProject|//pro
>>
>>ject|//rel
>>
>>>>>>>>>>>tionDefinition|atedPro
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>tionDefinition|je
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>ct|//software|//spatialRaster|//spatialReference|//sp
>>
>>atialVecto
>>
>>>>>>>>>>>ct|r|//sto
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>ct|re
>>>>>>>>>>>>dProcedure|//view|//protocol|//additionalMetadata : 
>>
>>Error in 
>>
>>>>>>>>>>>>dProcedure|xml
>>>>>>>>>>>>document.  This EML document is not valid because the id 30 
>>>>>>>>>>>>occurs more than once.  IDs must be unique. </error>
>>>>>>>>>>>>
>>>>>>>>>>>>* scope:                ces_dataset
>>>>>>>>>>>>* identifier:           30
>>>>>>>>>>>>* revision:             1
>>>>>>>>>>>>* documentType:         eml://ecoinformatics.org/eml-2.0.0
>>>>>>>>>>>>* documentURL:
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>http://seinet.asu.edu/DataCatalog/getXanthoriaRecord.
>>
>>jsp?source
>>
>>>>>>>>>>>=ces_da
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>ta
>>>>>>>>>>>>set_mohave&id=30
>>>>>>>>>>>>*
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>*****************************************************
>>
>>*********
>>
>>>>>>>>>>>**********
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>*****
>>>>>>>>>>>>
>>>>>>>>>>>>What do you think?
>>>>>>>>>>>>
>>>>>>>>>>>>Corinna
>>>>>>>>>>>>
>>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>>eml-dev mailing list
>>>>>>>>>>>>eml-dev at ecoinformatics.org
>>>>>>>>>>>>http://www.ecoinformatics.org/mailman/listinfo/eml-dev
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>--
>>>>>>>>>>>Mark Servilla, Ph.D.
>>>>>>>>>>>
>>>>>>>>>>>LTER Network Office
>>>>>>>>>>>Department of Biology
>>>>>>>>>>>MSC 03 2020
>>>>>>>>>>>1 University of New Mexico
>>>>>>>>>>>Albuquerque, NM 87131-0001
>>>>>>>>>>>
>>>>>>>>>>>servilla at lternet.edu
>>>>>>>>>>>Office (505) 277-2619
>>>>>>>>>>>Cell   (505) 453-8593
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>--
>>>>>>>>>>>Mark Servilla, Ph.D.
>>>>>>>>>>>
>>>>>>>>>>>LTER Network Office
>>>>>>>>>>>Department of Biology
>>>>>>>>>>>MSC 03 2020
>>>>>>>>>>>1 University of New Mexico
>>>>>>>>>>>Albuquerque, NM 87131-0001
>>>>>>>>>>>
>>>>>>>>>>>servilla at lternet.edu
>>>>>>>>>>>Office (505) 277-2619
>>>>>>>>>>>Cell   (505) 453-8593
>>>>>>>>>>>
>>>>>>>>>>>--
>>>>>>>>>>>James W. Brunt
>>>>>>>>>>>Associate Director for Information Management
>>>>>>>>>>>Long Term Ecological Research Network Office
>>>>>>>>>>>Department of Biology
>>>>>>>>>>>University of New Mexico
>>>>>>>>>>>Albuquerque, NM 87131-1091
>>>>>>>>>>>505 272 7085
>>>>>>>>>>>jbrunt at lternet.edu
>>>>>>>>>>>
>>>>>>>>>>>-------------------------------------------------
>>>>>>>>>>>Long-Term Ecological Research Network Mailing List 
>>>>>>>>>>>im at LTERnet.edu 
>>
>>http://sql.lternet.edu/cgi/mailgroups_view.pl?> im
>>
>>>>>>>>>>_______________________________________________
>>>>>>>>>>eml-dev mailing list
>>>>>>>>>>eml-dev at ecoinformatics.org
>>>>>>>>>>http://www.ecoinformatics.org/mailman/listinfo/eml-dev
>>
>>-- 
>>Mark Servilla, Ph.D.
>>
>>LTER Network Office
>>Department of Biology
>>MSC 03 2020
>>1 University of New Mexico
>>Albuquerque, NM 87131-0001
>>
>>servilla at lternet.edu
>>Office (505) 277-2619
>>Cell   (505) 453-8593
> 
> 

-- 
-------------------------------------------------------------------
Matt Jones                                     jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California Santa Barbara
Interested in ecological informatics? http://www.ecoinformatics.org
-------------------------------------------------------------------