[LTER-im] [Fwd: [Fwd: Re: FW: Report from Metacat Harvester: Wed Aug 25 11:00:36 MDT 2004]]

Thu Sep 2 10:24:25 PDT 2004

Ok thanks Matt. I see you already created a 2.02 milestone. So I think
we need some feeback on 1662 and we can probably move forward.

Peter McCartney (peter.mccartney at asu.edu)
Center for Environmental-Studies
Arizona State University

> -----Original Message-----
> From: Matt Jones [mailto:jones at nceas.ucsb.edu] 
> Sent: Thursday, September 02, 2004 9:50 AM
> To: Peter McCartney
> Cc: eml-dev at ecoinformatics.org; emlbestpractices at lternet.edu; 
> im at lternet.edu
> Subject: Re: [LTER-im] [Fwd: [Fwd: Re: FW: Report from 
> Metacat Harvester: Wed Aug 25 11:00:36 MDT 2004]]
> 
> 
> Hi Peter,
> 
> Peter McCartney wrote:
> > That probably depends on whethere a 2.02 should only address this 
> > issue (in which case I think a month could handle it) or 
> more. I did 
> > not pay a great deal of attention to the 2.01 process, so I 
> don't know 
> > the procedural details - did a branch or tag get created 
> for 2.01 in 
> > cvs?New Development is done on the HEAD of cvs.  Then a tag 
> is created to
> mark the files as released for a particular version (e.g., the tag 
> RELEASE_EML_2_0_1 was the most recent).  If fixes need to happen to a 
> release, then the tagged files are forked into a branch and 
> patched -- 
> so far we haven't needed to do that.
> 
> > Bugzilla does not seem to have a version tag for 2.01 so 
> how were bugs 
> > related to that kept separate from other bugs?
> 
> We create a target milestone for each release, so bugs are 
> targeted at 
> that.  If you 'Change columns' in your bugzilla display and 
> add 'Target 
> Milestone' and sort by that field then things make more 
> sense.  Official 
> 'versions' get created when each release is made so that bugs can be 
> filed against that version.  I usually create a tracker bug as the 
> release nears that lists all of the odds and ends that need 
> to be done 
> to get the release out the door (e.g., see bug 1195 
> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=1195)
> 
> When bugs are initially filed, they default to the milestone 
> 'Unspecified'.  Someone then needs to choose a target 
> milestone for that 
> bug, at which point it enters the list of TODO's before the 
> release gets 
> released.  Obviously, due to various time constraints some 
> bug targets 
> are changed to a later target in order to release in a timely manner.
> 
> The few discussions ive
> > particpated in would indicate that a possible roadmap out 
> there goes 
> > something like this:
> > 
> > 2.02 - support for scoping id's to system
> > 	? Support for multiple authentication systems within eml-access.
> > Ive talked with matt about this, but I 	don't think 
> there is a bug
> > entered yet.
> > 
> > 2.1? - support for updatable, online dictionaries for 
> enumerated content
> > (file format, connection schemas, 	units, projections, etc) -
> > similar to virus definition files.
> > 	? New modules for resource types - we are working on an
> > eml-model candidate under our ITR grant and are 	about to send
> > out invitations for a meeting on that this fall.
> > 3.0? - probably major restructuring to better support semantic 
> > extensions...
> Sounds good to me, and is in line with the issues that I've seen 
> discussed.  We have target milestones for each of those versions, but 
> don't have bug descriptions for all of those tasks.  I'm not 
> sure about 
> what 'multiple authentication systems' would involve, but its 
> certainly 
> worth creating a bug and discussing what to do about it.
> 
> Cheers,
> Matt
> 
> > 
> > Peter McCartney (peter.mccartney at asu.edu)
> > Center for Environmental-Studies
> > Arizona State University
> >  
> > 
> > 
> > 
> >>-----Original Message-----
> >>From: Mark Servilla [mailto:servilla at lternet.edu]
> >>Sent: Wednesday, September 01, 2004 4:00 PM
> >>To: Peter McCartney
> >>Cc: Matt Jones; jbrunt at LTERnet.edu; 
> >>eml-dev at ecoinformatics.org; emlbestpractices at lternet.edu; 
> >>im at lternet.edu
> >>Subject: Re: [LTER-im] [Fwd: [Fwd: Re: FW: Report from 
> >>Metacat Harvester: Wed Aug 25 11:00:36 MDT 2004]]
> >>
> >>
> >>Matt/Peter,
> >>
> >>Duane and I will evaluate the level of effort necessary for
> >>the changes 
> >>to the EML-parser based on Peter's schema mods.  I hope to 
> have a LOE 
> >>defined by next week.  Assuming it is not too great (and with 
> >>agreement 
> >>from our management), we will then enter the task into our 
> >>schedule.  In 
> >>addition, we would be glad to take a crack at 
> reviewing/updating the 
> >>documentation.
> >>
> >>What is (in your opinions) the overall urgency of this task
> >>(i.e., what 
> >>would be a reasonable target date for EML-2.0.2)?
> >>--------------
> >>
> >>Matt,
> >>
> >>Would you please add both Duane and myself to the eml-cvs
> >>list service.
> >>
> >>Is the EML-parser within the Metacat cvs or a separate cvs?  If
> >>separate, Duane will need update permission.
> >>
> >>Thanks!
> >>
> >>Sincerely,
> >>Mark
> >>
> >>Peter McCartney wrote:
> >>
> >>
> >>>I will.
> >>>On Tue, 2004-08-31 at 14:42, Matt Jones wrote:
> >>>
> >>>
> >>>>Yeah, I think there might be essentially full agreement 
> on the right 
> >>>>approach here -- minor differences maybe in what we
> >>
> >>emphasize.  In the
> >>
> >>>>interest of moving forward, is anyone willing to take the lead on
> >>>>developing the schema changes and other changes needed 
> for a 2.0.2 
> >>>>release that would deal Mark's #2 proposal?  They should 
> be pretty 
> >>>>minor, but I'm feeling kind of swamped, and the 2.0.1 
> >>
> >>release was enough
> >>
> >>>>of a burden that I'm not real excited to start right back
> >>
> >>up on it given
> >>
> >>>>other priorities.
> >>>>
> >>>>Matt
> >>>>
> >>>>Peter McCartney wrote:
> >>>>
> >>>>
> >>>>>Careful. i never said that to solve the example james just
> >>
> >>described
> >>
> >>>>>that one should take the first node. i said that in the 
> case where
> >>>>>you have duplicated content and have given both of them 
> >>
> >>the same id
> >>
> >>>>>and system, you can take the first, or any, node and it doesn't
> >>>>>matter. in the case of James's example, Mark's fix# 2 
> applies - i 
> >>>>>think we are all in agreement on that.
> >>>>>
> >>>>>The suggestion that we just don't include ids for things
> >>
> >>we know are
> >>
> >>>>>duplicating will of course solve the problem and that is probably
> >>>>>what we will do for now. However, it has the unfortunate 
> >>
> >>side effect
> >>
> >>>>>that it takes away our ability to maintain a relationship
> >>
> >>within EML
> >>
> >>>>>back to the original source content (because all of the 
> content in
> >>>>>our EML files is just a copy of the original record in our 
> >>
> >>database
> >>
> >>>>>anyway). This is very useful when loading EML files into a
> >>
> >>relational
> >>
> >>>>>database through the xanthoria put method. But thats our 
> problem...
> >>>>>
> >>>>>
> >>>>>On Tue, 2004-08-31 at 13:43, Matt Jones wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>Hi James,
> >>>>>>
> >>>>>>Yes, that's exactly the problem.  Peter is proposing to
> >>
> >>solve it by
> >>
> >>>>>>taking the *first* of the redundant trees. But, which is
> >>
> >>first depends
> >>
> >>>>>>on whether you traverse the document in breadth-first order or
> >>>>>>depth-first order.  That, to me, is just asking for 
> >>
> >>trouble -- we'd be
> >>
> >>>>>>asking people to remember to put the subtree they want
> >>
> >>referenced in the
> >>
> >>>>>>"depth-first" first node, which can change as the
> >>
> >>structure of the tree
> >>
> >>>>>>changes.  Hard to do and harder to maintain.
> >>>>>>
> >>>>>>Also, if we do it this way, we should probably check to
> >>
> >>be sure that
> >>
> >>>>>>two
> >>>>>>subtrees that have identical id's also have identical
> >>
> >>content, which is
> >>
> >>>>>>not a trivial programming task (assuming they are
> >>
> >>identical could easily
> >>
> >>>>>>lead to conflicting information).
> >>>>>>
> >>>>>>I would far prefer to keep the links unambiguous (ie, references
> >>>>>>always
> >>>>>>can be resolved to one and only one id).  If someone 
> >>
> >>doesn't want to
> >>
> >>>>>>deal with that stuff, they can always omit the ids and
> >>
> >>just duplicate
> >>
> >>>>>>the content, which is why we made the ids optional originally.
> >>>>>>
> >>>>>>Matt
> >>>>>>
> >>>>>>James W Brunt wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>Just a clarification...The specific error example we have been 
> >>>>>>>discussing is concerning two identical ids with
> >>
> >>different content...
> >>
> >>>>>>><dataset id="30" system="ces_dataset"> ... Is different from
> >>>>>>><creator id="30" system="ces_party"> ....
> >>>>>>>
> >>>>>>>Admittedly, were the content the same we would still get
> >>
> >>the error
> >>
> >>>>>>>(if
> >>>>>>>the parser is written to the spec). However, if there
> >>
> >>were (in this case
> >>
> >>>>>>>there wasn't) a
> >>>>>>>
> >>>>>>><references>30</references>
> >>>>>>>
> >>>>>>>it would be ambiguous. Correct?
> >>>>>>>
> >>>>>>>James
> >>>>>>>
> >>>>>>>Peter McCartney wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>On Tue, 2004-08-31 at 11:35, Matt Jones wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>It would really help me justify the extra work involed in
> >>>>>>>>>>managing ids and references if someone could give me 
> >>
> >>a concrete
> >>
> >>>>>>>>>>example of why it would be bad to have a document 
> contain two
> >>>>>>>>>>elements with identical ids and identical content.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>Like in other relational systems, The key (id) acts as a
> >>>>>>>>>surrogate
> >>>>>>>>>for the content.  So, references should resolve to one 
> >>
> >>(and only one)
> >>
> >>>>>>>>>id. It is far harder to validate that the content is
> >>
> >>the same between
> >>
> >>>>>>>>>two nodes with identical keys than it is to validate
> >>
> >>that no key is
> >>
> >>>>>>>>>duplicated.  I think they got this right in the
> >>
> >>relational model, and
> >>
> >>>>>>>>>we should follow that lead.  If you allow duplicate
> >>
> >>ids, then I am
> >>
> >>>>>>>>>sure this situation will arise:
> >>>>>>>>>
> >>>>>>>>><a id="1">foo</a>
> >>>>>>>>><a id="1">bar></a>
> >>>>>>>>><b><references>1</references></b>
> >>>>>>>>>
> >>>>>>>>>What is the value of <b>?  foo, or bar?  It is indeterminate.
> >>>>>>>>>And
> >>>>>>>>>this is precisely why this is a problem.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>I agree this would be bad, but this is not what is
> >>
> >>happening. The
> >>
> >>>>>>>>documents that are being rejected have: <a id="1">foo</a> <a 
> >>>>>>>>id="1">foo></a> Typically, when this happens, the code is 
> >>>>>>>>obviously not
> >>
> >>bothering with
> >>
> >>>>>>>>references tags, so we aren't likely to create broken
> >>
> >>or ambiguous
> >>
> >>>>>>>>reference tags. Even if we did throw in a 
> >>>>>>>><b><references>1</references></b>, it really wouldn't
> >>
> >>be a problem. In
> >>
> >>>>>>>>some of our files where attributes are repeated in view
> >>
> >>entities, we are
> >>
> >>>>>>>>also getting this:
> >>>>>>>>
> >>>>>>>><a id="1">foo</a>
> >>>>>>>><a id="2">foo></a>
> >>>>>>>>
> >>>>>>>>but your parser hasn't spotted that one yet :) and again, even
> >>>>>>>>though it violates the spec,  i would contend that this 
> >>
> >>causes no
> >>
> >>>>>>>>problem.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>If my xpath returns one or several nodes and they
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>are all identical, why is it so bad to just assume
> >>
> >>that the rule
> >>
> >>>>>>>>>>is: "identical id (and system) means identical
> >>
> >>content" and just
> >>
> >>>>>>>>>>use the first one in the list?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>Because relational models have shown that this never 
> works.  I
> >>>>>>>>>think
> >>>>>>>>>that such an assumption will result in lots of broken docs.
> >>>>>>>>>
> >>>>>>>>>I think it is no more work to write parsers to
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>check for differences between nodes ith similar ids
> >>
> >>than it is
> >>
> >>>>>>>>>>to check for duplicate ids in the first place, but it makes
> >>>>>>>>>>generating valid eml a LOT simpler.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>Generating valid eml with only one copy of a subtree
> >>
> >>is easy --
> >>
> >>>>>>>>>just
> >>>>>>>>>track whether you've already inserted it, and reference it
> >>>>>>>>>thereafter. I don't understand at all why this is 
> >>
> >>hard.  However, I
> >>
> >>>>>>>>>do understand the problem with system not being
> >>
> >>included in the
> >>
> >>>>>>>>>assessment of the uniqueness of the ID.  So I like 
> the idea of
> >>>>>>>>>pursuing Mark's suggestion (2).
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>I also like pursuing 2 regardless of how we debate over 3 and
> >>>>>>>>would support a hasty 2.02 to revise the spec 
> documentation and 
> >>>>>>>>add an optional system attribute as such:
> >>>>>>>>
> >>>>>>>><references system="ces_dataset">201</references>.
> >>>>>>>>Keep in mind that when people hear you (or me, or
> >>
> >>anyone...) say
> >>
> >>>>>>>>"its not that hard" they are thinking "sure, if you
> >>
> >>have a team of
> >>
> >>>>>>>>Java programmers!"). So perhaps it would help to
> >>
> >>provide some code
> >>
> >>>>>>>>samples that can be adapted to the kind of approaches
> >>
> >>people are
> >>
> >>>>>>>>taking with more off-the-shelf tools so that people don't feel
> >>>>>>>>like the only way to work with valid eml is to use one set of 
> >>>>>>>>tools from one shop. For example, the approach we take in 
> >>>>>>>>Xanthoria for converting from RDBMS to xml is 
> actually a fairly 
> >>>>>>>>common one that appears in Cocoon, XML spy's RDPMS 
> >>
> >>mapping tool,
> >>
> >>>>>>>>and many other vendor-specific DB->xml modules.
> >>
> >>Specifically, the
> >>
> >>>>>>>>rdbms content is exported to a generic, denormalized
> >>
> >>xml and then
> >>
> >>>>>>>>transformed with xsl to map to the desired schema. So for most
> >>>>>>>>cases, the place where this tracking needs to be done 
> >>
> >>is likely to
> >>
> >>>>>>>>be in XSL. While we have found it relatively easy when
> >>
> >>parsing EML
> >>
> >>>>>>>>in XSL to follow references to find the content, we have also
> >>>>>>>>found that tracking things within xsls when writing out 
> >>
> >>eml to be
> >>
> >>>>>>>>a cumbersome process, let alone making sure that each
> >>
> >>time we do
> >>
> >>>>>>>>it it is going to come out consistent. So if there is some xsl
> >>>>>>>>sample that we can easily add to xanthoria style sheets 
> >>
> >>to solve
> >>
> >>>>>>>>this problem, then thats cool. Otherwise, I really
> >>
> >>think it would
> >>
> >>>>>>>>be folly to hang too long on this when we (LTER that is) have
> >>>>>>>>bigger fish to fry. Namely, building a better search 
> >>
> >>interface for
> >>
> >>>>>>>>searching LTER data via eml. The query interface is 
> what the CC
> >>>>>>>>spent hours talking about in Fairbanks, so if we come back in 
> >>>>>>>>Miami with the ID problem solved but no improved query 
> >>
> >>system, I'd
> >>
> >>>>>>>>prefer not be the one to give that powerpoint.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>Matt
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>Peter McCartney (peter.mccartney at asu.edu)
> >>>>>>>>>>Center for Environmental-Studies
> >>>>>>>>>>Arizona State University
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>-----Original Message-----
> >>>>>>>>>>>From: owner-im at lternet.edu [mailto:owner-im at lternet.edu] On
> >>>>>>>>>>>Behalf Of James W Brunt
> >>>>>>>>>>>Sent: Monday, August 30, 2004 2:57 PM
> >>>>>>>>>>>To: eml-dev at ecoinformatics.org; 
> emlbestpractices at lternet.edu;
> >>>>>>>>>>>im at lternet.edu
> >>>>>>>>>>>Subject: [LTER-im] [Fwd: [Fwd: Re: FW: Report from Metacat 
> >>>>>>>>>>>Harvester: Wed Aug 25 11:00:36 MDT 2004]]
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>Peter,  et. al,
> >>>>>>>>>>>
> >>>>>>>>>>>Mark's email to me (below) has reinforced my own conclusion
> >>>>>>>>>>>about the id, system, references question. There 
> at least 2 
> >>>>>>>>>>>possibly 3 issues (bugs if you will) here to be dealt with:
> >>>>>>>>>>>
> >>>>>>>>>>>1. The eml normative documentation needs to 
> reflect the real
> >>>>>>>>>>>intent and use of the system attribute. Read (Can 
> O Worms). 
> >>>>>>>>>>>Options as I see them:
> >>>>>>>>>>>  a. deprecate the system attribute until it can be better
> >>>>>>>>>>>defined - ignore 2 and 3 below (Mark goes even 
> >>
> >>further on this one
> >>
> >>>>>>>>>>>below).
> >>>>>>>>>>>  b. clearly define the system attribute and make
> >>
> >>the changes in
> >>
> >>>>>>>>>>>2 and 3 below.
> >>>>>>>>>>>
> >>>>>>>>>>>2. <references> tag needs to be made system/scope aware
> >>>>>>>>>>>
> >>>>>>>>>>>3. EMLparser needs to enforce the final outcome of 1 and 2.
> >>>>>>>>>>>
> >>>>>>>>>>>Currently, the documentation introduces system but it's
> >>>>>>>>>>>definition does not supercede the unique ID 
> >>
> >>requirement within
> >>
> >>>>>>>>>>>a document, references is not system aware, EMLparser is
> >>>>>>>>>>>enforcing exactly what the documentation says.
> >>>>>>>>>>>
> >>>>>>>>>>>Turning off the ID checking as Peter has suggested
> >>
> >>(different
> >>
> >>>>>>>>>>>thread) would  result in uninterpretable EML
> >>
> >>documents were the
> >>
> >>>>>>>>>>>references tag to be used (Although, in all but one
> >>
> >>case in the
> >>
> >>>>>>>>>>>example below there were no references to the IDs).
> >>
> >>I don't see
> >>
> >>>>>>>>>>>this as an intermediate solution.
> >>>>>>>>>>>
> >>>>>>>>>>>The intent as I remember all that long discussion 
> ago was to
> >>>>>>>>>>>create a way to get around having to completely duplicate 
> >>>>>>>>>>>content in a document. Thus creating a more 
> compact document 
> >>>>>>>>>>>and one that would be more easily maintained for 
> someone not 
> >>>>>>>>>>>generating the documents
> >>>>>>>>>>
> >>>>>>>>>>>from a database. I'm sure I can be clarified some here by
> >>>>>>>>>>
> >>>>>>>>>>>others
> >>>>>>>>>>
> >>>>>>>>>>>that were present. I realize the difficulty in tracking a
> >>>>>>>>>>>document
> >>>>>>>>>>>ID map for every document you automatically generate 
> >>
> >>however I
> >>
> >>>>>>>>>>>really don't understand why you wouldn't completely
> >>
> >>duplicate the
> >>
> >>>>>>>>>>>content. However, the inclusion of a second
> >>
> >>qualifying attribute
> >>
> >>>>>>>>>>>that has to be checked for every id tag is doable
> >>
> >>but before we
> >>
> >>>>>>>>>>>begin something like this it must be clearly 
> spelled-out and
> >>>>>>>>>>>agreeable to the group(s). We'd like to hear from eml-dev, 
> >>>>>>>>>>>eml-bestpractices, and im as well as individual 
> stakeholders.
> >>>>>>>>>>>
> >>>>>>>>>>>Thanks,
> >>>>>>>>>>>
> >>>>>>>>>>>James
> >>>>>>>>>>>
> >>>>>>>>>>>--
> >>>>>>>>>>>James W. Brunt
> >>>>>>>>>>>Associate Director for Information Management
> >>>>>>>>>>>Long Term Ecological Research Network Office Department of 
> >>>>>>>>>>>Biology University of New Mexico
> >>>>>>>>>>>Albuquerque, NM 87131-1091
> >>>>>>>>>>>505 272 7085
> >>>>>>>>>>>jbrunt at lternet.edu
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>-------- Original Message --------
> >>>>>>>>>>>From: Mark Servilla <servilla at lternet.edu>
> >>>>>>>>>>>To: James Brunt <jbrunt at lternet.edu>
> >>>>>>>>>>>Subject: [Fwd: Re: FW: Report from Metacat
> >>
> >>Harvester: Wed Aug
> >>
> >>>>>>>>>>>25 11:00:36 MDT 2004]
> >>>>>>>>>>>
> >>>>>>>>>>>James,
> >>>>>>>>>>>
> >>>>>>>>>>>After reviewing the EML specification documents, it
> >>
> >>appears to
> >>
> >>>>>>>>>>>me that duplicate IDs within a single instance
> >>
> >>document is not
> >>
> >>>>>>>>>>>valid EML, and therefore (IMHO), the EML Parser is behaving
> >>>>>>>>>>>correctly.  I cannot see how setting either the 
> >>
> >>SYSTEM or SCOPE
> >>
> >>>>>>>>>>>attribute can be used by the REFERENCES element to
> >>
> >>distinguish
> >>
> >>>>>>>>>>>duplicate IDs within a single document (perhaps someone in
> >>>>>>>>>>>eml-dev can help answer how SYSTEM/SCOPE are used in this 
> >>>>>>>>>>>context).
> >>>>>>>>>>>
> >>>>>>>>>>>Some possible solutions are:
> >>>>>>>>>>>(1) Deprecate SYSTEM/SCOPE attributes in this
> >>
> >>context, update
> >>
> >>>>>>>>>>>the specification to reflect such change, and do not allow
> >>>>>>>>>>>duplicate IDs.
> >>>>>>>>>>>(2) Modify the specification to allow SYSTEM/SCOPE 
> to narrow 
> >>>>>>>>>>>the ID
> >>>>>>>>>>>scope, thereby allowing duplicate IDs when qualified 
> >>
> >>by either
> >>
> >>>>>>>>>>>SYSTEM/SCOPE -- and, modify the specification for
> >>
> >>REFERENCES to
> >>
> >>>>>>>>>>>make use of such change.
> >>>>>>>>>>>(3) Deprecate REFERENCES completely and force
> >>
> >>repeated content.
> >>
> >>>>>>>>>>>Just my thoughts - thanks!
> >>>>>>>>>>>
> >>>>>>>>>>>Mark
> >>>>>>>>>>>
> >>>>>>>>>>>-------- Original Message --------
> >>>>>>>>>>>Subject: Re: FW: Report from Metacat Harvester: Wed Aug 25
> >>>>>>>>>>>11:00:36 MDT 2004
> >>>>>>>>>>>Date: Mon, 30 Aug 2004 09:26:13 -0600
> >>>>>>>>>>>From: Mark Servilla <servilla at lternet.edu>
> >>>>>>>>>>>To: 'Corinna Gries' <corinna at asu.edu>
> >>>>>>>>>>>CC: James Brunt <jbrunt at lternet.edu>, Duane Costa 
> >>>>>>>>>>><dcosta at lternet.edu>
> >>>>>>>>>>>References: <E1C0TNQ-00066I-00 at lternet.lternet.edu>
> >>>>>>>>>>>
> >>>>>>>>>>>Hi Corinna,
> >>>>>>>>>>>
> >>>>>>>>>>>I have been discussing this issue of ID attributes
> >>
> >>with James
> >>
> >>>>>>>>>>>and Duane here at LNO.  Please correct me if I am 
> wrong, but
> >>>>>>>>>>>the section on Reusable Content (below or 
> >>>>>>>>>>>http://knb.ecoinformatics.org/software/eml/eml-2.0.1/
> >>
> >>index.htm
> >>
> >>>>>>>>>>>l#reusableContent)
> >>>>>>>>>>>states that "two identical ids cannot exist in a single
> >>>>>>>>>>>document".
> >>>>>>>>>>>It appears that the "SYSTEM" attribute only allows 
> >>
> >>identical ids in
> >>
> >>>>>>>>>>>multiple documents within the system (that is, only
> >>
> >>if the repeated
> >>
> >>>>>>>>>>>ids reference the exact same object) - something
> >>
> >>like globalizing
> >>
> >>>>>>>>>>>the id'ed object to the system for repeated
> >>
> >>reference in one or
> >>
> >>>>>>>>>>>more documents, but not necessarily allowing
> >>
> >>identical ids within a
> >>
> >>>>>>>>>>>single document by changing the SYSTEM attribute
> >>
> >>value.  I am not
> >>
> >>>>>>>>>>>really sure how one would take advantage of the
> >>
> >>SYSTEM attribute
> >>
> >>>>>>>>>>>for reusable content.  And, I don't know the
> >>
> >>provenance of this
> >>
> >>>>>>>>>>>particular issue (the documentation could certainly
> >>
> >>be more clear),
> >>
> >>>>>>>>>>>but if we were to follow the documentation as we
> >>
> >>interpret, would
> >>
> >>>>>>>>>>>this still be a bug in the Harvester/Metacat software?
> >>>>>>>>>>>
> >>>>>>>>>>>Sincerely,
> >>>>>>>>>>>Mark
> >>>>>>>>>>>
> >>>>>>>>>>>3.3. Reusable Content
> >>>>>>>>>>>EML allows the reuse of previously defined
> >>
> >>structured content
> >>
> >>>>>>>>>>>(DOM
> >>>>>>>>>>>sub-trees) through the use of key/keyRef type 
> references. In 
> >>>>>>>>>>>order for an EML package to remain cohesive and to
> >>
> >>allow for the
> >>
> >>>>>>>>>>>cross platform compatability of packages, the
> >>
> >>following rules with
> >>
> >>>>>>>>>>>respect to packaging must be followed. 1. An ID is
> >>
> >>required on the
> >>
> >>>>>>>>>>>eml root element. 2. IDs are optional on all other
> >>
> >>elements. 3. If
> >>
> >>>>>>>>>>>an ID is not provided, that content must be interpreted as
> >>>>>>>>>>>representing a distinct object. 4. If an ID is 
> >>
> >>provided for content
> >>
> >>>>>>>>>>>then that content is distinct
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>from all other content except for that content that
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>references its ID. 5. If a user wants to reuse content to
> >>>>>>>>>>>indicate
> >>>>>>>>>>>the repetition of an object, a reference must be used. Two 
> >>>>>>>>>>>identical ids cannot exist in a single document. 6. 
> >>
> >>"Document"
> >>
> >>>>>>>>>>>scope is defined as identifiers unique only to a
> >>
> >>single instance
> >>
> >>>>>>>>>>>document (if a document does not have a system
> >>
> >>attribute or if
> >>
> >>>>>>>>>>>scope is set to 'document' then all IDs are defined
> >>
> >>as distinct
> >>
> >>>>>>>>>>>content). 7. "System" scope is defined as
> >>
> >>identifiers unique to an
> >>
> >>>>>>>>>>>entire data management system (if two documents
> >>
> >>share a system
> >>
> >>>>>>>>>>>string, then any IDs in those two documents that are
> >>
> >>identical
> >>
> >>>>>>>>>>>refer to the same object). 8. If an element
> >>
> >>references another
> >>
> >>>>>>>>>>>element, it must not have an ID itself. 9. All EML
> >>
> >>packages must
> >>
> >>>>>>>>>>>have the 'eml' module as the root. 10. The system and scope
> >>>>>>>>>>>attribute are always optional except for at the 
> >>
> >>'eml' module where
> >>
> >>>>>>>>>>>the scope attribute is fixed as 'system'. The scope
> >>
> >>attribute
> >>
> >>>>>>>>>>>defaults to 'document' for all other modules.
> >>>>>>>>>>>
> >>>>>>>>>>>Duane Costa wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>Could anyone comment as to whether the EML error reported
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>by Metacat
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>below is a genuine EML error versus a bug in 
> Metacat or the
> >>>>>>>>>>>>EML validator program? The issue is whether the id 
> >>
> >>value for
> >>
> >>>>>>>>>>>><dataset> must be unique from the id value for <creator>.
> >>>>>>>>>>>>
> >>>>>>>>>>>>Thanks,
> >>>>>>>>>>>>Duane
> >>>>>>>>>>>>
> >>>>>>>>>>>>-----Original Message-----
> >>>>>>>>>>>>From: Corinna Gries [mailto:corinna at asu.edu]
> >>>>>>>>>>>>Sent: Thursday, August 26, 2004 3:48 PM
> >>>>>>>>>>>>To: dcosta at lternet.edu
> >>>>>>>>>>>>Subject: RE: Report from Metacat Harvester: Wed Aug 25
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>11:00:36 MDT 2004
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>Hi Duane,
> >>>>>>>>>>>>
> >>>>>>>>>>>>I am trying to fix these problems with our eml
> >>
> >>files. Some are
> >>
> >>>>>>>>>>>>easy because they are actual errors in our files,
> >>
> >>but there is
> >>
> >>>>>>>>>>>
> >>>>>>>>>>>one where I
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>wonder if the ID checking is right. I understood 
> IDs should
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>be unique
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>within the system, that is for example:
> >>>>>>>>>>>>
> >>>>>>>>>>>><dataset id="30" system="ces_dataset"> ... Is different
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>from <creator
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>>id="30" system="ces_party"> ....
> >>>>>>>>>>>>
> >>>>>>>>>>>>However, your harvester complains that they are the same:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>*****************************************************
> >>
> >>**********
> >>
> >>>>>>>>>>>*******
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>**
> >>>>>>>>>>>>*****
> >>>>>>>>>>>>*
> >>>>>>>>>>>>* METACAT HARVESTER REPORT: Wed Aug 25 11:00:36 MDT 2004
> >>>>>>>>>>>>*
> >>>>>>>>>>>>* A TOTAL OF 22 ERRORS WERE DETECTED.
> >>>>>>>>>>>>* Please see the log entries below for additonal details.
> >>>>>>>>>>>>*
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>*****************************************************
> >>
> >>*********
> >>
> >>>>>>>>>>>**********
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>*****
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>*****************************************************
> >>
> >>*********
> >>
> >>>>>>>>>>>**********
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>*****
> >>>>>>>>>>>>*
> >>>>>>>>>>>>* harvestLogID:         5549
> >>>>>>>>>>>>* harvestDate:          Wed Aug 25 11:00:36 MDT 2004
> >>>>>>>>>>>>* status:               1
> >>>>>>>>>>>>* message:              * harvestOperationCode: 
> >>
> >>InsertDocError
> >>
> >>>>>>>>>>>>* description:          Error inserting EML 
> >>
> >>document to Metacat
> >>
> >>>>>>>>>>>>* detailLogID:          383
> >>>>>>>>>>>>* errorMessage:         MetacatException: <?xml 
> >>
> >>version="1.0"?>
> >>
> >>>>>>>>>>>><error>
> >>>>>>>>>>>>Error running xpath expression:
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>//dateTimeDomain|//nonNumericDomain|//numericDomain|/
> >>
> >>/access|/
> >>
> >>>>>>>>>>>/attribute
> >>>>>>>>>>>
> >>>>>>>>>>>List|//constraint|//coverage|//temporalCoverage|//geo
> >>
> >>graphicCov
> >>
> >>>>>>>>>>>List|erage|/
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>List|/t
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>axonomicCoverage|/dataset|/eml/dataset|//dataSource|/
> >>
> >>/dataTable
> >>
> >>>>>>>>>>>axonomicCoverage||//othe
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>axonomicCoverage|rE
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>ntity|//citation|//address|//conferenceLocation|//par
> >>
> >>ty|//origi
> >>
> >>>>>>>>>>>ntity|nator|/
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>ntity|/c
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>reator|//contact|//publisher|//editor|//recipient|//p
> >>
> >>erformer|/
> >>
> >>>>>>>>>>>reator|/instit
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>reator|ut
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>ion|//metadataProvider|//associatedParty|//personnel|
> >>
> >>//physical
> >>
> >>>>>>>>>>>ion||//conn
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>ion|ec
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>tionDefinition|//distribution|//researchProject|//pro
> >>
> >>ject|//rel
> >>
> >>>>>>>>>>>tionDefinition|atedPro
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>tionDefinition|je
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>ct|//software|//spatialRaster|//spatialReference|//sp
> >>
> >>atialVecto
> >>
> >>>>>>>>>>>ct|r|//sto
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>ct|re
> >>>>>>>>>>>>dProcedure|//view|//protocol|//additionalMetadata :
> >>
> >>Error in
> >>
> >>>>>>>>>>>>dProcedure|xml
> >>>>>>>>>>>>document.  This EML document is not valid because 
> the id 30
> >>>>>>>>>>>>occurs more than once.  IDs must be unique. </error>
> >>>>>>>>>>>>
> >>>>>>>>>>>>* scope:                ces_dataset
> >>>>>>>>>>>>* identifier:           30
> >>>>>>>>>>>>* revision:             1
> >>>>>>>>>>>>* documentType:         eml://ecoinformatics.org/eml-2.0.0
> >>>>>>>>>>>>* documentURL:
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>http://seinet.asu.edu/DataCatalog/getXanthoriaRecord.
> >>
> >>jsp?source
> >>
> >>>>>>>>>>>=ces_da
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>ta
> >>>>>>>>>>>>set_mohave&id=30
> >>>>>>>>>>>>*
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>*****************************************************
> >>
> >>*********
> >>
> >>>>>>>>>>>**********
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>*****
> >>>>>>>>>>>>
> >>>>>>>>>>>>What do you think?
> >>>>>>>>>>>>
> >>>>>>>>>>>>Corinna
> >>>>>>>>>>>>
> >>>>>>>>>>>>_______________________________________________
> >>>>>>>>>>>>eml-dev mailing list
> >>>>>>>>>>>>eml-dev at ecoinformatics.org
> >>>>>>>>>>>>http://www.ecoinformatics.org/mailman/listinfo/eml-dev
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>--
> >>>>>>>>>>>Mark Servilla, Ph.D.
> >>>>>>>>>>>
> >>>>>>>>>>>LTER Network Office
> >>>>>>>>>>>Department of Biology
> >>>>>>>>>>>MSC 03 2020
> >>>>>>>>>>>1 University of New Mexico
> >>>>>>>>>>>Albuquerque, NM 87131-0001
> >>>>>>>>>>>
> >>>>>>>>>>>servilla at lternet.edu
> >>>>>>>>>>>Office (505) 277-2619
> >>>>>>>>>>>Cell   (505) 453-8593
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>--
> >>>>>>>>>>>Mark Servilla, Ph.D.
> >>>>>>>>>>>
> >>>>>>>>>>>LTER Network Office
> >>>>>>>>>>>Department of Biology
> >>>>>>>>>>>MSC 03 2020
> >>>>>>>>>>>1 University of New Mexico
> >>>>>>>>>>>Albuquerque, NM 87131-0001
> >>>>>>>>>>>
> >>>>>>>>>>>servilla at lternet.edu
> >>>>>>>>>>>Office (505) 277-2619
> >>>>>>>>>>>Cell   (505) 453-8593
> >>>>>>>>>>>
> >>>>>>>>>>>--
> >>>>>>>>>>>James W. Brunt
> >>>>>>>>>>>Associate Director for Information Management
> >>>>>>>>>>>Long Term Ecological Research Network Office Department of 
> >>>>>>>>>>>Biology University of New Mexico
> >>>>>>>>>>>Albuquerque, NM 87131-1091
> >>>>>>>>>>>505 272 7085
> >>>>>>>>>>>jbrunt at lternet.edu
> >>>>>>>>>>>
> >>>>>>>>>>>-------------------------------------------------
> >>>>>>>>>>>Long-Term Ecological Research Network Mailing List
> >>>>>>>>>>>im at LTERnet.edu 
> >>
> >>http://sql.lternet.edu/cgi/mailgroups_view.pl?> im
> >>
> >>>>>>>>>>_______________________________________________
> >>>>>>>>>>eml-dev mailing list
> >>>>>>>>>>eml-dev at ecoinformatics.org
> >>>>>>>>>>http://www.ecoinformatics.org/mailman/listinfo/eml-dev
> >>
> >>--
> >>Mark Servilla, Ph.D.
> >>
> >>LTER Network Office
> >>Department of Biology
> >>MSC 03 2020
> >>1 University of New Mexico
> >>Albuquerque, NM 87131-0001
> >>
> >>servilla at lternet.edu
> >>Office (505) 277-2619
> >>Cell   (505) 453-8593
> > 
> > 
> 
> -- 
> -------------------------------------------------------------------
> Matt Jones                                     jones at nceas.ucsb.edu
> http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
> National Center for Ecological Analysis and Synthesis (NCEAS) 
> University of California Santa Barbara Interested in 
> ecological informatics? http://www.ecoinformatics.org
> -------------------------------------------------------------------
>