[eml-dev] Proposed revision to eml-literature module...

Matt Jones jones at nceas.ucsb.edu
Tue Aug 16 08:59:54 PDT 2005


Well, I'm a usual suspect, but I'll still chime in.  When we originally
discussed the top level container, we did agree that it should contain
children that represent "one" resource for consistency.  But we never
said that the child couldn't contain repeating children itself.  As Mark
mentions, dataset is a good example (it contains multiple entities), and
is exactly analogous to having a "bibliography" that contains multiple
citations.  So in this case the eml document can contain "one
bibliography", which I think would be a valuable change.  So I think we
should go ahead and make this change.

In terms of backwards compatibility, it will break the current schemas,
but is probably not particularly onerous because I have not seen anybody
use top level elements other than the dataset module.  So there are
probably only one or two sites, if any, that would be affect.  The fix
would also be simple (change "eml/citation" to
"eml/bibliography/citation") so retrofitting shouldn't be difficult for
people either.

So, I think we should move ahead with this one as proposed.

Matt

Mark Servilla wrote:
> It is very likely that I missing the historical perspective and wisdom
> of the original creators of eml, however, I would argue that there
> exists an analogy between a new "bibliography" top-level module that
> permits multiple "citation" elements and the other top-level modules,
> namely dataset, software, and protocol.  For example, dataset allows
> multiple dataTable, spatialRaster, spatialVector, storedProcedure, etc.
> objects; software allows multiple implementation objects; and, protocol
> allows multiple proceduralStep objects.  Are these models that different
> from what has been proposed for multiple citation objects within a
> bibliography module?
> 
> Or, is the real issue the argument against change and backward
> compatibility?  In essence, the proposed change does not restructure the
> original "citation" object (at least not too much, there is the notion
> of a "contact" associated with each citation).  It somewhat follows what
> Peter suggested - create new schema "Y" (the bibliography module) to
> incorporate multiple instances of old schema "X" (the citation module).
>  EML should evolve (as does all software) to accommodate needed
> enhancements; albeit, it should evolve in an organized fashion.  Some of
> the other potential changes to eml-2.0.1 that have been informally
> suggested would also break backward compatibility (e.g., external
> references).  Also, I believe Metacat supports multiple versions of eml,
> so the pain of change may be spread over time.
> 
> One final comment: as James mentioned, it would really be nice to hear
> from others in the eml community other than the usual suspects.
> 
> Sincerely,
> Mark
> 
> James W Brunt wrote:
> 
>> OK, I concede the second point - the other parallel elements do refer
>> to 1 something and not a collection - it did seem more intuitive to me
>> to have 1 bibliography than 1 citation in the way that "eml" documents
>> currently work - in the "old" way of relating objects it made more
>> sense for literature to be 1 citation. So, do we have a category of 
>> schemas, deemed "useful", that we make public for the purpose of
>> communication and validation? I'm not dead set on changing eml but I
>> am intent on creating a useful container that we can write
>> applications to. Be nice to see some discussion on this from
>> others.......
>>
>> James
>>
>> Peter McCartney wrote:
>>
>>> I hope my email didn't sound like I don't think a multi-record xml
>>> format wasn't useful - I just don't think the approach suggested is very
>>> extensible. If we define schema x to describe a single item and then
>>> decide we want to include multiple x's in a single file, it seems far
>>> more extensible to create a new schema y that imports schema x rather
>>> than redefining x. Its only a matter of time before someone says - "oh,
>>> id like to also make a bibliograpy of datasets" and then we have to make
>>> more changes to schema files.
>>> When I said container for multiple documents I did mean multiple
>>> instances of the citation element- I just examined the schemas in your
>>> cvs and there was no confusion over what you proposed.  I'm not sure I
>>> understood the comment about dataset, protocl, software allowing
>>> multiple entries in their protocols because they don't. An eml document
>>> contains one element of either dataset, citation, software, protocol
>>> etc. Now, if you wanted to change each of those to be unbounded inside
>>> <eml>, then you accomplish the same thing you've requested without
>>> breaking any existing eml documents. However, I think this introduces
>>> confusion over what an "eml" document is - does it describe one resource
>>> or collections of resources? I'd like to second something that Mark
>>> did say though. I do think it
>>> would be useful to add reference to external content from within eml.
>>>
>>>
>>> Peter McCartney(peter.mccartney at asu.edu)
>>> International Institute for Sustainability
>>> Arizona State University
>>> 480-965-6791
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: eml-dev-bounces at ecoinformatics.org
>>>> [mailto:eml-dev-bounces at ecoinformatics.org] On Behalf Of Matt Jones
>>>> Sent: Tuesday, August 09, 2005 4:20 PM
>>>> To: Mark Servilla
>>>> Cc: eml-dev at ecoinformatics.org
>>>> Subject: Re: [eml-dev] Proposed revision to eml-literature module...
>>>>
>>>>
>>>> Mark,
>>>>
>>>> I think I agree with you -- the proposed bibliography element is a
>>>> container that allows lists of citations that is very useful and
>>>> should be a direct part of EML.  I don't think the long lists of
>>>> elements is a real problem, as its just an XML document and
>>>> judicious use of an event parser like SAX allows one to handle even
>>>> the largest XML documents (use of DOM or JDOM can definitely have a
>>>> negative impact on performance in a situation like this).
>>>>
>>>> I haven't had a chance to review the proposal fully yet (I will do
>>>> so when I return), but at first glance it seemed like a beneficial
>>>> change.
>>>>
>>>> Matt
>>>>
>>>> Mark Servilla wrote:
>>>>
>>>>> Hi Peter,
>>>>>
>>>>> Thank you for your thoughts.  I've added some additional comments
>>>>> below.
>>>>>
>>>>> Sincerely,
>>>>> Mark
>>>>>
>>>>> Peter McCartney wrote:
>>>>>
>>>>>
>>>>>> Ive thought about this since it was presented last week and 
>>>>
>>>>
>>>>
>>>> I have to
>>>>
>>>>>> say I don't believe its necessary. The purpose of EML is to 
>>>>
>>>>
>>>>
>>>> provide a
>>>>
>>>>>> standard for describing an information resources. We discussed the
>>>>>> issue of using it as a container for many documents early on and
>>>>>> decided this was not appropriate. Early experiments using 
>>>>
>>>>
>>>>
>>>> this type of
>>>>
>>>>>> schema with
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> For clarity, what we have proposed is not a container for multiple
>>>>> documents, but only for multiple document citations - 
>>>>
>>>>
>>>>
>>>> similar to how the
>>>>
>>>>> dataset, software, and protocol modules allow for multiple entries
>>>>> within each of their respective modules.  I realize that there is a
>>>>> concern for the volume that could be generated within a 
>>>>
>>>>
>>>>
>>>> "bibliography"
>>>>
>>>>> module, but similar constraints are not enforced within the other
>>>>> modules and volume with in-line data could certainly far out-weigh
>>>>> multiple citation entries (especially, any remote sensing 
>>>>
>>>>
>>>>
>>>> imagery).  In
>>>>
>>>>> such cases, asynchronous communication issues should be 
>>>>
>>>>
>>>>
>>>> addressed at a
>>>>
>>>>> different level of the application.
>>>>>
>>>>>
>>>>>
>>>>>> Xanthoria revealted that the file could potentially grow very
>>>>>> large with no warning, resulting in timeouts and hangs.
>>>>>>
>>>>>> I think an equivalent solution that does not introduce any
>>>>>> backward compatibility is to define a new schema called
>>>>>> "bibliography" and import the eml-literature.xsd using the
>>>>>> citation element as a repeatable element within that schema. We
>>>>>> have done this 
>>>>
>>>>
>>>>
>>>> lots in our
>>>>
>>>>>> xylopia project where we wanted to define a schema for one 
>>>>
>>>>
>>>>
>>>> purpose or
>>>>
>>>>>> another that contained within it some eml document. Any aplication
>>>>>> that reads such a document can take each individual 
>>>>
>>>>
>>>>
>>>> citation element
>>>>
>>>>>> and write it out as valid EML document on the receiving end 
>>>>
>>>>
>>>>
>>>> simply by
>>>>
>>>>>> generating a new <eml> tag and inserting the entire <citation> or
>>>>>> <dataset> tag
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> But isn't this really a work around for short comings in eml? 
>>>>> Wouldn't
>>>>> correcting eml be a more appealing fix, thus not requiring 
>>>>
>>>>
>>>>
>>>> each domain
>>>>
>>>>> to develop an eml work-around - and, making the correction 
>>>>
>>>>
>>>>
>>>> part of the
>>>>
>>>>> standard?
>>>>>
>>>>>
>>>>>
>>>>>> inside that. An even better solution is to simply use the harvest
>>>>>> document format used for metacat uploads that contains only 
>>>>
>>>>
>>>>
>>>> pointers
>>>>
>>>>>> to the individual documents so they can be retrieved at a pace
>>>>>> that the ingesting service can determine. SEINet uses bibliography
>>>>>> files that look like this for managing user's bibliographies. Ive 
>>>>
>>>>
>>>>
>>>> attached a
>>>>
>>>>>> sample.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Agreed, if I understand what you are saying.  The proposed 
>>>>
>>>>
>>>>
>>>> change only
>>>>
>>>>> contains references to the citation (not the actual document).  If
>>>>> changes to eml include an external referencing mechanism 
>>>>
>>>>
>>>>
>>>> (wasn't this
>>>>
>>>>> once implemented?), then this should be a no brainer.
>>>>>
>>>>>
>>>>>
>>>>>> Peter McCartney(peter.mccartney at asu.edu)
>>>>>> International Institute for Sustainability
>>>>>> Arizona State University
>>>>>> 480-965-6791
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: eml-dev-bounces at ecoinformatics.org
>>>>>>> [mailto:eml-dev-bounces at ecoinformatics.org] On Behalf Of 
>>>>
>>>>
>>>>
>>>> Mark Servilla
>>>>
>>>>>>> Sent: Tuesday, August 09, 2005 11:34 AM
>>>>>>> To: eml-dev at ecoinformatics.org
>>>>>>> Cc: Margaret O'Brien
>>>>>>> Subject: [eml-dev] Proposed revision to eml-literature module...
>>>>>>>
>>>>>>>
>>>>>>> Hello EML Community,
>>>>>>>
>>>>>>> The LTER Network Office and Santa Barbara Coastal LTER site
>>>>>>> would like to propose a change to the eml-literature module.  The
>>>>>>> proposed change is to move the "citation" element subtree
>>>>>>> currently at the 
>>>>
>>>>
>>>>
>>>> top module
>>>>
>>>>>>> level (where the cardinality is 1) to an inner and new top level
>>>>>>> module, "bibliography", where the cardinality of citation would
>>>>>>> be 1 to infinity.  The goal of this change is to better reflect 
>>>>
>>>>
>>>>
>>>> management of
>>>>
>>>>>>> publication style citation lists as opposed to a single 
>>>>
>>>>
>>>>
>>>> citation for
>>>>
>>>>>>> each eml document instance.  Note that a single citation is still
>>>>>>> very possible.
>>>>>>>
>>>>>>> We have also added the "contact" subtree within the 
>>>>
>>>>
>>>>
>>>> "bibliography" at
>>>>
>>>>>>> the same level as "citation", in addition to adding "contact"
>>>>>>> within the actual "citation" subtree.  The first 
>>>>
>>>>
>>>>
>>>> "bibliography/contact" would be
>>>>
>>>>>>> used to denote the manager of the bibliography, where as the
>>>>>>> "citation/contact" would reference the manager of the actual
>>>>>>> citation. The following link is to the revised schema within our
>>>>>>> public CVS
>>>>>>> (http://cvs.lternet.edu/cgi-bin/viewcvs.cgi/NIS/projects/bibli
>>>>>>
>>>>>>
>>>>>>
>>>>>> ography/eml-2.0.1bib/).
>>>>>> I have also attached a simple "png" view of the proposed 
>>>>
>>>>
>>>>
>>>> change in
>>>>
>>>>>> XMLSpy graphical notation as a quick reference.
>>>>>>
>>>>>> We were also discussing the merit of having the "title" 
>>>>
>>>>
>>>>
>>>> element in the
>>>>
>>>>>> eml-resource module change from a simple element to a 
>>>>
>>>>
>>>>
>>>> complex element,
>>>>
>>>>>> and include within the title subtree similar structure to 
>>>>
>>>>
>>>>
>>>> the "section"
>>>>
>>>>>> and "para" elements (found within "abstract") for those 
>>>>
>>>>
>>>>
>>>> more complicated
>>>>
>>>>>> titles that include text-based style and formatting.  We did not,
>>>>>> however, modify the the test schema to include this change 
>>>>
>>>>
>>>>
>>>> (at least at
>>>>
>>>>>> this point).
>>>>>>
>>>>>> We realize that any such change to the current EML-2.0.1 standard
>>>>>> would
>>>>>> certainly break backward compatibility.  However, it may be 
>>>>
>>>>
>>>>
>>>> acceptable
>>>>
>>>>>> if/when the next major eml release would potential have the 
>>>>
>>>>
>>>>
>>>> same effect.
>>>>
>>>>>> Your thoughts are most welcome on this proposed change.
>>>>>>
>>>>>> Sincerely,
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------
>>>>
>>>>
>>>>
>>>> ----------
>>>>
>>>>>> -- 
>>>>>>
>>>>>> <bibliography creationDate="Mar 8, 2004"
>>>>>> id="1078769397263"><name>peter</name><item id="101 " schema="EML
>>>>>> Dataset" src="ces_dataset"/><item id="102 " schema="EML Dataset"
>>>>>> src="ces_dataset"/><item id="801" schema="EML Literature"
>>>>>> src="ces_literature"/><item id="805" schema="EML Literature"
>>>>>> src="ces_literature"/></bibliography>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> -- 
>>>> -------------------------------------------------------------------
>>>> Matt Jones                                     jones at nceas.ucsb.edu
>>>> http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
>>>> National Center for Ecological Analysis and Synthesis (NCEAS)
>>>> University of California Santa Barbara Interested in ecological
>>>> informatics? http://www.ecoinformatics.org
>>>> -------------------------------------------------------------------
>>>> _______________________________________________
>>>> Eml-dev mailing list
>>>> Eml-dev at ecoinformatics.org
>>>> http://mercury.nceas.ucsb.edu/ecoinformatics/m>
>>>
>>>
>>>
>>> ailman/listinfo/eml-dev
>>>
>>> _______________________________________________
>>> Eml-dev mailing list
>>> Eml-dev at ecoinformatics.org
>>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>
>>
>>
> 

-- 
-------------------------------------------------------------------
Matt Jones                                     jones at nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Fax: 425-920-2439    Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
University of California Santa Barbara
Interested in ecological informatics? http://www.ecoinformatics.org
-------------------------------------------------------------------


More information about the Eml-dev mailing list