[eml-dev] quick question on adding citation information

Margaret O'Brien margaret.obrien at ucsb.edu
Fri Dec 6 08:42:01 PST 2013


Hi All -
I agree with all the below comments, with one major exception: I'm 95% 
sure that in EML, the citation element always appears as a child of the 
methods/protocol tree, so in my opinion, it's for a published protocol, 
not a paper related to the data.

In practice, in our own system (SBC LTER), if there is a 1:1 
relationship between the data package and the paper, we put the paper 
citation in the data package abstract using the ulink element. An 
example is:
http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.54
Since most often, the paper is published first, the paper's url can be 
constructed with its DOI

If the data are a time series, and the EML metadata record is later 
updated (to reflect the new data), I also update the abstract to include 
a phrase like "an earlier version of this dataset was used by ..." , as in
http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.21

Both these EML-examples can be downloaded from the link at the bottom of 
the view.

To make the linkage between the paper and the data (including a revision 
for an ongoing time-series), I rely on systems external to both the data 
catalog and the publications catalog. Generally, this is a many:many 
relationship, and it makes more sense to update the linkage without 
affecting the metadata for either resource (paper or data package)

I would like to comment further per Matt's request, but will save that 
for another message. LTER actually has some overdue contributions in 
this area, and others are also examining ways to create these 
connections. For now, some related links are below.

Issues already recorded for EML that are related to this problem
https://projects.ecoinformatics.org/ecoinfo/issues/3503
https://projects.ecoinformatics.org/ecoinfo/issues/3163
https://projects.ecoinformatics.org/ecoinfo/issues/2076

Here is a summary of Wade's use (at GCE LTER) of the lter-project 
extension of EML. This extension is capable of linking papers to 
datasets. This newsletter article has a good set of references for 
lter-project.xsd
http://databits.lternet.edu/fall-2010/implementing-projectdb-georgia-coastal-ecosystems-lter

This is not an "elementary" question at all!

Margaret

-----------
Margaret O'Brien
Information Management
Santa Barbara Coastal LTER
Marine Science Institute, UCSB
Santa Barbara, CA 93106
805-893-2071 (voice)
http://sbc.lternet.edu

On 12/6/13 7:51 AM, Matt Jones wrote:
> +1 on Chris' and Wade's comments so far.  I think its a shortcoming of 
> EML that there is no explicitly labeled field for this, and I've 
> gotten this question from many EML users.  I also agree with Wade that 
> these linkages quickly become stale.  Nevertheless, it would be good 
> to have a way to link to citations that use the data.  Chris' solution 
> to add it to additionalMetadata is certainly a workaround, but not 
> very satisfying because there won't be consistency across users as to 
> how its embedded.  We could consider adding an optional top level 
> field to eml-dataset to provide this, possibly something like:
>
> /eml/dataset/dataUsageCitation which would be of type CitationType
>
> I added a feature request ticket in Redmine to track this issue:
> https://projects.ecoinformatics.org/ecoinfo/issues/6283
>
> Thoughts?  Is such change worth a revision of the EML schemas?  This 
> issue has been on my radar for a long time, but has never reached the 
> critical point of triggering a version change, which has widespread 
> impact.  There are several other outstanding schema changes that might 
> raise the need for a new release, including fixing 
> internationalization issues 
> <https://projects.ecoinformatics.org/ecoinfo/issues/5728>, 
> compatibility with ISO issues 
> <https://projects.ecoinformatics.org/ecoinfo/issues/5998> raised by 
> GBIF, key/keyref parser checking 
> <https://projects.ecoinformatics.org/ecoinfo/issues/5731>, and other 
> items.  So maybe now's the right time?  I think these could be done 
> with backwards-compatible changes (all EML 2.1.1 documents would be 
> valid under the new schema with only a namespace change).
>
> Matt
>
>
>
>
>
> On Fri, Dec 6, 2013 at 5:55 AM, Wade Sheldon <sheldon at uga.edu 
> <mailto:sheldon at uga.edu>> wrote:
>
>     Chris and Carl,
>
>     I agree with Chris' interpretation of the specification, and would
>     not recommend putting a general literature citation under
>     methods/methodStep/citation or under software. Those elements are
>     best used to link to protocols and other documents specific to the
>     parent elements, and viewers would not think to look for citations
>     referencing the entire data set there.
>
>     And no, we did not specifically address this issue in the first
>     LTER EML Best Practices document in 2004, and I don't recall that
>     issue being addressed version 2 in 2011 either (but Margaret can
>     correct me if I'm wrong).
>
>     In my opinion, it makes less sense to embed citations to
>     publications in data than citations to data in publications, so we
>     do not attempt to shoe-horn citations into EML documents using
>     additionalMetadata or other approaches. The data used in a
>     publication is fixed at the time of publication, whereas a
>     published data set will ideally be used and cited many times (hey
>     - I'm an optimist), so literature citations in data sets would
>     quickly stale and require ongoing document maintenance to keep
>     current.
>
>     I think the best approach to this issue, long-term, is to rely on
>     data registries and link-outs associated with journals to provide
>     this type of association.
>
>     Regards,
>
>     Wade Sheldon
>     GCE-LTER Information Manager
>
>
>
>     On 12/6/2013 8:18 AM, Christopher Jones wrote:
>
>         Hi Carl,
>
>         I'd say what you're trying to do is pretty fundamental, and
>         should be straight forward.  However, I think you've
>         highlighted an issue that is a consequence of a design
>         transition that happened in the EML schemas quite a while ago.
>
>         In the first designs of EML, the last of which was EML 2 Beta
>         6
>         <https://code.ecoinformatics.org/code/eml/tags/RELEASE_EML_2_0_0_BETA_6/>,
>         each of the EML modules were linkable to each other in an
>         RDF-like syntax (see the <triple> tag in eml-resource
>         <https://code.ecoinformatics.org/code/eml/tags/RELEASE_EML_2_0_0_BETA_6/eml-resource.png>).
>          And so what you're describing would entail creating an EML
>         Dataset document, then an EML Citation document, and then
>         linking the two with a relationship ("citation.1.1" "is
>         citation for" "dataset.1.1").
>
>
>         This obviously provided plenty of flexibility, but also a
>         degree of complexity, and so the community decided to move
>         toward a hierarchical structure with a top-level eml.xsd
>         schema.  In doing so, some of the module relationships were
>         hard-coded into the EML schema hierarchy, and at the top
>         level, datasets and citations were encoded as top-level
>         choices, and as you point out, are mutually exclusive.
>
>         As I do a quick scan of the schemas for references to
>         eml-literature CitationType, the module is used in the
>         following other modules:
>
>         eml.xsd
>         eml-attribute.xsd
>         eml-coverage.xsd
>         eml-methods.xsd
>         eml-physical.xsd
>         eml-project.xsd
>
>         Given the history above, I think that the intention is to
>         document your paper at the /eml/citation level.  Hard-coded
>         links to citations in eml-attribute, eml-coverage,
>         eml-methods, eml-physical, and eml-project all describe links
>         to citations that are very specific in scope (e.g., in
>         eml-methods, the citation is intended to document a specific
>         procedure used).
>
>         So, to me, the use of a citation in these sub-modules to
>         describe a dataset-level citation doesn't quite fit.  I'd love
>         to hear what others are doing to satisfy this need.  In
>         particular, did the LTER's EML Best Practices group touch on
>         this subject while writing that document?
>
>         As one alternative suggestion, I should point out the
>         /eml/dataset/additionalMetadata
>         <https://code.ecoinformatics.org/code/eml/tags/RELEASE_EML_2_1_1/eml.png>
>         field.  This structure was added after EML 2 Beta 6 in order
>         to retain some of the flexibility that the triple structure
>         used to provide.  This element has a <metadata> child, which
>         in theory could contain a full /eml/citation document, and the
>         sibling <describes> element could point to the id attribute
>         value of your /eml/dataset element.
>
>
>         I'd like to hear what others think about this solution.
>          Obviously, the <describes> link is a more semantically vague
>         link from the citation documentation to the dataset
>         documentation than a predicate like "isCitationFor", but it at
>         least provides a link.
>
>         If the community hasn't already come to a consensus on this
>         issue, I think this thread might help in getting there.
>          Thanks for bringing it up, Carl.
>
>         Cheers,
>         Chris
>
>         On Dec 5, 2013, at 4:43 PM, Carl Boettiger wrote:
>
>             Sorry if this is a bit elementary.
>
>             Let's say I have an EML file describing data that is
>             published as part of the supplemental materials of a
>             paper.  It seems reasonable to add a citation to that
>             paper in the metadata.  Is the best place for such a citation:
>
>                 eml/dataset/methods/methodsStep/citation
>
>             preceded by a
>
>                 eml/dataset/methods/methodsStep/description
>
>             explaining that the citation refers to the paper in which
>             the data was first published, etc?
>
>
>             In a related question, if the EML was documenting software
>             instead, e.g. eml/software, where would such a citation
>             go?  I don't see
>             <http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-software.png>
>             `citation` as child of anything under `eml/software`.
>
>
>
>             Or am I missing the boat entirely here and the natural
>             thing to do is have a separate EML file, eml/citation, and
>             somehow reference that file?  I don't really understand
>             the motivation for having an eml file that consists only
>             of eml/citation, but as I understand, eml/citation and
>             eml/dataset etc are exclusive, right?
>
>             Thanks for the help!
>
>             - Carl
>
>             -- 
>             Carl Boettiger
>             UC Santa Cruz
>             http://carlboettiger.info/
>             _______________________________________________
>             Eml-dev mailing list
>             Eml-dev at ecoinformatics.org
>             <mailto:Eml-dev at ecoinformatics.org>
>             <mailto:Eml-dev at ecoinformatics.org
>             <mailto:Eml-dev at ecoinformatics.org>>
>             http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>
>
>         _______________________________________________
>         Eml-dev mailing list
>         Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
>         http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>     -- 
>     ____________________________________
>
>      Wade M. Sheldon
>      GCE-LTER Information Manager
>      School of Marine Programs
>      University of Georgia
>      Athens, GA 30602-3636
>      Email: sheldon at uga.edu <mailto:sheldon at uga.edu>
>      WWW: http://gce-lter.marsci.uga.edu/bios/wsheldon
>
>
>     _______________________________________________
>     Eml-dev mailing list
>     Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
>     http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev



More information about the Eml-dev mailing list