[eml-dev] quick question on adding citation information

Matt Jones jones at nceas.ucsb.edu
Fri Dec 6 09:51:55 PST 2013


Regarding your question about describing multiple data entity types in one
EML file, you can indeed.  eml-dataset contains a repeatable choice of data
entities, including data tables, rasters, vectors, etc.

Matt


On Fri, Dec 6, 2013 at 7:50 AM, Carl Boettiger <cboettig at gmail.com> wrote:

> Thanks all for the excellent explanations.  My own more naive thoughts as
> I process these suggestions:
>
> It does sound like under the current schema that additionalMetadata is a
> reasonable home.  If I understand correctly, this section is rather
> flexible and I could put an EML citation node under
> additionalMetadata/metadata.  Given this flexibility, I might be tempted to
> write the citation data out in RDFa with something more widely used such as
> the PRISM vocabulary (indeed I could copy that from the html headers of
> most publishers), or crossref's XML -- which perhaps only proves Matt's
> point about why additionalMetadata isn't ideal.
>
> On extending EML, it's not clear to me that `dataUsageCitation` (or
> whatever term is chosen) should be an element of `dataset` rather than of
> `eml`?  Personally I would have put it at `eml`, as a sister node to the
> `dataset`, `protocol` or `software` that the document might be describing.
>  If I wanted to publish an article & an eml metadata file describing a
> piece of software, I'd use eml/software I think, but then I'd need
> somewhere other than eml/dataset/dataUsageCitation to link my citation.
>
> I appreciate seeing Margaret's examples, though I think it is important
> for a machine using the EML file to be able to extract the bibliographic
> information (at least doi, if available) to the work that should cited when
> that data is used; so that having the citation data in either the abstract
> or only at a 'package' level above the metadata itself seems non-ideal.
>
> Thanks for the background on the shift from multiple RDF linked EML files
> to a single hierarchical file.  On this topic, I wonder if there might be
> any other places where exclusive elements limit expression in a single
> file: for instance, it is not obvious to me that I can describe both a
> dataTable and a spatialRaster in the same file (imagining I use both in the
> same analysis) Of course I can serialize these into different EML files,
> but not sure what the best practice would.
>
>
>
>
>
> On Fri, Dec 6, 2013 at 7:51 AM, Matt Jones <jones at nceas.ucsb.edu> wrote:
>
>> +1 on Chris' and Wade's comments so far.  I think its a shortcoming of
>> EML that there is no explicitly labeled field for this, and I've gotten
>> this question from many EML users.  I also agree with Wade that these
>> linkages quickly become stale.  Nevertheless, it would be good to have a
>> way to link to citations that use the data.  Chris' solution to add it to
>> additionalMetadata is certainly a workaround, but not very satisfying
>> because there won't be consistency across users as to how its embedded.  We
>> could consider adding an optional top level field to eml-dataset to provide
>> this, possibly something like:
>>
>> /eml/dataset/dataUsageCitation which would be of type CitationType
>>
>> I added a feature request ticket in Redmine to track this issue:
>>     https://projects.ecoinformatics.org/ecoinfo/issues/6283
>>
>> Thoughts?  Is such change worth a revision of the EML schemas?  This
>> issue has been on my radar for a long time, but has never reached the
>> critical point of triggering a version change, which has widespread impact.
>>  There are several other outstanding schema changes that might raise the
>> need for a new release, including fixing internationalization issues<https://projects.ecoinformatics.org/ecoinfo/issues/5728>,
>> compatibility with ISO issues<https://projects.ecoinformatics.org/ecoinfo/issues/5998>raised by GBIF, key/keyref
>> parser checking <https://projects.ecoinformatics.org/ecoinfo/issues/5731>,
>> and other items.  So maybe now's the right time?  I think these could be
>> done with backwards-compatible changes (all EML 2.1.1 documents would be
>> valid under the new schema with only a namespace change).
>>
>> Matt
>>
>>
>>
>>
>>
>> On Fri, Dec 6, 2013 at 5:55 AM, Wade Sheldon <sheldon at uga.edu> wrote:
>>
>>> Chris and Carl,
>>>
>>> I agree with Chris' interpretation of the specification, and would not
>>> recommend putting a general literature citation under
>>> methods/methodStep/citation or under software. Those elements are best used
>>> to link to protocols and other documents specific to the parent elements,
>>> and viewers would not think to look for citations referencing the entire
>>> data set there.
>>>
>>> And no, we did not specifically address this issue in the first LTER EML
>>> Best Practices document in 2004, and I don't recall that issue being
>>> addressed version 2 in 2011 either (but Margaret can correct me if I'm
>>> wrong).
>>>
>>> In my opinion, it makes less sense to embed citations to publications in
>>> data than citations to data in publications, so we do not attempt to
>>> shoe-horn citations into EML documents using additionalMetadata or other
>>> approaches. The data used in a publication is fixed at the time of
>>> publication, whereas a published data set will ideally be used and cited
>>> many times (hey - I'm an optimist), so literature citations in data sets
>>> would quickly stale and require ongoing document maintenance to keep
>>> current.
>>>
>>> I think the best approach to this issue, long-term, is to rely on data
>>> registries and link-outs associated with journals to provide this type of
>>> association.
>>>
>>> Regards,
>>>
>>> Wade Sheldon
>>> GCE-LTER Information Manager
>>>
>>>
>>>
>>> On 12/6/2013 8:18 AM, Christopher Jones wrote:
>>>
>>>> Hi Carl,
>>>>
>>>> I'd say what you're trying to do is pretty fundamental, and should be
>>>> straight forward.  However, I think you've highlighted an issue that is a
>>>> consequence of a design transition that happened in the EML schemas quite a
>>>> while ago.
>>>>
>>>> In the first designs of EML, the last of which was EML 2 Beta 6 <
>>>> https://code.ecoinformatics.org/code/eml/tags/RELEASE_EML_2_0_0_BETA_6/>,
>>>> each of the EML modules were linkable to each other in an RDF-like syntax
>>>> (see the <triple> tag in eml-resource <https://code.ecoinformatics.
>>>> org/code/eml/tags/RELEASE_EML_2_0_0_BETA_6/eml-resource.png>).  And so
>>>> what you're describing would entail creating an EML Dataset document, then
>>>> an EML Citation document, and then linking the two with a relationship
>>>> ("citation.1.1" "is citation for" "dataset.1.1").
>>>>
>>>>
>>>> This obviously provided plenty of flexibility, but also a degree of
>>>> complexity, and so the community decided to move toward a hierarchical
>>>> structure with a top-level eml.xsd schema.  In doing so, some of the module
>>>> relationships were hard-coded into the EML schema hierarchy, and at the top
>>>> level, datasets and citations were encoded as top-level choices, and as you
>>>> point out, are mutually exclusive.
>>>>
>>>> As I do a quick scan of the schemas for references to eml-literature
>>>> CitationType, the module is used in the following other modules:
>>>>
>>>> eml.xsd
>>>> eml-attribute.xsd
>>>> eml-coverage.xsd
>>>> eml-methods.xsd
>>>> eml-physical.xsd
>>>> eml-project.xsd
>>>>
>>>> Given the history above, I think that the intention is to document your
>>>> paper at the /eml/citation level.  Hard-coded links to citations in
>>>> eml-attribute, eml-coverage, eml-methods, eml-physical, and eml-project all
>>>> describe links to citations that are very specific in scope (e.g., in
>>>> eml-methods, the citation is intended to document a specific procedure
>>>> used).
>>>>
>>>> So, to me, the use of a citation in these sub-modules to describe a
>>>> dataset-level citation doesn't quite fit.  I'd love to hear what others are
>>>> doing to satisfy this need.  In particular, did the LTER's EML Best
>>>> Practices group touch on this subject while writing that document?
>>>>
>>>> As one alternative suggestion, I should point out the /eml/dataset/additionalMetadata
>>>> <https://code.ecoinformatics.org/code/eml/tags/RELEASE_EML_
>>>> 2_1_1/eml.png> field.  This structure was added after EML 2 Beta 6 in
>>>> order to retain some of the flexibility that the triple structure used to
>>>> provide.  This element has a <metadata> child, which in theory could
>>>> contain a full /eml/citation document, and the sibling <describes> element
>>>> could point to the id attribute value of your /eml/dataset element.
>>>>
>>>>
>>>> I'd like to hear what others think about this solution.  Obviously, the
>>>> <describes> link is a more semantically vague link from the citation
>>>> documentation to the dataset documentation than a predicate like
>>>> "isCitationFor", but it at least provides a link.
>>>>
>>>> If the community hasn't already come to a consensus on this issue, I
>>>> think this thread might help in getting there.  Thanks for bringing it up,
>>>> Carl.
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> On Dec 5, 2013, at 4:43 PM, Carl Boettiger wrote:
>>>>
>>>>  Sorry if this is a bit elementary.
>>>>>
>>>>> Let's say I have an EML file describing data that is published as part
>>>>> of the supplemental materials of a paper.  It seems reasonable to add a
>>>>> citation to that paper in the metadata.  Is the best place for such a
>>>>> citation:
>>>>>
>>>>>     eml/dataset/methods/methodsStep/citation
>>>>>
>>>>> preceded by a
>>>>>
>>>>>     eml/dataset/methods/methodsStep/description
>>>>>
>>>>> explaining that the citation refers to the paper in which the data was
>>>>> first published, etc?
>>>>>
>>>>>
>>>>> In a related question, if the EML was documenting software instead,
>>>>> e.g. eml/software, where would such a citation go?  I don't see <
>>>>> http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-software.png>
>>>>> `citation` as child of anything under `eml/software`.
>>>>>
>>>>>
>>>>>
>>>>> Or am I missing the boat entirely here and the natural thing to do is
>>>>> have a separate EML file, eml/citation, and somehow reference that file?  I
>>>>> don't really understand the motivation for having an eml file that consists
>>>>> only of eml/citation, but as I understand, eml/citation and eml/dataset etc
>>>>> are exclusive, right?
>>>>>
>>>>> Thanks for the help!
>>>>>
>>>>> - Carl
>>>>>
>>>>> --
>>>>> Carl Boettiger
>>>>> UC Santa Cruz
>>>>> http://carlboettiger.info/
>>>>> _______________________________________________
>>>>> Eml-dev mailing list
>>>>> Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
>>>>> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Eml-dev mailing list
>>>> Eml-dev at ecoinformatics.org
>>>> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>>
>>>
>>> --
>>> ____________________________________
>>>
>>>  Wade M. Sheldon
>>>  GCE-LTER Information Manager
>>>  School of Marine Programs
>>>  University of Georgia
>>>  Athens, GA 30602-3636
>>>  Email: sheldon at uga.edu
>>>  WWW: http://gce-lter.marsci.uga.edu/bios/wsheldon
>>>
>>>
>>> _______________________________________________
>>> Eml-dev mailing list
>>> Eml-dev at ecoinformatics.org
>>> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>
>>
>>
>> _______________________________________________
>> Eml-dev mailing list
>> Eml-dev at ecoinformatics.org
>> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>
>>
>
>
> --
> Carl Boettiger
> UC Santa Cruz
> http://carlboettiger.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20131206/5121a125/attachment-0001.html>


More information about the Eml-dev mailing list