[eml-dev] quick question on adding citation information

Carl Boettiger cboettig at gmail.com
Fri Dec 6 08:50:08 PST 2013


Thanks all for the excellent explanations.  My own more naive thoughts as I
process these suggestions:

It does sound like under the current schema that additionalMetadata is a
reasonable home.  If I understand correctly, this section is rather
flexible and I could put an EML citation node under
additionalMetadata/metadata.  Given this flexibility, I might be tempted to
write the citation data out in RDFa with something more widely used such as
the PRISM vocabulary (indeed I could copy that from the html headers of
most publishers), or crossref's XML -- which perhaps only proves Matt's
point about why additionalMetadata isn't ideal.

On extending EML, it's not clear to me that `dataUsageCitation` (or
whatever term is chosen) should be an element of `dataset` rather than of
`eml`?  Personally I would have put it at `eml`, as a sister node to the
`dataset`, `protocol` or `software` that the document might be describing.
 If I wanted to publish an article & an eml metadata file describing a
piece of software, I'd use eml/software I think, but then I'd need
somewhere other than eml/dataset/dataUsageCitation to link my citation.

I appreciate seeing Margaret's examples, though I think it is important for
a machine using the EML file to be able to extract the bibliographic
information (at least doi, if available) to the work that should cited when
that data is used; so that having the citation data in either the abstract
or only at a 'package' level above the metadata itself seems non-ideal.

Thanks for the background on the shift from multiple RDF linked EML files
to a single hierarchical file.  On this topic, I wonder if there might be
any other places where exclusive elements limit expression in a single
file: for instance, it is not obvious to me that I can describe both a
dataTable and a spatialRaster in the same file (imagining I use both in the
same analysis) Of course I can serialize these into different EML files,
but not sure what the best practice would.





On Fri, Dec 6, 2013 at 7:51 AM, Matt Jones <jones at nceas.ucsb.edu> wrote:

> +1 on Chris' and Wade's comments so far.  I think its a shortcoming of EML
> that there is no explicitly labeled field for this, and I've gotten this
> question from many EML users.  I also agree with Wade that these linkages
> quickly become stale.  Nevertheless, it would be good to have a way to link
> to citations that use the data.  Chris' solution to add it to
> additionalMetadata is certainly a workaround, but not very satisfying
> because there won't be consistency across users as to how its embedded.  We
> could consider adding an optional top level field to eml-dataset to provide
> this, possibly something like:
>
> /eml/dataset/dataUsageCitation which would be of type CitationType
>
> I added a feature request ticket in Redmine to track this issue:
>     https://projects.ecoinformatics.org/ecoinfo/issues/6283
>
> Thoughts?  Is such change worth a revision of the EML schemas?  This issue
> has been on my radar for a long time, but has never reached the critical
> point of triggering a version change, which has widespread impact.  There
> are several other outstanding schema changes that might raise the need for
> a new release, including fixing internationalization issues<https://projects.ecoinformatics.org/ecoinfo/issues/5728>,
> compatibility with ISO issues<https://projects.ecoinformatics.org/ecoinfo/issues/5998>raised by GBIF, key/keyref
> parser checking <https://projects.ecoinformatics.org/ecoinfo/issues/5731>,
> and other items.  So maybe now's the right time?  I think these could be
> done with backwards-compatible changes (all EML 2.1.1 documents would be
> valid under the new schema with only a namespace change).
>
> Matt
>
>
>
>
>
> On Fri, Dec 6, 2013 at 5:55 AM, Wade Sheldon <sheldon at uga.edu> wrote:
>
>> Chris and Carl,
>>
>> I agree with Chris' interpretation of the specification, and would not
>> recommend putting a general literature citation under
>> methods/methodStep/citation or under software. Those elements are best used
>> to link to protocols and other documents specific to the parent elements,
>> and viewers would not think to look for citations referencing the entire
>> data set there.
>>
>> And no, we did not specifically address this issue in the first LTER EML
>> Best Practices document in 2004, and I don't recall that issue being
>> addressed version 2 in 2011 either (but Margaret can correct me if I'm
>> wrong).
>>
>> In my opinion, it makes less sense to embed citations to publications in
>> data than citations to data in publications, so we do not attempt to
>> shoe-horn citations into EML documents using additionalMetadata or other
>> approaches. The data used in a publication is fixed at the time of
>> publication, whereas a published data set will ideally be used and cited
>> many times (hey - I'm an optimist), so literature citations in data sets
>> would quickly stale and require ongoing document maintenance to keep
>> current.
>>
>> I think the best approach to this issue, long-term, is to rely on data
>> registries and link-outs associated with journals to provide this type of
>> association.
>>
>> Regards,
>>
>> Wade Sheldon
>> GCE-LTER Information Manager
>>
>>
>>
>> On 12/6/2013 8:18 AM, Christopher Jones wrote:
>>
>>> Hi Carl,
>>>
>>> I'd say what you're trying to do is pretty fundamental, and should be
>>> straight forward.  However, I think you've highlighted an issue that is a
>>> consequence of a design transition that happened in the EML schemas quite a
>>> while ago.
>>>
>>> In the first designs of EML, the last of which was EML 2 Beta 6 <
>>> https://code.ecoinformatics.org/code/eml/tags/RELEASE_EML_2_0_0_BETA_6/>,
>>> each of the EML modules were linkable to each other in an RDF-like syntax
>>> (see the <triple> tag in eml-resource <https://code.ecoinformatics.
>>> org/code/eml/tags/RELEASE_EML_2_0_0_BETA_6/eml-resource.png>).  And so
>>> what you're describing would entail creating an EML Dataset document, then
>>> an EML Citation document, and then linking the two with a relationship
>>> ("citation.1.1" "is citation for" "dataset.1.1").
>>>
>>>
>>> This obviously provided plenty of flexibility, but also a degree of
>>> complexity, and so the community decided to move toward a hierarchical
>>> structure with a top-level eml.xsd schema.  In doing so, some of the module
>>> relationships were hard-coded into the EML schema hierarchy, and at the top
>>> level, datasets and citations were encoded as top-level choices, and as you
>>> point out, are mutually exclusive.
>>>
>>> As I do a quick scan of the schemas for references to eml-literature
>>> CitationType, the module is used in the following other modules:
>>>
>>> eml.xsd
>>> eml-attribute.xsd
>>> eml-coverage.xsd
>>> eml-methods.xsd
>>> eml-physical.xsd
>>> eml-project.xsd
>>>
>>> Given the history above, I think that the intention is to document your
>>> paper at the /eml/citation level.  Hard-coded links to citations in
>>> eml-attribute, eml-coverage, eml-methods, eml-physical, and eml-project all
>>> describe links to citations that are very specific in scope (e.g., in
>>> eml-methods, the citation is intended to document a specific procedure
>>> used).
>>>
>>> So, to me, the use of a citation in these sub-modules to describe a
>>> dataset-level citation doesn't quite fit.  I'd love to hear what others are
>>> doing to satisfy this need.  In particular, did the LTER's EML Best
>>> Practices group touch on this subject while writing that document?
>>>
>>> As one alternative suggestion, I should point out the /eml/dataset/additionalMetadata
>>> <https://code.ecoinformatics.org/code/eml/tags/RELEASE_EML_2_1_1/eml.png>
>>> field.  This structure was added after EML 2 Beta 6 in order to retain some
>>> of the flexibility that the triple structure used to provide.  This element
>>> has a <metadata> child, which in theory could contain a full /eml/citation
>>> document, and the sibling <describes> element could point to the id
>>> attribute value of your /eml/dataset element.
>>>
>>>
>>> I'd like to hear what others think about this solution.  Obviously, the
>>> <describes> link is a more semantically vague link from the citation
>>> documentation to the dataset documentation than a predicate like
>>> "isCitationFor", but it at least provides a link.
>>>
>>> If the community hasn't already come to a consensus on this issue, I
>>> think this thread might help in getting there.  Thanks for bringing it up,
>>> Carl.
>>>
>>> Cheers,
>>> Chris
>>>
>>> On Dec 5, 2013, at 4:43 PM, Carl Boettiger wrote:
>>>
>>>  Sorry if this is a bit elementary.
>>>>
>>>> Let's say I have an EML file describing data that is published as part
>>>> of the supplemental materials of a paper.  It seems reasonable to add a
>>>> citation to that paper in the metadata.  Is the best place for such a
>>>> citation:
>>>>
>>>>     eml/dataset/methods/methodsStep/citation
>>>>
>>>> preceded by a
>>>>
>>>>     eml/dataset/methods/methodsStep/description
>>>>
>>>> explaining that the citation refers to the paper in which the data was
>>>> first published, etc?
>>>>
>>>>
>>>> In a related question, if the EML was documenting software instead,
>>>> e.g. eml/software, where would such a citation go?  I don't see <
>>>> http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-software.png>
>>>> `citation` as child of anything under `eml/software`.
>>>>
>>>>
>>>>
>>>> Or am I missing the boat entirely here and the natural thing to do is
>>>> have a separate EML file, eml/citation, and somehow reference that file?  I
>>>> don't really understand the motivation for having an eml file that consists
>>>> only of eml/citation, but as I understand, eml/citation and eml/dataset etc
>>>> are exclusive, right?
>>>>
>>>> Thanks for the help!
>>>>
>>>> - Carl
>>>>
>>>> --
>>>> Carl Boettiger
>>>> UC Santa Cruz
>>>> http://carlboettiger.info/
>>>> _______________________________________________
>>>> Eml-dev mailing list
>>>> Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
>>>> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Eml-dev mailing list
>>> Eml-dev at ecoinformatics.org
>>> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>
>>
>> --
>> ____________________________________
>>
>>  Wade M. Sheldon
>>  GCE-LTER Information Manager
>>  School of Marine Programs
>>  University of Georgia
>>  Athens, GA 30602-3636
>>  Email: sheldon at uga.edu
>>  WWW: http://gce-lter.marsci.uga.edu/bios/wsheldon
>>
>>
>> _______________________________________________
>> Eml-dev mailing list
>> Eml-dev at ecoinformatics.org
>> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>
>
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>


-- 
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20131206/bef397b2/attachment-0001.html>


More information about the Eml-dev mailing list