[eml-dev] quick question on adding citation information

Carl Boettiger cboettig at gmail.com
Wed Dec 11 11:09:46 PST 2013


This looks very promising to me (mostly from the user's perspective,
experts on this list can probably say more).   A few questions that might
be silly:

1) I am a bit confused by the documentation "A citation to articles or
products in which the  dataset is used or referenced."

This sounds like it will list all articles/products that have used the
dataset since it's original publication, which presumably it won't without
some automated mechanism to update this field.  Besides, I believe
providing a complete and up-to-date "citedBy" list is outside the use-case
of this element anyway.

In my mind this element is used only for the "canonical" citation that
other authors re-using the dataset should cite to provide appropriate
acknowledgement.  For instance, Dryad makes the canonical citation very
clear (example <http://datadryad.org/resource/doi:10.5061/dryad.2k462/1>).
("Canonical" might not be the right word).  I realize that the concept of
having a single paper that should be cited whenever (some part of) the data
is used may not be applicable for all datasets.

2) Could we have a similar element be available for /eml/software/ ?

3) Very minor: why have the element name as "citation" instead of an
element named "dataUsageCitation" of class "citationType"?  (as objects of
other names have the class "citationType" already)  I have no idea which is
preferable.


Thanks for clarifying!

- Carl

On Wed, Dec 11, 2013 at 10:48 AM, Matt Jones <jones at nceas.ucsb.edu> wrote:

> I have made a proposal for a new /eml/dataset/citation field to satisfy
> the needs described in this conversation.  The new field is described in EML
> Ticket # 6283 <https://projects.ecoinformatics.org/ecoinfo/issues/6283>and I have checked it into the trunk of SVN (r2344).  Please review and
> comment on whether:
>
> 1) You think this will solve the needs described in this thread, and
> 2) If you would like to see any changes in field name, structure, or
> documentation.
>
> Thanks,
>
> Matt
>
>
>
> On Sat, Dec 7, 2013 at 11:18 PM, David Blankman <dblankman1 at gmail.com>wrote:
>
>> I would certainly be in favor of improving compatibility with ISO and
>> other internationalization issue since I am working now primarily in a
>> European context.
>>
>> David
>>
>>
>>
>> *David Blankman*
>> Chair, ILTER Information Management Committee
>> Director, Information Management, Israel LTER
>>
>> 972-77-442-1951
>> 972-54-685-9345 (mobile)
>> 1-505-349-5680 (Skype)
>> dblankman (Skype)
>>
>>
>> On Fri, Dec 6, 2013 at 5:51 PM, Matt Jones <jones at nceas.ucsb.edu> wrote:
>>
>>> +1 on Chris' and Wade's comments so far.  I think its a shortcoming of
>>> EML that there is no explicitly labeled field for this, and I've gotten
>>> this question from many EML users.  I also agree with Wade that these
>>> linkages quickly become stale.  Nevertheless, it would be good to have a
>>> way to link to citations that use the data.  Chris' solution to add it to
>>> additionalMetadata is certainly a workaround, but not very satisfying
>>> because there won't be consistency across users as to how its embedded.  We
>>> could consider adding an optional top level field to eml-dataset to provide
>>> this, possibly something like:
>>>
>>> /eml/dataset/dataUsageCitation which would be of type CitationType
>>>
>>> I added a feature request ticket in Redmine to track this issue:
>>>     https://projects.ecoinformatics.org/ecoinfo/issues/6283
>>>
>>> Thoughts?  Is such change worth a revision of the EML schemas?  This
>>> issue has been on my radar for a long time, but has never reached the
>>> critical point of triggering a version change, which has widespread impact.
>>>  There are several other outstanding schema changes that might raise the
>>> need for a new release, including fixing internationalization issues<https://projects.ecoinformatics.org/ecoinfo/issues/5728>,
>>> compatibility with ISO issues<https://projects.ecoinformatics.org/ecoinfo/issues/5998>raised by GBIF, key/keyref
>>> parser checking<https://projects.ecoinformatics.org/ecoinfo/issues/5731>,
>>> and other items.  So maybe now's the right time?  I think these could be
>>> done with backwards-compatible changes (all EML 2.1.1 documents would be
>>> valid under the new schema with only a namespace change).
>>>
>>> Matt
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Dec 6, 2013 at 5:55 AM, Wade Sheldon <sheldon at uga.edu> wrote:
>>>
>>>> Chris and Carl,
>>>>
>>>> I agree with Chris' interpretation of the specification, and would not
>>>> recommend putting a general literature citation under
>>>> methods/methodStep/citation or under software. Those elements are best used
>>>> to link to protocols and other documents specific to the parent elements,
>>>> and viewers would not think to look for citations referencing the entire
>>>> data set there.
>>>>
>>>> And no, we did not specifically address this issue in the first LTER
>>>> EML Best Practices document in 2004, and I don't recall that issue being
>>>> addressed version 2 in 2011 either (but Margaret can correct me if I'm
>>>> wrong).
>>>>
>>>> In my opinion, it makes less sense to embed citations to publications
>>>> in data than citations to data in publications, so we do not attempt to
>>>> shoe-horn citations into EML documents using additionalMetadata or other
>>>> approaches. The data used in a publication is fixed at the time of
>>>> publication, whereas a published data set will ideally be used and cited
>>>> many times (hey - I'm an optimist), so literature citations in data sets
>>>> would quickly stale and require ongoing document maintenance to keep
>>>> current.
>>>>
>>>> I think the best approach to this issue, long-term, is to rely on data
>>>> registries and link-outs associated with journals to provide this type of
>>>> association.
>>>>
>>>> Regards,
>>>>
>>>> Wade Sheldon
>>>> GCE-LTER Information Manager
>>>>
>>>>
>>>>
>>>> On 12/6/2013 8:18 AM, Christopher Jones wrote:
>>>>
>>>>> Hi Carl,
>>>>>
>>>>> I'd say what you're trying to do is pretty fundamental, and should be
>>>>> straight forward.  However, I think you've highlighted an issue that is a
>>>>> consequence of a design transition that happened in the EML schemas quite a
>>>>> while ago.
>>>>>
>>>>> In the first designs of EML, the last of which was EML 2 Beta 6 <
>>>>> https://code.ecoinformatics.org/code/eml/tags/RELEASE_EML_
>>>>> 2_0_0_BETA_6/>, each of the EML modules were linkable to each other
>>>>> in an RDF-like syntax (see the <triple> tag in eml-resource <
>>>>> https://code.ecoinformatics.org/code/eml/tags/RELEASE_EML_
>>>>> 2_0_0_BETA_6/eml-resource.png>).  And so what you're describing would
>>>>> entail creating an EML Dataset document, then an EML Citation document, and
>>>>> then linking the two with a relationship ("citation.1.1" "is citation for"
>>>>> "dataset.1.1").
>>>>>
>>>>>
>>>>> This obviously provided plenty of flexibility, but also a degree of
>>>>> complexity, and so the community decided to move toward a hierarchical
>>>>> structure with a top-level eml.xsd schema.  In doing so, some of the module
>>>>> relationships were hard-coded into the EML schema hierarchy, and at the top
>>>>> level, datasets and citations were encoded as top-level choices, and as you
>>>>> point out, are mutually exclusive.
>>>>>
>>>>> As I do a quick scan of the schemas for references to eml-literature
>>>>> CitationType, the module is used in the following other modules:
>>>>>
>>>>> eml.xsd
>>>>> eml-attribute.xsd
>>>>> eml-coverage.xsd
>>>>> eml-methods.xsd
>>>>> eml-physical.xsd
>>>>> eml-project.xsd
>>>>>
>>>>> Given the history above, I think that the intention is to document
>>>>> your paper at the /eml/citation level.  Hard-coded links to citations in
>>>>> eml-attribute, eml-coverage, eml-methods, eml-physical, and eml-project all
>>>>> describe links to citations that are very specific in scope (e.g., in
>>>>> eml-methods, the citation is intended to document a specific procedure
>>>>> used).
>>>>>
>>>>> So, to me, the use of a citation in these sub-modules to describe a
>>>>> dataset-level citation doesn't quite fit.  I'd love to hear what others are
>>>>> doing to satisfy this need.  In particular, did the LTER's EML Best
>>>>> Practices group touch on this subject while writing that document?
>>>>>
>>>>> As one alternative suggestion, I should point out the /eml/dataset/additionalMetadata
>>>>> <https://code.ecoinformatics.org/code/eml/tags/RELEASE_EML_
>>>>> 2_1_1/eml.png> field.  This structure was added after EML 2 Beta 6 in
>>>>> order to retain some of the flexibility that the triple structure used to
>>>>> provide.  This element has a <metadata> child, which in theory could
>>>>> contain a full /eml/citation document, and the sibling <describes> element
>>>>> could point to the id attribute value of your /eml/dataset element.
>>>>>
>>>>>
>>>>> I'd like to hear what others think about this solution.  Obviously,
>>>>> the <describes> link is a more semantically vague link from the citation
>>>>> documentation to the dataset documentation than a predicate like
>>>>> "isCitationFor", but it at least provides a link.
>>>>>
>>>>> If the community hasn't already come to a consensus on this issue, I
>>>>> think this thread might help in getting there.  Thanks for bringing it up,
>>>>> Carl.
>>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>> On Dec 5, 2013, at 4:43 PM, Carl Boettiger wrote:
>>>>>
>>>>>  Sorry if this is a bit elementary.
>>>>>>
>>>>>> Let's say I have an EML file describing data that is published as
>>>>>> part of the supplemental materials of a paper.  It seems reasonable to add
>>>>>> a citation to that paper in the metadata.  Is the best place for such a
>>>>>> citation:
>>>>>>
>>>>>>     eml/dataset/methods/methodsStep/citation
>>>>>>
>>>>>> preceded by a
>>>>>>
>>>>>>     eml/dataset/methods/methodsStep/description
>>>>>>
>>>>>> explaining that the citation refers to the paper in which the data
>>>>>> was first published, etc?
>>>>>>
>>>>>>
>>>>>> In a related question, if the EML was documenting software instead,
>>>>>> e.g. eml/software, where would such a citation go?  I don't see <
>>>>>> http://knb.ecoinformatics.org/software/eml/eml-2.1.1/eml-software.png>
>>>>>> `citation` as child of anything under `eml/software`.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Or am I missing the boat entirely here and the natural thing to do is
>>>>>> have a separate EML file, eml/citation, and somehow reference that file?  I
>>>>>> don't really understand the motivation for having an eml file that consists
>>>>>> only of eml/citation, but as I understand, eml/citation and eml/dataset etc
>>>>>> are exclusive, right?
>>>>>>
>>>>>> Thanks for the help!
>>>>>>
>>>>>> - Carl
>>>>>>
>>>>>> --
>>>>>> Carl Boettiger
>>>>>> UC Santa Cruz
>>>>>> http://carlboettiger.info/
>>>>>> _______________________________________________
>>>>>> Eml-dev mailing list
>>>>>> Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
>>>>>> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Eml-dev mailing list
>>>>> Eml-dev at ecoinformatics.org
>>>>> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>>>
>>>>
>>>> --
>>>> ____________________________________
>>>>
>>>>  Wade M. Sheldon
>>>>  GCE-LTER Information Manager
>>>>  School of Marine Programs
>>>>  University of Georgia
>>>>  Athens, GA 30602-3636
>>>>  Email: sheldon at uga.edu
>>>>  WWW: http://gce-lter.marsci.uga.edu/bios/wsheldon
>>>>
>>>>
>>>> _______________________________________________
>>>> Eml-dev mailing list
>>>> Eml-dev at ecoinformatics.org
>>>> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>>
>>>
>>>
>>> _______________________________________________
>>> Eml-dev mailing list
>>> Eml-dev at ecoinformatics.org
>>> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>
>>>
>>
>> _______________________________________________
>> Eml-dev mailing list
>> Eml-dev at ecoinformatics.org
>> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>
>>
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>


-- 
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20131211/6954b466/attachment-0001.html>


More information about the Eml-dev mailing list