[ecoinfo] Citation norms for datasets

Reza Chalabianlou reza.chalabianlou at gmail.com
Sat Jul 16 04:45:21 PDT 2011


Hi Kyle,

Other colleagues and friends have provided a rich set of answers to your
question. But my response may seem at least at first glance a bit naive and
my questions would be kind of silly questions! Perhaps these silly questions
may deserve to allocate sometime to think about!

Frankly speaking I am personally confused by the question you posed in
your initiating e-mail?!

As I could understand, you are intending to ask your students to include in
their CVs the results of the field station works that they'd fulfilled. This
means two different notions for me: The first notion is that the
students are going to archiving the results of their works in the form of an
either published or unpublished (lets say Formal or Informal) papers OR my
second take is that the students will include (insert fully or partially
[e.g. only Metadata]) of their results in their CVs! These imply and
speculate two different notions and we may expect different outputs from
each process accordingly. If the above are two processes that your are
following in your intended CV citation job, then one may argue that it is
not surprisingly new thing to do and that both has been practiced already by
many scientific and even business communities for data archival and sharing.


The second confusing issue is that whether your students are going to put
their datasets in their CVs so that these sets of data could be accessed
and/or cited by other members of the scientific community who are reaching
out to the archived datasets in the CVs? or in another say, are the
students, by inclusion of their datasets in their CVs, requesting implicitly
other peer researchers to cite their dataset sources in their articles?! and
HOW?

I have got confused by the above issues as it is not clearly indicated in
your question that "what exactly is the problem" with this type of datatset
archiving? As dataset archiving and referencing have been an accepted
process by majority of research communities and have been in use for at
least a decade or so. Though I could not figure out a unified and
universally accepted guideline, standard and/or white paper as the rulling
entity of this puzzle! Each emerging dataset archival and sharing community,
in contrast seems to has adopted its own dataset and information archiving
and storage guideline! And this is the unfortunate situation that the
new database management community try to resolve yet one may consider that
the process has already begon and is in rapid progress.

In my understanding, all the recent advances and practices, which have been
pursued through the past at least half a decade were towards building a
foundational frameworks using standard protocols and Grid-base technology to
create a context for better archival, sharing, retrieval, access and use of
scattered datasets by different reaserch groups and to improve the
interoperability of these kind of data archival, sharing and usage. The
LTER, Ecoinformatics, NCEAS, ACEAS, SWEET ontology, Biopax and similar
projects were pursuing the similar idea of making the implicit knowledge
and poorly designed and shared data and information a more tractible,
explicit, formally shaped and accessible for diversitty of uses through an
ineroperable system of data and knowledge sharing.

I have already collected a set of URLs for this diversity of communities who
have been and are practicing to have all distributed data and metadata
together in order to make them at least easily accessible and usable for the
global research and scientific community.

Examples are:
http://www.ecoinformatics.org/
http://www.nceas.ucsb.edu/
http://www.opencyc.org/
http://www.geongrid.org/
http://ontology.buffalo.edu/
http://www-ksl.stanford.edu/sns.shtml
http://www.ontoknowledge.org/oil/
http://www.ontologyportal.org/
http://www.environmentontology.org/
http://www.ifomis.org/bfo
http://www.onto-med.de/ontologies/gfo/
http://www.openclinical.org/ontologies.html
http://www.geneontology.org/
http://www.obofoundry.org/
http://www.plantontology.org/
http://sweet.jpl.nasa.gov/
And many others that are growing up rapidly.

Therfore, istead of asking that if the dataset archival process has been or
is currently a NORM?!, we would better argue that "HOW TO DO THE JOB
PROPERLY?" It means that the job is already started but lacking a unified
protocol. This in turn means that what type of standards or guidelines or
protocols shall be followed to make all dataset archiving and retreival
process harmonized and explicitly understandable and reusable by users from
different corners of the world (i.e. plugged to the grid)?!

The recent advances in XML and semantic web technology have made it much
more easier and user friendly for professionals from different scientific
and empirical domains to apply the concepts and tools that have made
available for the target community.

And as such, the issue that remains to be addressed and discussed in order
to get a deeper understanding of the consequences of these types of dataset
archival may rather be (in my view) to generate and have an overall
consencus on a set of guiding ruls and protocols that facilitates and
harmonizes the process and outcomes of any attempt for data archival,
retrieval, processing, sharing and improved interoperabilty.

I tried to collect a number of articles from the net (all were openly
accessable!) that I am attaching hereto for your attention and probable use.
I do hope that what I tried to explain in the above lines would be of help.

Good luck,
Reza Chalabianlou






On Thu, Jul 14, 2011 at 11:49 PM, Carl Boettiger <cboettig at gmail.com> wrote:

> Kyle,
>
> Thanks for the reply.  I would be interested to know what option you settle
> on to get a persistent identifier when you get a chance.  I'm sure others on
> the list could offer some input on the strengths and weaknesses of some
> common ones as well.
>
> Cheers,
> Carl
>
>
> On Thu, Jul 14, 2011 at 12:15 PM, Kyle Kwaiser <kkwaiser at umich.edu> wrote:
>
>> Hi Carl,
>>
>> The repository we use is one I have built on Drupal for our field station:
>>
>> http://umbs.lsa.umich.edu/
>>
>> Thanks to work done by the LTER, I am able to provide metadata in an EML
>> compliant format and, at some point in the future, I will leverage this to
>> facilitate data contribution to a third party.  I am aware of several such
>> options but have not begun the process of evaluating them.
>>
>> This means that I cannot offer a formal persistent identifier which is
>> hardly ideal and one of the reasons I hesitate to tell students to place
>> citations on their CV's.
>>
>> Best,
>>
>> Kyle
>>
>>
>>
>>
>> Quoting Carl Boettiger <cboettig at gmail.com>:
>>
>> Kyle,
>>>
>>> Are your students archiving these in repositories that will issue a doi
>>> for
>>> the citation information?  (Merritt, Dryad if they correspond to already
>>> published work, etc)?
>>>
>>>
>>> Here's a few more refs that have argued for this, some quite extensively.
>>>
>>> This whole piece is essentially an argument for data citation:
>>> ?Mons, B., Haagen, H. van, Chichester, C., Hoen, P.-B. ?T, Dunnen, J. T.
>>> den, Ommen, G. van, et al. (2011). The value of data. Nature genetics,
>>> 43(4), 281-3. Nature Publishing Group. doi: 10.1038/ng0411-281.
>>>
>>>
>>> Birney, E., Hudson, T. J., Green, E. D., Gunter, C., Eddy, S., Rogers,
>>> J.,
>>> et al. (2009). Prepublication data sharing. Nature, 461(7261), 168-70.
>>> doi:
>>> 10.1038/461168a.
>>> "another would be to track the usage and citation of data sets using
>>> electronic systems similar to those used for traditional publications"
>>> ..
>>> who cite this in support:
>>> Sharing Data from Large-scale Biological Research Projects: A System of
>>> Tripartite Responsibility (Wellcome Trust, 2003); available at
>>> www.wellcome.ac.uk/stellent/**groups/corporatesite/@policy_**
>>> communications/documents/<http://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/>
>>> web_document/wtd003207.pdf
>>>
>>>
>>> Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read,
>>> E.,
>>> et al. (2011). Data Sharing by Scientists: Practices and Perceptions. (C.
>>> Neylon, Ed.)PLoS ONE, 6(6), e21101. doi: 10.1371/journal.pone.0021101.
>>>     "Providing a secure but flexible cyberinfrastructure while
>>> promulgating
>>> best practices such as data citation and metadata reuse, will help build
>>> confidence in data sharing"
>>>
>>>
>>> Rod discusses data  citation quite a bit here:
>>> ?Page, R. D. M. (2010). Enhanced display of scientific articles using
>>> extended metadata. Web Semantics: Science, Services and Agents on the
>>> World
>>> Wide Web, 8(2-3), 190-195. doi: 10.1016/j.websem.2010.03.004.
>>>
>>>
>>> ?Constable, H., Guralnick, R., Wieczorek, J., Spencer, C., & Peterson, a
>>> T.
>>> (2010). VertNet: a new model for biodiversity data sharing. PLoS biology,
>>> 8(2), e1000309. doi: 10.1371/journal.pbio.1000309.
>>> "By ensuring that data remain curated at the source, and by showing the
>>> importance of data sharing to promote data citation and usage, we have
>>> grown
>>> past our original technology implementation and are ready to move into a
>>> long-term production environment that departs from the original model."
>>>
>>>
>>> These three make mention of data citation, mostly in reference to
>>> increased
>>> citation rates of papers.
>>> ?Moore, A. J., McPeek, M. a, Rausher, M. D., Rieseberg, L., & Whitlock,
>>> M.
>>> C. (2010). The need for archiving data in evolutionary biology. Journal
>>> of
>>> evolutionary biology, 23(4), 659-60. doi: 10.1111/j.1420-9101.2010.**
>>> 01937.x.
>>>
>>> Whitlock, M. C., McPeek, M. a, Rausher, M. D., Rieseberg, L., & Moore, A.
>>> J..
>>> (2010). Data archiving. The American naturalist, 175(2), 145-6. doi:
>>> 10.1086/650340.
>>>
>>> Whitlock, M. C. (2010). Data archiving in ecology and evolution: best
>>> practices. Trends in Ecology & Evolution, 1-5. Elsevier Ltd. doi:
>>> 10.1016/j.tree.2010.11.006.
>>>
>>> Mark Parson's talk: http://ands.org.au/guides/**
>>> data-citation-awareness.html<http://ands.org.au/guides/data-citation-awareness.html>
>>>
>>> -Carl
>>>
>>>
>>>
>>>
>>> On Thu, Jul 14, 2011 at 8:22 AM, Cook, Robert B. <cookrb at ornl.gov>
>>> wrote:
>>>
>>> Kyle,
>>>>
>>>> At the ORNL DAAC we have been providing recommended citations for our
>>>> published data sets since the early 2000s.  These citations are
>>>> appearing in
>>>> papers that use the data publication.  Citing data products gives the
>>>> authors credit for the intellectual effort in generating the data set.
>>>>
>>>> Please refer to the attached note for additional information.
>>>>
>>>> When we publish these data products, I send a note to each author
>>>> congratulating them on their publication and encouraging them to place
>>>> the
>>>> citation data pub on their cv.
>>>>
>>>> Many journals will allow data product citations to appear in the
>>>> references
>>>> section of papers.
>>>>
>>>> We are working with the Web of Knowledge to place these data pubs into
>>>> their indexing service, so that authors can view both their publications
>>>> and
>>>> their data products.  Plus they can readily see who has used their data
>>>> in
>>>> subsequent publications.
>>>>
>>>> Good luck!
>>>> Bob
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: ecoinfo-bounces@**ecoinformatics.org<ecoinfo-bounces at ecoinformatics.org>[mailto:
>>>> ecoinfo-bounces@**ecoinformatics.org<ecoinfo-bounces at ecoinformatics.org>]
>>>> On Behalf Of Kyle Kwaiser
>>>> Sent: Thursday, July 14, 2011 10:16 AM
>>>> To: ecoinfo at ecoinformatics.org
>>>> Subject: [ecoinfo] Citation norms for datasets
>>>>
>>>> Hello Colleagues,
>>>>
>>>> I am working with graduate students this summer to archive their work
>>>> at our field station.  I want to tell them to cite their datasets on
>>>> their CV's but I know this is not yet the norm.
>>>>
>>>> Any general thoughts on how close we are to including datasets on
>>>> CV's?  Can you suggest recent papers that argue (decisively) for this
>>>> practice?  Here are two relevant but slightly tangential examples:
>>>>
>>>> Reichman, O. J., M. B. Jones, and M. P. Schildhauer. 2011. "Challenges
>>>> and Opportunities of Open Data in Ecology." Science 331 (6018)
>>>> (February): 703-705. doi:10.1126/science.1197962.
>>>>
>>>> Vision, Todd J. 2010. "Open Data and the Social Contract of Scientific
>>>> Publishing." BioScience 60 (5) (May): 330-331.
>>>> doi:10.1525/bio.2010.60.5.2.
>>>>
>>>> Best,
>>>>
>>>> Kyle
>>>>
>>>>
>>>> ------------------------------**-----------
>>>> Kyle Kwaiser, Information Manager
>>>> University of Michigan Biological Station
>>>> 9133 Biological Rd.
>>>> Pellston, Michigan 49769-9149 USA
>>>> Ph: 231-539-8789
>>>> ______________________________**_________________
>>>> Ecoinfo mailing list
>>>> Ecoinfo at ecoinformatics.org
>>>> hxxp://lists.nceas.ucsb.edu/**ecoinformatics/mailman/**listinfo/ecoinfo<http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/ecoinfo>
>>>>
>>>>
>>>> ______________________________**_________________
>>>> Ecoinfo mailing list
>>>> Ecoinfo at ecoinformatics.org
>>>> http://lists.nceas.ucsb.edu/**ecoinformatics/mailman/**listinfo/ecoinfo<http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/ecoinfo>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Carl Boettiger
>>> UC Davis
>>> http://www.carlboettiger.info/
>>>
>>>
>>
>>
>> ------------------------------**-----------
>> Kyle Kwaiser, Information Manager
>> University of Michigan Biological Station
>> 9133 Biological Rd.
>> Pellston, Michigan 49769-9149 USA
>> Ph: 231-539-8789
>>
>
>
>
> --
> Carl Boettiger
> UC Davis
> http://www.carlboettiger.info/
>
>
> _______________________________________________
> Ecoinfo mailing list
> Ecoinfo at ecoinformatics.org
> http://lists.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/ecoinfo
>
>


-- 
To preserve the Earth's life supporting functions, please don't print this
e-mail and it's attachments unless you really need to. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nceas.ucsb.edu/ecoinformatics/pipermail/ecoinfo/attachments/20110716/996e70aa/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: developing_using_standards.pdf
Type: application/pdf
Size: 71359 bytes
Desc: not available
URL: <http://lists.nceas.ucsb.edu/ecoinformatics/pipermail/ecoinfo/attachments/20110716/996e70aa/attachment-0004.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Reference Model for an Open Archival Information System (OAIS).PDF
Type: application/pdf
Size: 654750 bytes
Desc: not available
URL: <http://lists.nceas.ucsb.edu/ecoinformatics/pipermail/ecoinfo/attachments/20110716/996e70aa/attachment-0005.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: A Proposed Standard for the Scholarly Citation of Quantitative Data.pdf
Type: application/pdf
Size: 80872 bytes
Desc: not available
URL: <http://lists.nceas.ucsb.edu/ecoinformatics/pipermail/ecoinfo/attachments/20110716/996e70aa/attachment-0006.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PUBLISHING-STANDARDS-DATA-2009.PDF
Type: application/pdf
Size: 2197422 bytes
Desc: not available
URL: <http://lists.nceas.ucsb.edu/ecoinformatics/pipermail/ecoinfo/attachments/20110716/996e70aa/attachment-0007.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Report_ISO_final.doc
Type: application/msword
Size: 9100800 bytes
Desc: not available
URL: <http://lists.nceas.ucsb.edu/ecoinformatics/pipermail/ecoinfo/attachments/20110716/996e70aa/attachment-0001.doc>


More information about the Ecoinfo mailing list