[SEEK-Taxon] [Fwd: Request for review: taXMLit]
Dave Vieglais
vieglais at KU.EDU
Mon Jul 12 20:13:03 PDT 2004
Though you all might be interested in this. It's probably important to
ensure that the taxonomic xml development streams (this one applies
mainly to literature citations) remain in touch with each other,
although initial impressions of this work coming out of the Smithsonian
is that they should take a close look at other devleopment activities,
such as being developed in the SEEK project.
regards,
Dave V.
-------- Original Message --------
Subject: Request for review: taXMLit
Date: Mon, 12 Jul 2004 15:27:40 -0400
From: Anna Weitzman <Weitzman.Anna at NMNH.SI.EDU>
To: <baillarg at agr.gc.ca>, <ghw at anbg.gov.au>, <jrc at anbg.gov.au>,
<G.Hagedorn at BBA.DE>, <w.berendsohn at bgbm.org>,
<deepreef at BISHOPMUSEUM.ORG>, <sblum at CalAcademy.Org>,
<vcanhos at cria.org.br>, <ram at cs.umb.edu>, <steves at ento.csiro.au>,
<lspeers at gbif.org>, <GBIF-DADI at ig.circa.gbif.net>,
<GBIF-ECAT at ig.circa.gbif.net>, <beach at ku.edu>,
<cooperj at LandcareResearch.co.nz>, <rlg-naturalhistory at lists2.rlg.org>,
<dremsen at MBL.EDU>, <Andrew.Brown at mel.kesoftware.com>,
<jdoolan at mel.kesoftware.com>, <Chris.Freeland at mobot.org>,
<Chuck.Miller at mobot.org>, <J.Kennedy at napier.ac.uk>, "Andrew
Brown" <Brown.Andrew at NMNH.SI.EDU>, <gguala at nsf.gov>,
<joe at nt.ars-grin.gov>, <m.pullen at rbge.org.uk>, <m.watson at rbge.org.uk>,
<e.lughadha at rbgkew.org.uk>, <S.Hinchcliffe at rbgkew.org.uk>,
<pheidorn at uiuc.edu>, <vieglais at ukans.edu>, <peberry at wisc.edu>
CC: <jfeld at dclab.com>, "Tom Garnett" <GarnettT at MNHGWIA.si.edu>,
"Martin Kalfatovic" <KalfatovicM at MNHGWIA.si.edu>, <c.lyal at nhm.ac.uk>
Dear all,
We are sending you this email to invite your comments and input on some
work that we have been doing on an XML schema for taxonomic literature.
Some of you already know much of what follows, and our apologies for
repeating ourselves to those of you who do (and to any of you who get
this more than once via a GBIF list).
The XML schema development is part of a multi-phase project aimed at
creating a model for facilitating access to biodiversity data on a
global scale. The flora and fauna of the Mesoamerican region are being
used as a basis, starting with an important and out-of-print scientific
work, the Biologia Centrali-Americana (BCA). The project will create
resources and knowledge tools in electronic form for biodiversity
studies centered on Mexico and Central America. The methodologies
developed in this project will be applicable to other digitization efforts.
The first phase of the project, the digital edition of the BCA, is now
online (http://www.sil.si.edu/digitalcollections/bca). This website
contains jpeg images of nearly all of the pages of the 58 biological
volumes of the BCA. It will soon also have a complete set of jpeg and
pdf files for download and printing as well as the ability to zoom in on
the images in great detail. The website also has links to a variety of
documents giving background, project status, etc.
The next phase is the creation of a searchable resource based on the
text of the BCA. The text will be rekeyed and coded in XML. This will
not only enable the text to be searchable, but more importantly will be
structured in such a way that the text will be interoperable with other
datasets, such as specimen databases that are web-enabled using ABCD,
and name authority files. The mark-up will also make it possible to
emplace direct links to ancillary information such as glossaries,
gazetteers, bibliographies and other resources to enable users to work
with the BCA more effectively. Most particularly links will be made
from the BCA text to databases of BCA specimens and species held in
different collections, including those of Smithsonian's National Museum
of Natural History, Harvard University, The Natural History Museum
(London), The Missouri Botanical Garden, and the Royal Botanic Gardens,
Kew.
The BCA, while certainly one of the most comprehensive and
broad-ranging, is only one of a large number of similar works that would
benefit the research community through being more accessible. By
providing a model of how a work like this can be successfully translated
to a digital form, the stage will be set for others to engage in similar
interacting projects. The digitized BCA will provide an experimental
testbed for determining the best ways of linking different, yet
complementary, sets of data. An important part of the testbed will
consist of defining the practices and standards needed to ensure
effective crosswalks between, and links among, relevant biological data
systems, including appropriate specimen images, especially of types.
In order to complete the next steps, an XML-based standard for coding
taxonomic literature was needed. The first draft is now complete and we
would be very grateful if you could help review it for us. It includes
the components common to taxonomic literature - the names (including
synonyms etc), citations, specimen lists, keys, hierarchical statements
etc that are found in pretty well any paper. Basically, it covers all
of the components of taxonomic publications and the taxon treatments
contained within them other than the actual characters, which are dealt
with by other projects. Although taxonomic literature is very
structured there can be an amazing amount of variability, but we hope we
have managed to cover this, (and we are counting on you and others to
point out where we have missed things). The schema has been written
with a focus on both botanical and zoological taxonomic literature, and
should also accept fungal and paleontological publications, but this has
not been tested. It does not take into account the kinds of data needed
for viral or bacterial publications, and if anyone is interested in
pursuing this in the future we are very open to collaboration.
We attach three documents, the schema itself (taXMLit-v1-3.xsd), a pdf
file (taXMLitv1-3Intro.pdf) that explains how and why we have structured
the schema as we have, and an xml file that contains some of the data
from one volume of the BCA marked up according to the schema as a means
of testing its viability (coleop-v4p3-assign-taxmlit-v1-3.xml).
We hope that some of you may find some time to review the schema and
assist us with improvements to it. We would be most grateful for any
comments from you.
Many thanks,
Anna & Chris
Anna L. Weitzman
Informatics Office & Department of Botany
National Museum of Natural History
Smithsonian Institution
Washington, DC
Christopher H.C. Lyal
Department of Entomology
The Natural History Museum
London
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: taXMLit-v1-3.xsd
Url: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-taxon/attachments/20040712/00b8c905/taXMLit-v1-3.bat
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: coleop-v4p3-assign-taxmlit-v1-3.xml
Url: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-taxon/attachments/20040712/00b8c905/coleop-v4p3-assign-taxmlit-v1-3.bat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: taXMLitv1-3Intro.pdf
Type: application/pdf
Size: 1135313 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-taxon/attachments/20040712/00b8c905/taXMLitv1-3Intro.pdf
More information about the Seek-taxon
mailing list