[SEEK-Taxon] [Fwd: Request for review: taXMLit]

Mon Jul 12 20:13:03 PDT 2004

Though you all might be interested in this.  It's probably important to 
ensure that the taxonomic xml development streams (this one applies 
mainly to literature citations) remain in touch with each other, 
although initial impressions of this work coming out of the Smithsonian 
is that they should take a close look at other devleopment activities, 
such as being developed in the SEEK project.

regards,
   Dave V.

-------- Original Message --------
Subject: Request for review:  taXMLit
Date: Mon, 12 Jul 2004 15:27:40 -0400
From: Anna Weitzman <Weitzman.Anna at NMNH.SI.EDU>
To: <baillarg at agr.gc.ca>, <ghw at anbg.gov.au>, <jrc at anbg.gov.au>, 
<G.Hagedorn at BBA.DE>, <w.berendsohn at bgbm.org>, 
<deepreef at BISHOPMUSEUM.ORG>, <sblum at CalAcademy.Org>, 
<vcanhos at cria.org.br>, <ram at cs.umb.edu>, <steves at ento.csiro.au>, 
<lspeers at gbif.org>, <GBIF-DADI at ig.circa.gbif.net>, 
<GBIF-ECAT at ig.circa.gbif.net>, <beach at ku.edu>, 
<cooperj at LandcareResearch.co.nz>, <rlg-naturalhistory at lists2.rlg.org>, 
       <dremsen at MBL.EDU>, <Andrew.Brown at mel.kesoftware.com>, 
<jdoolan at mel.kesoftware.com>, <Chris.Freeland at mobot.org>, 
<Chuck.Miller at mobot.org>, <J.Kennedy at napier.ac.uk>,        "Andrew 
Brown" <Brown.Andrew at NMNH.SI.EDU>, <gguala at nsf.gov>, 
<joe at nt.ars-grin.gov>, <m.pullen at rbge.org.uk>, <m.watson at rbge.org.uk>, 
       <e.lughadha at rbgkew.org.uk>, <S.Hinchcliffe at rbgkew.org.uk>, 
  <pheidorn at uiuc.edu>, <vieglais at ukans.edu>, <peberry at wisc.edu>
CC: <jfeld at dclab.com>, "Tom Garnett" <GarnettT at MNHGWIA.si.edu>, 
"Martin Kalfatovic" <KalfatovicM at MNHGWIA.si.edu>, <c.lyal at nhm.ac.uk>

Dear all,

We are sending you this email to invite your comments and input on some 
work that we have been doing on an XML schema for taxonomic literature. 
  Some of you already know much of what follows, and our apologies for 
repeating ourselves to those of you who do (and to any of you who get 
this more than once via a GBIF list).

The XML schema development is part of a multi-phase project aimed at 
creating a model for facilitating access to biodiversity data on a 
global scale.  The flora and fauna of the Mesoamerican region are being 
used as a basis, starting with an important and out-of-print scientific 
work, the Biologia Centrali-Americana (BCA).  The project will create 
resources and knowledge tools in electronic form for biodiversity 
studies centered on Mexico and Central America.  The methodologies 
developed in this project will be applicable to other digitization efforts.

The first phase of the project, the digital edition of the BCA, is now 
online (http://www.sil.si.edu/digitalcollections/bca).  This website 
contains jpeg images of nearly all of the pages of the 58 biological 
volumes of the BCA.  It will soon also have a complete set of jpeg and 
pdf files for download and printing as well as the ability to zoom in on 
the images in great detail.  The website also has links to a variety of 
documents giving background, project status, etc.

The next phase is the creation of a searchable resource based on the 
text of the BCA.  The text will be rekeyed and coded in XML.  This will 
not only enable the text to be searchable, but more importantly will be 
structured in such a way that the text will be interoperable with other 
datasets, such as specimen databases that are web-enabled using ABCD, 
and name authority files.  The mark-up will also make it possible to 
emplace direct links to ancillary information such as glossaries, 
gazetteers, bibliographies and other resources to enable users to work 
with the BCA more effectively.  Most particularly links will be made 
from the BCA text to databases of BCA specimens and species held in 
different collections, including those of Smithsonian's National Museum 
of Natural History, Harvard University, The Natural History Museum 
(London), The Missouri Botanical Garden, and the Royal Botanic Gardens, 
Kew.

The BCA, while certainly one of the most comprehensive and 
broad-ranging, is only one of a large number of similar works that would 
benefit the research community through being more accessible.  By 
providing a model of how a work like this can be successfully translated 
to a digital form, the stage will be set for others to engage in similar 
interacting projects.  The digitized BCA will provide an experimental 
testbed for determining the best ways of linking different, yet 
complementary, sets of data.  An important part of the testbed will 
consist of defining the practices and standards needed to ensure 
effective crosswalks between, and links among, relevant biological data 
systems, including appropriate specimen images, especially of types.

In order to complete the next steps, an XML-based standard for coding 
taxonomic literature was needed.  The first draft is now complete and we 
would be very grateful if you could help review it for us.  It includes 
the components common to taxonomic literature - the names (including 
synonyms etc), citations, specimen lists, keys, hierarchical statements 
etc that are found in pretty well any paper.  Basically, it covers all 
of the components of taxonomic publications and the taxon treatments 
contained within them other than the actual characters, which are dealt 
with by other projects.  Although taxonomic literature is very 
structured there can be an amazing amount of variability, but we hope we 
have managed to cover this, (and we are counting on you and others to 
point out where we have missed things).  The schema has been written 
with a focus on both botanical and zoological taxonomic literature, and 
should also accept fungal and paleontological publications, but this has 
not been tested.  It does not take into account the kinds of data needed 
for viral or bacterial publications, and if anyone is interested in 
pursuing this in the future we are very open to collaboration.

We attach three documents, the schema itself (taXMLit-v1-3.xsd), a pdf 
file (taXMLitv1-3Intro.pdf) that explains how and why we have structured 
the schema as we have, and an xml file that contains some of the data 
from one volume of the BCA marked up according to the schema as a means 
of testing its viability (coleop-v4p3-assign-taxmlit-v1-3.xml).

We hope that some of you may find some time to review the schema and 
assist us with improvements to it.  We would be most grateful for any 
comments from you.

Many thanks,

Anna & Chris

Anna L. Weitzman
Informatics Office & Department of Botany
National Museum of Natural History
Smithsonian Institution
Washington, DC

Christopher H.C. Lyal
Department of Entomology
The Natural History Museum
London

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: taXMLit-v1-3.xsd
Url: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-taxon/attachments/20040712/00b8c905/taXMLit-v1-3.bat
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: coleop-v4p3-assign-taxmlit-v1-3.xml
Url: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-taxon/attachments/20040712/00b8c905/coleop-v4p3-assign-taxmlit-v1-3.bat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: taXMLitv1-3Intro.pdf
Type: application/pdf
Size: 1135313 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-taxon/attachments/20040712/00b8c905/taXMLitv1-3Intro.pdf