[SEEK-Taxon] Mammals of the world as TCS

Paterson, Trevor T.Paterson at napier.ac.uk
Thu Feb 3 08:33:42 PST 2005


Hi all (especially Robert and Aimee at KU)
 
I have managed to convert the mammal species of the world database into TCS format - starting with an Access file that I got from Dennis Hasch, but getting the hierarchy and synonym relations by scraping their web site.
 
I have the put original original Access file from him zipped on the CVS @ http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/seek/projects/taxon/data/Mammals_MSW/
 
and I have zipped my XML files of the Concepts  and put them on CVS.
 
everything is together in one big XML 30MB file which validates with the current version of the schema v088b.xml (- and I have it broken down into some smaller files as well for convenience). Note, although the file is valid - I can't be one hundred percent sure all the references between concepts and vouchers and publications are ok - they should pretty much be but I wouldn't be surprised if the occasional publication or voucher got lost in the mangling....
 
note version v088b.xsd has a new element in that GBIF/Donald was very keen on: <Provider Link> - I have used it to put the URL of each record on MSW website
 
in summary there are 
 
Vouchers (4629)  i.e. for specimens - the IDs in these are virtually useless!

Publications (6052) i.e. where names were originally published

primary (revision) concepts (6053)

Original concepts species(7874) and monomials(1792)  where this was available - i.e. name plus authors plus citation 

Concepts for Common names ( 5162)

Concepts for synonyms (without authors) (species 12427) (monomials 1158)

Concepts for synonyms with authors (monomials  no parenthesis 377) ( species with parenthesis  843) (species no parenthesis 5574)
 
the relationships used are 'is child of', 'has replaced synonym', 'is replaced synonym for' and 'has vernacular'
 
so there should be enought their to test the system - and hopefully these will be useful for looking at the real test cases
 
as I mentioned on the phone the updated MSW is set to be released in a few months - DeeAnn Reeder kindly gave some data for pocket gophers and carnivores in spreadsheet format which I have also put on cvs - bats may be available soon if that is one of the groups of interest - so you could contact her again - I have put copies of my emails with her and Dennis on the CVS too.
 
lastly I have put on thesame page of CVS a tiny command line java.5 program that validates an  XML file against a schema - its useful for these very large XML files, and sometimes gives useful error messages ;-)
 
C:\>java -jar SchemaTest.jar

TO SPECIFY A SCHEMA:  USAGE: SchemaTest Path/file.XSD Path/file.XML
 **OR**
TO USE INTERNAL NAMESPACE SPECIFICATIONS:  USAGE: SchemaTest Path/file.XML
 
 
I am finishing up here at Napier next week - and still working on various papers with Jessie, Bob and Nico etc
 
but I guess I won't see you guys again - so all the best......
 
Trevor
 

Trevor Paterson PhD 
t.paterson at napier.ac.uk 

School of Computing 
Napier University
Merchiston Campus
EDINBURGH
EH10 5DT
Scotland UK

tel:          +44 (0)131 455-2752 

www.dcs.napier.ac.uk/~cs175
www.prometheusdb.org <http://www.prometheusdb.org/>  

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/seek-taxon/attachments/20050203/3817ef5c/attachment.htm


More information about the Seek-taxon mailing list