[tcs-lc] Misspelling db response

Roger Hyam roger at hyam.net
Tue May 3 03:30:25 PDT 2005


Charles and all,

I think you have very interesting points here.

The most important one for me, though, is your last.

Yes these things should inform the debate but it is indeed separate from 
the transport schema.

The point I was trying to make in starting this thread is that what a 
NameObject represents ( the one true rendition of the name or  a 
possible misspelling of a name or and intentional misspelling of a name 
or whatever) is not a matter for the schema but for the dbs that are 
publishing the data. (I don't believe it is a problem anyhow but that is 
another matter...)

If we can establish that the schema can support whatever outcome the 
debate is likely to come to we can stop having the debate HERE and move 
on to the nuts and bolts of finishing the transport schema. We then have 
a tool to use to express whatever we decide on. These are my thoughts 
anyhow.

All the best,

Roger

Charles Hussey wrote:

>This thread, for me, has homed in on a very real issue that will affect the
>usefulness of any project that attempts to integrate different databases
>(either through a portal or a datawarehouse approach).
>
>The key is to provide a "query expansion tool" - otherwise relevant data is
>likely not to be picked up through a simple query.
>
>The problem in serving up data is in different renditions of a name +
>authority string (let alone distinguishing different taxonomic concepts) and
>the effort that needs to be put into mapping equivalences. Some mapping will
>be obvious and trivial; others will need expert scrutiny. When one moves
>from published records to online access to collections and observations
>records, this problem is going to increase.
>
>Who should put in this effort? -
>
>1) Data provider (list compiler)
>2) Data collator (manager of database incorporating several lists, e.g.
>Fauna Europaea, ERMS)
>3) Third party body (a nameserver e.g. GBIF ECAT)
>4) Individual user (up to user to research possible alternatives that they
>may need to use as search terms)
>5) Users collectively (through online editing tool - e.g. IPNI project)
>
>In the UK, the National Biodiversity Network has already had to tackle this
>problem in its Gateway project which gives access to over 18 million species
>observation records. I run the Species Dictionary project which manages
>nomenclature for the NBN and we have started to map equivalences for
>priority groups. Having all the observation records in one  datawarehouse
>(the Gateway) and all the name checklists in another warehouse (the
>Dictionary) helps in the capture of all actually occurring name + author
>strings and in the mapping of equivalences.
>
>Here is how a search result for "Picea abies" is currently presented:
>http://www.searchnbn.net/speciesInfo/taxonomy.jsp?searchTerm=Abies%20abies&s
>pKey=NHMSYS0000461247
>
>another example:
>http://www.searchnbn.net/speciesInfo/taxonomy.jsp?searchTerm=Myotis&spKey=NH
>MSYS0000528026
>
>There are a whole set of dangers associated with aggregating data that need
>to be spelled out to users and ours is a very simplistic approach
>(pragmatism over purism).
>
>My concern is that unless all name variants actually present in data sources
>are captured and mapped, the user is going to only get a partial return and
>will not even know that they are getting a partial return.
>
>This is indeed a separate issue to constructing data exchange schema but
>should influence this debate.
>
>Cheers,
>
>Charles Hussey,
>
>Science Data Co-ordinator,
>Data and Digital Systems Team,
>Library and Information Services,
>Natural History Museum,
>Cromwell Road,
>London SW7 5BD
>United Kingdom
>
>Tel. +44 (0)207 942 5213
>Fax. +44 (0)207 942 5559
>e-mail c.hussey at nhm.ac.uk
>Species Dictionary project: www.nhm.ac.uk/nbn/
>Nature Navigator: www.nhm.ac.uk/naturenavigator/
>
>_______________________________________________
>Tcs-lc mailing list
>Tcs-lc at ecoinformatics.org
>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/tcs-lc
>
>  
>

-- 

==============================================
 Roger Hyam
----------------------------------------------
 Biodiversity Informatics
 Independent Web Development 
----------------------------------------------
 http://www.hyam.net  roger at hyam.net
----------------------------------------------
 2 Janefield Rise, Lauder, TD2 6SP, UK.
 T: +44 (0)1578 722782 M: +44 (0)7890 341847
==============================================


-------------- next part --------------
A non-text attachment was scrubbed...
Name: roger.vcf
Type: text/x-vcard
Size: 275 bytes
Desc: not available
Url : http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/tcs-lc/attachments/20050503/2cc5294f/roger.vcf


More information about the Tcs-lc mailing list