[kepler-dev] accessing occurrence data (such as DiGIR) through GBIF REST services

Dave Vieglais vieglais at ku.edu
Mon May 14 15:21:24 PDT 2007


Problems accessing DiGIR data from Kepler have been due to many  
factors, not the least of which is the efficiency of performing the  
distributed query to data sources of unreliable performance.  We  
started work on constructing a caching mechanism for these data, but  
were preempted by the generation of the provision of the GBIF REST  
Services (GRS).  Since the GBIF cache acquires data from TAPIR as  
well as DiGIR, it may be more appropriate to utilize these services  
from within Kepler as a mechanism for accessing occurrence data, such  
as for the mammals use case in SEEK.

A quick evaluation of the GRS indicates the availability of some 1.1  
million georeferenced records from the class mammalia, e.g.:

http://newportal.gbif.org/ws/rest/occurrence/count? 
minlongitude=-180&maxlongitude=360&taxonconceptkey=9887127

The property "taxonconceptkey" is the identifier for the class  
mammalia.  This should be discoverable by searching for mammalia in  
the GRS taxon service, e.g.:

http://newportal.gbif.org/ws/rest/taxon/list? 
rank=class&scientificname=mammalia

But it didn't work for me (Donald thinks it's a bug that should be  
easily fixaed, so it might work for you by the time you read this),  
instead searching with the UI resulted in this:

http://newportal.gbif.org/taxonomy/9887127

Where I assumed 9887127 is the id for the class mammalia.  This was  
verified with the GRS by:

http://newportal.gbif.org/ws/rest/taxon/get/9887127

Looking at the general distribution of mammalia data available in GRS  
shows a strong bias towards the US:

http://newportal.gbif.org/taxonomy/9887127

I'm not sure if that will adversely affect the analyses for the use  
case (thinking of the SEEK project use case).

Examining the data a bit more closely reveals some issues though, for  
example:

http://newportal.gbif.org/ws/rest/occurrence/list? 
taxonconceptkey=9887127&minlongitude=-180&maxlongitude=360&stylesheet=

Shows that the occurrences are not necessarily identified to genus +  
species.  So to actually get useful data it will first be necessary  
to troll the taxon GRS to discover appropriate taxon ids or  
binomials, then query GRS for those occurrence data, for example  
Zapus princeps* (also returns subspecies):

http://newportal.gbif.org/ws/rest/occurrence/list?scientificname=zapus 
%20princeps*&minlongitude=-180&maxlongitude=360&stylesheet=

I'm not sure if the GRS would be helpful in discovering synonyms or  
alternative classifications.

In any case, it appears that a Kepler actor able to retrieve data  
from the GRS occurrence service would be a valuable addition to the  
Kepler library and could be used to replace the existing DiGIR actor.

regards,
   Dave V.









More information about the Kepler-dev mailing list