[kepler-dev] accessing occurrence data (such as DiGIR) through GBIF REST services
Dave Vieglais
vieglais at ku.edu
Mon May 14 15:21:24 PDT 2007
Problems accessing DiGIR data from Kepler have been due to many
factors, not the least of which is the efficiency of performing the
distributed query to data sources of unreliable performance. We
started work on constructing a caching mechanism for these data, but
were preempted by the generation of the provision of the GBIF REST
Services (GRS). Since the GBIF cache acquires data from TAPIR as
well as DiGIR, it may be more appropriate to utilize these services
from within Kepler as a mechanism for accessing occurrence data, such
as for the mammals use case in SEEK.
A quick evaluation of the GRS indicates the availability of some 1.1
million georeferenced records from the class mammalia, e.g.:
http://newportal.gbif.org/ws/rest/occurrence/count?
minlongitude=-180&maxlongitude=360&taxonconceptkey=9887127
The property "taxonconceptkey" is the identifier for the class
mammalia. This should be discoverable by searching for mammalia in
the GRS taxon service, e.g.:
http://newportal.gbif.org/ws/rest/taxon/list?
rank=class&scientificname=mammalia
But it didn't work for me (Donald thinks it's a bug that should be
easily fixaed, so it might work for you by the time you read this),
instead searching with the UI resulted in this:
http://newportal.gbif.org/taxonomy/9887127
Where I assumed 9887127 is the id for the class mammalia. This was
verified with the GRS by:
http://newportal.gbif.org/ws/rest/taxon/get/9887127
Looking at the general distribution of mammalia data available in GRS
shows a strong bias towards the US:
http://newportal.gbif.org/taxonomy/9887127
I'm not sure if that will adversely affect the analyses for the use
case (thinking of the SEEK project use case).
Examining the data a bit more closely reveals some issues though, for
example:
http://newportal.gbif.org/ws/rest/occurrence/list?
taxonconceptkey=9887127&minlongitude=-180&maxlongitude=360&stylesheet=
Shows that the occurrences are not necessarily identified to genus +
species. So to actually get useful data it will first be necessary
to troll the taxon GRS to discover appropriate taxon ids or
binomials, then query GRS for those occurrence data, for example
Zapus princeps* (also returns subspecies):
http://newportal.gbif.org/ws/rest/occurrence/list?scientificname=zapus
%20princeps*&minlongitude=-180&maxlongitude=360&stylesheet=
I'm not sure if the GRS would be helpful in discovering synonyms or
alternative classifications.
In any case, it appears that a Kepler actor able to retrieve data
from the GRS occurrence service would be a valuable addition to the
Kepler library and could be used to replace the existing DiGIR actor.
regards,
Dave V.
More information about the Kepler-dev
mailing list