[obs] flexible database schemas

Law, Jason Jason.Law at portlandoregon.gov
Thu May 5 13:11:00 PDT 2011


Hello everyone,

I'm going to apologize immediately for the long e-mail. Thanks to anyone who as the patience to read and offer an opinion.

I'm in the midst of modeling a database schema for environmental and ecological observation data.  As an organization, the city I work for has run up against the common problem of an inflexible data management system specifically designed for one type of observational data.  New programs and the new methods and data that they generate have fallen outside this enterprise system and end up being managed by individuals and small groups all over our agency.  We collect a wide variety of environmental information: hydrology, weather (mostly rain), geological (boreholes, soil cores), avian point counts, stream macroinvertebrate data, habitat (from simple surveys to EPA EMAP stream habitat protocols), and a large amount of analytical chemistry data (river sediment, soil samples, water quality).  As someone who is trying to integrate data from many sources, I'm necessarily trying to come up with a better solution.

I've tried to do as much research as possible into how others have solved the problem.  I've looked at other government agencies with similar data (USGS and US EPA), commercial systems, and things like O&M, OBOE, etc.  I think combining ideas from O&M with concrete ideas from actual database schemas like ODM version 2 (AKA EnviroDB) might be a good route.  For example, the translation layer and collections layer from EnvioroDB seem like great ways to approach translating data from multiple sources.  However, the concrete database systems I've looked at always seem to fall short of encompassing the wide range of data sources that we have.  For example, putting biological data like toxicity data into EnviroDB seems like a stretch unless you define variable names like "P. promelas Survival LC50".  Doing things like that is basically where we are.

I've tried to combine what I see as good ideas from a bunch of places and have attached a ERD of my ideas.  The fields are dummied in and are by no means complete.  Two submodels don't show any relationships (Activity and Controlled Vocabulary) because they would be connected in a bunch of other places.  The core features are to use the O&M sampling feature model to model observed real world entities as subclasses of a general sampling feature entity and to allow arbitrary relationships between these entities.  This would allow us to do things like separate the 'P. promelas' entity that we've observed the survival of from the physical sample that was used for the toxicity test.  My 'Activity' is essentially a combination of SF_Process and OM_Process from O&M and is supposed to represent activities done by people (making measurements, collecting samples, performing a point count, etc).

My questions for folks are:

In general, do I seem to be on the right track?

Is there an existing example of the type of system I'm envisioning?

Is there a better way to integrate biological data into an O&M like schema?

I'm trying to cram a wide variety of data into a single system because a lot of our projects are pretty multidisciplinary and our limited resources means that we can probably only get enough IT support to create a single new system.  I'm mostly trying to come up with a plan so that I can guide our IT folks into coming up with the right solution (getting the best people to do the work, making the specifications, etc).  Any thoughts or pointers to other resources, or opportunies for collaboration with other cash strapped organizations are greatly appreciated.

Thanks again,

Jason Law
Statistician
City of Portland, Bureau of Environmental Services
Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
503-823-1038
jason.law at portlandoregon.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Overview.png
Type: image/png
Size: 66735 bytes
Desc: Overview.png
URL: <http://lists.nceas.ucsb.edu/ecoinformatics/pipermail/obs/attachments/20110505/eae2c8f5/attachment-0001.png>


More information about the obs mailing list