[obs] flexible database schemas

Jeff Horsburgh jeff.horsburgh at usu.edu
Thu May 5 21:33:42 PDT 2011


Jason,

This is probably a longer conversation than I can do justice to in an email, but I did want to respond.  I assume that you are referring the Observations Data Model (ODM) from the CUAHSI Hydrologic Information System (http://his.cuahsi.org/odmdatabases.html) and the EnviroDB schema described at http://www.sensordatabus.org/Pages/AboutEnviroDB.aspx and http://research.microsoft.com/apps/pubs/default.aspx?id=70602. 

In our work on the Observations Data Model (ODM), we have come up against the issue that you describe many times.  Almost immediately after publishing the ODM paper in the Water Resources Research journal we got a lot of comments from people about how 80% of their data fit into ODM but they have a few datasets that did not.  Mostly, it was data that we didn't anticipate with the original ODM design or that we just figured that people wouldn't try to put in there - things like moving sensors, geospatial fields/grids, biological data, and some others.  Also, a lot of people wanted to add new attributes to tables that were not part of the ODM schema - for example adding a "SiteType" attribute to the "Sites" table.

The EnviroDB model does introduce some very useful concepts, but I will warn you - the literature that they put together contains several statements about ODM that just aren't correct, and not all of what they claim are advancements over ODM are really advances.  Also, EnviroDB is not ODM Version 2.  

In fact, we are right now working on ODM Version 2 - and specifically for the reasons you articulate.  Like you, we are considering ideas from O&M as well as EnviroDB/EDM.  I would be more than happy to share the ideas that we have been working on.

--------------------------------------------------------------
Jeffery S. Horsburgh
Utah Water Research Laboratory
Utah State University
8200 Old Main Hill
Logan, UT 84322-8200
Phone: (435) 797-2946  Fax: (435) 797-3663
jeff.horsburgh at usu.edu   http://jeffh.usu.edu
---------------------------------------------------------------

-----Original Message-----
From: obs-bounces at ecoinformatics.org [mailto:obs-bounces at ecoinformatics.org] On Behalf Of Law, Jason
Sent: Thursday, May 05, 2011 2:11 PM
To: 'obs at ecoinformatics.org'
Subject: [obs] flexible database schemas

Hello everyone,

I'm going to apologize immediately for the long e-mail. Thanks to anyone who as the patience to read and offer an opinion.

I'm in the midst of modeling a database schema for environmental and ecological observation data.  As an organization, the city I work for has run up against the common problem of an inflexible data management system specifically designed for one type of observational data.  New programs and the new methods and data that they generate have fallen outside this enterprise system and end up being managed by individuals and small groups all over our agency.  We collect a wide variety of environmental information: hydrology, weather (mostly rain), geological (boreholes, soil cores), avian point counts, stream macroinvertebrate data, habitat (from simple surveys to EPA EMAP stream habitat protocols), and a large amount of analytical chemistry data (river sediment, soil samples, water quality).  As someone who is trying to integrate data from many sources, I'm necessarily trying to come up with a better solution.

I've tried to do as much research as possible into how others have solved the problem.  I've looked at other government agencies with similar data (USGS and US EPA), commercial systems, and things like O&M, OBOE, etc.  I think combining ideas from O&M with concrete ideas from actual database schemas like ODM version 2 (AKA EnviroDB) might be a good route.  For example, the translation layer and collections layer from EnvioroDB seem like great ways to approach translating data from multiple sources.  However, the concrete database systems I've looked at always seem to fall short of encompassing the wide range of data sources that we have.  For example, putting biological data like toxicity data into EnviroDB seems like a stretch unless you define variable names like "P. promelas Survival LC50".  Doing things like that is basically where we are.

I've tried to combine what I see as good ideas from a bunch of places and have attached a ERD of my ideas.  The fields are dummied in and are by no means complete.  Two submodels don't show any relationships (Activity and Controlled Vocabulary) because they would be connected in a bunch of other places.  The core features are to use the O&M sampling feature model to model observed real world entities as subclasses of a general sampling feature entity and to allow arbitrary relationships between these entities.  This would allow us to do things like separate the 'P. promelas' entity that we've observed the survival of from the physical sample that was used for the toxicity test.  My 'Activity' is essentially a combination of SF_Process and OM_Process from O&M and is supposed to represent activities done by people (making measurements, collecting samples, performing a point count, etc).

My questions for folks are:

In general, do I seem to be on the right track?

Is there an existing example of the type of system I'm envisioning?

Is there a better way to integrate biological data into an O&M like schema?

I'm trying to cram a wide variety of data into a single system because a lot of our projects are pretty multidisciplinary and our limited resources means that we can probably only get enough IT support to create a single new system.  I'm mostly trying to come up with a plan so that I can guide our IT folks into coming up with the right solution (getting the best people to do the work, making the specifications, etc).  Any thoughts or pointers to other resources, or opportunies for collaboration with other cash strapped organizations are greatly appreciated.

Thanks again,

Jason Law
Statistician
City of Portland, Bureau of Environmental Services Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
503-823-1038
jason.law at portlandoregon.gov


More information about the obs mailing list