[obs] flexible database schemas
Simon.Cox at csiro.au
Simon.Cox at csiro.au
Thu May 5 21:49:49 PDT 2011
Jason -
This is interesting work. Good to see.
Looking through my O&M spectacles, I have to ask:
* where is the domain feature? (i.e. the sampled-feature)
* is your 'sample' intended to be equivalent to the O&M Specimen class?
* it looks like you have focussed on what O&M calls the 'result' instead of the observation event. But as a consequence, I can't see where the key temporal properties of the observation act are found - result-time, phenomenon-time. Are they unimportant in your applications? Note that separation of these times is the key to having a system that can report on forecasts as well as estimates of phenomena from the deep past.
Simon Cox
-----Original Message-----
From: obs-bounces at ecoinformatics.org [mailto:obs-bounces at ecoinformatics.org] On Behalf Of Law, Jason
Sent: Friday, 6 May 2011 4:11 AM
To: 'obs at ecoinformatics.org'
Subject: [obs] flexible database schemas
Hello everyone,
I'm going to apologize immediately for the long e-mail. Thanks to anyone who as the patience to read and offer an opinion.
I'm in the midst of modeling a database schema for environmental and ecological observation data. As an organization, the city I work for has run up against the common problem of an inflexible data management system specifically designed for one type of observational data. New programs and the new methods and data that they generate have fallen outside this enterprise system and end up being managed by individuals and small groups all over our agency. We collect a wide variety of environmental information: hydrology, weather (mostly rain), geological (boreholes, soil cores), avian point counts, stream macroinvertebrate data, habitat (from simple surveys to EPA EMAP stream habitat protocols), and a large amount of analytical chemistry data (river sediment, soil samples, water quality). As someone who is trying to integrate data from many sources, I'm necessarily trying to come up with a better solution.
I've tried to do as much research as possible into how others have solved the problem. I've looked at other government agencies with similar data (USGS and US EPA), commercial systems, and things like O&M, OBOE, etc. I think combining ideas from O&M with concrete ideas from actual database schemas like ODM version 2 (AKA EnviroDB) might be a good route. For example, the translation layer and collections layer from EnvioroDB seem like great ways to approach translating data from multiple sources. However, the concrete database systems I've looked at always seem to fall short of encompassing the wide range of data sources that we have. For example, putting biological data like toxicity data into EnviroDB seems like a stretch unless you define variable names like "P. promelas Survival LC50". Doing things like that is basically where we are.
I've tried to combine what I see as good ideas from a bunch of places and have attached a ERD of my ideas. The fields are dummied in and are by no means complete. Two submodels don't show any relationships (Activity and Controlled Vocabulary) because they would be connected in a bunch of other places. The core features are to use the O&M sampling feature model to model observed real world entities as subclasses of a general sampling feature entity and to allow arbitrary relationships between these entities. This would allow us to do things like separate the 'P. promelas' entity that we've observed the survival of from the physical sample that was used for the toxicity test. My 'Activity' is essentially a combination of SF_Process and OM_Process from O&M and is supposed to represent activities done by people (making measurements, collecting samples, performing a point count, etc).
My questions for folks are:
In general, do I seem to be on the right track?
Is there an existing example of the type of system I'm envisioning?
Is there a better way to integrate biological data into an O&M like schema?
I'm trying to cram a wide variety of data into a single system because a lot of our projects are pretty multidisciplinary and our limited resources means that we can probably only get enough IT support to create a single new system. I'm mostly trying to come up with a plan so that I can guide our IT folks into coming up with the right solution (getting the best people to do the work, making the specifications, etc). Any thoughts or pointers to other resources, or opportunies for collaboration with other cash strapped organizations are greatly appreciated.
Thanks again,
Jason Law
Statistician
City of Portland, Bureau of Environmental Services Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
503-823-1038
jason.law at portlandoregon.gov
More information about the obs
mailing list