status of EML 2.0

Mon Aug 19 07:57:16 PDT 2002

Hello,

I would like to make an agenda for the Wednesday conference call (I'm
still assuming it's wednesday at 9:00 PDT because I have not heard back
from anyone saying that is a bad time).  I will add the issues in
Peter's original email to the agenda and any others submited in reply
since then.  Please email eml-dev if you want any other issues put on
the agenda.  Please reply no later than noon on Tuesday (tomorrow) so I
have time to figure out the timing.  Also, if you want to be in on the
conference call, please email me so I can setup a verizon conference
session.  I already have confirmation from Tim, Peter and Ken.  

thanks,
chad

On Fri, 2002-08-16 at 16:59, Ken Ramsey wrote:
> Peter,
> 
> I wanted to let you know that I will be replying to your message but I haven't had time to do it today. I will be out of the office on Monday, so I will work on my reply this weekend (no connectivity in the mountains) and will send it to you on Tuesday. I would be interested in participating in the conference call if you think it would be helpful. Other than Monday, I am can be available any time next week.
> 
> Ken
> 
> ----------------------------------------
> Ken Ramsey
> Data Manager
> Jornada Basin LTER Project
> New Mexico State University
> Box 30001, MSC 3AF
> Las Cruces, NM 88003
> (505)646-7918 (office)
> (505)646-5665 (fax)
> keramsey at nmsu.edu
> 
> 
> >>> Peter McCartney<peter.mccartney at asu.edu> 08/15/02 05:32PM >>>
> Ok. Thanks for comments, Chad. rather than respond to responses, ill let it
> digest and we can try to sort some of these out in a call as you suggest.
> I've penciled in Wednesday morning - hopefully, we'll see some others weigh
> in so we can set a plan of action before the next wave of travel in october
> hits.
> 
> Peter McCartney (peter.mccartney at asu.edu)
> Center for Environmental Studies
> Arizona State University
> 480-965-6791 
> 
> -----Original Message-----
> From: Chad Berkley [mailto:berkley at nceas.ucsb.edu] 
> Sent: Thursday, August 15, 2002 3:03 PM
> To: Peter McCartney
> Cc: Eml-Dev (E-mail); 'im at lternet.edu'; 'Scott Chapal'
> Subject: Re: status of EML 2.0
> 
> 
> Hello Peter et. al.,
> 
> I agree that we need to get EML moving again.  We had originally said
> that we would have EML 2.0 out by the beginning of this Summer.  We
> have, however, been able to collect some very good feedback from Scott,
> Tim and others as to the usability of EML2b9 that can only improve our
> final product.  It is only through people trying to use the beta
> releases that we will actually find the modeling errors and other bugs
> that have popped up in the last couple months.  I would encourage every
> one who has an interest in EML's success to try to mark up at least on
> of their datasets in EML and not just wait for the final release.  Your
> time is not wasted in this process!  That having been said, please see
> my comments inline below.
> 
> On Thu, 2002-08-15 at 12:52, Peter McCartney wrote:
> > Hi everyone. 
> > 
> > Its been a busy summer what with travel, vacation, and major scores on the
> > funding front. While we all probably needed a break, we do need to resolve
> > where we are with EML 2.0. Ive noticed a trickle of traffic in bugzilla on
> > minor points, but its so small that I suspect I'm not the only one that is
> > muddling over the path we should be taking with respect to some of the
> > feedback we've been getting. Heres my take on the issues drawn both from
> our
> > experience working with beta 9 this summer and from the workshop. I dont
> > find bugzilla well suited to this level of comment, so i will make them
> here
> > first. Im willing to put in the effort to turn some of these comments into
> > bugs once we have some general sense of how to respond to them, or at
> least
> > agreement that they are bugs. I've cc'd this to the lter IM list so that
> > they can confirm whether or not my interpretations of the workshop
> response
> > are fair.
> > 
> > 1) There were a number of issues that Chris and i both felt were simple
> > errors in beta 9 when we did our walk-through. The following are the most
> > glaring that i recall:
> > 	a) there is no recursive link within project to a related project
> > description
> >  	b) the dataSourceUsed element which links protocol methodSteps to
> > existing eml-datasets from which this dataset was derived is missing
> > 	c) there is no recursive link within protocol to reference an
> > existing protocol (see separate comments on protocol below)
> > 	d)the ascii fixed section of physical doesnt work, nor does it
> > support records with multiple physical lines. We've already defined a
> > structure that does this. 
> > 
> I think these are just bugs that are easily fixed. I could not find any
> of these bugs in bugzilla.  Chris or Peter, do you want to enter them
> since you found them and are familiar with them? 
> 
> > 2) there are some technical problems with the identifier and keyref
> > statements which prevent any instance file from validating. I dont
> > understand this aspect of XML very well so i cant really suggest how to
> fix
> > it or where the problem lies but I assume it is just a technical matter
> and
> > not a fundamental problem with what we are trying to do with references
> >  
> I have not tried to validate an instance document so I have not seen
> this problem.  Will the person who encountered this error please write a
> detailed description and either put it in bugzilla or email it to
> eml-dev.  
> 
> > 3) Literature needs fixing - it doesnt work intuitively with the way most
> of
> > us cite bibliographic information, even after we get some robust name
> > parsing tools written. Ive already enumerated the problems in bugzilla, so
> > ill won't belabor it here. Ive had a student writing XSLs for various
> > journal formats as well as endnote conversion, but they are held up
> waiting
> > for a final version. The fact that the network office is investing so much
> > effort into endnote export format as a means for harvesting bibliographic
> > information is in my opinion not a good letter of recommendation for
> > eml-literature, so we should fix it or drop it in favor of something
> > simpler. 
> I would like to see a model of the proposed structure of a new
> literature module.  I think that some of the fields there are not needed
> and there are others that might be needed.  I would propose that someone
> who is actively working on marking up citations using literature propose
> a model for us to look at. The EndNote export format may be a good
> starting point.  The important part is that we have input from people
> who are actually trying to use eml-literature.
> 
> > 4) The decision to record online distribution only using URLs and only as
> > stateless pointers to a single opaque object will, I fear, force us to
> > seriously limit the role of EML in the future development of a web service
> > based network. The fact that URLs are at best awkward and at worse not
> > useable for expressing some types of connections is one thing, but it is
> the
> > lack of support for describing a stateful connection that bothers me most.
> > Many LTER sites, not just CAP, are attempting to build internet
> applications
> > that are metadata driven and provide an interface (either direct or web
> > service based) to data stored in many different systems including SDE,
> SQL,
> > ascii files and various GIS and hyperspectral formats. While few of us
> > intend to give out the stateful connection information to end users
> > directly, many of us would like to see the development of server-side
> tools
> > follow some standards so that we might all better share software
> components.
> > Without a standard in EML for describing connection information in a
> usable
> > format, the result is adminstrators are force to still develop local
> > solutions and then figure out how to relate them to EML. I'd hate to see
> EML
> > perceived as useful for enabling outside institutions to build
> applications
> > around site data but not very useful for sites in building their own
> > applications.
> I see many of your points, however, in my mind there are 4 requirements
> for the connection model.  1) It must be machine parsable and/or
> directly machine usable.  2) It must not require that we add to or
> change the standard every time a new connection protocol is introduced
> or an existing one is revised. 3)It must not be based on proprietary
> connection protocols that limit the scope of other types of
> connections.  4)  It must be comprehensive, allowing the description of
> any type of connection that one may want to list.
> 
> The current method (using URLs to define connection points) follows
> 1,2,3 and mostly 4.  I'm sure you can find some connection somewhere
> that doesn't have a standard URL, but they are far and few between.  The
> only other method that I can see working is to develop a name/value pair
> connection parameter model, where a connection is defined by the set of
> name/value pairs of needed connection parameters.  The problem with this
> is that to enable cross connections, we may have to have some sort of
> dictionary or map that shows what types of connections need what kinds
> of parameters.  Maybe we need a hybrid of the two.  What do you propose
> should take the place of the URL?
> 
> > 
> > 5) the recent traffic on reusable content partly underscores, I think, our
> > failure to adequately separate storage and management of metadata from its
> > presentation during this design process. 
> I would say, that as EML is an XML metadata standard, presentation has
> no place in EML.  We should be focusing on creating EML as a metadata
> storage container.  Presentation can be done later with stylesheets if
> the structure of the metadata storage mechanism is accurate enough to
> hold all of the facets of the data which it is modeling.  I don't think
> that eml has been built for presentation at all.  We have attempted to
> organize certain sub-categories of information, but that is not
> presentation, that is organization.  The problem with this is that
> everyone tends to think of the data model for EML slightly differently,
> so there have been some disagreements as to what types of information
> needs to be repeatable (normalized) and some of the organization is
> sometimes an issue.
> 
> > The former benefits from a high
> > degree of granularity and normalization, the latter benefits from just the
> > opposite (assuming size of the eml document is not an issue). The
> references
> > element is a device to introduce some normalization capability within EML
> to
> > better serve management of information at the expense of some convenience
> in
> > reading it. Its not likely to satisfy everyone since it doesnt allow
> > addressing between documents and this, as well as granularity, will be a
> > perennial problem when trying to use EML to serve as both a metadata
> > management format as well as a metadata presentation format. For those of
> us
> > that are dynamically building an EML document from a normalized source
> such
> > as a relational database or collections of independent xml fragments, this
> > is far less an issue: we can choose our own level of granularity within
> our
> > storage systems and frankly find it easier to write the same information
> out
> > twice rather than going through the hassle of creating identifiers and
> > remembering what they are during the entire output process. 
> The problem with this approach is that there is no way to know whether
> two sub-trees that have the same content, are, in fact, the same
> object.  For instance, if you have entity alpha(A,B,C,D) and entity
> beta(A,B,C,D) are they the same entity?  If you use references, you know
> that alpha(A,B,C,D) with id=1 is the same object as beta(refid=1).  This
> is very important for machine processing of this metadata.  I would say
> that one should view EML as a metadata propegation unit that no one will
> ever look at.  It is, in essence, a machine language.  You would never
> look at the binary format of an excel file and try to follow the
> pointers around would you?  I don't think a human will try to do that
> with EML.  The presentation should be completely seperate from the
> storage and we must follow the concrete rules that govern when to use a
> relationship and when not to.  See
> http://knb.ecoinformatics.org/software/eml/eml20docs/eml-docbook.html#reusab 
> leContent
> 
> When you talk about the hassle of creating identifiers, I'm not sure
> what you are referencing.  It is really no trouble at all to add
> identifiers programmatically.
> 
> > Id hate to see
> > the issue of references and granularity hold up the design process given
> > that (in my opinion) they aren't really necessary at all in order to
> define
> > EML content (with the one excepton of key definitions in eml-constraint
> > which i dont like).
> I don't see references as holding up the design process.   They are part
> of the design and hence need to be part of the design process.  Like I
> said above, I think they are very necessary for advance machine
> processing with EML.
> 
> > 
> > 6) Finally, and most significantly, the response from the workshop
> indicated
> > that how we have organized project and protocol are at odds with most
> > participants. the problem seems to stem from the fact that most sites view
> > projects as something that exists at a different level from a dataset.
> while
> > most agreed philosopically that there has to be a discreet intellectual
> > activity to produce a dataset, few make any formal recognition of this
> > activity. Instead, most see certain components (data collection methods,
> > sampling, qaqc, etc) as direct properties of the dataset. what is
> recognized
> > as a project seems to be defined more by administrative or research
> criteria
> > that often are on a higher plane than an individual dataset. A few
> > acknowledged that they could live with using the immediate project element
> > to record these more dataset-specific items and include a link to a
> > higher-order project description, but this was hard to visualize at the
> time
> > because that link was missing in beta 9.
> I don't necessarily think that project needs to be at the root level of
> EML.  In fact, I think it used to buried farther down.  If people think
> of it as being farther down in the tree, I have no problem with that
> change.  Where should it go though?  Any ideas?
> 
> > 
> >  There were also similar problems with protocol. As Tim Bergsma put it (in
> > better words that I did), we are trying to use one module to carry both
> > prescriptive and descriptive information. In deciding to make protocol a
> > resource-level element, we have really made the choice to use it as a
> > prescriptive information tool - that is, a way of describing standardized
> > protocols independent of any particular data collection instance. i even
> > recall at least one person saying that their personal interpretation of
> > "protocol" was  As such, the informatin is only peripherally useful for
> > describing the actual methods used to produce a specific dataset. 
> > 
> > There was also some dissatisfaction with the organization of protocol.
> many
> > objected to the idea of binding QAQC descriptions to specific methodStep
> > descriptions. Again, there was no philosophical argument that quality
> > control measures by definition impose control over actions, nevertheless
> it
> > does not agree with how most organize this information. Instead, QAQC
> > descriptions are typically stored indeptendent of the the descriptions of
> > the methodology and cannot be easily linked in this way.  Finally, as
> we've
> > seen in recent email traffic, there are frustrations with the perenial
> gray
> > area of blending pure content markup (XML) with formating markup (for
> > predominantly textual content). 
> I would agree with this.  I'm not sure how to handle it though. I'll
> need to think more about it.
> 
> > 
> > If I were to suggest changes to Beta9 to best address these responses,
> they
> > might go something like this. I would change eml-project to be
> predominantly
> > a research project description including stafffing, funding, publications,
> > and links to higher level projects.. I would also leave eml-protocol as a
> > resource module, but make it predominately text based and prescriptive,
> used
> > only when a prodedure has been formally worked out and used by many
> > datasets. 
> Sounds logical.
> 
> > I would make a new module called methods, which i would use in
> > every place that we now use protocol. methods would contain a repeatable
> > methodStep element, which in turn would include references to source
> > datasets (type eml-dataset), software (type eml-software),
> instrumentation,
> > and any QAQC procedures that can be logically related to those steps.
> > Methods would also include optional links to eml-literature and
> eml-protocol
> > as references to formally published or cataloged prodedures.  I would
> create
> > a new module researchContext in which i would include the methodological
> > descriptors that directly qualify this dataset like site description,
> > sampling, and the above methods module. Finally, for QAQC information that
> > arent described under methodology but are directly related to specific
> > attributes in the data, I would suggest using the data-quality module (in
> > its current incarnations as attributeAccuracy, horizonalAccuracy and
> > verticalAccuracy) should be used as the mechanism for describing both data
> > quality and the various control/assurance procedures used to arrive at
> that
> > quality. 
> > 
> > These of course are pretty extreme changes for a beta 9. I think that they
> > probably describe something much more inline with metadata that the
> current
> > LTER network is producing, but we have to weigh that against time-honored
> > software development procedures which are designed to prevent knee-jerk
> > changes like this so late in the game! With both momentum from the LTER
> > network workshops and a desire to get going on the new ITR(s), we need EML
> > 2.0 out the door soon or we will begin to lose our focus but we also need
> it
> > to work. 
> It seems like this would work fine.  I need to think it over a bit more
> though.
> 
> > 
> > So what's the best course of action? an irc meeting? conference call? wait
> > for a few days to see who responds to this email? 
> I think we should give people a few days to digest this then have a
> conference call with all interested parties some time next week.  I
> propose Wednesday at 9:00 PDT for the call. Who would want to be in on
> the call?  I think it would be good to have Scott and/or Tim and/or
> other interested outside parties in on it to get an outside perspective,
> lest we fall back into our original arguments that we've been having for
> 2 years.  Anyone interrested in the call respond to eml-dev.  I can set
> up an Verizon conference call if we don't have a situation where we can
> chain enough 2 line phones together.  If you have a problem with the
> date/time of the call, propose a different one.  I'm available all next
> week.
> 
> chad
> 
>  
> -- 
> -----------------------
> Chad Berkley
> National Center for 
> Ecological Analysis 
> and Synthesis (NCEAS)
> berkley at nceas.ucsb.edu 
> -----------------------
> _______________________________________________
> eml-dev mailing list
> eml-dev at ecoinformatics.org
> http://www.ecoinformatics.org/mailman/listinfo/eml-dev
-- 
-----------------------
Chad Berkley
National Center for 
Ecological Analysis 
and Synthesis (NCEAS)
berkley at nceas.ucsb.edu
-----------------------