[Bug 484] - eml-attribute changes needed

Peter McCartney peter.mccartney at asu.edu
Fri May 31 10:23:18 PDT 2002


See inline

Peter McCartney (peter.mccartney at asu.edu)
Center for Environmental Studies
Arizona State University
480-965-6791 

-----Original Message-----
From: Chad Berkley [mailto:berkley at nceas.ucsb.edu]
Sent: Friday, May 31, 2002 9:36 AM
To: Peter McCartney
Cc: Eml-Dev (E-mail)
Subject: RE: [Bug 484] - eml-attribute changes needed


work on the Monarch analytical engine, I'm pretty convinced that a
simple enumeration of units will not suffice for more advanced
automation systems. I've kind of looked into XLink as a method for
linking into the dictionary, but I thought the URI method was more
succinct.

I agree that we need an ontology here. im just thinking of the
usability/acceptance, as well as migrating legacy data. We're still in beta,
so lets just try it since no-one else is objecting. At the very worst, we
may want to call it stmmlUnit and pair it up with a text qualifier so that
if people go with the default dimensionless,they can type something in and a
data manager could recode it later.



On the broader "how should attribute be structured" note, take a look at
my most recent attribute changes and see if you still prefer to have two
complext types or not.  


I'll put comments on this in the other email you wrote today re:attribute
structure. 


Also, I looked at your changes with respect to externalCodeSet.  You
had:
<xs:element name="externalCodeSet">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="codesetName">
        <xs:complexType/>
      </xs:element>
      <xs:choice maxOccurs="unbounded">
        <xs:element name="citation" type="lit:Citation"/>
        <xs:element name="codesetURL" type="xs:anyURI"/>
      </xs:choice>
    </xs:sequence>
  </xs:complexType>
</xs:element>

A couple questions...first, why is there <xs:complexType/> in
codesetName?  Did you mean <xs:element name="codeSetName"
type="xs:string"/> ?  Second, should codesetURL use the

Yes, i had forgotten to set the dataype for codeSetName to xs:string

/eml-distribution/online/ content model....perhaps we should make that a
complexType and you could import it here.

Yes, that makes sense - its probably variable as to the nature of these
url's - they could be just a text page or something more strucutured like a
thesaurus or ontology.

Regarding the keyConstraint enumeration possiblitiy, it occurred to me that
we probably dont need to include the table reference since we can always
figure out which entity we mean by accessing the parent of the referenced
attributes. In practice, the user would probably navigate to these by the
entity, but its not necessary to put it in the metatdata. 

chad

On Thu, 2002-05-30 at 18:22, Peter McCartney wrote:
> Im still not clear why we need the URI in the field. If the applications
are
> providing a list of names ("meters","cubic feet per second") that get
> resolved to uri's , why can't the list of names be an enumeration that is
> part of eml and we only need one url to the dictionary where the names are
> resolved? Wouldn't that save application developers from all having to
> invent their own lists that all do the same thing? Maybe im totally
missing
> something here - ill read more on stmml, and sit back to see if anyone
else
> weighs in.
> 
> 
> regarding enumeration, im not sure why it didnt get into the notes (my own
> included) but its really quite essential that enumeratedDomain consist of
a
> choice between the list of codes that you have in it now, a reference to
an
> externally available domain (like international country codes), or to an
> entity within the dataset that provides the domainlist (such as a table of
> sample site locations).
> 
> Chad, (or anyone else) take a look at the attached version. Ive put in
what
> im talking about regarding enumeration (Ill add the docs if you agree with
> it). Ive also tried a simpler variant on the packaging that i think rings
a
> bit truer to the other files (i used youre edits on storedProcedure as the
> model). in this model, the id's are attached to the content elements and
the
> importer can choose to use either attributeListType or attributeType,
> preventing the end user from being able to choose when do do a list or a
> single. Im heading out now, but ill check my mail later to see if you have
a
> chance to look at it.  
> 
> Peter McCartney (peter.mccartney at asu.edu)
> Center for Environmental Studies
> Arizona State University
> 480-965-6791 
> 
> -----Original Message-----
> From: Chad Berkley [mailto:berkley at nceas.ucsb.edu]
> Sent: Thursday, May 30, 2002 2:21 PM
> To: Eml-Dev (E-mail); Peter McCartney
> Subject: Re: [Bug 484] - eml-attribute changes needed
> 
> 
> See my comments inline below:
> 
> On Thu, 2002-05-30 at 13:09, bugzilla-daemon at ecoinformatics.org wrote:
> > http://bugzilla.ecoinformatics.org/show_bug.cgi?id=484
> 
> > ------- Additional Comments From peter.mccartney at asu.edu  2002-05-30
13:09
> -------
> > Here are some comments.
> > 
> > 1) under storageType,  the prefix xs:  should not be expected since
> prefixes 
> > for content models are defined by the individual schemas that import
them
> and 
> > this is not the context in which these will be used. so i would expect
> people 
> > to type in "string" and not "xs:string", or "xsd:string"
> > 
> 
> agreed, as long as the exact word after the namespace is used, including
> case.  I think this is an application issue.
> 
> > 2) I like the spirit behind your proposed unit field, but i feel the
same
> about 
> > is as i do about connection URLs - i dont believe people will understand
> it 
> > well enough to use it. Typing in
> "http://ecoinformatics.org/unitDictionary?" 
> > for every entry seems a bit awkward and unnecessary. Just like with
> connection 
> > URLs, users will require both a wizard processor to help them construct
it
> as 
> > well as processing code to interpret it, and if the dictionary isn't
> shipped 
> > with eml, then youve now created a dependency on a web address that we
> cant 
> > guarantee will always be there. What i thought was going to come out of
> the 
> > sevilleta discussion was an element that was either free choice or an 
> > enumeration based on a list of stmml type names that are taken from this

> > directory. 
> do you really think that someone who doesn't know EML well is going to
> sit down with a text editor and fill out these fields by hand?  I
> seriously don't.  I wouldn't even do that.  Like I said in the previous
> note on this bug, this would be filled out by some application with a
> hash of normal units to URIs.  the user should never need to know that
> the URI exists.  When you sit down with XMLSpy and create a schema, it
> creates URIs just like this.  You never need to know they are there but
> the XMLSchema namespace depends on them.
> 
> I think the first part of the URI is necessary so that other unit
> dictionaries can be used if need be.  
> 
> I would argue that the dictionary should be shipped with EML so that it
> can be parsed and used by any application that uses EML.  also note,
> that the URI is not a true active URI.  you are not actually linking to
> a live web page where the dictionary exists.  it is merely a way to
> create a unique identifier to the dialect that you are using.
> 
> the problem with having this filled in free form is you can't keep
> people from filling in all sorts of different things for the units. 
> like meters/s2 or m/s2 or met/sec^2.  all of those are homogeneous, but
> how would an automated system know that?  If we want to do any automated
> processing using this metadata, we must have a standard vocabulary for
> describing units.  This is fundamental to most if not all automation
> engines.  There is also no way to successfully integrate two datasets if
> the units cannot be compared.
> 
> > 
> > 
> > The problem seems similar to me to the spatial reference module, in
which 
> > projections that people frequently refer to by a name "UTM zone 12"
> actually 
> > require a fairly complex set of terms and references to standard
> algorithms. In 
> > eml-spatialReference, these are  encoded as complex types that define
> which 
> > parameters need to be filled in. I understand that part of what you want
> to do 
> > is allow a syntax for people to build thier own data types using the 
> > established ontology, but i think they will not respond well to the URI
> model 
> > for doing that. I'm afraid of them simply not filling it in if they can
> either 
> > type in the name they use or pick it from a controlled list. 
> > 
> > Related to this section, i have a question regarding some fields from
ISO
> that 
> > i am trying to eliminate on the grounds that we cover them elsewhere.
for 
> > raster cells, ISO defines cellattributedescription, cellvalueunits, 
> > tonegradation, scalefactor and offset. the first, and perhaps all of
these
> are 
> > covered in eml-attribute.xsd. cellvalueunits has a list of codes that i
> think 
> > could be indetified as an externalcodeset domain if that is brought back
> (see 
> > below). tone gradation is the number of colors(64 colors, 256 colors,
> etc). i 
> > think this could be gotten from storageType, but maybe we need something
> in 
> > enumeratedDomain for numberOfUniqueValues?. 
> 
> I'm not sure that I know what you mean.  I think you want to map these
> ISO fields to eml fields, right?  If you do, I don't know if this
> information belongs in attribute.  Is
> rasterImage/cellattributedescription symantically equivalent to
> attribute/attributeDescription?  What do you propose as a map of ISO
> fields -> eml-attribute fields?
> 
> > scale factor and offset are for any 
> > scale multipliers or delta constants that have been applied to the
values.
> i 
> > think they mean transformations done to allow expression of values that
> are 
> > either larger or have a broader range than can be accomodated by the
> storage 
> > type used (ie using a byte data type for annual accumulation in
thousands
> of 
> > inches). does stmml have a way of deal with this or do we need to leave
> these 
> > in?
> > 
> I don't believe that stmml handles this.  See if any of the examples in
> http://www.xml-cml.org/stmml look right.
> > 
> > 3) the sequence portion of enumeratedDomain needs to repeat so that
> multiple 
> > codes can be entered. I dont thinke each code needs a separate source,
but
> 
> > thats a minor point. My recollection from sevilleta was that we were
going
> to 
> > let enumerated domain include a choice between providing a value list, 
> > providing a reference to an external codeset (codeSetName, 
> > codeSetURI?,codeSetCitation?) or a reference to an entity within the
> dataset 
> > whose data define the domain (entity, codeAttribute,
> codeDefinitionAttribute,
> 
> agreed.  the original note from the sev meeting was:
> 8) move "textDomain" and "enumeratedDomain" up so that they are siblings
> of numeric domain, remove the choice
> 
> it looks like the only thing that didn't happen was to remove the
> choice.
> 
> chad
> 
> 
-- 
-------------------------------
Chad Berkley
National Center for Ecological
Analysis and Synthesis (NCEAS)
735 State St. Ste. 204
Santa Barbara, CA 93101
805-892-2530
berkley at nceas.ucsb.edu
-------------------------------

_______________________________________________
eml-dev mailing list
eml-dev at ecoinformatics.org
http://www.ecoinformatics.org/mailman/listinfo/eml-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20020531/223b0baa/attachment.htm


More information about the Eml-dev mailing list