Granularity of Reusable Content
Scott Chapal
scott.chapal at jonesctr.org
Tue Aug 6 14:11:51 PDT 2002
Matt,
I've only just started thinking about this, but since you invited
comments, here goes.
Matt Jones <jones at nceas.ucsb.edu> writes:
> What we decided upon was to carefully evaluate the eml modules and
> decide which components should be referencable. The guideline was:
> only make it referenceable if there is some identity we wanted to
> maintain (e.g., a person/party), or if there is some substantial
> savings in terms of space/effort by not replicating an entire subtree.
Space is a hard thing to optimize in XML, and somewhat unnecessary.
Effort is a different, more expensive consideration.
> Single elements (e.g., organizationName) that contain only a simple
> string generally did not fit either of these criteria.
Considering effort, how much effort is required to update Metadata if
a person moves? If his phone changes? How about if the Institution
changes its name? If someone's personal URL goes away and needs to be
replaced by a departmental or institutional URL? etc. etc. etc.
I can envision nightmarish maintenance headaches.
OTOH, maybe these could be handled by some automated filter [XSLT] and
their expression in the EML instance document would remain harmlessly
redundant?
Which would be better/easier in Morpho? Should automated updates to
EML and interactive editing via Morpho be able to coexist peacefully?
> So, given that using a references element won't save space for
> single strings, and that it makes it much more complex to validate,
> parse, and interpret the xml document because you have to expand the
> references, do you still think it is worthwhile to make additional,
> finer-grained elements referencable such as <phone>, <onlineUrl>,
> and <organizationName>?
I'm not sure how to determine optimal granularity.
But if I think of it in data-like terms, I'd rather have high
resolution, high frequency data. I can always summarize, subset or
otherwise dumb-down the information for various purposes.
But I can't recreate a dataset from a statistical summary.
Might not this be analagous? In other words, if you put the
complexity in the Schema, then you can let the EML instance take
advantage of it (e.g. for automation of updates) if that is desirable.
If not, then those verbose, repetitive elements and sub-trees remain
legal in the instance.
A matter of degree, I'm sure.
> I can't see it at all for phone or onlineUrl, although I can see a
> bit of an argument for maintaining identity for "organizationName",
> but at this point it doesn't seem worth the additional complexity.
Isn't there a trade off/balance between the complexity (thoroughness?)
of the schema and the 'effort' necessary to create and maintain the
metadata?
As you can see, I'm better at questions than I am at answers.
-Scott
--
Scott E. Chapal_________________________________________________
Database & Network Manager scott.chapal at jonesctr.org
J.W. Jones Ecological Research Center 229.734.4706 x227
Rt. 2. Box. 2324. Newton, GA 31770-9651 229.734.6650 :FAX
More information about the Eml-dev
mailing list