[eml-dev] xml:lang attribute for title in EML 2.1.0

Thu Sep 16 13:46:16 PDT 2010

David, Congrats on the ILTER election - you bring quality experience there.

As for objections, I just wouldnt know what to object to :)   ..  I dont
see specifics.  The quoted text below is the language I see closest
to a specific solution to this issue.

[...]
I proposed a solution for allowing mixed element content for fields that should be internationalized.

[...]
-EML 2.1.1 would be a more relaxed schema than the current EML 2.1.0
	-we could augment existing EML-specific parsers to perform additional checks on the mixed content after schema-based validation was performed.
		-Metacat already includes [...]
		-The EML project has a utility parser that [...]

It sounds good to me, more details welcomed.

Inigo

On 9/16/2010 2:19 PM, David Blankman wrote:
> Matt and EML-Dev,
>
> I think that Ben's solution should be pushed forward. At the ILTER 
> level we are starting to push strongly for EML documents from ILTER 
> member networks. Having a solution that allows for multiple languages 
> in a single document is certainly preferable to two EML documents for 
> the same dataset.
>
> By the way, at the ILTER meeting earlier this month, I got elected to 
> be the new Chair of the ILTER IM Committee. Kristin has been made 
> co-chair of the US-ILTER committee. One of her tasks will be to 
> encourage US LTER researchers to start pursuing global synthetic 
> research. Having a clear approach to handling EML will help to move 
> this forward.
>
> David
> ———————————————————
> Everything is possible with a chocolate cookie!
>   - Rabbi Herbie of Jerusalem
>
> If I am not for myself, then who will be for me? If I am for myself 
> alone, then who am I? If not now, when?
>  - Rabbi Hillel
>
>
> On Thu, Sep 16, 2010 at 9:56 PM, Matt Jones <jones at nceas.ucsb.edu 
> <mailto:jones at nceas.ucsb.edu>> wrote:
>
>     The solution that Ben proposed is meant to address the
>     requirements that arose from the iLTER Lake Taihu meeting for
>     providing core metadata in multiple languages.  These
>     recommendations then were also at the core of the recommendations
>     made to GBIF about which fields should contain English
>     translations, but the set of fields differs slightly in the two
>     recommendations.  Because many of these fields are not currently
>     repeatable according to the EML 2.1 schema, we would need to, at a
>     minumum, change cardinality rules to allow for each field to be
>     included multiple times if the xml:lang tag were used to
>     differentiate them (or for the approach Inigo points to).  As Ben
>     points out, it would still be ambiguous as to whether the
>     repeating fields represent different information, or the same
>     information translated.  So his proposal is meant to explicitly
>     flag translations as such within mixed content string fields, with
>     the goal of doing so without breaking existing EML 2.1
>     compatibility and without having to change existing cardinality
>     rules.
>
>     Ben's prior discussion on this highlighted the conflict with the
>     NonEmptyString type that was introduced in EML 2.1, in that mixed
>     content elements would not be validated and so the rules for
>     NonEmptyString would not be enforced.  I think this would only be
>     a small issue, and that the advantages in compatibility provided
>     by using a mixed content model for language translations outweigh
>     the loss of validation within our string types.  Either way, we
>     would need to add the xml:lang attribute so that it can be used
>     throughout EML, including in the translation elements that Ben
>     proposed.
>
>     Are there any objections to moving forward with the schema changes
>     to use a mixed content models for translations that Ben proposed
>     in his earlier emails?
>
>     Matt
>
>     On Thu, Sep 16, 2010 at 11:35 AM, Inigo San Gil
>     <isangil at canyon.lternet.edu <mailto:isangil at canyon.lternet.edu>>
>     wrote:
>
>
>         We'll keep our eyes on the ball, then.
>
>         Meanwhile others have adopted their own solution.
>         Here are two examples:
>         1) a site from Spain reports this implementation
>
>         <title>[Language:En]Snow cover data provided by MODIS
>         satellite imagery</title>
>         <title>[Language:Sp]Datos de innivaci&#243;n seg&#250;n
>         im&#225;genes MODIS</title>
>
>         We thought that the use of the XML attribute "lang=en | sp"
>         was interesting -but, among other problems,  we would  have
>         gotten screwed by eml-dev eventual internationalization
>         implementation.  Call it luck, but you can bet the "eventual
>         eml-dev decision" would force us to re-code the EML
>         generation.
>
>         2) From Taiwan, it is also a mix and match.  I had the
>         internationalization conversation years ago, when we set
>         harvesting into the NBII clearinghouse.  at the TFRI, we
>         found EML documents that have a hybrid of english and
>         chinese, with no sign or whatsoever of the language used.
>         We had to devise a mechanism to detect language.  We
>         simply did not harvest those docs whose critical content
>         was not translated in English.
>
>         ILTER discussed (two years ago?) some guidelines on
>         how the different countries were going to deal with the
>         tower of Babel problem.  May be you can look into those
>         if you feel curious, but if I recall correctly, it went along
>         the lines of encoding the metadata in the native language,
>         and produce some discovery-level EML in English. This
>         strategy would create two EMLs per EML..
>
>         Sparks or not, I still have to recommend the EML users
>         to implement some solution. Im inclined to suggest  that
>         such solution 1) does not break the current EML rules.
>         2) The solution should allow for easy language detection.
>         Spain's case fits here, for example.
>
>         Cheers, inigo
>
>
>
>
>
>         On 9/16/2010 10:56 AM, ben leinfelder wrote:
>
>             Hi Markus,
>             I'm afraid your findings are accurate with respect to the
>             xml:lang attribute in the<title>  element (or any
>             "NonEmptyStringType" element).
>             In the course of my experimentation with allowing
>             backwards-compatible internationalization with a new EML
>             version (2.1.1) I did have to include the
>             "http://www.w3.org/XML/1998/namespace" namespace just as
>             you did and also declare the xml:lang attribute in
>             elements where I wanted to employ it.
>             While certain EML elements are repeatable, it's not always
>             clear what the presence of multiple elements represent
>             (are they translations in different languages or are they
>             alternate titles?). In order to clarify this confusion and
>             also allow multiple translations for non-repeatable
>             elements I proposed a solution for allowing mixed element
>             content for fields that should be internationalized.
>             There's a fairly comprehensive discussion of this approach
>             in our eml-dev archives:
>             http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/2010-July/001828.html
>             I didn't get a lot of decisive feedback and so have not
>             moved forward with releasing an updated EML version.
>             Hopefully this thread will again set the ball rolling.
>             -ben
>             .nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>             <http://nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev>
>
>
>         _______________________________________________
>         Eml-dev mailing list
>         Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
>         http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>
>     _______________________________________________
>     Eml-dev mailing list
>     Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
>     http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20100916/b3963b1e/attachment.html>