[eml-dev] xml:lang attribute for title in EML 2.1.0

Thu Sep 16 12:56:43 PDT 2010

The solution that Ben proposed is meant to address the requirements that
arose from the iLTER Lake Taihu meeting for providing core metadata in
multiple languages.  These recommendations then were also at the core of the
recommendations made to GBIF about which fields should contain English
translations, but the set of fields differs slightly in the two
recommendations.  Because many of these fields are not currently repeatable
according to the EML 2.1 schema, we would need to, at a minumum, change
cardinality rules to allow for each field to be included multiple times if
the xml:lang tag were used to differentiate them (or for the approach Inigo
points to).  As Ben points out, it would still be ambiguous as to whether
the repeating fields represent different information, or the same
information translated.  So his proposal is meant to explicitly flag
translations as such within mixed content string fields, with the goal of
doing so without breaking existing EML 2.1 compatibility and without having
to change existing cardinality rules.

Ben's prior discussion on this highlighted the conflict with the
NonEmptyString type that was introduced in EML 2.1, in that mixed content
elements would not be validated and so the rules for NonEmptyString would
not be enforced.  I think this would only be a small issue, and that the
advantages in compatibility provided by using a mixed content model for
language translations outweigh the loss of validation within our string
types.  Either way, we would need to add the xml:lang attribute so that it
can be used throughout EML, including in the translation elements that Ben
proposed.

Are there any objections to moving forward with the schema changes to use a
mixed content models for translations that Ben proposed in his earlier
emails?

Matt

On Thu, Sep 16, 2010 at 11:35 AM, Inigo San Gil
<isangil at canyon.lternet.edu>wrote:

>
> We'll keep our eyes on the ball, then.
>
> Meanwhile others have adopted their own solution.
> Here are two examples:
> 1) a site from Spain reports this implementation
>
> <title>[Language:En]Snow cover data provided by MODIS satellite
> imagery</title>
> <title>[Language:Sp]Datos de innivaci&#243;n seg&#250;n im&#225;genes
> MODIS</title>
>
> We thought that the use of the XML attribute "lang=en | sp"
> was interesting -but, among other problems,  we would  have
> gotten screwed by eml-dev eventual internationalization
> implementation.  Call it luck, but you can bet the "eventual
> eml-dev decision" would force us to re-code the EML
> generation.
>
> 2) From Taiwan, it is also a mix and match.  I had the
> internationalization conversation years ago, when we set
> harvesting into the NBII clearinghouse.  at the TFRI, we
> found EML documents that have a hybrid of english and
> chinese, with no sign or whatsoever of the language used.
> We had to devise a mechanism to detect language.  We
> simply did not harvest those docs whose critical content
> was not translated in English.
>
> ILTER discussed (two years ago?) some guidelines on
> how the different countries were going to deal with the
> tower of Babel problem.  May be you can look into those
> if you feel curious, but if I recall correctly, it went along
> the lines of encoding the metadata in the native language,
> and produce some discovery-level EML in English. This
> strategy would create two EMLs per EML..
>
> Sparks or not, I still have to recommend the EML users
> to implement some solution. Im inclined to suggest  that
> such solution 1) does not break the current EML rules.
> 2) The solution should allow for easy language detection.
> Spain's case fits here, for example.
>
> Cheers, inigo
>
>
>
>
>
> On 9/16/2010 10:56 AM, ben leinfelder wrote:
>
>> Hi Markus,
>> I'm afraid your findings are accurate with respect to the xml:lang
>> attribute in the<title>  element (or any "NonEmptyStringType" element).
>> In the course of my experimentation with allowing backwards-compatible
>> internationalization with a new EML version (2.1.1) I did have to include
>> the "http://www.w3.org/XML/1998/namespace" namespace just as you did and
>> also declare the xml:lang attribute in elements where I wanted to employ it.
>> While certain EML elements are repeatable, it's not always clear what the
>> presence of multiple elements represent (are they translations in different
>> languages or are they alternate titles?). In order to clarify this confusion
>> and also allow multiple translations for non-repeatable elements I proposed
>> a solution for allowing mixed element content for fields that should be
>> internationalized. There's a fairly comprehensive discussion of this
>> approach in our eml-dev archives:
>> http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/2010-July/001828.html
>> I didn't get a lot of decisive feedback and so have not moved forward with
>> releasing an updated EML version. Hopefully this thread will again set the
>> ball rolling.
>> -ben
>> .nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20100916/b8969294/attachment.html>