[eml-dev] xml:lang attribute for title in EML 2.1.0

Thu Sep 16 13:19:23 PDT 2010

Matt and EML-Dev,

I think that Ben's solution should be pushed forward. At the ILTER level we
are starting to push strongly for EML documents from ILTER member networks.
Having a solution that allows for multiple languages in a single document is
certainly preferable to two EML documents for the same dataset.

By the way, at the ILTER meeting earlier this month, I got elected to be the
new Chair of the ILTER IM Committee. Kristin has been made co-chair of the
US-ILTER committee. One of her tasks will be to encourage US LTER
researchers to start pursuing global synthetic research. Having a clear
approach to handling EML will help to move this forward.

David
———————————————————
Everything is possible with a chocolate cookie!
  - Rabbi Herbie of Jerusalem

If I am not for myself, then who will be for me? If I am for myself alone,
then who am I? If not now, when?
 - Rabbi Hillel

On Thu, Sep 16, 2010 at 9:56 PM, Matt Jones <jones at nceas.ucsb.edu> wrote:

> The solution that Ben proposed is meant to address the requirements that
> arose from the iLTER Lake Taihu meeting for providing core metadata in
> multiple languages.  These recommendations then were also at the core of the
> recommendations made to GBIF about which fields should contain English
> translations, but the set of fields differs slightly in the two
> recommendations.  Because many of these fields are not currently repeatable
> according to the EML 2.1 schema, we would need to, at a minumum, change
> cardinality rules to allow for each field to be included multiple times if
> the xml:lang tag were used to differentiate them (or for the approach Inigo
> points to).  As Ben points out, it would still be ambiguous as to whether
> the repeating fields represent different information, or the same
> information translated.  So his proposal is meant to explicitly flag
> translations as such within mixed content string fields, with the goal of
> doing so without breaking existing EML 2.1 compatibility and without having
> to change existing cardinality rules.
>
> Ben's prior discussion on this highlighted the conflict with the
> NonEmptyString type that was introduced in EML 2.1, in that mixed content
> elements would not be validated and so the rules for NonEmptyString would
> not be enforced.  I think this would only be a small issue, and that the
> advantages in compatibility provided by using a mixed content model for
> language translations outweigh the loss of validation within our string
> types.  Either way, we would need to add the xml:lang attribute so that it
> can be used throughout EML, including in the translation elements that Ben
> proposed.
>
> Are there any objections to moving forward with the schema changes to use a
> mixed content models for translations that Ben proposed in his earlier
> emails?
>
> Matt
>
> On Thu, Sep 16, 2010 at 11:35 AM, Inigo San Gil <
> isangil at canyon.lternet.edu> wrote:
>
>>
>> We'll keep our eyes on the ball, then.
>>
>> Meanwhile others have adopted their own solution.
>> Here are two examples:
>> 1) a site from Spain reports this implementation
>>
>> <title>[Language:En]Snow cover data provided by MODIS satellite
>> imagery</title>
>> <title>[Language:Sp]Datos de innivaci&#243;n seg&#250;n im&#225;genes
>> MODIS</title>
>>
>> We thought that the use of the XML attribute "lang=en | sp"
>> was interesting -but, among other problems,  we would  have
>> gotten screwed by eml-dev eventual internationalization
>> implementation.  Call it luck, but you can bet the "eventual
>> eml-dev decision" would force us to re-code the EML
>> generation.
>>
>> 2) From Taiwan, it is also a mix and match.  I had the
>> internationalization conversation years ago, when we set
>> harvesting into the NBII clearinghouse.  at the TFRI, we
>> found EML documents that have a hybrid of english and
>> chinese, with no sign or whatsoever of the language used.
>> We had to devise a mechanism to detect language.  We
>> simply did not harvest those docs whose critical content
>> was not translated in English.
>>
>> ILTER discussed (two years ago?) some guidelines on
>> how the different countries were going to deal with the
>> tower of Babel problem.  May be you can look into those
>> if you feel curious, but if I recall correctly, it went along
>> the lines of encoding the metadata in the native language,
>> and produce some discovery-level EML in English. This
>> strategy would create two EMLs per EML..
>>
>> Sparks or not, I still have to recommend the EML users
>> to implement some solution. Im inclined to suggest  that
>> such solution 1) does not break the current EML rules.
>> 2) The solution should allow for easy language detection.
>> Spain's case fits here, for example.
>>
>> Cheers, inigo
>>
>>
>>
>>
>>
>> On 9/16/2010 10:56 AM, ben leinfelder wrote:
>>
>>> Hi Markus,
>>> I'm afraid your findings are accurate with respect to the xml:lang
>>> attribute in the<title>  element (or any "NonEmptyStringType" element).
>>> In the course of my experimentation with allowing backwards-compatible
>>> internationalization with a new EML version (2.1.1) I did have to include
>>> the "http://www.w3.org/XML/1998/namespace" namespace just as you did and
>>> also declare the xml:lang attribute in elements where I wanted to employ it.
>>> While certain EML elements are repeatable, it's not always clear what the
>>> presence of multiple elements represent (are they translations in different
>>> languages or are they alternate titles?). In order to clarify this confusion
>>> and also allow multiple translations for non-repeatable elements I proposed
>>> a solution for allowing mixed element content for fields that should be
>>> internationalized. There's a fairly comprehensive discussion of this
>>> approach in our eml-dev archives:
>>> http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/2010-July/001828.html
>>> I didn't get a lot of decisive feedback and so have not moved forward
>>> with releasing an updated EML version. Hopefully this thread will again set
>>> the ball rolling.
>>> -ben
>>> .nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>
>>
>> _______________________________________________
>> Eml-dev mailing list
>> Eml-dev at ecoinformatics.org
>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>
>
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20100916/dc73e450/attachment-0001.html>