[eml-dev] eml globalization

Fri Jun 25 12:45:26 PDT 2010

Matt,

Thank you for clarifying the issue.

 From an ILTER perspective I am excited to hear that you are willing to
commit resources to modifying Morpho to create multi-lingual documents.

As part of the ILTER information management committee, I am willing to
coordinate efforts to get native-speaker translations. My Java programming
skills are not good enough to make a substantive contribution to Morpho.

As I read your comments, it looks like EML will require modification in
order to adequately accomodate internationalization and that the changes are
not trivial. Because this is so important to ILTER, I am will take have a
significant involvement. On the other hand, I know that i am not currently
up to speed on the parser issues.

There will be an ILTER meeting in Israel in August. Chin Chau-Lin from
Taiwan will be there. Chin had also proposed a follow-up meeting to the Lake
Taihu meeting which he said could be hosted in Taiwan.

I would appreciate your suggestions as far as the process for moving
forward.

David
———————————————————
Everything is possible with a chocolate cookie!
 - Rabbi Herbie of Jerusalem

If I am not for myself, then who will be for me? If I am for myself alone,
then who am I? If not now, when?
- Rabbi Hillel

On Thu, Jun 24, 2010 at 11:28 PM, Matt Jones <jones at nceas.ucsb.edu> wrote:

> Hi --
>
> This is an important issue, and one that I think we should tackle very soon
> for EML as we have a lot of new international groups producing EML in many
> languages.  I was in Brazil 2 weeks ago setting up a Metacat for PELD and
> the issue of supporting multiple languages came up immediately.  We've
> discussed this in the past, and the approach I was thinking of is summarized
> here:
>
> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=585#c4
>
> The alternate solution of producing multiple metadata documents each in a
> different language has the problem of not knowing how to locate a particular
> translation -- I guess it would be done by file naming convention, but this
> is problematic as it is difficult to standardize without a specification.
>
> The three ways I can see doing this are:
> 1) At the element level, allow repeating content in multiple languages
>    -- matches how ISO19115 does it
>    -- this is the proposal in bug
> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=585#c4
> 2) At the document level, allow two or more sections, each in their own
> language
> 3) Multiple documents
>
> Personally, I think the 1st is the most approachable, and allows groups to
> add translated content for a few fields easily.  Marking these with the
> appropriate locale using xml:lang and related attributes would be
> straightforward.  The hard part would be changing the content model of EML
> to allow the repeating fields -- it would be best if we could do this in a
> way that does not invalidate existing EML 2.1 documents, but I'm not sure if
> that is possible.  Also, as attributes in XML can not repeat, we'd need to
> determine how to best provide translations for attribute content -- we use
> few attributes in EML (mostly for things like packageId), so maybe they
> don't need to be translated at all.
>
> Thanks to our contributions from our collaborators in Taiwan, we now have a
> localizable version of Morpho, with the UI translated into Chinese,
> Japanese, Spanish, French, and Portuguese.  So the next version of Morpho
> would support the UI in multiple languages when we release it -- we need to
> get native speakers from those languages to help validate and fix the
> translations.  It would be great if we could also add in multi-language
> support for metadata content in that same release. If you're interesting in
> seeing this development version of Morpho, contact Ben Leinfelder and he can
> point you in the right direction.
>
> We'd be willing to put some time into i18n for Morpho and EML over the next
> 6 months if others want to help out too.  New releases need not take a long
> time, assuming that people are willing to contribute to making sure the
> changes are broadly acceptable and won't break a lot for existing EML users.
>
> Matt
>
>
> On Thu, Jun 24, 2010 at 12:03 PM, David Blankman <dblankman1 at gmail.com>wrote:
>
>> Inigo,
>>
>> We talked about the possibility of using one document with repeating
>> elements  with a language tag, but I think that it creates a document that
>> is confusing. EML is sufficiently complex even in one language. Personally I
>> do think that mixing languages is a good idea.
>>
>> I am copying Matt and Eamonn O Tuama (GBIF) on this since both were a part
>> of the meeting in China. They may have different ideas. GBIF, I know, deals
>> with multiple languages on a regular basis.
>>
>> It seems to me that mixing languages creates two problems. For the human
>> reader, it makes the document harder to read. You have more experience with
>> the machine parsing approach than I do, but intuitively it seems to me that
>> it is easier to parse two single language documents than one mixed document,
>> although clearly one can use the language tag to separate the two languages.
>> ILTER is a resource poor organization relying on volunteers. ILTER doesn't
>> have the resources to develop the parsing of a mixed document.
>>
>> Most ILTER users have minimal information management people. There are
>> exceptions: China and Taiwan are the most obvious. But their technical
>> expertise cannot be counted upon by ILTER in general.
>>
>> It also seems to me that generating a mixed document is more difficult.
>> Morpho can be used easily to create two documents. Creating a mixed
>> document, as far as I know, requires either hand editing or the development
>> of a tool specifically for this purpose. Since ILTER does not have the
>> resources to create such a tool, I think the recommendation has to be two
>> separate documents.
>>
>> Kristin, Matt or Eamonn, feel free to to comment.
>>
>> David
>>
>>
>> ———————————————————
>> Everything is possible with a chocolate cookie!
>>  - Rabbi Herbie of Jerusalem
>>
>> If I am not for myself, then who will be for me? If I am for myself alone,
>> then who am I? If not now, when?
>> - Rabbi Hillel
>>
>>
>> 2010/6/24 Inigo San Gil <isangil at canyon.lternet.edu>
>>
>>>
>>> Thanks David,
>>>
>>> Cool.. as for the actual implementation:
>>>
>>> For example, the title tag can be duplicated, so i can see having this
>>> sort of logic
>>>
>>> <title>[Language:En]Snow cover data provided by MODIS satellite imagery
>>> </title>
>>> <title>[Language:Sp]Datos de innivaci&#243;n seg&#250;n im&#225;genes
>>> MODIS</title>
>>> <creator>(this translation would only apply for non latin
>>> codesets)</creator>
>>> <abstract>
>>>   <para>[Language:En] These data shows all the information obtained
>>> through the MODIS atellite imagery about the Snow cover at Sierra
>>> Nevada</para>
>>>         <para>[Language:Sp]Incluye toda la informaci&#243;n obtenida de
>>> las im&#225;genes de sat&#233;lite de MODIS sobre nieve en Sierra
>>> Nevada</para>
>>>   </abstract>
>>> etc...
>>>
>>> An alternative would be to tweak EML to allow for an attribute "lang"
>>> within the EML tags
>>> (this could be painful as it would need to be sanctioned by eml-dev -- a
>>> 2 year wait or more)
>>>
>>> <title lang='en'>Snow cover data provided by MODIS satellite imagery
>>> </title>
>>> <title lang='sp'>Datos de innivaci&#243;n seg&#250;n im&#225;genes
>>> MODIS</title>
>>>
>>> But if I understand it correctly, ILTER suggests two documents,
>>> (optionally).
>>> One must be at least be "discovery level" in english, and other "full
>>> document" in the
>>> native tongue.  Is this what we should do?
>>> like "snowcover.xml" (packageId='knb-spainlster-snv-en.0100.1230493704'
>>> and "innivacion.xml" (packageId='knb-spainlster-snv-sp.0100.1230493704'
>>>
>>> (note the different scope in the packageId)
>>>
>>> i dont know of any specific implementations, all i encountered in dealing
>>> with this monster issue is the Taiwan EML, which does not follow a unique
>>> strategy.  may be i should take a look at the Brazilian or Chilean EML and
>>> such (if they have any..)
>>>
>>> cheers,
>>> Inigo
>>>
>>>
>>>
>>> David Blankman wrote:
>>>
>>>> Hi Inigo,
>>>>
>>>> We discussed this issue in an ILTER workshop in China. This workshop
>>>> produced a recommendation which the ILTER coordinating committee agreed
>>>> at
>>>> the ILTER meeting in Slovakia in 2008. The strategy is to provide,
>>>> at minimum a basic discovery level document in English to include:
>>>> title,
>>>> creator, contact, abstract, and keywords.  A site could then produce a
>>>> full
>>>> document in the native language. In both bases the language tag should
>>>> probably be used.
>>>>
>>>> Let me know if you need more information.
>>>>
>>>> David
>>>> ———————————————————
>>>> Everything is possible with a chocolate cookie!
>>>>  - Rabbi Herbie of Jerusalem
>>>>
>>>> If I am not for myself, then who will be for me? If I am for myself
>>>> alone,
>>>> then who am I? If not now, when?
>>>> - Rabbi Hillel
>>>>
>>>>
>>>> On Thu, Jun 24, 2010 at 8:59 PM, Inigo San Gil
>>>> <isangil at canyon.lternet.edu>wrote:
>>>>
>>>>
>>>>
>>>>> remind me, David
>>>>>
>>>>> how are we tackling the Babelian problem in EML? are we duplicating
>>>>> titles,
>>>>> and descriptive tags in the natural language and english? do we use
>>>>> some
>>>>> sort of XML attribute to denote the language? separate EML docs? what
>>>>> was
>>>>> the strategy outlined at ISEI6 (cancun)?
>>>>>
>>>>> it is urgent cause the spaniards are producing EML, and we are
>>>>> wondering
>>>>> what would be the best way.  I know the Taiwan TFRI have a
>>>>> mix-and-match of
>>>>> instances (all chinese, a mix of chinese and english, all english).
>>>>> cheers, inigo
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20100625/ae00c018/attachment.html>