[eml-dev] xml:lang attribute for title in EML 2.1.0
Inigo San Gil
isangil at canyon.lternet.edu
Thu Sep 16 13:46:16 PDT 2010
David, Congrats on the ILTER election - you bring quality experience there.
As for objections, I just wouldnt know what to object to :) .. I dont
see specifics. The quoted text below is the language I see closest
to a specific solution to this issue.
[...]
I proposed a solution for allowing mixed element content for fields that should be internationalized.
[...]
-EML 2.1.1 would be a more relaxed schema than the current EML 2.1.0
-we could augment existing EML-specific parsers to perform additional checks on the mixed content after schema-based validation was performed.
-Metacat already includes [...]
-The EML project has a utility parser that [...]
It sounds good to me, more details welcomed.
Inigo
On 9/16/2010 2:19 PM, David Blankman wrote:
> Matt and EML-Dev,
>
> I think that Ben's solution should be pushed forward. At the ILTER
> level we are starting to push strongly for EML documents from ILTER
> member networks. Having a solution that allows for multiple languages
> in a single document is certainly preferable to two EML documents for
> the same dataset.
>
> By the way, at the ILTER meeting earlier this month, I got elected to
> be the new Chair of the ILTER IM Committee. Kristin has been made
> co-chair of the US-ILTER committee. One of her tasks will be to
> encourage US LTER researchers to start pursuing global synthetic
> research. Having a clear approach to handling EML will help to move
> this forward.
>
> David
> ———————————————————
> Everything is possible with a chocolate cookie!
> - Rabbi Herbie of Jerusalem
>
> If I am not for myself, then who will be for me? If I am for myself
> alone, then who am I? If not now, when?
> - Rabbi Hillel
>
>
> On Thu, Sep 16, 2010 at 9:56 PM, Matt Jones <jones at nceas.ucsb.edu
> <mailto:jones at nceas.ucsb.edu>> wrote:
>
> The solution that Ben proposed is meant to address the
> requirements that arose from the iLTER Lake Taihu meeting for
> providing core metadata in multiple languages. These
> recommendations then were also at the core of the recommendations
> made to GBIF about which fields should contain English
> translations, but the set of fields differs slightly in the two
> recommendations. Because many of these fields are not currently
> repeatable according to the EML 2.1 schema, we would need to, at a
> minumum, change cardinality rules to allow for each field to be
> included multiple times if the xml:lang tag were used to
> differentiate them (or for the approach Inigo points to). As Ben
> points out, it would still be ambiguous as to whether the
> repeating fields represent different information, or the same
> information translated. So his proposal is meant to explicitly
> flag translations as such within mixed content string fields, with
> the goal of doing so without breaking existing EML 2.1
> compatibility and without having to change existing cardinality
> rules.
>
> Ben's prior discussion on this highlighted the conflict with the
> NonEmptyString type that was introduced in EML 2.1, in that mixed
> content elements would not be validated and so the rules for
> NonEmptyString would not be enforced. I think this would only be
> a small issue, and that the advantages in compatibility provided
> by using a mixed content model for language translations outweigh
> the loss of validation within our string types. Either way, we
> would need to add the xml:lang attribute so that it can be used
> throughout EML, including in the translation elements that Ben
> proposed.
>
> Are there any objections to moving forward with the schema changes
> to use a mixed content models for translations that Ben proposed
> in his earlier emails?
>
> Matt
>
> On Thu, Sep 16, 2010 at 11:35 AM, Inigo San Gil
> <isangil at canyon.lternet.edu <mailto:isangil at canyon.lternet.edu>>
> wrote:
>
>
> We'll keep our eyes on the ball, then.
>
> Meanwhile others have adopted their own solution.
> Here are two examples:
> 1) a site from Spain reports this implementation
>
> <title>[Language:En]Snow cover data provided by MODIS
> satellite imagery</title>
> <title>[Language:Sp]Datos de innivación según
> imágenes MODIS</title>
>
> We thought that the use of the XML attribute "lang=en | sp"
> was interesting -but, among other problems, we would have
> gotten screwed by eml-dev eventual internationalization
> implementation. Call it luck, but you can bet the "eventual
> eml-dev decision" would force us to re-code the EML
> generation.
>
> 2) From Taiwan, it is also a mix and match. I had the
> internationalization conversation years ago, when we set
> harvesting into the NBII clearinghouse. at the TFRI, we
> found EML documents that have a hybrid of english and
> chinese, with no sign or whatsoever of the language used.
> We had to devise a mechanism to detect language. We
> simply did not harvest those docs whose critical content
> was not translated in English.
>
> ILTER discussed (two years ago?) some guidelines on
> how the different countries were going to deal with the
> tower of Babel problem. May be you can look into those
> if you feel curious, but if I recall correctly, it went along
> the lines of encoding the metadata in the native language,
> and produce some discovery-level EML in English. This
> strategy would create two EMLs per EML..
>
> Sparks or not, I still have to recommend the EML users
> to implement some solution. Im inclined to suggest that
> such solution 1) does not break the current EML rules.
> 2) The solution should allow for easy language detection.
> Spain's case fits here, for example.
>
> Cheers, inigo
>
>
>
>
>
> On 9/16/2010 10:56 AM, ben leinfelder wrote:
>
> Hi Markus,
> I'm afraid your findings are accurate with respect to the
> xml:lang attribute in the<title> element (or any
> "NonEmptyStringType" element).
> In the course of my experimentation with allowing
> backwards-compatible internationalization with a new EML
> version (2.1.1) I did have to include the
> "http://www.w3.org/XML/1998/namespace" namespace just as
> you did and also declare the xml:lang attribute in
> elements where I wanted to employ it.
> While certain EML elements are repeatable, it's not always
> clear what the presence of multiple elements represent
> (are they translations in different languages or are they
> alternate titles?). In order to clarify this confusion and
> also allow multiple translations for non-repeatable
> elements I proposed a solution for allowing mixed element
> content for fields that should be internationalized.
> There's a fairly comprehensive discussion of this approach
> in our eml-dev archives:
> http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/2010-July/001828.html
> I didn't get a lot of decisive feedback and so have not
> moved forward with releasing an updated EML version.
> Hopefully this thread will again set the ball rolling.
> -ben
> .nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
> <http://nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev>
>
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
>
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20100916/b3963b1e/attachment.html>
More information about the Eml-dev
mailing list