[eml-dev] xml:lang attribute for title in EML 2.1.0

Matt Jones jones at nceas.ucsb.edu
Thu Sep 16 20:55:03 PDT 2010


Ben,

I agree with you on the identifier and creator, contact, etc fields as not
strictly needing translation.  However, even some of those fields may
benefit, such as the use of a name in Mandarin and its Romanized
translation.  Also, I think all of the fields that might allow for general
text should be included, such as methods, etc.  Certainly anything that
accepts TextType might need to be translated in addition to the fields you
listed.  Does that sound reasonable?

Matt

On Thu, Sep 16, 2010 at 6:18 PM, Mark Servilla <servilla at lternet.edu> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I agree with the changes proposed by Ben in his 7/30/10 Jul 30, 2010 email
> -
> that is, the "mixed content" approach.  It is not entirely clear what the
> long-term implications of the changes will be for parsing and validation,
> but
> the short-term approach seems manageable (albeit, with effort from some
> warm
> body).  Internationalization of EML is a necessary step as we move research
> and
> its associated data/metadata to a global level.
>
> I personally do not see the alternatives as being viable long-term
> solutions.
> Multiple documents of differing languages will ultimately be too cumbersome
> and,
> likely, not kept synchronized.  Introducing another inline content
> attribute
> (i.e., <title>[Language:En]Snow cover...) only adds yet more syntactical
> parsing
> issues.  The use of the xml:lang attribute is, at least, a recognized and
> standard approach in many systems.
>
> I appreciate and thank Ben and others for their efforts in this matter.
>
> Sincerely,
> Mark
>
> On 9/16/10 Sep 16, 2010 1:56 PM, Matt Jones wrote:
> > The solution that Ben proposed is meant to address the requirements that
> arose
> > from the iLTER Lake Taihu meeting for providing core metadata in multiple
> > languages.  These recommendations then were also at the core of the
> > recommendations made to GBIF about which fields should contain English
> > translations, but the set of fields differs slightly in the two
> recommendations.
> >  Because many of these fields are not currently repeatable according to
> the EML
> > 2.1 schema, we would need to, at a minumum, change cardinality rules to
> allow
> > for each field to be included multiple times if the xml:lang tag were
> used to
> > differentiate them (or for the approach Inigo points to).  As Ben points
> out, it
> > would still be ambiguous as to whether the repeating fields represent
> different
> > information, or the same information translated.  So his proposal is
> meant to
> > explicitly flag translations as such within mixed content string fields,
> with
> > the goal of doing so without breaking existing EML 2.1 compatibility and
> without
> > having to change existing cardinality rules.
> >
> > Ben's prior discussion on this highlighted the conflict with the
> NonEmptyString
> > type that was introduced in EML 2.1, in that mixed content elements would
> not be
> > validated and so the rules for NonEmptyString would not be enforced.  I
> think
> > this would only be a small issue, and that the advantages in
> compatibility
> > provided by using a mixed content model for language translations
> outweigh the
> > loss of validation within our string types.  Either way, we would need to
> add
> > the xml:lang attribute so that it can be used throughout EML, including
> in the
> > translation elements that Ben proposed.
> >
> > Are there any objections to moving forward with the schema changes to use
> a
> > mixed content models for translations that Ben proposed in his earlier
> emails?
> >
> > Matt
> >
> > On Thu, Sep 16, 2010 at 11:35 AM, Inigo San Gil <
> isangil at canyon.lternet.edu
> > <mailto:isangil at canyon.lternet.edu>> wrote:
> >
> >
> >     We'll keep our eyes on the ball, then.
> >
> >     Meanwhile others have adopted their own solution.
> >     Here are two examples:
> >     1) a site from Spain reports this implementation
> >
> >     <title>[Language:En]Snow cover data provided by MODIS satellite
> imagery</title>
> >     <title>[Language:Sp]Datos de innivaci&#243;n seg&#250;n im&#225;genes
> >     MODIS</title>
> >
> >     We thought that the use of the XML attribute "lang=en | sp"
> >     was interesting -but, among other problems,  we would  have
> >     gotten screwed by eml-dev eventual internationalization
> >     implementation.  Call it luck, but you can bet the "eventual
> >     eml-dev decision" would force us to re-code the EML
> >     generation.
> >
> >     2) From Taiwan, it is also a mix and match.  I had the
> >     internationalization conversation years ago, when we set
> >     harvesting into the NBII clearinghouse.  at the TFRI, we
> >     found EML documents that have a hybrid of english and
> >     chinese, with no sign or whatsoever of the language used.
> >     We had to devise a mechanism to detect language.  We
> >     simply did not harvest those docs whose critical content
> >     was not translated in English.
> >
> >     ILTER discussed (two years ago?) some guidelines on
> >     how the different countries were going to deal with the
> >     tower of Babel problem.  May be you can look into those
> >     if you feel curious, but if I recall correctly, it went along
> >     the lines of encoding the metadata in the native language,
> >     and produce some discovery-level EML in English. This
> >     strategy would create two EMLs per EML..
> >
> >     Sparks or not, I still have to recommend the EML users
> >     to implement some solution. Im inclined to suggest  that
> >     such solution 1) does not break the current EML rules.
> >     2) The solution should allow for easy language detection.
> >     Spain's case fits here, for example.
> >
> >     Cheers, inigo
> >
> >
> >
> >
> >
> >     On 9/16/2010 10:56 AM, ben leinfelder wrote:
> >
> >         Hi Markus,
> >         I'm afraid your findings are accurate with respect to the
> xml:lang
> >         attribute in the<title>  element (or any "NonEmptyStringType"
> element).
> >         In the course of my experimentation with allowing
> backwards-compatible
> >         internationalization with a new EML version (2.1.1) I did have to
> >         include the "http://www.w3.org/XML/1998/namespace" namespace
> just as you
> >         did and also declare the xml:lang attribute in elements where I
> wanted
> >         to employ it.
> >         While certain EML elements are repeatable, it's not always clear
> what
> >         the presence of multiple elements represent (are they
> translations in
> >         different languages or are they alternate titles?). In order to
> clarify
> >         this confusion and also allow multiple translations for
> non-repeatable
> >         elements I proposed a solution for allowing mixed element content
> for
> >         fields that should be internationalized. There's a fairly
> comprehensive
> >         discussion of this approach in our eml-dev archives:
> >
> http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/2010-July/001828.html
> >         I didn't get a lot of decisive feedback and so have not moved
> forward
> >         with releasing an updated EML version. Hopefully this thread will
> again
> >         set the ball rolling.
> >         -ben
> >         .nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
> >         <http://nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev>
> >
> >
> >     _______________________________________________
> >     Eml-dev mailing list
> >     Eml-dev at ecoinformatics.org <mailto:Eml-dev at ecoinformatics.org>
> >
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
> >
> >
> >
> >
> > _______________________________________________
> > Eml-dev mailing list
> > Eml-dev at ecoinformatics.org
> > http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
> - --
> Mark Servilla, Ph.D.
>
> LTER Network Office
> Department of Biology
> MSC 03 2020
> 1 University of New Mexico
> Albuquerque, NM 87131-0001
>
> servilla at LTERnet.edu
> Office (505) 277-2619
> Cell   (505) 453-8593
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.8 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAkySz/4ACgkQqFW3+12RyXOEggCeLtSSf8r3pJty+lv06lk9uSVH
> z0YAn1HQNykMFDCt8zIm02bwMv5iecng
> =z21i
> -----END PGP SIGNATURE-----
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20100916/19ed6fbc/attachment.html>


More information about the Eml-dev mailing list