[eml-dev] [Bug 585] internationalization needed in EML
bugzilla-daemon at ecoinformatics.org
bugzilla-daemon at ecoinformatics.org
Wed Jul 21 16:41:43 PDT 2010
http://bugzilla.ecoinformatics.org/show_bug.cgi?id=585
ben leinfelder <leinfelder at nceas.ucsb.edu> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |leinfelder at nceas.ucsb.edu
--- Comment #5 from ben leinfelder <leinfelder at nceas.ucsb.edu> 2010-07-21 16:41:42 PDT ---
Looking into this more concretely, it doesn't appear to be overly daunting.
For the dataset/title element I've extended the non-empty string type that it
currently is in EML 2.1.0 to allow "mixed" content so that the original title
can remain unchanged and additional translations can be added. This means
multi-language fields would not need to be modified en masse when upgrading
documents from existing EML 2.1.0 to EML 2.1.1 (or whatever EML version number
we choose).
Fragment supporting old and new:
----------
<title>
Original title
<!-- language translations -->
<value xml:lang="en">Title in English</value>
<value xml:lang="es">Titulo en Español</value>
</title>
----------
The relevant XSD change is below:
----------
<xs:complexType name="i18nNonEmptyStringType" mixed="true">
<xs:sequence>
<xs:element name="value" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="NonEmptyStringType">
<xs:attribute ref="xml:lang" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
----------
While this is valid XML that conforms to a valid schema, I'm not sure the
"mixed" type will be the easiest to work with. Presumably XML parsers can
deftly handle mixed elements, but I imagine there will be a few unanticipated
gotchyas. Certainly for xPath-like queries, we'd need to be explicit about
searching both the original element and any possible localized versions
(sub-elements) of it, but this is an inescapable hurdle for any
internationalization solution.
The current 'DocBook' style structure for some text fields (the "abstract"
comes to mind) already uses "mixed" elements where text and structure can be
interleaved. I believe multi-lingual extensions to those elements would be
similarly straight forward as what I've described above.
--
Configure bugmail: http://bugzilla.ecoinformatics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
More information about the Eml-dev
mailing list