[kepler-users] Workflow XML validator using JDOM

Matt Jones jones at nceas.ucsb.edu
Tue Jan 12 23:52:54 PST 2010


Yeah, under DTDs there were only limited ways to relax validation for just
an element.  You actually can use CDATA, because you can then include any
characters that you want, including markup characters -- none of it is
parsed, so escaping is not needed.  However, because it is not parsed, CDATA
sections don't emit the same events during SAX parsing -- they are emitted
as CDATA events, so the SAX parser would be unaware of the element start and
end events that ptolemy probably uses now.

An alternative using DTDs is to set the element to ANY content, which allows
arbitrary mixtures of element and PCDATA text, and any well-formed XML
elements will satisfy the validator.

Under XML Schema, there is a richer option to use a similar 'xs:any' element
to define a content model that allows arbitrary content.  It also allows you
to set the validation policy for xs:any elements to one of strict (must have
an associated schema for the element's namespace, and the content must
validate), lax (a schema for the namespace is optional, but if provided the
element must validate), or skip (any provided schemas are ignored for the
namespace, and the validation stage is totally skipped when handling element
content).

For MoML, I don't see a major advantage of validating because the DTD does
not encode a lot of the semantics of the language that is used by the
ptolemy engine. For some other XML schemas we use, the typing is much more
explicit and so validating is a fast way to be sure providers are producing
reasonable content that matches the semantics expected by the application.
 But this wouldn't work so well with the ANY model in DTDs.  The nice thing
about processContents='lax' under XML schema is that you can have an xs:any
element that allows arbitrary content, and that content can be validated
when a schema for the namespace is provided, but it can also can be skipped
when a schema is unavailable.   We've used this in some of our metadata
languages.  For example, a reasonable definition for the configure element
using XML Schema might be:

<xs:element name="configure">
  <xs:complexType>
    <xs:sequence>
      <xs:any processContents='lax' />
    </xs:sequence>
  </xs:complexType>
</xs:element>

This would allow you to provide schemas for some content (e.g., for PlotML),
or other times omit the schema and skip validation altogether.  This could
be useful in validating that models are legitimate and sensible.  One major
disadvantage is that validation is slow (especially compared to the
lightweight, non-validating SAX parser ptolemy is using now), and so adding
in validation to the MoML parsing would be bound to slow down the model
processing.  So I personally don't think it would be worth the added
overhead.

Matt

On Tue, Jan 12, 2010 at 7:13 PM, Edward A. Lee <eal at eecs.berkeley.edu>wrote:

>
> The intent of the <configure> tag in the MoML DTD is to be able
> to include _foreign_ XML (with its own DTD) inside a MoML file.
> The plotter uses this (the DTD is PlotML), as do a number of other
> actors.
>
> At the time, there was no clean way to do this in XML. In my view,
> this was a major oversight/failing in XML. It might worth
> revisiting the issue to see whether there is a way to do it now
> that is "approved" XML.
>
> Note that using CDATA won't work, because (at least then) CDATA could
> not include _any_ XML.  I suppose it might be possible if everything
> were escaped, but with nesting this could get pretty messy.
>
> So the first question I would ask is:
>
> Why do we care about the validator?
>
> If there is a really good reason, then this is worth revisiting.
> Otherwise, I would simply declare that we've improved XML but supporting
> heterogeneous schema (just as we've improved models of computation
> but supporting heterogeneous MoCs). In this case, the validator
> needs to be fixed :-)
>
> Edward
>
>
>
> On 1/12/10 12:35 PM, Matt Jones wrote:
>
>> Hi Christopher,
>>
>> Actually, #PCDATA stands for *parsed* character data, and is explicitly
>> the type of element content that is parsed text (for additional markup
>> and entity substitutions, for example).  If you want configure to be
>> plain text and to not be parsed for markup, then it should be typed as
>> CDATA content, which is still character data, but is not parsed.  I
>> think this has not been an issue because in general you don't validate
>> the MoML instance documents against the DTD, but I suspect you will find
>> a lot of validity errors if you did.
>>
>> Matt
>>
>> 2010/1/12 Christopher Brooks <cxh at eecs.berkeley.edu
>> <mailto:cxh at eecs.berkeley.edu>>
>>
>>
>>    Hi Josep,
>>    I'm not an XML expert, but my understanding is that things
>>    inside a <configure> tag is #PCDATA, which should not be interpreted
>>    against the DTD?
>>
>>    http://ptolemy.eecs.berkeley.edu/xml/dtd/MoML_1.dtd
>>    has:
>>    <!ELEMENT configure (#PCDATA)>
>>
>>    For information about the configure tag, see page 220 of Volume 1 of
>>    the Ptolemy II Design Doc at
>>    http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-28.html:
>>    --start
>>    A second, much more flexible mechanism is provided for
>>    parameterizing entities.  A configure
>>    element can be used to specify a relative or absolute URL pointing
>>    to a file that configures the entity,
>>    or it can be used to include the configuration information in line.
>>    That information need not be MoML
>>    information. It need not even be XML, and can even be binary encoded
>>    data (although binary data cannot
>>    be in line; it must be in an external file). For example,
>>    <entity name="sink" class="ptolemy.actor.lib.SequencePlotter">
>>    <configure source="url"/>
>>    </entity>
>>    Here, url can give the name of a file containing data, or a URL for
>>    a remote file. (For the Sequence-
>>    Plotter actor, that external data will have PlotML syntax; PlotML is
>>    another XML schema for configuring
>>    plotters.) Configure information can also be given in the body of
>>    the MoML file as follows:
>>
>>    <entity name="sink" class="ptolemy.actor.lib.SequencePlotter">
>>    <configure>
>>    configure information
>>    </configure>
>>    </entity>
>>    With the above syntax, the configure information must be textual
>>    data. It can contain XML markup
>>    with only one restriction: if the tag “</configure>” appears in the
>>    textual data, then it must be preceeded
>>    by a matching “<configure>”. That is, any configure elements in the
>>    markup must have balanced
>>    start and end tags.
>>
>>    You can give both a source attribute and in-line configuration
>>    information, as in the following:
>>
>>    <entity name="sink" class="ptolemy.actor.lib.SequencePlotter">
>>    <configure source="url">
>>    configure information
>>    </configure>
>>    </entity>
>>    In this case, the file data will be passed to the application first,
>>    followed by the in-line configuration
>>    data.
>>    In Ptolemy II, the configure element is supported by any class that
>>    implements the Configurable
>>    interface. That interface defines a configure() method that accepts
>>    an input stream. Both external file
>>    data and in-line data are provided to the class as a character
>>    stream by calling this method.
>>    There is a subtle limitation with using markup within the configure
>>    element. If any of the elements
>>    within the configure element match MoML elements, then the MoML DTD
>>    will be applied to assign
>>    default values, if any, to their attributes. Thus, this mechanism
>>    works best if the markup within the
>>    configure element is not using an XML schema that happens to have
>>    element names that match those
>>    in MoML. Alternatively, if it does use MoML element names, then
>>    those elements are used with their
>>    MoML meaning. This limitation can be fixed using XML namespaces,
>>    something we will eventually
>>    implement.
>>    --end--
>>
>>    My guess is that JDOM needs to be adjusted so that it does not
>>    try to interpret the #PCDATA?  A quick google search did not bring
>>    up anything.
>>
>>    You could also try running JDOM on a model that has a Ptolemy PlotML
>>    xml in a configure tag.
>>    For example,
>>
>> http://ptolemy.eecs.berkeley.edu/ptolemyII/ptIIlatest/ptII/ptolemy/domains/sdf/demo/Butterfly/Butterfly.xml
>>    has
>>
>>    <entity name="XY Plotter" class="ptolemy.actor.lib.gui.XYPlotter">
>>    <property name="fillOnWrapup" class="ptolemy.data.expr.Parameter"
>>    value="true">
>>    </property>
>>    <property name="_windowProperties"
>>    class="ptolemy.actor.gui.WindowPropertiesAttribute"
>>    value="{bounds={258, 206, 500, 344}, maximized=false}">
>>    </property>
>>    <property name="_plotSize" class="ptolemy.actor.gui.SizeAttribute"
>>    value="[500, 300]">
>>    </property>
>>    <property name="startingDataset" class="ptolemy.data.expr.Parameter"
>>    value="0">
>>    </property>
>>    <property name="_location" class="ptolemy.kernel.util.Location"
>>    value="[488.5, 265.5]">
>>    </property>
>>    <configure>
>>    <?plotml <!DOCTYPE plot PUBLIC "-//UC Berkeley//DTD PlotML 1//EN"
>>    "http://ptolemy.eecs.berkeley.edu/xml/dtd/PlotML_1.dtd">
>>    <plot>
>>    <title></title>
>>    <xRange min="-2.4845557281711192" max="3.9293144658408363"/>
>>    <yRange min="-3.6505369457601655" max="3.6505369457601127"/>
>>    <noGrid/>
>>    </plot>?>
>>    </configure>
>>    </entity>
>>
>>    It could be that it would be better if the dtd was declared in
>>    <svg>...</svg> example,
>>    but I'm no expert in this area.  It could be that by definition, a
>>    validating parser
>>    tries to validate #PCDATA.
>>
>>    _Christopher
>>
>>
>>
>>
>>
>>    On 1/12/10 11:25 AM, Chad Berkley wrote:
>>
>>        Hi Josep,
>>
>>        I'll let Christopher Brooks explain why it's not in the DTD (the
>>        ptolemy
>>        project controls the MoML DTD). I can tell you that the <svg>
>>        element is
>>        used to set SVG icons in ptolemy. This attribute is not used (to my
>>        knowledge) in kepler since we have our own icon set.
>>
>>        chad
>>
>>
>>        Josep Morer Muñoz wrote:
>>
>>            Hi all,
>>
>>            I have build a XML validator using JDOM. I am trying to
>> validate
>>            Kepler workflows using DTD file
>>            (http://ptolemy.eecs.berkeley.edu/xml/dtd/MoML_1.dtd) but in
>>            most of
>>            cases the validator fails when the workflow has parameters, for
>>            example 02-LoktaVolterraPrey.xml in getting-started
>>            directory. The
>>            validator fails because finds an unknown tag in XML file
>>            named *svg*.
>>
>>            For example:
>>
>>            <entity name="02-LotkaVolterraPredatorPrey"
>>            class="ptolemy.actor.TypedCompositeActor">
>>            <property name="_createdBy"
>>            class="ptolemy.kernel.attributes.VersionAttribute"
>>            value="7.0.2">
>>            </property>
>>            <property name="r" class="ptolemy.data.expr.Parameter"
>>            value="2">
>>            <property name="_hideName"
>>            class="ptolemy.kernel.util.SingletonAttribute">
>>            </property>
>>            <property name="_icon" class="ptolemy.vergil.icon.ValueIcon">
>>            </property>
>>            <property name="_smallIconDescription"
>>            class="ptolemy.kernel.util.SingletonConfigurableAttribute">
>>            <configure>
>>            <svg>
>>            <text x="20" style="font-size:14; font-family:SansSerif;
>>            fill:blue"
>>            y="20">-P-</text>
>>            </svg>
>>            </configure>
>>            </property>
>>            <property name="_editorFactory"
>>            class="ptolemy.vergil.toolbox.VisibleParameterEditorFactory">
>>            </property>
>>            <property name="_location" class="ptolemy.kernel.util.Location"
>>            value="410.0, 50.0">
>>            </property>
>>            </property>
>>            <property name="a" class="ptolemy.data.expr.Parameter"
>>            value="0.1">
>>            <property name="_icon" class="ptolemy.vergil.icon.ValueIcon">
>>            </property>
>>            <property name="_smallIconDescription"
>>            class="ptolemy.kernel.util.SingletonConfigurableAttribute">
>>            <configure>
>>            <svg>
>>            <text x="20" style="font-size:14; font-family:SansSerif;
>>            fill:blue"
>>            y="20">-P-</text>
>>            </svg>
>>            ...
>>
>>            This element (red highlighted) is not defined in DTD file.
>>            Can it be
>>            avoided? What does this tag mean?
>>
>>            Thanks for your help.
>>            --
>>            Josep
>>
>>
>>
>>  ------------------------------------------------------------------------
>>
>>            _______________________________________________
>>            Kepler-users mailing list
>>            Kepler-users at kepler-project.org
>>            <mailto:Kepler-users at kepler-project.org>
>>
>>
>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users
>>
>>
>>    --
>>    Christopher Brooks, PMP                       University of California
>>    CHESS Executive Director                      US Mail: 337 Cory Hall
>>    Programmer/Analyst CHESS/Ptolemy/Trust        Berkeley, CA 94720-1774
>>    ph: 510.643.9841 fax:510.642.2718             (Office: 545Q Cory)
>>    home: (F-Tu) 707.665.0131 cell: 707.332.0670
>>
>>    _______________________________________________
>>    Kepler-users mailing list
>>    Kepler-users at kepler-project.org <mailto:
>> Kepler-users at kepler-project.org>
>>
>>    http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users
>>
>>
>>
>>
>> _______________________________________________
>> Kepler-users mailing list
>> Kepler-users at kepler-project.org
>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nceas.ucsb.edu/kepler/pipermail/kepler-users/attachments/20100112/e1be6fb8/attachment.html>


More information about the Kepler-users mailing list