[kepler-users] Workflow XML validator using JDOM
Edward A. Lee
eal at eecs.berkeley.edu
Wed Jan 13 08:54:20 PST 2010
On 1/12/10 11:52 PM, Matt Jones wrote:
> Yeah, under DTDs there were only limited ways to relax validation for
> just an element. You actually can use CDATA, because you can then
> include any characters that you want, including markup characters --
> none of it is parsed, so escaping is not needed. However, because it is
> not parsed, CDATA sections don't emit the same events during SAX parsing
> -- they are emitted as CDATA events, so the SAX parser would be unaware
> of the element start and end events that ptolemy probably uses now.
It can't be quite right that CDATA is not parsed because then the
end element would not be recognized. My recollection is that all markup
in CDATA must be escaped, but it was probably 8 years ago that I looked
at this...
The data inside <configure> ... </configure> is not used at all by
the MoML parser. It is simply collected and handed to the configure()
method of the enclosing object, which implements Configurable.
That object is expected to have its own parser, which can use its
own DTD.
>
> An alternative using DTDs is to set the element to ANY content, which
> allows arbitrary mixtures of element and PCDATA text, and any
> well-formed XML elements will satisfy the validator.
This seems worth a try. I don't recall this existing 8 years ago.
>
> Under XML Schema, there is a richer option to use a similar 'xs:any'
> element to define a content model that allows arbitrary content. It
> also allows you to set the validation policy for xs:any elements to one
> of strict (must have an associated schema for the element's namespace,
> and the content must validate), lax (a schema for the namespace is
> optional, but if provided the element must validate), or skip (any
> provided schemas are ignored for the namespace, and the validation stage
> is totally skipped when handling element content).
Right. Namespaces didn't exist 8 years ago.
Perhaps it's time to consider designing MoML 2.0 ?
However, it's hard for me to see a reason to give this high priority...
There are lots of other things to work on.
Edward
>
> For MoML, I don't see a major advantage of validating because the DTD
> does not encode a lot of the semantics of the language that is used by
> the ptolemy engine. For some other XML schemas we use, the typing is
> much more explicit and so validating is a fast way to be sure providers
> are producing reasonable content that matches the semantics expected by
> the application. But this wouldn't work so well with the ANY model in
> DTDs. The nice thing about processContents='lax' under XML schema is
> that you can have an xs:any element that allows arbitrary content, and
> that content can be validated when a schema for the namespace is
> provided, but it can also can be skipped when a schema is unavailable.
> We've used this in some of our metadata languages. For example, a
> reasonable definition for the configure element using XML Schema might be:
>
> <xs:element name="configure">
> <xs:complexType>
> <xs:sequence>
> <xs:any processContents='lax' />
> </xs:sequence>
> </xs:complexType>
> </xs:element>
> This would allow you to provide schemas for some content (e.g., for
> PlotML), or other times omit the schema and skip validation altogether.
> This could be useful in validating that models are legitimate and
> sensible. One major disadvantage is that validation is slow (especially
> compared to the lightweight, non-validating SAX parser ptolemy is using
> now), and so adding in validation to the MoML parsing would be bound to
> slow down the model processing. So I personally don't think it would be
> worth the added overhead.
>
> Matt
>
> On Tue, Jan 12, 2010 at 7:13 PM, Edward A. Lee <eal at eecs.berkeley.edu
> <mailto:eal at eecs.berkeley.edu>> wrote:
>
>
> The intent of the <configure> tag in the MoML DTD is to be able
> to include _foreign_ XML (with its own DTD) inside a MoML file.
> The plotter uses this (the DTD is PlotML), as do a number of other
> actors.
>
> At the time, there was no clean way to do this in XML. In my view,
> this was a major oversight/failing in XML. It might worth
> revisiting the issue to see whether there is a way to do it now
> that is "approved" XML.
>
> Note that using CDATA won't work, because (at least then) CDATA could
> not include _any_ XML. I suppose it might be possible if everything
> were escaped, but with nesting this could get pretty messy.
>
> So the first question I would ask is:
>
> Why do we care about the validator?
>
> If there is a really good reason, then this is worth revisiting.
> Otherwise, I would simply declare that we've improved XML but supporting
> heterogeneous schema (just as we've improved models of computation
> but supporting heterogeneous MoCs). In this case, the validator
> needs to be fixed :-)
>
> Edward
>
>
>
> On 1/12/10 12:35 PM, Matt Jones wrote:
>
> Hi Christopher,
>
> Actually, #PCDATA stands for *parsed* character data, and is
> explicitly
> the type of element content that is parsed text (for additional
> markup
> and entity substitutions, for example). If you want configure to be
> plain text and to not be parsed for markup, then it should be
> typed as
> CDATA content, which is still character data, but is not parsed. I
> think this has not been an issue because in general you don't
> validate
> the MoML instance documents against the DTD, but I suspect you
> will find
> a lot of validity errors if you did.
>
> Matt
>
> 2010/1/12 Christopher Brooks <cxh at eecs.berkeley.edu
> <mailto:cxh at eecs.berkeley.edu>
> <mailto:cxh at eecs.berkeley.edu <mailto:cxh at eecs.berkeley.edu>>>
>
>
> Hi Josep,
> I'm not an XML expert, but my understanding is that things
> inside a <configure> tag is #PCDATA, which should not be
> interpreted
> against the DTD?
>
> http://ptolemy.eecs.berkeley.edu/xml/dtd/MoML_1.dtd
> has:
> <!ELEMENT configure (#PCDATA)>
>
> For information about the configure tag, see page 220 of
> Volume 1 of
> the Ptolemy II Design Doc at
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-28.html:
> --start
> A second, much more flexible mechanism is provided for
> parameterizing entities. A configure
> element can be used to specify a relative or absolute URL
> pointing
> to a file that configures the entity,
> or it can be used to include the configuration information
> in line.
> That information need not be MoML
> information. It need not even be XML, and can even be binary
> encoded
> data (although binary data cannot
> be in line; it must be in an external file). For example,
> <entity name="sink" class="ptolemy.actor.lib.SequencePlotter">
> <configure source="url"/>
> </entity>
> Here, url can give the name of a file containing data, or a
> URL for
> a remote file. (For the Sequence-
> Plotter actor, that external data will have PlotML syntax;
> PlotML is
> another XML schema for configuring
> plotters.) Configure information can also be given in the
> body of
> the MoML file as follows:
>
> <entity name="sink" class="ptolemy.actor.lib.SequencePlotter">
> <configure>
> configure information
> </configure>
> </entity>
> With the above syntax, the configure information must be textual
> data. It can contain XML markup
> with only one restriction: if the tag “</configure>” appears
> in the
> textual data, then it must be preceeded
> by a matching “<configure>”. That is, any configure elements
> in the
> markup must have balanced
> start and end tags.
>
> You can give both a source attribute and in-line configuration
> information, as in the following:
>
> <entity name="sink" class="ptolemy.actor.lib.SequencePlotter">
> <configure source="url">
> configure information
> </configure>
> </entity>
> In this case, the file data will be passed to the
> application first,
> followed by the in-line configuration
> data.
> In Ptolemy II, the configure element is supported by any
> class that
> implements the Configurable
> interface. That interface defines a configure() method that
> accepts
> an input stream. Both external file
> data and in-line data are provided to the class as a character
> stream by calling this method.
> There is a subtle limitation with using markup within the
> configure
> element. If any of the elements
> within the configure element match MoML elements, then the
> MoML DTD
> will be applied to assign
> default values, if any, to their attributes. Thus, this
> mechanism
> works best if the markup within the
> configure element is not using an XML schema that happens to
> have
> element names that match those
> in MoML. Alternatively, if it does use MoML element names, then
> those elements are used with their
> MoML meaning. This limitation can be fixed using XML namespaces,
> something we will eventually
> implement.
> --end--
>
> My guess is that JDOM needs to be adjusted so that it does not
> try to interpret the #PCDATA? A quick google search did not
> bring
> up anything.
>
> You could also try running JDOM on a model that has a
> Ptolemy PlotML
> xml in a configure tag.
> For example,
> http://ptolemy.eecs.berkeley.edu/ptolemyII/ptIIlatest/ptII/ptolemy/domains/sdf/demo/Butterfly/Butterfly.xml
> has
>
> <entity name="XY Plotter" class="ptolemy.actor.lib.gui.XYPlotter">
> <property name="fillOnWrapup" class="ptolemy.data.expr.Parameter"
> value="true">
> </property>
> <property name="_windowProperties"
> class="ptolemy.actor.gui.WindowPropertiesAttribute"
> value="{bounds={258, 206, 500, 344}, maximized=false}">
> </property>
> <property name="_plotSize" class="ptolemy.actor.gui.SizeAttribute"
> value="[500, 300]">
> </property>
> <property name="startingDataset" class="ptolemy.data.expr.Parameter"
> value="0">
> </property>
> <property name="_location" class="ptolemy.kernel.util.Location"
> value="[488.5, 265.5]">
> </property>
> <configure>
> <?plotml <!DOCTYPE plot PUBLIC "-//UC Berkeley//DTD PlotML 1//EN"
> "http://ptolemy.eecs.berkeley.edu/xml/dtd/PlotML_1.dtd">
> <plot>
> <title></title>
> <xRange min="-2.4845557281711192" max="3.9293144658408363"/>
> <yRange min="-3.6505369457601655" max="3.6505369457601127"/>
> <noGrid/>
> </plot>?>
> </configure>
> </entity>
>
> It could be that it would be better if the dtd was declared in
> <svg>...</svg> example,
> but I'm no expert in this area. It could be that by
> definition, a
> validating parser
> tries to validate #PCDATA.
>
> _Christopher
>
>
>
>
>
> On 1/12/10 11:25 AM, Chad Berkley wrote:
>
> Hi Josep,
>
> I'll let Christopher Brooks explain why it's not in the
> DTD (the
> ptolemy
> project controls the MoML DTD). I can tell you that the
> <svg>
> element is
> used to set SVG icons in ptolemy. This attribute is not
> used (to my
> knowledge) in kepler since we have our own icon set.
>
> chad
>
>
> Josep Morer Muñoz wrote:
>
> Hi all,
>
> I have build a XML validator using JDOM. I am trying
> to validate
> Kepler workflows using DTD file
>
> (http://ptolemy.eecs.berkeley.edu/xml/dtd/MoML_1.dtd) but in
> most of
> cases the validator fails when the workflow has
> parameters, for
> example 02-LoktaVolterraPrey.xml in getting-started
> directory. The
> validator fails because finds an unknown tag in XML file
> named *svg*.
>
> For example:
>
> <entity name="02-LotkaVolterraPredatorPrey"
> class="ptolemy.actor.TypedCompositeActor">
> <property name="_createdBy"
> class="ptolemy.kernel.attributes.VersionAttribute"
> value="7.0.2">
> </property>
> <property name="r" class="ptolemy.data.expr.Parameter"
> value="2">
> <property name="_hideName"
> class="ptolemy.kernel.util.SingletonAttribute">
> </property>
> <property name="_icon" class="ptolemy.vergil.icon.ValueIcon">
> </property>
> <property name="_smallIconDescription"
>
> class="ptolemy.kernel.util.SingletonConfigurableAttribute">
> <configure>
> <svg>
> <text x="20" style="font-size:14; font-family:SansSerif;
> fill:blue"
> y="20">-P-</text>
> </svg>
> </configure>
> </property>
> <property name="_editorFactory"
>
> class="ptolemy.vergil.toolbox.VisibleParameterEditorFactory">
> </property>
> <property name="_location" class="ptolemy.kernel.util.Location"
> value="410.0, 50.0">
> </property>
> </property>
> <property name="a" class="ptolemy.data.expr.Parameter"
> value="0.1">
> <property name="_icon" class="ptolemy.vergil.icon.ValueIcon">
> </property>
> <property name="_smallIconDescription"
>
> class="ptolemy.kernel.util.SingletonConfigurableAttribute">
> <configure>
> <svg>
> <text x="20" style="font-size:14; font-family:SansSerif;
> fill:blue"
> y="20">-P-</text>
> </svg>
> ...
>
> This element (red highlighted) is not defined in DTD
> file.
> Can it be
> avoided? What does this tag mean?
>
> Thanks for your help.
> --
> Josep
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Kepler-users mailing list
> Kepler-users at kepler-project.org
> <mailto:Kepler-users at kepler-project.org>
> <mailto:Kepler-users at kepler-project.org
> <mailto:Kepler-users at kepler-project.org>>
>
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users
>
>
> --
> Christopher Brooks, PMP University of
> California
> CHESS Executive Director US Mail: 337
> Cory Hall
> Programmer/Analyst CHESS/Ptolemy/Trust Berkeley, CA
> 94720-1774
> ph: 510.643.9841 fax:510.642.2718 (Office: 545Q
> Cory)
> home: (F-Tu) 707.665.0131 cell: 707.332.0670
>
> _______________________________________________
> Kepler-users mailing list
> Kepler-users at kepler-project.org
> <mailto:Kepler-users at kepler-project.org>
> <mailto:Kepler-users at kepler-project.org
> <mailto:Kepler-users at kepler-project.org>>
>
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users
>
>
>
>
> _______________________________________________
> Kepler-users mailing list
> Kepler-users at kepler-project.org
> <mailto:Kepler-users at kepler-project.org>
> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eal.vcf
Type: text/x-vcard
Size: 351 bytes
Desc: not available
URL: <http://lists.nceas.ucsb.edu/kepler/pipermail/kepler-users/attachments/20100113/e4a3b54f/attachment.vcf>
More information about the Kepler-users
mailing list