[eml-dev] [Fwd: [Bug 2512] New: - require text content in elements to be non-empty]
inigo san gil
isangil at lternet.edu
Wed Aug 16 08:54:46 PDT 2006
Certainly, Kepler and emlParsers of the like will break when the content
expected is other than a space filler such as "unknown". But then,
discarding that good metadata, will produce the unwanted effect that may
be lost forever. And then there will be a chance that Kepler, or a human
person will deem the entire data set incomplete. So it is a mixed picture.
I do not really know what is best. To enforce this easy to defeat rule
may help just a bit. Does it hurt to place this new rule, then? I am not
sure. I guess the answer will depend on the potential metadata loss. I
vote for the Metadata moderator, instead!
Addressing the general question of "EML metadata rules workarounds,
schema relaxations and QA/QC" seems a good topic for the IM meetings.
Dont forget your boxing gloves :)
Inigo
>
> Mark Servilla wrote:
>> But, the bogus content can potentially cause underlying problems with
>> systems like kepler and perhaps even trends. I am in favor of not
>> allowing bogus content, but relaxing some of the required EML
>> content. At least the latter can be checked easily by a parser or
>> exploiting application.
>>
>> inigo san gil wrote:
>>>
>>> yes, that's what i thought, that bogus content will make up for the
>>> new constraint.
>>>
>>> BUT for a reason many times! sometimes you may want to add a little
>>> bogus/empty content to be able to provide quite a bit of legit,
>>> usable content. i think im guilty as charged of the "unknown"
>>> content as means to provide good metadata that otherwise had to be
>>> ignored. I actually prefer to provide the content in the right
>>> places than in <additionalMetadata> because it is easier to parse,
>>> both humanly and programatically.
>>>
>>> id go in a case-by-case basis, using the metacat moderator approach
>>> --documents have to pass by the metadata cop before they are
>>> harvested-- the application is in beta, but is working, and looks
>>> pretty neat. i saw a demo while in SB, while you where at the SEV
>>> discussing TRENDS.
>>>
>>> Inigo
>>>
>>> Mark Servilla wrote:
>>>> See below...
>>>>
>>>> Your are right, it somewhat of a hack. And, the content can simply
>>>> be fake for the same effect. I'm not sure if this new rule
>>>> prevents some other regular errors when sites have automated EML
>>>> generators - permits catching some type of lazy error?
>>>>
>>>> Mark
>>>>
>>>> James W Brunt wrote:
>>>>> What's to stop those people from simply putting in placeholders *-
>>>>> INIGO* ? The constraint seems a bit of a hack. :-|
>>>>>
>>>>> James
>>>>>
>>>>> Mark Servilla wrote:
>>>>>> James,
>>>>>>
>>>>>> I agree with this change. Although it is perfectly legal in XML
>>>>>> to have pure whitespace between opening and closing tags, EML
>>>>>> should not accept such content - it rather meaningless, and can
>>>>>> through a curve ball to some parsers. I don't know how many
>>>>>> sites use this technique to get past required elements; and, I
>>>>>> didn't look closely to see if this is the specific problem that
>>>>>> plagued AND.
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>> James W Brunt wrote:
>>>>>>> Comments?
>>>>>>>
>>>>>>> -------- Original Message --------
>>>>>>> Subject: [eml-dev] [Bug 2512] New: - require text content in
>>>>>>> elements to be non-empty
>>>>>>> Date: Tue, 15 Aug 2006 10:44:48 -0700 (PDT)
>>>>>>> From: bugzilla-daemon at ecoinformatics.org
>>>>>>> To: eml-dev at ecoinformatics.org
>>>>>>>
>>>>>>> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2512
>>>>>>>
>>>>>>> Summary: require text content in elements to be
>>>>>>> non-empty
>>>>>>> Product: EML
>>>>>>> Version: 2.0.1
>>>>>>> Platform: Other
>>>>>>> OS/Version: All
>>>>>>> Status: NEW
>>>>>>> Severity: enhancement
>>>>>>> Priority: P1
>>>>>>> Component: eml - general bugs
>>>>>>> AssignedTo: jones at nceas.ucsb.edu
>>>>>>> ReportedBy: jones at nceas.ucsb.edu
>>>>>>> QAContact: eml-dev at ecoinformatics.org
>>>>>>>
>>>>>>>
>>>>>>> Current EML schemas allow text content to be empty, which
>>>>>>> defeats validation
>>>>>>> rules by allowing users to provide content such as:
>>>>>>> <attributeName> </attributeName>
>>>>>>> I propose that these uses of empty strings should not be valid.
>>>>>>> We can acheive
>>>>>>> this by redefining the datatype we use for strings to have a
>>>>>>> minimum length of
>>>>>>> 1 and a pattern that requires some non-whitespace characters.
>>>>>>>
>>>>>>> In XML Schema, we can declare the element to be of type
>>>>>>> eml:nonemptystring
>>>>>>> where eml:nonemptystring is a simple type derived from xs:string
>>>>>>> like this:
>>>>>>> <simpleType name='nonemptystring'>
>>>>>>> <restriction base='string'>
>>>>>>> <minLength value='1'/>
>>>>>>> <pattern value='\s*(^\s)+.*'/>
>>>>>>> </restriction>
>>>>>>> </simpleType>
>>>>>>>
>>>>>>> I'm not sure if that regular expression quite gets what we want,
>>>>>>> but it is
>>>>>>> close and would need some testing. It is intended to sleect
>>>>>>> (zero or more
>>>>>>> whitespace characters) followed by (one or more non-whitespace
>>>>>>> characters)
>>>>>>> followed by (any additional characters). We probably could
>>>>>>> remove the plus
>>>>>>> symbol as its redundant with the subsequent .*
>>>>>>> _______________________________________________
>>>>>>> Eml-dev mailing list
>>>>>>> Eml-dev at ecoinformatics.org
>>>>>>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Mark Servilla, Ph.D.
>>>>
>>>> LTER Network Office
>>>> Department of Biology
>>>> MSC 03 2020
>>>> 1 University of New Mexico
>>>> Albuquerque, NM 87131-0001
>>>>
>>>> servilla at lternet.edu
>>>> Office (505) 277-2619
>>>> Cell (505) 453-8593
>>>
>>
>
>
More information about the Eml-dev
mailing list