[eml-dev] [Fwd: [Bug 2512] New: - require text content in elements to be non-empty]

Wed Aug 16 08:54:46 PDT 2006

Certainly, Kepler and emlParsers of the like will break when the content 
expected is other than a space filler such as "unknown". But then, 
discarding that good metadata, will produce the unwanted effect that may 
be lost forever. And then there will be a chance that Kepler, or a human 
person will deem the entire data set incomplete. So it is a mixed picture.

I do not really know what is best. To enforce this easy to defeat rule 
may help just a bit. Does it hurt to place this new rule, then? I am not 
sure. I guess the answer will depend on the potential metadata loss. I 
vote for the Metadata moderator, instead!

Addressing the general question of  "EML metadata rules workarounds, 
schema relaxations and QA/QC" seems a good topic for the IM meetings. 
Dont forget your boxing gloves :)

Inigo
>
> Mark Servilla wrote:
>> But, the bogus content can potentially cause underlying problems with 
>> systems like kepler and perhaps even trends.  I am in favor of not 
>> allowing bogus content, but relaxing some of the required EML 
>> content.  At least the latter can be checked easily by a parser or 
>> exploiting application.
>>
>> inigo san gil wrote:
>>>
>>> yes, that's what i thought, that bogus content will make up for the 
>>> new constraint.
>>>
>>> BUT for a reason many times! sometimes you may want to add a little 
>>> bogus/empty content to be able to provide quite a bit of legit, 
>>> usable content. i think im guilty as charged of the "unknown" 
>>> content as means to provide good metadata that otherwise had to be 
>>> ignored. I actually prefer to provide the content in the right 
>>> places than in <additionalMetadata> because it is easier to parse, 
>>> both humanly and programatically.
>>>
>>> id go in a case-by-case basis, using the metacat moderator approach 
>>> --documents have to pass by the metadata cop before they are 
>>> harvested-- the application is in beta, but is working, and looks 
>>> pretty neat. i saw a demo while in SB, while you where at the SEV 
>>> discussing TRENDS.
>>>
>>> Inigo
>>>
>>> Mark Servilla wrote:
>>>> See below...
>>>>
>>>> Your are right, it somewhat of a hack.  And, the content can simply 
>>>> be fake for the same effect.  I'm not sure if this new rule 
>>>> prevents some other regular errors when sites have automated EML 
>>>> generators - permits catching some type of lazy error?
>>>>
>>>> Mark
>>>>
>>>> James W Brunt wrote:
>>>>> What's to stop those people from simply putting in placeholders *- 
>>>>> INIGO* ? The constraint seems a bit of a hack. :-|
>>>>>
>>>>> James
>>>>>
>>>>> Mark Servilla wrote:
>>>>>> James,
>>>>>>
>>>>>> I agree with this change.  Although it is perfectly legal in XML 
>>>>>> to have pure whitespace between opening and closing tags, EML 
>>>>>> should not accept such content - it rather meaningless, and can 
>>>>>> through a curve ball to some parsers.  I don't know how many 
>>>>>> sites use this technique to get past required elements; and, I 
>>>>>> didn't look closely to see if this is the specific problem that 
>>>>>> plagued AND.
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>> James W Brunt wrote:
>>>>>>> Comments?
>>>>>>>
>>>>>>> -------- Original Message --------
>>>>>>> Subject: [eml-dev] [Bug 2512] New: - require text content in 
>>>>>>> elements to be    non-empty
>>>>>>> Date: Tue, 15 Aug 2006 10:44:48 -0700 (PDT)
>>>>>>> From: bugzilla-daemon at ecoinformatics.org
>>>>>>> To: eml-dev at ecoinformatics.org
>>>>>>>
>>>>>>> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2512
>>>>>>>
>>>>>>>            Summary: require text content in elements to be 
>>>>>>> non-empty
>>>>>>>            Product: EML
>>>>>>>            Version: 2.0.1
>>>>>>>           Platform: Other
>>>>>>>         OS/Version: All
>>>>>>>             Status: NEW
>>>>>>>           Severity: enhancement
>>>>>>>           Priority: P1
>>>>>>>          Component: eml - general bugs
>>>>>>>         AssignedTo: jones at nceas.ucsb.edu
>>>>>>>         ReportedBy: jones at nceas.ucsb.edu
>>>>>>>          QAContact: eml-dev at ecoinformatics.org
>>>>>>>
>>>>>>>
>>>>>>> Current EML schemas allow text content to be empty, which 
>>>>>>> defeats validation
>>>>>>> rules by allowing users to provide content such as:
>>>>>>>   <attributeName> </attributeName>
>>>>>>> I propose that these uses of empty strings should not be valid.  
>>>>>>> We can acheive
>>>>>>> this by redefining the datatype we use for strings to have a 
>>>>>>> minimum length of
>>>>>>> 1 and a pattern that requires some non-whitespace characters.
>>>>>>>
>>>>>>> In XML Schema, we can declare the element to be of type 
>>>>>>> eml:nonemptystring
>>>>>>> where eml:nonemptystring is a simple type derived from xs:string 
>>>>>>> like this:
>>>>>>> <simpleType name='nonemptystring'>
>>>>>>>   <restriction base='string'>
>>>>>>>     <minLength value='1'/>
>>>>>>>     <pattern value='\s*(^\s)+.*'/>
>>>>>>>   </restriction>
>>>>>>> </simpleType>
>>>>>>>
>>>>>>> I'm not sure if that regular expression quite gets what we want, 
>>>>>>> but it is
>>>>>>> close and would need some testing.  It is intended to sleect 
>>>>>>> (zero or more
>>>>>>> whitespace characters) followed by (one or more non-whitespace 
>>>>>>> characters)
>>>>>>> followed by (any additional characters).  We probably could 
>>>>>>> remove the plus
>>>>>>> symbol as its redundant with the subsequent .*
>>>>>>> _______________________________________________
>>>>>>> Eml-dev mailing list
>>>>>>> Eml-dev at ecoinformatics.org
>>>>>>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev 
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> -- 
>>>> Mark Servilla, Ph.D.
>>>>
>>>> LTER Network Office
>>>> Department of Biology
>>>> MSC 03 2020
>>>> 1 University of New Mexico
>>>> Albuquerque, NM 87131-0001
>>>>
>>>> servilla at lternet.edu
>>>> Office (505) 277-2619
>>>> Cell   (505) 453-8593
>>>
>>
>
>