[eml-dev] [Fwd: [Bug 2512] New: - require text content in elements to be non-empty]

Wed Aug 16 09:34:40 PDT 2006

Thanks, Inigo.

I agree we won't prevent all of these issues.  But at least we can make 
it explicit that when we make an element required it is because we want 
real content there.

I also agree that we should discuss common situations where people have 
used this trick to avoid providing content and consider revising the 
requirements.  The 'precision' field was initially required but later 
made optional in EML 2.0.1 for this very reason.  Other fields may be in 
the same category.  However, I would note that we already struggle with 
automating data import with the existing set of required information, 
and so loosening up the requirements will negatively impact what we can 
do with the metadata.

Do you have any suggested fields to make optional that wouldn't 
adversely influence the effectiveness of the metadata?

Regards,
Matt

inigo san gil wrote:
> Hi,
> 
> It seems to me like a good attempt to curb the bad practice of leaving 
> empty content. However, I can see the workaround coming:
> 
> <mandatoryElement>
>     <mandatoryChild1>Unknown</mandatoryChild1>
>     <optionalChild2>Legit content</optionalChild2>
> </mandatoryElement>
> 
> But you have to try, and this stricter rule may encourage us to make an 
> extra effort to provide better content in the EML documents.
> 
> There are times where we have 80% of content of certain element (branch, 
> section) and we just have to create a bogus entry to provide the good 
> 80%. I am also inclined to relax certain rules in order not to lose this 
> valuable content. Perhaps we should study the few cases that happen more 
> often, and evaluate the necessary changes to the EML schema to 
> accommodate the existing metadata.
> 
> I like the NCEAS - ESA initiative of  the "Metacat moderator", which 
> oversees the EMLs for content approval before being harvested (accepted 
> in the repository). It will enable us to monitor bogus content or empty 
> tags, nonsenses and the like. That may pave the way to enrich metadata, 
> as well as a de facto mechanism for quality control - q. assurance.
> 
> Inigo
> 
> 
> 
> 
> Mark Servilla wrote:
> 
>>James,
>>
>>I agree with this change.  Although it is perfectly legal in XML to 
>>have pure whitespace between opening and closing tags, EML should not 
>>accept such content - it rather meaningless, and can through a curve 
>>ball to some parsers.  I don't know how many sites use this technique 
>>to get past required elements; and, I didn't look closely to see if 
>>this is the specific problem that plagued AND.
>>
>>Mark
>>
>>James W Brunt wrote:
>>
>>>Comments?
>>>
>>>-------- Original Message --------
>>>Subject: [eml-dev] [Bug 2512] New: - require text content in elements 
>>>to be    non-empty
>>>Date: Tue, 15 Aug 2006 10:44:48 -0700 (PDT)
>>>From: bugzilla-daemon at ecoinformatics.org
>>>To: eml-dev at ecoinformatics.org
>>>
>>>http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2512
>>>
>>>           Summary: require text content in elements to be non-empty
>>>           Product: EML
>>>           Version: 2.0.1
>>>          Platform: Other
>>>        OS/Version: All
>>>            Status: NEW
>>>          Severity: enhancement
>>>          Priority: P1
>>>         Component: eml - general bugs
>>>        AssignedTo: jones at nceas.ucsb.edu
>>>        ReportedBy: jones at nceas.ucsb.edu
>>>         QAContact: eml-dev at ecoinformatics.org
>>>
>>>
>>>Current EML schemas allow text content to be empty, which defeats 
>>>validation
>>>rules by allowing users to provide content such as:
>>>  <attributeName> </attributeName>
>>>I propose that these uses of empty strings should not be valid.  We 
>>>can acheive
>>>this by redefining the datatype we use for strings to have a minimum 
>>>length of
>>>1 and a pattern that requires some non-whitespace characters.
>>>
>>>In XML Schema, we can declare the element to be of type 
>>>eml:nonemptystring
>>>where eml:nonemptystring is a simple type derived from xs:string like 
>>>this:
>>><simpleType name='nonemptystring'>
>>>  <restriction base='string'>
>>>    <minLength value='1'/>
>>>    <pattern value='\s*(^\s)+.*'/>
>>>  </restriction>
>>></simpleType>
>>>
>>>I'm not sure if that regular expression quite gets what we want, but 
>>>it is
>>>close and would need some testing.  It is intended to sleect (zero or 
>>>more
>>>whitespace characters) followed by (one or more non-whitespace 
>>>characters)
>>>followed by (any additional characters).  We probably could remove 
>>>the plus
>>>symbol as its redundant with the subsequent .*
>>>_______________________________________________
>>>Eml-dev mailing list
>>>Eml-dev at ecoinformatics.org
>>>http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>
>>
> 
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matt Jones                                   Ph: 907-789-0496
jones at nceas.ucsb.edu                    SIP #: 1-747-626-7082
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara     http://www.nceas.ucsb.edu/ecoinformatics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~