[eml-dev] [Fwd: [Bug 2512] New: - require text content in elements to be non-empty]
inigo san gil
isangil at lternet.edu
Wed Aug 16 10:10:18 PDT 2006
Matt Jones wrote:
> Thanks, Inigo.
>
> I agree we won't prevent all of these issues. But at least we can
> make it explicit that when we make an element required it is because
> we want real content there.
>
> I also agree that we should discuss common situations where people
> have used this trick to avoid providing content and consider revising
> the requirements. The 'precision' field was initially required but
> later made optional in EML 2.0.1 for this very reason. Other fields
> may be in the same category. However, I would note that we already
> struggle with automating data import with the existing set of required
> information, and so loosening up the requirements will negatively
> impact what we can do with the metadata.
>
> Do you have any suggested fields to make optional that wouldn't
> adversely influence the effectiveness of the metadata?
>
i have some issues in the spatialRaster branch, mostly to create a
better crosswalk with the other big standard FGDC and related profiles),
and I'll be happy to share with you all. I talked to Mark Servilla about
some of these, and we arrived a compromise to savage a lot of metadata
making one or two "unknown" entries.
.. but here is one that comes very often:
eml/dataset/dataTable/attributeList/attribute/missingValueCode/codeExplanation
--
i frequently use the content "none given", (that means that no less
than 5 LTER sites and all their EML docs do use it) as many times there
is no way to understand why the value is missing, and for sure it is not
documented, or the documentation for the common "code" for "missing
values" does not apply to ALL missing values. frequently there are notes
that say when "a bear ate the instrumentation" or the "probe was most
likely stolen" or, very important, the "pi/creator/data owner deemed the
data recordings as not good" (this last one is perhaps troublesome, as
these decisions are often not documented, or there is some subjective
criteria to do so).
making <codeExplanation> optional or leaving it as is makes little
difference to me, and i doubt that the content is used programmatically
for anything crucial.
cheers, inigo
> Regards,
> Matt
>
> inigo san gil wrote:
>> Hi,
>>
>> It seems to me like a good attempt to curb the bad practice of
>> leaving empty content. However, I can see the workaround coming:
>>
>> <mandatoryElement>
>> <mandatoryChild1>Unknown</mandatoryChild1>
>> <optionalChild2>Legit content</optionalChild2>
>> </mandatoryElement>
>>
>> But you have to try, and this stricter rule may encourage us to make
>> an extra effort to provide better content in the EML documents.
>>
>> There are times where we have 80% of content of certain element
>> (branch, section) and we just have to create a bogus entry to provide
>> the good 80%. I am also inclined to relax certain rules in order not
>> to lose this valuable content. Perhaps we should study the few cases
>> that happen more often, and evaluate the necessary changes to the EML
>> schema to accommodate the existing metadata.
>>
>> I like the NCEAS - ESA initiative of the "Metacat moderator", which
>> oversees the EMLs for content approval before being harvested
>> (accepted in the repository). It will enable us to monitor bogus
>> content or empty tags, nonsenses and the like. That may pave the way
>> to enrich metadata, as well as a de facto mechanism for quality
>> control - q. assurance.
>>
>> Inigo
>>
>>
>>
>>
>> Mark Servilla wrote:
>>
>>> James,
>>>
>>> I agree with this change. Although it is perfectly legal in XML to
>>> have pure whitespace between opening and closing tags, EML should
>>> not accept such content - it rather meaningless, and can through a
>>> curve ball to some parsers. I don't know how many sites use this
>>> technique to get past required elements; and, I didn't look closely
>>> to see if this is the specific problem that plagued AND.
>>>
>>> Mark
>>>
>>> James W Brunt wrote:
>>>
>>>> Comments?
>>>>
>>>> -------- Original Message --------
>>>> Subject: [eml-dev] [Bug 2512] New: - require text content in
>>>> elements to be non-empty
>>>> Date: Tue, 15 Aug 2006 10:44:48 -0700 (PDT)
>>>> From: bugzilla-daemon at ecoinformatics.org
>>>> To: eml-dev at ecoinformatics.org
>>>>
>>>> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2512
>>>>
>>>> Summary: require text content in elements to be non-empty
>>>> Product: EML
>>>> Version: 2.0.1
>>>> Platform: Other
>>>> OS/Version: All
>>>> Status: NEW
>>>> Severity: enhancement
>>>> Priority: P1
>>>> Component: eml - general bugs
>>>> AssignedTo: jones at nceas.ucsb.edu
>>>> ReportedBy: jones at nceas.ucsb.edu
>>>> QAContact: eml-dev at ecoinformatics.org
>>>>
>>>>
>>>> Current EML schemas allow text content to be empty, which defeats
>>>> validation
>>>> rules by allowing users to provide content such as:
>>>> <attributeName> </attributeName>
>>>> I propose that these uses of empty strings should not be valid. We
>>>> can acheive
>>>> this by redefining the datatype we use for strings to have a
>>>> minimum length of
>>>> 1 and a pattern that requires some non-whitespace characters.
>>>>
>>>> In XML Schema, we can declare the element to be of type
>>>> eml:nonemptystring
>>>> where eml:nonemptystring is a simple type derived from xs:string
>>>> like this:
>>>> <simpleType name='nonemptystring'>
>>>> <restriction base='string'>
>>>> <minLength value='1'/>
>>>> <pattern value='\s*(^\s)+.*'/>
>>>> </restriction>
>>>> </simpleType>
>>>>
>>>> I'm not sure if that regular expression quite gets what we want,
>>>> but it is
>>>> close and would need some testing. It is intended to sleect (zero
>>>> or more
>>>> whitespace characters) followed by (one or more non-whitespace
>>>> characters)
>>>> followed by (any additional characters). We probably could remove
>>>> the plus
>>>> symbol as its redundant with the subsequent .*
>>>> _______________________________________________
>>>> Eml-dev mailing list
>>>> Eml-dev at ecoinformatics.org
>>>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>>
>>>
>>
>> _______________________________________________
>> Eml-dev mailing list
>> Eml-dev at ecoinformatics.org
>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
More information about the Eml-dev
mailing list