[eml-dev] [Bug 3480] New: - duplicate distribution types (in resource.xsd, and physical.xsd)
Matt Jones
jones at nceas.ucsb.edu
Fri Aug 29 12:43:55 PDT 2008
I agree that there should only be one type. It's really not clear to me why
there are two, especially if they are indeed identical type definitions. If
that is the case, I would argue for eliminating one of them. It was
probably an oversight.
A separate issue is why there are two distribution elements, one as a child
of the resource classes, and one as a child of the entity classes. This was
done on purpose, and after significant discussion. Originally, the only
place that we had distribution was at the entity level so that both people
and software agents could download the exact data files that are associated
with an entity description. Because many people wanted to omit the entity
description altogether, there was a lot of support for adding a "top-level"
distribution element that could direct someone where to get more information
about a resource, and possibly to download it. However, it was recognized
that software agents couldn't really use this effectively because it would
never be clear exactly what one would find at that URI (one of the data
entities, all of them in some sort of archive, metadata about the
resource...it just wasn't clear because there is no physical description at
the resource level). So, we decided on the following guidelines.
The top-level distribution element is used to provide distribution
information about the overall resource (e.g., a dataset containing one or
more entities), but it is undefined in terms of exactly what will be
returned from a URI in that distribution section. For some people it might
be a web page describing the resource, for others it might be a login page
to authenticate people or collect usage information before download, and for
others it might be a zip of all of the data files, etc. Thus, it was
recommended that people set their "url" function attribute to 'information'
to indicate that this is a URI that provides, generically, more information
about the resource, but in no particular physical format.
At the entity-level, however, the intent was to provide URIs that allowed
download of the data streams for each individual entity, and that the byte
stream received from that URI should exactly match the description in
eml-physical. So, if eml-physical says the object is 67KB, ascii-delimited
text with \n line endings, that's exactly what should be returned from the
URI, and the url "function" should be set to download. If for some reason
the actual bytes returned don't match what is described in eml-physical,
then the URL is really an informational URL, and software agents shouldn't
use it for automated processing. It was intended that, for the most part,
people would use 'informational' URLs at the resource distribution level and
'download' URLs at the entity distribution level, and we expected that
software agents would generally only be able to deal with the distribution
elements at the entity level because they needed the physical description of
the entity to do anything useful.
Hope this clarifies. The eml-dev archives contain a wealth of discussions
on this topic if you're interested.
Matt
On Fri, Aug 29, 2008 at 11:21 AM, inigo <isangil at lternet.edu> wrote:
>
> as for why are there two "types", is something that escapes me. One type
> should have been enough,
> either "PhysicalDistributionType" or "DistributionType", no need for both
> as they are the same, and
> provide same (but contextual, as explained by David) functionality. In
> that regard, it could be regarded
> as a bug --one type would have done the trick-
>
> I am with David about the intention of the designers/developers; that view
> has actually materialized
> for many sites at LTER that have provided their metadata to the larger
> community. Some LTER
> sites use the distribution element at the resource (general) level.
>
> Some of these cited (but unnamed) sites that have level 5 eml, may also put
> a reference to the
> <distribution> in the //entity/physical/distribution, or duplicate the
> group altogether.
> Some sites do not have level 5 (entity level) so they make use of the
> general resource <distribution>.
> Other thing is where that URL points to. a data catalog? a broken link? a
> screen to capture personal data?
> In about all cases for those documents, this distribution tag contains a
> URL, no "DataBase connection" used.
> which makes me wonder about the practical use of the DB metadata
> placeholders within that group.
> Perhaps a <distribution> tag customized to the GIS entities (multiple shape
> files, etc) is in more
> demand today that the cited database-oriented connectivity & query guide
> tags.
>
> The use of <distribution> at the resource level tends to happen more in the
> case when the site
> uses EML as a "splitter" -- one dataset-one entity (entity is one of
> dataTable, raster, vector or otherEntity)
>
> For those of you lumping entities in one EML, the distribution within the
> entity element is used
> (or shoulda been used), and perhaps the use of the resource <distribution>
> element is irrelevant.
> However, some people have come up with creative ways to use these two
> placeholders -like
> the general to a generic URL, and the entity distribution pointing to the
> actual dataset.
>
> cheers, inigo
>
> David Blankman wrote:
>
>> Margaret,
>>
>> I am trying to remeber the rational for having distribution in two places.
>> I
>> think that we did that expecting that lots of people would not use
>> eml-Physical, so it made sense to have it in resource. The distribution
>> information in Resource was intended, I think, for human consumption while
>> the element in eml-physical was meant primarily for software agents.
>>
>> I am not making a case for two versions, but rather trying to remember the
>> rational. I don't trust my memory so I could be wrong, but my explanation,
>> at least, seems logical.
>>
>> David
>>
>> David Blankman
>> Director of Information Management, Israel LTER/Ma'arag
>> Mitrani Department of Desert Ecology
>> Jacob Blaustein Desert Research Institute
>> Ben Gurion University
>> Midreshet Ben Gurion, 84990 Israel
>> 972-54-685-9345 (cell)
>> 1-505-349-5680 (Skype)
>>
>>
>>
>> On Fri, Aug 29, 2008 at 9:08 PM, <bugzilla-daemon at ecoinformatics.org>
>> wrote:
>>
>>
>>
>>> http://bugzilla.ecoinformatics.org/show_bug.cgi?id=3480
>>>
>>> Summary: duplicate distribution types (in resource.xsd, and
>>> physical.xsd)
>>> Product: EML
>>> Version: 2.0.1
>>> Platform: Other
>>> OS/Version: All
>>> Status: NEW
>>> Severity: normal
>>> Priority: P2
>>> Component: eml - general bugs
>>> AssignedTo: jones at nceas.ucsb.edu
>>> ReportedBy: mob at icess.ucsb.edu
>>> QAContact: eml-dev at ecoinformatics.org
>>>
>>>
>>> In examining <distribution>: there are currently 2 Distribution Types,
>>> which
>>> are identical:
>>>
>>> 1. eml-physical.xsd: <xs:complexType name="PhysicalDistributionType">
>>>
>>> used by:
>>> eml-physical.xsd:1169: <xs:element name="distribution"
>>> type="PhysicalDistributionType"
>>>
>>> and
>>> 2. eml-resource.xsd: <xs:complexType name="DistributionType">
>>>
>>> used in these schemas:
>>> eml-resource.xsd:364: <xs:element name="distribution"
>>> type="DistributionType" minOccurs="0" maxOccurs="unbounded">
>>> eml-software.xsd:129: <xs:element name="distribution"
>>> type="res:DistributionType" maxOccurs="unbounded"> (this one is a child
>>> of
>>> <implementation>)
>>>
>>> I cant tell why this is the case. What was the rationale for having 2?
>>>
>>> Also note that bug #1154 only mentions one of these,
>>> PhysicalDistributionType
>>> _______________________________________________
>>> Eml-dev mailing list
>>> Eml-dev at ecoinformatics.org
>>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>>
>>>
>>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Eml-dev mailing list
>> Eml-dev at ecoinformatics.org
>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>>
>>
>
> _______________________________________________
> Eml-dev mailing list
> Eml-dev at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/eml-dev
>
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matthew B. Jones
Director of Informatics Research and Development
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara
jones at nceas.ucsb.edu Ph: 1-907-523-1960
http://www.nceas.ucsb.edu/ecoinfo
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/eml-dev/attachments/20080829/079e9d74/attachment-0001.html>
More information about the Eml-dev
mailing list