[kepler-dev] [Bug 4104] - Need resource manager to handle objects in the resources directories

bugzilla-daemon at ecoinformatics.org bugzilla-daemon at ecoinformatics.org
Tue May 26 12:53:09 PDT 2009


http://bugzilla.ecoinformatics.org/show_bug.cgi?id=4104





------- Comment #2 from welker4kepler at gmail.com  2009-05-26 12:53 -------
Interesting ideas!

I want to point out a couple of things. First of all, we already have most of
this functionality. If you have foo.png in common/resources/configs/foo.xml
then you can reference this from the class path using the relative path
"configs/foo.xml." I think that having the relative part of the path is a
feature and not a bug, since otherwise it would be extremely easy for tricky
bugs to arise where something is accidentally overridden when it is not
intended to be. Such a bug could be extremely challenging to find. I think
having and requiring a relative path (keeping in mind that the relative path
could be nothing -- that is common/resources/foo.xml could be referenced just
by "foo.xml" actually does not impose very much cost at all, but adds quite a
bit of safety in terms of making the code much more robust from accidental
changes.

If you aren't convinced of the potential for such bugs, imagine this scenario.
Someone drops "foo.xml" in one of the many directories that we specify as being
a "root" directory to read from. Perhaps they do not even realize this is a one
of those canonical directories. The call ResourceManager.getResource("foo.xml")
now unintentionally reads in the wrong resource (in the original module,
"foo.xml" was stored and read from "resources/configs" but in the higher
priority module there is a different "foo.xml" stored in "resources/data." Most
people are probably not going to expect the addition of "foo.xml" to
"resources/data" to break Kepler, especially as there is no
"resources/data/foo.xml" being overridden in a lower priority module. So,
chances are, this person is going to be looking for problems in the code
associated with their own module for quite some time, when the real culprit
lies in the way that code from a lower priority module is interacting with an
unexpected "foo.xml." And the irony is that the more "readable" foo.xml is to
the lower priority module, the harder the bug is to find. The more that the new
foo.xml is just nonsense in the context of the lower priority module, the
easier the bug is going to be to find. But, one can imagine some very subtle
bugs arising from reading in the wrong resources, especially when those
resources share many common features like XML files tend to do.

Basically, the principle I want to advocate here is that resource overrides
should be obvious and automatically detectable by the build system. But, if
there are multiple "roots" and multiple ways of reading in resources, then it
will be impossible to detect whether two instances of "foo.xml" that exist in
different relative subdirectories of "resources" are overrides or not.
Alternatively, if resources are read in from their relative paths, all resource
overrides can be detected by the build system and reported to the user and
otherwise managed. This alone will save many developer hours of debugging time.

A second issue. Any references to modules by name in the code will render our
code much more fragile. I think it should be avoided. I would even go so far as
to say that I would greatly prefer that module names were NEVER referenced in
either code or in properties.

Let us say you have a reference to ResourceManager.getResource("common",
"images/kepler-about.png"). This code will break as soon as common is published
so that common is now named common-2.0, for example. Furthermore, if a
developer wants to change the behavior of the system so that instead of
"common/resources/images/kepler-about.png" being read,
"foo-module/resources/images/kepler-about.png" is read instead, they will not
be able to override the resource. Instead, they will have to override the
entire base Kepler class that references the resource, all so that the call to
ResourceManager.getResource("common", images/kepler-about.png") can be either
changed to ResourceManger.getResource("images/kepler-about.png") or less
robustly ResourceManager.getResource("foo-module", "images/kepler-about.png").
Of course, this will lead to the risk of code drift. And it is completely
unnecessary, as this override was introduced for a trivial rather than
fundamental reason.

What if in the future, we want to refactor the common module? Perhaps we find
that there are common patterns in our distributions, such that common would
usefully be broken up for different contexts? Well, such refactoring is going
to be much more difficult, because now we have to update all the references to
the common module in our code. Note only that, since we would be implicitly
encouraging the use of references to the "common" module, even if we fixed all
of our code so that it could be refactored, we are likely to break a whole lot
of code in other modules that has come to depend on the common module.

We have discussed and debated the fact that a module can make references to
code in lower priority modules, but not in higher priority modules. This is a
feature, in that just by looking at modules.txt, you already know a lot about
the dependencies between the modules. This is a bug, in that if we allow cyclic
dependencies it may (or may not) make the task of break util into more modules
somewhat easier. Well, if you are going to reference specific resources from
code, you can just forget it. There is nothing stopping a developer of a lower
priority module from referencing a higher priority module and thus creating an
implicit and hidden dependency between them. Whatever information that can be
gleaned about code dependencies just from looking at modules.txt would be
largely rendered uncertain. 

So, the second principle I would like to advocate is this. All resources in the
core modules should be read off the classpath or through the build system as an
intermediary and never directly off the file system if such a reference
involves coding a reference to a specific module in your code. In general,
module names should never be referenced by either the code or by any resources
or by system properties. Except by the build system, which is specifically
designed to handle such references. In this way, our code will never become
dependent on the particular module names we have chosen, and we will always
have maximum flexibility to much more easily refactor modules as we see fit.

There are already cases where this second principle is violated. By me. For
example, in the ppod suite (which include ppod, ppod-actors, ppod-gui, and
provenance-apps) system properties reference common and other modules. This
causes all sorts of problems when publishing, because the reference to, for
example, common, becomes out of date when common is renamed to common-2.0a1
when published. Either referencing the file system or module names directly is
not a good idea, especially since files are just as easy to read off the
classpath. 

Anyway, I am not against the idea of making a resource manager. It could make
reading files off the classpath even more easy than it is already is, by for
example, producing a BufferedReader or PrintWriter for any reference to a
resource so that developers do not even have to think about how one goes about
reading and writing files that are found on the classpath. (Although this is
probably a skill that any Java developer should learn.) But, such a resource
manager should not allow the users of that manager to reference specific
modules.


More information about the Kepler-dev mailing list