[kepler-dev] A very thorny question..

Wed Mar 28 15:18:06 PDT 2007

I imagine this will not be an easy question to answer. And, I may get  
different opinions from different people on how to approach the  
problem. But, before I ask it, let me first take a few minutes to  
frame the discussion. These questions are technical and are related to  
infrastructure. I used the nightly build kepler20070325 in this  
discussion. However, I saw the same problems on Beta3.

As a new user to Kepler, I've been going through the "Getting Started  
Guide." I am trying to build an environment for Biologist I work with  
(and also as a project that I will need to write a classroom paper on)  
that will be so easy to use it can easily replace BioPerl. After  
spending a semester's work evaluating Wildfire, Infosense, the Apple  
Automator, and even the Lego Mindstorms programmable block  
environment, I found Kepler. It's *exactly* the framework I needed.  
And, my purpose: To build an environment that is clean to use, has few  
confusing messages, and a Biologist with little programming would not  
be intimidated with. From the papers I read here, I know many  
Biologists use Kepler as is. The Biologists I work with, however feel  
intimidated by confusing messages and non-intuitive interfaces. They  
want to get on with the Biology and not bogged down by the tools.

While going through the "Getting Started Guide," I found it to be well  
written and easy for me to use. I thought, "Gosh, this is almost not a  
draft." I have made many notes about small things like parallelism,  
missed words, etc. But, then, I discovered that the basic problems I  
had from the beginning were all related to the same core situation.

The only "real" problem with the Getting Started Guide was that many  
of the examples didn't work. I thought that would be fixed once Kepler  
was out of Beta. I now no longer believe this to be the case. I  
believe the core problem is related to the interoperability between  
systems and would like to know if several of the examples in the  
Getting Started Guide could be rewritten to avoid the situation I will  
explain.

The first of two examples to show this case is the third demo, "Image  
Display" example. On my first pass, I was naive and just thought that  
"nothing happened." On a more recent pass of the guide, I dug into  
more technical details and saw that when I replaced the ImageJ actor  
with the Browser Display actor, the following is given to standard out:

> Reading from the browser - val = false
> Error invoking browser, cmd=netscape -remote  
> openURL(file:///...distribution.PNG)

Now, this makes perfect sense since I don't have netscape installed (I  
use firefox). But, the more fundamental question is - Do we want to  
use an example that depends upon a browser that will vary from system  
to system? This will inevitably fail on some systems no matter how  
good the Getting Started Guide is written. Ultimately, it's a great  
demonstration. But, should it be in the first document that  is seen  
by new users?

The second of my two examples revolves around section 7.1 Sample  
Workflow 1 - Simple Statistics.

Upon the first run, I saw nothing happening again. Now, on a second  
pass with a more technical mindset, I troubleshot and saw the  
following displayed on standard out:

> Problem with creating process in RExpression!
> Error in _exec()
> 54 ms. Memory: 142636K Free: 54674K (38%)

The process couldn't be created because R was not installed on the  
system. After installing R with default settings, I see the system now  
work. However, there is an additional message "File Error: Could not  
open the file." It doesn't stop the demonstration from working, but it  
adds confusion to the situation. I'm sure I could resolve this as  
well. But, the same question comes to mind. In an introduction to the  
software, do we want to use something that involves other programs  
outside of our normal control? Would we, in the future, include R as  
part of the install and therefore avoid this issue? How important is  
it for us to use an R example? Can we give just one example (instead  
of many examples) that uses R that stresses boldly how it may fail if  
R is not installed. If an initial user doesn't know what R is, or care  
to use it, many of the examples will fail.

In summary, these are the impressions from a new set of eyes. Kepler  
is impressive as all heck and the framework I want to use for the  
project that will probably take the next few years of my life. If I  
learned nothing else in studying my second year of Bioinformatics, it  
is that if a software looks too confusing, no matter how good it is,  
my Biologists tell me they shy away from it. I'd like to see the  
software work so well that it becomes the de facto standard like  
BioPerl is.

Warmest Regards,

Glen
P.S. Kirsten, I still have about a zillion notes I made on reading the  
guide (like parallelism, some omitted words, etc.). But, they seem so  
insignificant compared to the big issues seen in this email.

-- 
913-486-8775
glen at glenjarvis.com
http://www.glenjarvis.com

"You must be the change you wish to see in the world." -M. Gandhi