[SDM-SPA] Re: [kepler-dev] moving edu.ncsu to org.sdm.spa

Wed Apr 7 12:25:54 PDT 2004

Hi Terence:

Thanks for chiming in. I'll try to answer briefly (below) before
hopping into a meeting... Note that I'm not aware of all the details
that the developers are fighting with as I'm not working on the actual
code (unless of course, we'll have a Prolog actor, which might be
coming soon ;-)

Bertram

>>>>> "TC" == Terence Critchlow <critchlow1 at llnl.gov> writes:
TC> 
TC> Hi Bertram,
TC> I am very confused about your comments and don't think I understand your 
TC> concerns.  In no particular order, the items confusing me include:
TC> 
TC> - My understanding is that the current directory structure is organized 
TC> around projects, not capabilities (ie org/* ). Part of Xiaowen's goal is to 
TC> minimize changes to the current directory structure so she is keeping that 
TC> organization. If I am understanding it correctly, your proposal would 
TC> completely restructure the entire repository - which is something we have 
TC> been trying to avoid.

I think it's ok to organize by project, if we can make it work. As I
understand, the organization by project aims at identifying what each
project contributed and thus some sort of "ownership". The problem is
that there are examples at the workflow level but also at the "utils"
level and even at the actors level where code is co-developed among
multiple projects. I don't know how to best keep that in order. Maybe
we just need some shared folder for those (because creating folder of
the sort SPA-GEON, GEON-SEEK, etc won't be practical I think)

TC> - From the perspective of where certain shared file go - this seems to be a 
TC> problem that the developers have already worked out, since we don't have 
TC> the shared directory at this time. Given they don't seem to have a problem, 
TC> why is this a concern from your perspective?

I don't think it has been really solved. In Kepler, as I understand,
there was not a lot of attention paid to clean up the directory
structure. Hence things were (and probably still are) somewhat messy
and need improvement. Thus it's really good that we spend some time on 
the cleaning up. But as far as I understand there wasn't a project
oriented separation before (on the source level). I think... at least
it was not completely. So there are still things to sort out.

One solution that I would have preferred (and I think it's the one
that Edward Lee and other also advocated) was to customize (esp. by
project) what users actually see. The workflow and actors libraries as 
visible to the user (and any reviewers should the time come ;-) can be 
easily configured and thus organized by project. In the future
(Kepler/SEEK has such plans) this may become even more dynamic and
easier than it is already in the Ptolemy/Vergil design.

TC> - From the perspective of keeping the SPA and Kepler repositories sync'ed, 
TC> it is a lot easier if the copy is based on the directory structure instead 
TC> of identifying each file individually, but it is doable either way.

I agree. We should keep that in mind as well. But I goes it is one of
several issues to be considered. 

TC> - I don't see why you would not have the same code location concerns under 
TC> a domain specific approach - either you need to figure out that the 
TC> browserUI interface was first applied to a blah workflow and thus is 
TC> located in the blah directory, or you great a blah blah directory and put 
TC> the code shared by those domains in that directory,  or you put everything 
TC> that may possibly be shared into a single directory creating a huge mess. 
TC> Given the number of application domains is larger than the number of 
TC> projects, I could see this actually being worse overall.

Yes, things can get messy. That's why I had earlier suggested that we
have a dynamic classification of actors (and workflows which can be
seen as composite actors) based on a list of keywords (or "concepts").

Thus all actors would go into one or more actor repositories (there are
plans to include web-accessible remote actor libraries into a user's
environment, on demand), possible physically structured by function
(or domain or whatever else makes sense, including by project if it
works out), but *logically* structured by the properties we assign to
the actors. For example say that we have two actors with property
lists as follows

A1 has_prop [p1,p2,p3]
A2 has_prop [p1,p4,p5] 

Wrt. property p1 both A1 and A2 would end up in the same bucket. If
for example, p2 isa p4 and p5 isa p3, then organization by properies
p2/4 and/or p3/5 gets more interesting. 

Here we can apply classification and reasoning techniques to come up
with a dynamically created folder structure (or concept lattice) which 
will allow the user to browser and search large actor and workflow
libraries by concepts and logical structure instead of physical
structure.

Obviously one property would be the project affiliation of the
contributing authors. Hence one will have a "project view". Another
one can be a "function view" (the way Ptolemy/Vergil organizes things
by default), yet another one would be by "domain" etc.

TC> - You have a good point about the problem of identifying what actors 
TC> currently exist and where they are. This could be resolved if each project 
TC> kept up a web page that described the actors that are currently available 
TC> in their directories (differentiating the ones that are still works in 
TC> progress from the ones that are ready to go). A main Kepler page could then 
TC> either combine these pages or search across them.

This is a good idea. Specifically the actor repositories will be
similar to web service repositories and have this capabilities (but I
don't think anyone in Kepler got around to pushing the envelope
there. But it's on the 2-do list ... maybe even in the Kepler bugzilla 
list..)

TC> - I can see arguments in favor of both a top-level utils directory and a 
TC> project based one. Could we have both, with the understanding that the top 
TC> level directory is for very general shared utility code (such as an XSLT 
TC> parser) and the project level one being for general code where it is 
TC> unclear whether or not other projects would really be interested in it.

Maybe so. But I find it just hard to think about these things: for
some people the XSLT actor may be highly useless and thus should get
favorable treatment over a "point in polyon" actor that might be seen
quite specific to GEON initially, but then used, e.g., by SEEK.

TC> - I am surprised that you expect there to be a lot of joint development at 
TC> the actor level. I would have expected the collaborative work to occur more 
TC> on the workflow level, with creating an actor being the responsibility of a 
TC> specific developer. It seems like actors are developed based on a need 
TC> (real or perceived) for a given workflow, and may be generalized (when 
TC> created or later on) for other workflows. Is it the ongoing refinement of 
TC> the actor that you are expecting, or do you really see multiple developers 
TC> across projects working on a single actor at the same time?

Indeed there is probably more joint development for workflows (or
reuse of actors for a workflows) than there is for individual
actors. But it has happened in the past and it may happen in the
future, so we need to take it into account.

For example, it may make sense to develop a "grid access" actor
jointly between SPA, SEEK, and GEON since all these projects are
interested (more or less) in a form of remote data access and
execution. 

Or we may want to develop jointly a "Grid director" that allows
transparent remote access and execution instead of explicit "grid
actors". Here, we probably could use some help from the Ptolemy team
(if we can get them interested in this activity).

So I do see a lot of potential for such cross-project leverage at a
very operational day-to-day level. It would be great if we can nourish
such efforts as they will allow us to make even fast progress because
we will have a "critical mass" (and we need it: "die Konkurrenz
schlaeft nicht" -- "the competition doesn't sleep" -- German saying ;-)

Bertram

TC> 
TC> Terence
TC> 
TC> At 09:44 AM 4/7/2004 -0700, Bertram Ludaescher wrote:
>> >>>>> "XX" == Xiaowen Xin <xin2 at llnl.gov> writes:
>> ..
>> >> Also for n projects will we have 2^n subdirectories for all possible
>> >> combinations? Where is the browserUI actor now? I think it's at least
>> >> SPA&GEON. Maybe a subsequent version will be SPA&GEON&SEEK or
>> >> SPA&GEON&Ptolemy, or of course SPA&GEON&PTOLEMY&SEEK.
XX> 
XX> No, there would only be one copy of each file.  If SPA needs to use
XX> something created by GEON, we would need to replicate it in the NCSU
XX> repository, and import org.geon.* for example.
XX> 
XX> The reason for this is that since we're already dividing up most of the
XX> source files by organization, we should just divide it all up by
XX> organization.
>> >> What do we do with stuff that is authored by multiple authors (like
>> >> browserUI)? I really want to understand how that works!
XX> 
XX> I think we should put that file in one of the projects.  Just pick one
XX> that contributed most to it or the one that originated the idea.  All
XX> authors' names will get listed in the file, and the other projects can
XX> import that code.
>> 
>> Xiaowen:
>> I don't feel comfortable with moving, say browserUI either under GEON
>> or under SPA. In either case the organization seems to imly that GEON
>> or SPA own this actor which is untrue.
>> 
>> The actor has been build in collaboration. It should be somewhere
>> under Kepler, yes, but neither under GEON nor SPA alone.
>> 
>> I guess we need a place for "truly collaborative" code and that would
>> be somewhere "neutral", i.e., under Kepler not under SPA or
>> GEON. That's where the code should reside IMHO.
>> 
XX> I think the reason people wanted a util/ directory is so
XX> that it would be a place to put code that would be useful for all
XX> projects.
XX> However, this criteria is hard to determine because most of
XX> the code from one project could potentially be useful in the other
XX> projects.  So I think having a util/ directory as a subdirectory of the
XX> projects makes more sense.
>> >>
>> >> I don't understand how this solves the problem. How would Efrat (GEON)
>> >> know whether actors X,Y,Z she is developing are "util" (and thus
>> >> potentially useful for others) or not?
>> >> Should she spend cycles on figuring out what might be useful? I guess
>> >> the database access actor will be, but what about the point in
>> >> polygon?
>> >> I claim that fundamentally one CANNOT know what will be useful to
>> >> others or not. By default everything could be useful, right?
XX> 
XX> Yes, that was exactly my point that we can't figure out what's useful to
XX> others.  So a top-level util/ directory doesn't make sense.
>> 
>> Hmm.. I think just the opposite. Why would a Kepler member dig through
>> subdirectors of project oriented folders to find what's there instead
>> of looking at a shared Kepler/util directory?
>> 
>> (some might call this "plain confusing" ;-)
>> 
XX> I think actors should go under org.sdm.spa.* and utility files
XX> (non-actors, but methods that are required by several actors) should go
XX> under org.sdm.spa.util.*
XX> 
XX> If SPA needed to use something from GEON, we
XX> could import org.geon.*; or org.geon.util.* as an example.  The
>> >>
>> >> yes, that makes sense.
>> >>
XX> distinction here between org.sdm.spa.* and org.sdm.spa.util.* is that
XX> util.* contains non-actor utility code, while all the actors go into
XX> org.sdm.spa.*.
XX> 
XX> I hope that made sense :)
>> >>
>> >> yes, if you just want to split between non-actor code and actor code, 
>> then the
>> >> distinction between util and non-util seems reasonable.
>> >>
>> >> So do I understand the proposal right that each project would have two
>> >> subdirectories say
>> >>
>> >> .../geon/util and
>> >> .../geon/actors  ?
>> >>
>> >> (only that the actors you didn't have as a separate subdir so far)
>> >>
>> >> Still a major problem with this organization by project remains:
>> >> How do you deal with joint development, even at the file level
>> >> (actors, directors etc)? This is precisely what we want to encourage
>> >> in Kepler. Where do these things go?
>> >>
>> >> Bertram
XX> 
XX> I would say joint files go under one project, with authors clearly
XX> listed in the file.
>> 
>> I think it's a non-starter.
>> 
>> If I understand correctly, the whole point of the project-oriented
>> organization was to make clear what was produced by which project.
>> Nice at it may sound at first, it doesn't seem to work well for real
>> collaborations where developers from multiple projects contribute,
>> sometimes even to the same workflow or the same actor etc.
>> 
>> What does the project-oriented structure reflect if not "ownership"?
>> 
>> (a) If it does, then we may need to live with 2^n directories for n
>> projects. Not very practical.
>> 
>> (b) If it doesn't, why not organize the directories by a more functional
>> criterion and pull out "ownership" by selecting on the "author" (or
>> "project") field of files?
>> 
>> 
XX> If we put joint files in a common directory like util/, then that's also
XX> confusing.  Because you could have a scenario where SPA creates a file,
XX> but GEON subsequently helps to modify it significantly.  This would now
XX> qualify as a joint file, but should we move it from org.sdm.spa to util?
>> 
>> I think you just delivered another argument for a functional
>> organization instead of a project oriented one.
>> 
>> Bertram
>> 
XX> 
XX> What do you think?
XX> 
XX> Xiaowen