[kepler-dev] [Bug 3574] New: - Support for importing directory contents using CollectionSource

bugzilla-daemon at ecoinformatics.org bugzilla-daemon at ecoinformatics.org
Mon Oct 27 11:49:37 PDT 2008


http://bugzilla.ecoinformatics.org/show_bug.cgi?id=3574

           Summary: Support for importing directory contents using
                    CollectionSource
           Product: Kepler
           Version: 1.0.0
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: general
        AssignedTo: mcphillips at ecoinformatics.org
        ReportedBy: mcphillips at ecoinformatics.org
         QAContact: kepler-dev at ecoinformatics.org
 BugsThisDependsOn: 3573


A common workflow pattern is to take as input all of the files (or those of a
particular type) in a directory on a researcher's computer system.  For
example, there are COMAD workflows that process all the FASTA files in a
directory, creating a collection for each FASTA file and storing the contained
DNA or protein sequences in the corresponding input collections.  

Once the CollectionSource actor is able to automatically import the contents of
files (see bug 3573), it will be extremely useful to refer to directories in
the XML input to CollectionReader or CollectionComposer and have the actor
import all of the files it finds there.  Another useful feature would be the
option of having CollectionSource descend into sub-directories, creating a
nested collection for each and importing contained files into the corresponding
subcollections.  Whole directories of scientific data files could then easily
serve as input to COMAD workflows.

These features eventually could make it much easier to stage data for input to
a workflow run without requiring modification of the workflow specification
itself.


More information about the Kepler-dev mailing list