[kepler-dev] [Bug 3574] New: - Support for importing directory contents using CollectionSource
bugzilla-daemon at ecoinformatics.org
bugzilla-daemon at ecoinformatics.org
Mon Oct 27 11:49:37 PDT 2008
http://bugzilla.ecoinformatics.org/show_bug.cgi?id=3574
Summary: Support for importing directory contents using
CollectionSource
Product: Kepler
Version: 1.0.0
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: general
AssignedTo: mcphillips at ecoinformatics.org
ReportedBy: mcphillips at ecoinformatics.org
QAContact: kepler-dev at ecoinformatics.org
BugsThisDependsOn: 3573
A common workflow pattern is to take as input all of the files (or those of a
particular type) in a directory on a researcher's computer system. For
example, there are COMAD workflows that process all the FASTA files in a
directory, creating a collection for each FASTA file and storing the contained
DNA or protein sequences in the corresponding input collections.
Once the CollectionSource actor is able to automatically import the contents of
files (see bug 3573), it will be extremely useful to refer to directories in
the XML input to CollectionReader or CollectionComposer and have the actor
import all of the files it finds there. Another useful feature would be the
option of having CollectionSource descend into sub-directories, creating a
nested collection for each and importing contained files into the corresponding
subcollections. Whole directories of scientific data files could then easily
serve as input to COMAD workflows.
These features eventually could make it much easier to stage data for input to
a workflow run without requiring modification of the workflow specification
itself.
More information about the Kepler-dev
mailing list