[seek-dev] Re: Announcing the SDSC Matrix 2.0

Bertram Ludaescher ludaesch at sdsc.edu
Sat Sep 20 10:41:05 PDT 2003


This is very interesting! As you know, the topic of "scientific
workflows" / analysis pipelines is an active area of research and
development in various of our projects in particular SciDAC/SDM and
SEEK, but also BIRN and GEON.

Across multiple of these projects (SEEK, SciDAC, GEON) we have adopted
the open source system Ptolemy-II as a front end. There are a number
of changes we need to make "under the hood" to "grid enable" it etc.

Some preliminary info can be found here:
(warning: this is work in progress...)
Different extensions are in the making.

We should definitely compare notes!

Can you give a Matrix demo to the interested crowd at SDSC?
Conversely, we'd be happy to give you a demo and a larger "where do we
go from here". In fact, if my memory serves me right, we'll have a
demo of the SciDAC prototype this Monday in 462...

let me double check



>>>>> "AJ" == Arun Jagatheesan <arun at sdsc.edu> writes:
AJ> Hello All,
AJ> The Matrix team at SDSC would like to let you know about the latest release
AJ> of SDSC Matrix software.
AJ> Matrix provides the protocols and software infrastructure needed by
AJ> Inter-organizational data management services to create, access and manage
AJ> process-flow pipelines. Matrix uses the Data Grid Language, which can be
AJ> used to describe, query and control process-flow pipelines in data-intensive
AJ> environments. More information about this release can be found in our
AJ> "ReleaseNotes.txt" (also attached at the end of this mail). To get started
AJ> read the "RunningTheMatrix.html" file that comes along with our release.
AJ> We would like to acknowledge and thank the support and encouragement
AJ> provided by multiple projects and our "well-wishers" (copied on this mail).
AJ> We hope to provide more features in our next release.
AJ> Regards,
AJ> Arun.
AJ> ~~~~~~~~~
AJ> Dream; because, dreams lead to thoughts; thoughts lead to action, and action
AJ> leads to achievement.
AJ> Arun swaran Jagatheesan
AJ> http://www.sdsc.edu/~arun/
AJ> San Diego Supercomputer Center.
AJ> (858)822.5452
AJ> SDSC Matrix - The dataflow-process management system (services)
AJ> <<<READ the "RunningTheMatrix.html" to get started>>>
AJ> Matrix New Features/Additions: (Version 2.0 - Release September 2003)
AJ> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AJ> 1) Data Grid Queries using W3C XQuery
AJ> W3C XQuery is supposed to be successor of SQL. A new operation for querying
AJ> metadata has been
AJ> developed. The query is based on a W3C XQuery. System defined and user
AJ> defined meta data can
AJ> be queried. The query results are returned in a format defined by the user
AJ> in the Query
AJ> Request (as part of the XQuery Grammar). Further information on query
AJ> functionality can be
AJ> seen in the document Metadata Query Implementation for Matrix.doc in the
AJ> "docs" directory.
AJ> 2) Granular Pipe-line Status Queries
AJ> Matrix uses "Data Grid Language (DGL)" to support execution of operations on
AJ> a data grid.
AJ> This could be considered analogous to long run transactions using the
AJ> "Structured Query
AJ> Language" (SQL) on a database.
AJ> SQL:data base ::=:: DGL: data grid
AJ> In DGL, a Data Grid Request consists of multiple flows (like a process
AJ> pipeline). Each flow
AJ> could be either parallel or sequential and can have multiple steps
AJ> (operations). Each flow
AJ> or step or the whole transaction (request) is associated with an unique
AJ> identifier, which can
AJ> be used to query on its status.  Once the request is made, the status or
AJ> result of a
AJ> transaction or flow or status is known (asynchronously) using status
AJ> requests using their
AJ> respective unique identifiers.
AJ> The status of a specific transaction, flow or step can be requested at any
AJ> granular level.
AJ> The Status Request could be made by the process (or user) who originally
AJ> made the Data Grid
AJ> Request or by any other third party.
AJ> 3) Metadata Ingestion
AJ> Ingestion of system defined and user-defined metadata has been introduced in
AJ> this release.
AJ> 4)Matrix Server Configuration
AJ> Configuration properties for the Matrix server can be dynamically loaded
AJ> from a file given in
AJ> the %MATRIX_HOME%/conf/matrix.properties.  The properties for file cache,
AJ> log level,
AJ> deployment level , log file location, etc can be specified in this file.
AJ> 5) Logging Functionality
AJ> Logs can now be optionally written now to a file whose location is specified
AJ> in the
AJ> configuration properties file (mentioned in #4 above). The logs can be
AJ> written to the console
AJ> or file based on the properties set. There are currently 5 logging levels
AJ> now - DEBUG, INFO,
AJ> 6) Ingest URLs (HTTP, FTP)
AJ> The Matrix operation to ingest data can now support external URLs as the
AJ> data to be ingested.
AJ> External URLs can be inserted as data (snapshot) into the data grid. These
AJ> are also
AJ> asynchronous matrix operations. Ingestion of HTTP and FTP are currently
AJ> supported. Future
AJ> implementations should support more elaborate authentication schemes.
AJ> 7) Client API Additions
AJ> The Client API has been significantly updated with many useful methods and
AJ> better
AJ> documentation. We are also improving the documentation. To learn more, read
AJ> the client
AJ> README and also run the ant target 'ant client-docs'. This builds up the
AJ> javadoc for the
AJ> client API which should be the primary source of information. The client API
AJ> has also been
AJ> updated to reflect changes in the schema (specifically, the removal of
AJ> stepname from a step
AJ> and metadata components newly added)
AJ> 8) API Programming Examples
AJ> New examples for using the Client API have been added. For more information
AJ> look in
AJ> the RunningTheMatrix.html to run the examples. The examples are a good place
AJ> to start with.
AJ> 9) License for free academic and research use
AJ> We can't help it - we need to add a license to our work as part of
AJ> University rules to avoid
AJ> any other person taking a commercial advantage. Look at our license - you
AJ> are free to use this
AJ> in your code and development any time (as long as you dont make commercial
AJ> advantage of this).
AJ> 10) Open source View
AJ> If you would look at our code you can look it up at:
AJ> http://www.npaci.edu/DICE/SRB/matrix/cvs.cgi/
AJ> Warning #1: The code is not clean with less documentation.
AJ> Warning #2: Look at our license before you look into our code
AJ> Matrix Bugs Cleared:
AJ> ~~~~~~~~~~~~~~~~~~~
AJ> 1) Mandatory Step Name removed
AJ> Our previous version required the user to specify the name of the step or
AJ> operation he wants
AJ> to perform. This is no longer needed.
AJ> 2) Tomcat and temporary file cache
AJ> In Microsoft Windows if the Apache Tomcat was started from the shortcut
AJ> (instead of the
AJ> regular command line prompt), there was a file not found problems during
AJ> ingest or download.
AJ> This happened because of the temporary file written to disk could not be
AJ> read back again.
AJ> This bug was fixed by ensuring that all temporary files were written to a
AJ> pre-defined
AJ> file cache specified in the properties file.
AJ> 3) Accessing multiple Attachments in a single request
AJ> When a flow was executed in parallel, and multiple steps tried to retrieve
AJ> the attachments
AJ> from the attachment iterator concurrently, they were not thread-safe. This
AJ> defect was fixed
AJ> by storing all the attachments in a thread-safe hashtable, where they can
AJ> still run
AJ> concurrently or in parallel.
AJ> Known issues (a.k.a Bugs) in this release:
AJ> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AJ> 1) Selection of multiple user-defined meta data fields
AJ> There is a problem in this release when we use XQuery to select more than
AJ> one user-defined
AJ> meta data field. This is a bug from the underlying Jargon code which was
AJ> removed on the
AJ> release date (we have not added this in release as we have not tested it).
AJ> 2) The Data Grid Language Schema for ingestion of a dataset (file) has
AJ> 'logical identifier'
AJ> as an optional element. But, our Matrix implementation currently requires
AJ> this to be present.
AJ> 3) License: We dont know how to write this currently, just did a cut and
AJ> paste of Apache
AJ> license. Technically, it needs to say no to commercial exploitation (by
AJ> redistribution,
AJ> re-use, design/idea stealing(?), and what not). If translated to English, we
AJ> want the license
AJ> to say that the university folks need a share of the pie before you can make
AJ> money ;).
AJ> 4) Meta-data insert has not been given a full testing for performance and
AJ> functionality.
AJ> 5) We still dont have persistence. We wanna design conditional process
AJ> pipelines in grid.
AJ> This will change our existing store (or persistence). We will connect this
AJ> to a RDBMS once we
AJ> have finished adding "conditional process pipelines".
AJ> Matrix Future releases:
AJ> ~~~~~~~~~~~~~~~~~~~~~~
AJ> We are growing fast and trying to make our selves as first in "datagrid
AJ> services". We also
AJ> helping other institutions with our ideas do develop similar infrastructure.
AJ> We are open to
AJ> join forces with other projects or institutions who might be interested. New
AJ> features we
AJ> have planned (released based on demand by users):
AJ> - Extend Metadata Query Functionality
AJ> - Upgrade Matrix code to work with latest Apache jar files
AJ> - Persistence layer using RDBMS
AJ> - Matrix agents to invoke any WSDL web service (during data-flow process)
AJ> - Matrix agents to invoke any GGF OGSA service (during data-flow process)
AJ> - An overview document of the architecture.
AJ> - OGSA Grid File System, OGSA Data Services interfaces
AJ> - Look into
AJ> http://users.sdsc.edu/~arun/Research/dataGrid/Data%20Grid%20Services%20and%2
AJ> 0Pipelines.ppt
AJ> for a short course on SDSC Matrix.
AJ> - Iterative and conditional invocation of services
AJ> ~~~~~~~~~~~~~~~
AJ> SDSC Matrix Project is partly funded by:
AJ> - NSF GriPhyN Project 	   (Research and Architecture)
AJ> - NSF SCEC project 	   (Matrix Core Development)
AJ> - NIH BIRN Project 	   (Jargon)
AJ> - DOE Web Services Project (Jargon)
AJ> - NPACI REU 		   (Client API, Matrix Applications and Testing)
AJ> Matrix Team (who took part in this release-version):
AJ> - Allen Ding (client development)
AJ> - Arun Jagatheesan (All talk no work a.k.a Research ;)
AJ> - Reena Mathew (server development)
AJ> - (Lucas Gilbert for Jargon development)

More information about the Seek-dev mailing list