[kepler-dev] [Bug 4362] - component search performance

bugzilla-daemon at ecoinformatics.org bugzilla-daemon at ecoinformatics.org
Tue Oct 6 00:21:27 PDT 2009


http://bugzilla.ecoinformatics.org/show_bug.cgi?id=4362





------- Comment #2 from aschultz at nceas.ucsb.edu  2009-10-06 00:21 -------
Sean and I discussed two possibilities for implementing fast component search
today.

One was to use an SQL table that indexed the entire component library using
Preorder Tree Traversal technique of storing hierarchical data in a relational
database.  I have much experience with this and know that the machinery to
implement such a solution would be quite time consuming.  However it would make
our searches extremely fast and likely be useful in many other tasks.

Another option was to index the search terms (name, classname, ontologies,
classes, etc.) for the components by their KeplerLSIDs in an SQL table.  An SQL
query could then be used to perform the matching of the provided string with
the indexed search terms and return a list of KeplerLSIDs.  Then a quick walk
through the tree to match the LSIDs would finish the job.  We're pretty sure
the speed of the SQL query will be fast even though hsql does not support a
fulltext index the same way MySQL and PostgreSQL do.  Since we only expect to
have a few thousand rows for the existing size of the component library the
lack of fulltext indexing should not be a problem.  The other slowing factor
would be the KeplerLSID matching in the component library after the results
have been retrieved from the database.  To demonstrate that this is likely a
very quick procedure I have implemented KeplerLSID matching in the Component
Library (see bug 4303).  You can try it yourself by right clicking on a
component, view the LSID, copy it, and paste it into the search field.


More information about the Kepler-dev mailing list