[seek-taxon] Update

aravindc aravindc at mail.ku.edu
Thu Dec 8 14:05:15 PST 2005


Hi all,

  I just want to update the group on what i have been upto the last one month. 
Hierarchical classification is a key technology needed in
identifying the topics (species) discussed in a particular taxonomic
paper. Over the summer, we submitted a paper draft discussing research on
hierarchical classification. The reviews were positive, but the
reviewers wanted a comparison with other existing classification
techniques. The past one month has been spent installing and testing
freeware classification packages like libsvm for Support vector machines (SVM)
and rainbow for kNN, naive bayes, and rocchio.  We also used the KeyConcept
rocchio classifier for our experiments.

     Using these packages, we have been running experiments comparing their
running times and accuracy for text classification.  For each experiment, I
have a training and testing set.   I then perform cross-validation on the
training examples to determine the best parameters to use on the testing set.
Though SVM has been highly cited to work well, we get very low accuracy and
this might be because we use all the features for training and testing.  kNN
and naive bayes perform better than SVM and they give an accuracy of around
25% and 28% respectively.  But Rainbows rocchio performs the best, giving an
accuracy of around 54.92%.  However, the running time for all the classifiers
using the rainbow package is very high (in days and weeks).  On the other
hand, KeyConcept's rocchio classifier gives an accuracy of 34.66% but the
running time is around 30 minutes.  I am currently looking at improving the
accuracy of KeyConcept's rocchio without decreasing its efficiency.

Cheers,
Aravind.




More information about the Seek-taxon mailing list