Question: Machine learning based classification techniques
1
Entering edit mode

I'm a novice in machine learning-based classification techniques. Please do help

1) What is the difference between SMO (weka) and the LibSVM algorithms? Which is the best? Because the parameter requirements of the two are very different.

2) Feature reduction (e.g.PCA) and feature selection (e.g. InfoGain) are two different techniques for reducing features. Which one to rely on? In which conditions are they to be used?

3) In Infogain eval, the ranking algorithm ranks the features and the threshold parameter can remove the unwanted features with respect to entropy measure. Can we optimize both? Or do we optimize one of them alternatively? What I should I be looking for - accuracy?

4) Is accuracy the only thing that I should be looking for? Of course there is overfitting, but can I quantify the predictive power of the model other than just CV accuracy? Some other measure or technique?

ADD COMMENTlink 4.2 years ago tigeradab • 50 • updated 3.7 years ago chen ♦ 1.9k
Entering edit mode
2

This looks like a class assignment. Regarding 1), the question doesn't make sense: SMO is an algorithm for solving optimization problems. LibSVM is a software library that implements the SMO algorithm.

ADD REPLYlink 4.2 years ago
Jean-Karim Heriche
19k
Entering edit mode
1

No this is no class assignment. SMO used by weka is a different algorithm from the solvers used by LibSVM. Hence, their parameters are quite different. But the question is which one is better? And principally what are the differences?

ADD REPLYlink 4.2 years ago
tigeradab
• 50
Entering edit mode
2

This isn't a bioinformatics question though, this is a machine learning question. This isn't really the forum for it. I'd try posting your question to the machine learning tag of the stats stack exchange.

ADD REPLYlink 4.2 years ago
andrew.j.skelton73
5.7k
Entering edit mode
1

Both Weka and libSVM use the SMO algorithm. They have different implementations (and they reference different papers) but then your question is actually: is Weka's SMO implementation better (by what criteria ?) than the libSVM implementation ?
Edit: LibSVM actually use a SMO-like algorithm hence the different papers Weka and libSVM reference.

ADD REPLYlink 4.2 years ago
Jean-Karim Heriche
19k
Entering edit mode
1

Also given that there are 4 different questions, it would be better if they were posted separately so that answers could address them separately. This would improve clarity of the posts.

ADD REPLYlink 4.2 years ago
Jean-Karim Heriche
19k
3
Entering edit mode

1) What is the difference between SMO (weka) and the LibSVM algorithms? Which is the best? Because the parameter requirements of the two are very different.

SVM is a classifier algorithm problem, SMO is one of the common optimization algorithms to solve this problem, libSVM is a library implements SMO.

2) Feature reduction (e.g.PCA) and feature selection (e.g. InfoGain) are two different techniques for reducing features. Which one to rely on? In which conditions are they to be used?

Feature reduction makes feature transformation, while feature selection doesn't

3) In Infogain eval, the ranking algorithm ranks the features and the threshold parameter can remove the unwanted features with respect to entropy measure. Can we optimize both? Or do we optimize one of them alternatively? What I should I be looking for - accuracy?

This question is not clear, what does both stand for? Entropy and what? Optimization methods are very flexible, you can change the optimization target function if you want.

4) Is accuracy the only thing that I should be looking for? Of course there is overfitting, but can I quantify the predictive power of the model other than just CV accuracy? Some other measure or technique?

Accuracy is not the only concern. You should do careful validation (use techniques like K-fold cross validation) to test the generalization ability of your model. And be noted, for same accuracy, the fewer features you use, the better your model is.

ADD COMMENTlink 3.7 years ago chen ♦ 1.9k

Login before adding your answer.

Powered by the version 1.8