Skip to content Skip to sidebar Skip to footer

Using A Support Vector Classifier With Polynomial Kernel In Scikit-learn

I'm experimenting with different classifiers implemented in the scikit-learn package, to do some NLP task. The code I use to perform the classification is the following def train_c

Solution 1:

In both cases you should tune the value of the regularization parameter C using grid search. You cannot compare the results otherwise as a good value for C for one might yield crappy results for the other model.

For the polynomial kernel you can also grid search the optimal value for the degree (e.g. 2 or 3 or more): in that case you should grid search both C and degree at the same time.

Edit:

This has something to do with the uneven distribution of class' instances in my training data, right? Or am I calling the procedure incorrectly?

Check that you have at least 3 samples per class to be able to do StratifiedKFold cross validation with k == 3 (I think this is the default CV used by GridSearchCV for classification). If you have less, don't expect the model to be able to predict anything useful. I would recommend at least 100 samples per class (as a somewhat arbitrary rule of thumb min bound, unless you work on toy problems with less than 10 features and a lot of regularity in the decision boundaries between classes).

BTW, please always paste the complete traceback in questions / bug reports. Otherwise one might not have the necessary info to diagnose the right cause.

Post a Comment for "Using A Support Vector Classifier With Polynomial Kernel In Scikit-learn"