Skip to content

Sci-kit Learn

Linear Models

  • fast and simpler than alternatives (such as decision trees and kernel machines)
  • applicable for regression (called linear regression) and classification (called logistic regression or multiclass classification).
  • linear models give rise to straight line (or of higher dimensional 2-D planes etc).
  • Logistic regression give straight lines that separates two or groups
  • BP
    • best when data is linearly separable, that is, different groups of data can be separated by straight (higher dimensional) lines
    • even when linear models are unsuitable, feature engineering can turn some sets of data into linearly separable by introducing new (derived) features
    • can underfit when n_features << n_samples (e.g. number of features is < 10)
    • can overfit when n_samples << n_features
    • hard to beat when n_features is large

Regularization

  • It's known as Ridge Regression in sci-kit learn library. Similar to Linear Regression, except it takes an additional parameter \(alpha\)
  • \(alpha\) tries to pull coefficients to zero unless large coefficient value reduces training error by large value.
  • \(alpha\) of 0 becomes linear regression. BP use Ridge regression over linear regression with carefully tuned value of \(alpha\) (has no good default value)
  • Use RidgeCV (over GridSearchCV) (cross validation) to try different \(alpha\) values
  • scikit-learn by default uses regularization with LogisticRegression(C=1). C is inverse of \(alpha\) (high values of C leads to weaker regularization)
  • BP use regularization when n_samples << n_features