Sci-kit Learn¶

Linear Models¶

fast and simpler than alternatives (such as decision trees and kernel machines)
applicable for regression (called linear regression) and classification (called logistic regression or multiclass classification).
linear models give rise to straight line (or of higher dimensional 2-D planes etc).
Logistic regression give straight lines that separates two or groups
BP
- best when data is linearly separable, that is, different groups of data can be separated by straight (higher dimensional) lines
- even when linear models are unsuitable, feature engineering can turn some sets of data into linearly separable by introducing new (derived) features
- can underfit when n_features << n_samples (e.g. number of features is < 10)
- can overfit when n_samples << n_features
- hard to beat when n_features is large

It's known as Ridge Regression in sci-kit learn library. Similar to Linear Regression, except it takes an additional parameter \(alpha\)
\(alpha\) tries to pull coefficients to zero unless large coefficient value reduces training error by large value.
\(alpha\) of 0 becomes linear regression. BP use Ridge regression over linear regression with carefully tuned value of \(alpha\) (has no good default value)
Use RidgeCV (over GridSearchCV) (cross validation) to try different \(alpha\) values
scikit-learn by default uses regularization with LogisticRegression(C=1). C is inverse of \(alpha\) (high values of C leads to weaker regularization)
BP use regularization when n_samples << n_features