Skip to content

Notes - Paresh

Decision Trees

Decision Trees¶

Each attribute (or independent variable) forms a node and evaluating it leads to the next node in the tree
The root node is determined using either of:
- GINI index: \(1 - (P_{yes})^2 - (P_{no})^2\) where \(P\) is probability of having the \(True\) outcome
- Entropy of Gain index

Evaluation metrics¶

evaluate model's accuracy; some popular ones

Logistical Regression¶

AUC - ROC: Area under the curve, Receiver Operating Characteristic; measures accuracy of the model \(\frac{(TN+TP)}{(TN+TP+FP+FN)}\)
{True,False}{Positive,Negative} combinations
- Recall/Sensitivity (SN): gives an indication % of all true positives that model could recognize
- Precision: gives an indication of % of predicted positives were true positives
- Specificity:
F1 Score

Linear Regression¶

Accuracy: \(R^2 = 1 - \frac{RSS}{TSS}\)
- Residual Sum of Squares: \(RSS = \sum(Y_i - Y_{fitted})^2\)
- Total Sum of Squares: \(TSS = \sum(Y_i - Y_{mean})^2\)
- a relative metric, always between 0 and 1, closer to 1 => better model
RMSE: Root-Mean-Squared Error, if \(x_i\) is actual and \(\hat{x_i}\) is predicted value then: $\(\sqrt{\sum_{i=1}^n \frac{{(x_i - \hat{x_i})}^2}{N}}\)$
- an absolute number, lower value => better model