Skip to content

Decision Trees

  • Each attribute (or independent variable) forms a node and evaluating it leads to the next node in the tree
  • The root node is determined using either of:
    • GINI index: \(1 - (P_{yes})^2 - (P_{no})^2\) where \(P\) is probability of having the \(True\) outcome
    • Entropy of Gain index

Evaluation metrics

  • evaluate model's accuracy; some popular ones

Logistical Regression

  • AUC - ROC: Area under the curve, Receiver Operating Characteristic; measures accuracy of the model \(\frac{(TN+TP)}{(TN+TP+FP+FN)}\)
  • Confusion Matrix {True,False}{Positive,Negative} combinations
    • Recall/Sensitivity (SN): gives an indication % of all true positives that model could recognize
    • Precision: gives an indication of % of predicted positives were true positives
    • Specificity:
  • F1 Score

Linear Regression

  • Accuracy: \(R^2 = 1 - \frac{RSS}{TSS}\)
    • Residual Sum of Squares: \(RSS = \sum(Y_i - Y_{fitted})^2\)
    • Total Sum of Squares: \(TSS = \sum(Y_i - Y_{mean})^2\)
    • a relative metric, always between 0 and 1, closer to 1 => better model
  • RMSE: Root-Mean-Squared Error, if \(x_i\) is actual and \(\hat{x_i}\) is predicted value then: $\(\sqrt{\sum_{i=1}^n \frac{{(x_i - \hat{x_i})}^2}{N}}\)$
    • an absolute number, lower value => better model