Skip to content

Glossary

Glossary

Google glossary

  • noise: A unknown value, other than known features (X), that contributes to target (y). For example, in school grading, mood of teacher may contribute to the grade of a student but it's not usually captured as a feature.
  • polynomial model: polynomial equation with different degrees. One-degree polynomial being a straight line and higher degree polynomials can represent complex shapes
  • decision-tree models: decision tree predicts a constant value for a range of \(X\) values. A split introduces a different range that predict a different \(y\) values. A complex decision-tree model is with many splits which can predict complex data
  • regularization: Favoring simpler models over complex models, thus smoother, simpler functions to reduce overfitting noise
  • inductive bias: when different families of models favor certain bias. For example, decision-trees v/s polynomial
  • Bayes error rate: An irreducible error rate that cannot be reduced to zero because of noise, which is not captured by features
  • Model Complexity: Degree in polynomial models, or the number of splits in decision tree models. Models with higher complexity can fit training data better than simpler models. If the training set is small, complex models tend to overfit.
  • Overfitting: When (usually complex) model fits the training data too well, including the noise; thus fails to capture true structure of the data. Overfitting occurs with complex models with low amount training data. Testing errors are much higher than training error.
  • underfitting: When model is too simple, such as first-degree polynomial (straight line with a slope) that fails to capture complex trend in data. Both, training and testing errors, are large. However, it's hard to tell if large error is due to underfitting or noise. With different training datasets, the resulting models tend to
  • high variance v/s high bias: With different set of training data a model that overfits will vary greatly depending on the training data, thus the statistical name high variance. Whereas underfitting models make the same kind of error (for example, one-degree polynomial line will have only slightly different slope), but will not capture the true shape of the data. compare
  • cost function: e.g. mean-squared-error function
  • mean squared error: average of sum of square of difference between actual and predicted: \(\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2\)
  • learning rate, alpha: is the change applied to cost function to make it converge
  • regularization, lambda: used for penalizing certain feature by introducing high multiplier in cost function
  • hyperparameter: a parameter that controls learning process (e.g. alpha) v/s the training the model itself
  • Loss function: measures how far off the predicted value is from the actual value === sum(y - f(x)) for all data points
  • gradient boosting: gradually builds a better model by combining several weak learners
  • cross-validation: a practice of keeping data for training validation separate, i.e. do not train the model on the same data used for testing
  • Sigmoid Function: used to measure logistical regression: \(\frac{1}{1 + e^{-x}}\)
  • Manhattan Distance: In contrast to Euclidean distance, it is computed as \(|X_2 - X_1| + |Y_2 - Y_1|\)
  • Back propagation: Using results of loss function to adjust the weights to narrow the loss in the next cycle
  • one-hot encoding: converting categorical data into a number 1 or 0, so most modeling algorithms can work on it
  • Reinforcement learning: algorithms that learns an optimal policy to maximize return when interacting with environment