[Machine Learning] Boosted Trees

Gradient Boosted Decision Trees(GBDT)

  • Training builds a series of small decision trees.
  • Each tree attempts to correct errors from the previous stage
  • We should learn the tree that maximizes the learning objective

objective function -> regularization term.
learning rate: how hard the model is trying to fix for the previous model.

Regularization allows we to penalize complex models. The regularization term: weight of each leaves, the number of leaves.

yi(2) = f1(xi) + f2(xi)

Data Leakage

Data leakage will happen when the data you’re using contains information about what you’re trying to predict. Examples:

  • including the label to be predicted as a feature
  • including test data with training data

To Detect Data Leakage

  • Before buiding the model: EDA to find surprises in the Data
  • After building the model:
  • Test on new Data

Minimizing Data Leakage

  • Perform data preparation within each cross-validation fold separately
  • With time series data, use a timestamp cutoff
  • Before any work with a new dataset, split off a final test validation dataset

Bayesian Classifier

True State: The true state of the world, which you would like to know.
Prior: Prob(true state = x)
Evidence: Some symptom, or other thing you can observe
Conditional: Probability of seeing evidence if you did know the true state
Posterior: The prob (true state = x | some evidence)

Many Pieces of Evidence

Naive Bayesian make strong assumptions that each evidence are independence with each other.

Generative Classifier
Discrimitive Classifier