[Machine Learning] Boosted Trees

Gradient Boosted Decision Trees(GBDT)

Training builds a series of small decision trees.
Each tree attempts to correct errors from the previous stage
We should learn the tree that maximizes the learning objective

objective function -> regularization term.
learning rate: how hard the model is trying to fix for the previous model.

Regularization allows we to penalize complex models. The regularization term: weight of each leaves, the number of leaves.

$y_i(1) = f1(xi) = yi + f1(xi)$

yi(2) = f1(xi) + f2(xi)
…

Data Leakage

Data leakage will happen when the data you’re using contains information about what you’re trying to predict. Examples:

including the label to be predicted as a feature
including test data with training data

To Detect Data Leakage

Before buiding the model: EDA to find surprises in the Data
After building the model:
Test on new Data

Minimizing Data Leakage

Perform data preparation within each cross-validation fold separately
With time series data, use a timestamp cutoff
Before any work with a new dataset, split off a final test validation dataset

Bayesian Classifier

True State: The true state of the world, which you would like to know.
Prior: Prob(true state = x)
Evidence: Some symptom, or other thing you can observe
Conditional: Probability of seeing evidence if you did know the true state
Posterior: The prob (true state = x | some evidence)

Many Pieces of Evidence

Naive Bayesian make strong assumptions that each evidence are independence with each other.

Generative Classifier
Discrimitive Classifier