Gradient Boosted Decision Trees(GBDT)
- Training builds a series of small decision trees.
- Each tree attempts to correct errors from the previous stage
- We should learn the tree that maximizes the learning objective
objective function -> regularization term.
learning rate: how hard the model is trying to fix for the previous model.
Regularization allows we to penalize complex models. The regularization term: weight of each leaves, the number of leaves.
yi(2) = f1(xi) + f2(xi)
…
Data Leakage
Data leakage will happen when the data you’re using contains information about what you’re trying to predict. Examples:
- including the label to be predicted as a feature
- including test data with training data
To Detect Data Leakage
- Before buiding the model: EDA to find surprises in the Data
- After building the model:
- Test on new Data
Minimizing Data Leakage
- Perform data preparation within each cross-validation fold separately
- With time series data, use a timestamp cutoff
- Before any work with a new dataset, split off a final test validation dataset
Bayesian Classifier
True State
: The true state of the world, which you would like to know.Prior
: Prob(true state = x)Evidence
: Some symptom, or other thing you can observeConditional
: Probability of seeing evidence if you did know the true statePosterior
: The prob (true state = x | some evidence)
Many Pieces of Evidence
Naive Bayesian make strong assumptions that each evidence are independence with each other.
Generative Classifier
Discrimitive Classifier