Gradient Boosted Decision Trees(GBDT)
- Training builds a series of small decision trees.
 - Each tree attempts to correct errors from the previous stage
 - We should learn the tree that maximizes the learning objective
 
objective function  -> regularization term.
learning rate: how hard the model is trying to fix for the previous model.
Regularization allows we to penalize complex models. The regularization term: weight of each leaves, the number of leaves.
yi(2) = f1(xi) + f2(xi)
…
Data Leakage
Data leakage will happen when the data you’re using contains information about what you’re trying to predict. Examples:
- including the label to be predicted as a feature
 - including test data with training data
 
To Detect Data Leakage
- Before buiding the model: EDA to find surprises in the Data
 - After building the model:
 - Test on new Data
 
Minimizing Data Leakage
- Perform data preparation within each cross-validation fold separately
 - With time series data, use a timestamp cutoff
 - Before any work with a new dataset, split off a final test validation dataset
 
Bayesian Classifier
True State: The true state of the world, which you would like to know.Prior: Prob(true state = x)Evidence: Some symptom, or other thing you can observeConditional: Probability of seeing evidence if you did know the true statePosterior: The prob (true state = x | some evidence)
Many Pieces of Evidence
Naive Bayesian make strong assumptions that each evidence are independence with each other.
 Generative Classifier
 Discrimitive Classifier