XGBoost on the unbalanced data

2

Dears,
I am new in machine learning and try to apply the XGBoost to find the feature importance and plot AUC curve on my data but the samples are unbalanced, the control is 24 samples while the diseased is 153 samples.
I tried to make downsampling for the diseased but I don't know to make the downsampling on the whole data before split the data to training and testing the data or after that.
If after that, should I make the down sampling on testing data or training data and why ?

Hope someone explain to me and provide me some informative tutorials.
Regards,


unbalanced_data

• 299 views



Source link