machine learning - When doing cross validation, what changes if you ensure that the class distribution in the training and test set is equal to the whole set? -


let's take binary classification problem.

when doing k-fold cross validation, when separate randomly shuffled dataset k chunks, how have same label distribution function of k?

if class distribution uneven, 95% of dataset negatives , 5% positives, seems pretty there low values of k label distribution uneven. of course true values of k such k = (size of dataset), low values of k, such 5.

my main fear may occur don't have enough positive examples in training set in phase of cross validation. on other hand, if go , ensure equal label distribution in chunks, seems me may bringing bias problem.

basically want ask is, gain , lose, if ensure label distribution in chunks? or bad? , importantly, why?

seems similar question here https://stats.stackexchange.com/questions/117643/why-use-stratified-cross-validation-why-does-this-not-damage-variance-related-b

anyways, you'll have different models if have 1% 6% positives. better build on balanced data set.

how there bias?


Comments

Popular posts from this blog

yii2 - Yii 2 Running a Cron in the basic template -

asp.net - 'System.Web.HttpContext' does not contain a definition for 'GetOwinContext' Mystery -

mercurial graft feature, can it copy? -