machine learning - When doing cross validation, what changes if you ensure that the class distribution in the training and test set is equal to the whole set? -

- April 15, 2015

let's take binary classification problem.

when doing k-fold cross validation, when separate randomly shuffled dataset k chunks, how have same label distribution function of k?

if class distribution uneven, 95% of dataset negatives , 5% positives, seems pretty there low values of k label distribution uneven. of course true values of k such k = (size of dataset), low values of k, such 5.

my main fear may occur don't have enough positive examples in training set in phase of cross validation. on other hand, if go , ensure equal label distribution in chunks, seems me may bringing bias problem.

basically want ask is, gain , lose, if ensure label distribution in chunks? or bad? , importantly, why?

anyways, you'll have different models if have 1% 6% positives. better build on balanced data set.

how there bias?

Search This Blog

Panthy J

machine learning - When doing cross validation, what changes if you ensure that the class distribution in the training and test set is equal to the whole set? -

Comments

Post a Comment

Popular posts from this blog

asp.net - 'System.Web.HttpContext' does not contain a definition for 'GetOwinContext' Mystery -

yii2 - Yii 2 Running a Cron in the basic template -

javascript - jQuery DataTable responsive doesnt work with Boostrap 3 -