When using machine learning models like gradient boosted trees and CNN, is it required (or considered as an always-do good practice) to balance the amount of positive/negative examples when learning for binary classification?
Given P positive examples and N negative examples, where P << N, I can think of several choices: (Let's forget about validation set and test set)
Choice A) No balancing at all, put all examples (totally P+N) into the training set without weighting w.r.t. their ratio.
Choice B) Put all examples (totally P+N) into the training set, but weight all positive examples 1/2P and all negative examples 1/2N, so that total weight of positive examples and negative example equal.
Choice C) Take all P positive examples, then sample P negative examples (out of N), and train with these 2P examples with uniform weighting.
What are the pros/cons for each of the approach and which one(s) do we usually go with?