A standard solution, known as hard negative mining, is to iteratively grow, or bootstrap, a small set of negative examples by selecting those negatives for which the detector triggers a false positive alarm. This strategy leads to an iterative training algorithm that alternates between learning the detection model given the current set of examples, and then using the learned model to find hard negatives to add to the training set. In other word, we fed the model with a batch of positive RoIs and random subset of negative RoIs ,then we harvest false positive examples from the learning model, the false positives are then added to the training set and then the model is trained again. This process is iterated several times until satisfaction.
I quote from the paper Bootstrapping Face Detection with Hard Negative Examples which is publied on arXiv:1608.02236v1 [cs.CV] 7 Aug 2016
