目录
1. 如何实现端到端的训练2. 如何通过共享计算提高目标检测速度3. 如何进一步提高目标检测准确率3.1 architecture diagram3.2 multi-scale training and testing3.3 feature fusion and enhancement or multiple layers exploiting3.4 training strategy and loss function3.5 better proposal and balance3.6 contextual reasoning
Two-stage Anchor-based Object Detectors通常具有比较高的检测准确度,其发展趋势主要有3方面:
1. 如何实现端到端的训练
R-CNN的端到端训练程度非常低:SS算法选择候选区域+backbone预训练+backbone微调+训练多个SVM二分类器+训练多个边界框回归器;Fast R-CNN实现除SS外的端到端训练:把分类器和回归器嵌入网络中,用多个全连接层来代替;Faster R-CNN实现真正地端到端训练:使用RPN代替SS来产生候选区域。
2. 如何通过共享计算提高目标检测速度
SPPnet通过引入SSP层解决输入图像尺寸须固定的问题,并通过候选区域映射到特征图的方法,使得只需要对整个图像提取一次特征,大大减少特征提取的时间,提高目标检测速度;R-FCN引入位置敏感得分图实现head的共享计算,大大减少分类和回归的时间,提高了目标检测速度;Light-Head R-CNN在通过减少特征图的通道数,提高了目标检测速度。
3. 如何进一步提高目标检测准确率
3.1 architecture diagram
R-FCN: object detection via region-based fully convolutional networks. In NIPS, 2016ME R-CNN: multi-expert region-based CNN for object detection. In ICCV, 2017Couplenet: Coupling global structure with local parts for object detection. In ICCV, 2017 【CoupleNet】Cascade R-CNN: delving into high quality object detection. In CVPR, 2018 【Cascade R-CNN使用递增的IoU阈值,训练了多个级联的检测器。】Scale-aware trident networks for object detection. In ICCV, 2019 【TridentNet通过研究网络感受野与目标检测性能的关系,特定大小的特征层只用特定尺寸的物体来训练,并只检测特定尺寸的物体。】
3.2 multi-scale training and testing
An analysis of scale invariance in object detection - SNIP. In CVPR, 2018 【SNIP通过研究物体尺寸与网络性能的关系,借助图像金字塔,使网络只检测合适大小的物体,提高了目标检测的准确度】Autofocus: Efficient multi-scale inference. In ICCV, 2019
3.3 feature fusion and enhancement or multiple layers exploiting
A unified multi-scale deep convolutional neural network for fast object detection. In ECCV, 2016 【MS-CNN使用反卷积来增大输出特征图的分辨率,并直接基于多尺度特征图进行预测】Hypernet: Towards accurate region proposal generation and joint object detection. In CVPR, 2016 【HyperNet通过融合不同尺寸的特征图,输出具有较强语义信息和丰富位置信息的单尺度特征图】Beyond skip connections: Top-down modulation for object detection. In CoRR, 2016 【】Feature pyramid networks for object detection. In CVPR, 2017 【FPN通过映入多尺度特征融合,并进行多尺度特征预测,减小了目标尺寸变化的影响,大大提高了目标检测的准确度;】
3.4 training strategy and loss function
G-CNN: an iterative grid based object detector. In CVPR, 2016Training region-based object detectors with online hard example mining. In CVPR, 2016A-fast-rcnn: Hard positive generation via adversary for object detection. In CVPR, 2017Bounding box regression with uncertainty for accurate object detection. In CVPR, 2019 【KL loss将边界框建模成高斯分布,并利用高斯分布的标准差来衡量边界框定位的不确定性】
3.5 better proposal and balance
Learning to rank proposals for object detection. In ICCV, 2019Libra R-CNN: towards balanced learning for object detection. In CVPR, 2019
3.6 contextual reasoning
Insideoutside net: Detecting objects in context with skip pooling and recurrent neural networks. [CVPR, 2016Object detection via a multiregion and semantic segmentation-aware CNN model. In ICCV, 2015 【下载】Contextual priming and feedback for faster R-CNN. In ECCV, 2016Gated bi-directional CNN for object detection. In ECCV, 2016Structure inference net: Object detection using scene-level context and instance-level relationships. In CVPR, 2018Context refinement for object detection. In ECCV, 2018Thundernet: Towards realtime generic object detection. In ICCV, 2019