【论文阅读笔记】Self-training with Noisy Student improves ImageNet classification

it2024-04-09 53

Motivation

利用较少的标记数据来进一步利用大规模的无标记数据进行半监督/自监督学习用teacher模型生成伪标签训练studen模型，并通过加入噪声使student模型由于teacher模型，迭代此过程以得到更优的模型

原理

基于self-training的teacher-student框架用标记数据训练teacher模型用teacher模型对大规模的无标记数据生成伪标签用labelled data和具有伪标签的unlabelled data共同训练student模型新的student模型作为teacher模型重复上述三步，本文实验中重复了三次 Noisy Student Training 在训练student模型时增加噪声（文中具体添加的噪声包括：输入噪声：RandAugment data augmentation 模型噪声：dropout、stochastic depth）加入噪声是非常关键的：如果没有噪声，studen模型的训练结果只能无限毕竟teacher模型，但不会超过。

模型结构

详见论文实验部分

要点在于：

student模型如果比teacher模型更大的话，效果会更好如果大小一致，只使用 Noisy Student Training 也能有显著的提升

重点理解

Noisy Student Training的三种方式：

RandAugment [18]Dropout [76]Stochastic Depth [37]

[18] Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical data augmentation with no separate search. arXiv preprint arXiv:1909.13719, 2019. 1, 2, 4

[76] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014. 1, 2, 4

[37] Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Weinberger. Deep networks with stochastic depth. In European conference on computer vision, pages 646–661. Springer, 2016. 1, 2, 4

最新回复(0)