Channel-Wise Attention 论文阅读

Channel Distillation：Channel-Wise Attention for Knowledge Distillation 论文精读总结

Title

Channel Distillation: Channel-Wise Attention for Knowledge Distillation

文章核心内容：将通道测的注意力分布作为知识迁移给学生模型

文章缺点：消融实验太少，无法提供有力证据支撑前面的猜想

通过知识蒸馏的方法得到准确度高、参数量小的学生模型

the teacher is not good enough, and the student cannot accurately learn the essential information from the teacher.（学生模型无法准确学到教师模型的知识）
if the teacher is not completely correct, during training, ifthe student makes a decision with reference to the decisive result of the teacher, the poor output of the teacher will have a bad influence on the student instead.（教师模型准确率不是百分之百，错误样本会误导学生模型）
there is a margin betweenthe teacher and the student since they have different structure, which will make the student unable to find its own optimization space if we always let the teacher supervise it.（学生模型与教师模型网络结构的不同，导致它们具有不同的参数空间，因此不能一直在教师模型的监督下学习）