这三项工作打破了传统知识蒸馏必须依赖更强外部 Teacher(如 GPT-4)的定式,共同指向了一种 On-Policy Self-Distillation的新范式: 在数学推理任务中,SFT 存在训练与推理分布偏移的问题。OPSD (On-Policy Self-Distillation) 关注如何利用训练数据中隐含的特权信息——即 Ground Truth 答案 。
The event aims to introduce the local community to the traditions of the Spring Festival while encouraging students to deepen their interest in learning Chinese, Babikir said. He also noted that the ...