Summary: | With the advancement of deep learning techniques, the number of model parameters has been increasing, leading to significant memory consumption and limits in the deployment of such models in real-time applications. To reduce the number of model parameters and enhance the generalization capability of neural networks, we propose a method called Decoupled MetaDistil, which involves decoupled meta-distillation. This method utilizes meta-learning to guide the teacher model and dynamically adjusts the knowledge transfer strategy based on feedback from the student model, thereby improving the generalization ability. Furthermore, we introduce a decoupled loss method to explicitly transfer positive sample knowledge and explore the potential of negative samples knowledge. Extensive experiments demonstrate the effectiveness of our method.
|