Abstract: Knowledge distillation is an attractive approach for learning compact deep neural networks, which learns a lightweight student model by distilling knowledge from a complex teacher model.
Abstract: Out-of-distribution detection is critical to a reliable application of deep neural networks. To reduce model uncertainty, we propose a simple yet effective approach of ensemble knowledge ...
(Optional) If you are running decoding with gemma-2 models, you will also need to install flashinfer. python -m pip install flashinfer -i https://flashinfer.ai/whl ...