Averaging Weights Leads to Wider Optima and Better Generalization 보호되어 있는 글입니다. Deep Learning 2023.11.05
GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks 보호되어 있는 글입니다. Multi task learning 2023.10.24
KNOWLEDGE DISTILLATION VIA SOFTMAX REGRESSION REPRESENTATION LEARNING 보호되어 있는 글입니다. Knowledge Distillation 2023.09.21