Deep Learning

Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters

Jongmin Lim 2024. 6. 4. 16:46

Abstract

Over-parameterized model을 완화하기 위해서, Dropout을 기반으로 많은 regularization techniques이 등장
그러나 Imagenet과 같은 classsical benchmarks에서만 크게 성능을 개선했고, domain shift가 발생하면 성능이 저하
본 논문에서는 Bernoulli sampled dropout mask construction의 classical 정보에서 벗어나 high GSNR을 가진 prameter를 버린다
또한 meta-learning 접근법을 이용하여 optimal dropout ratio를 찾는다.

1. Introduction

최근 DNNs은 Regularization 방법론으로 large model이 training data에 overfitting 되는 것을 막았다.

특히 Dropout[1]은 fully connected layer의 activation을 random하게 선택하여 mute했다. Better-suited variant은 Dropblock[2]인데, 연속적인 공간적 상관관계를 가진 feature map을 mask 했다.

기존 regularization technique은 great result를 이루었지만, train and test data가 유사한 분포를 따른다는 가정했다. 그러나 train and test 사이에 distribution shift가 있는 domain generalization setting에서는 성능이 저하된다.

본 논문의 목표는 domain shift에 강력한 model을 설계하여 soruce domain과 unseen text domain 양쪽에서 동일하게 성능이 잘 나오는 것이다.

본 논문은 두 가지 관측에서 모델을 설계했다

1) Model은 높은 Gradient Signal to Noise Ration (GSNR)을 가져야한다

GSNR이 높으면 Unseen data를 평가할 때 성능이 급격하게 감소하지 않는다는 것을 의미[3]

2) 가장 Predictive parameter를 반복적으로 dropping하여 model이 less dominant feature를 학습하는데 집중하여 unseen domians 성능을 향상

즉, 본 논문은 1) Fig1에서 보이는 것처럼 GSNR 값이 가장 큰 값을 drop하고, 2) meta learning 기법으로 neural network의 각각의 block에 favour different dropout ratio $p$를 정한다

Contribution

GSNR 기반의 dropout 전략
novel meta-learning framework를 통해 optimal dropout ratio를 선택

2. Related Work

2.1. Domain Generalization

DG의 목표는 어떠한 unseen domain에도 일반화가 가능한 model을 학습하는 것이다.

2.2. Dropout Regularization

Dropout은 각 training iteration에서 Bernoulli distribution로 sampling된 binary mask를 만들어서, random하게 parameter를 mask하는 것이다.

Dropout을 기반으로한 다양한 regularization techinque이 존재한다.

DropPath[4]는 전체 layer를 mute했다.
SpatialDropout[5]는 channel-wise하게 mute했다
CutOut[6]는 input image에 random patch를 설정
AlphaDropout[7]도 CutOut과 마찬가지로 input image에 random patch를 설정했으나 original mean and standard deviation을 보존
본 연구와 유사한 DropBlock[8]은 feature map의 square patches를 dropout했다

그러나 기존 연구들은 domain shift가 큰 domain generalization setting에서 효과적이지 않고[9],

본 논문에서는 feature map의 가장 높은 activation values이나 gradients와 같은[10] predictive parts를 zeroing하여 더 나은 결과를 도출한다

3. Methodology

3.2. Background

GSNR of model은 strong generalization과 연관이 있다.

그러나 model’s GSNR을 높이는 분명한 방법이 없다.

3.3. Proposed Approach

[11]에서 gradient 크기를 조사하여 most predicitve features의 중요도를 측정했다.

3.3.1 GSNR-Guided DropBlock

3.3.2 Meta-learning the Dropout Ratios

Grid Search 해야하는 비용을 줄이기 위해서 learning-to-learn technique을 활용

4. Experiments

4.1. Experimental Setup

4.2. PACS Classification

4.3. Office-Home Classification

4.4. miniDomainNet Classification

4.5. OCIM Face Anti-Spoofing

4.6. Comparison with RSC

4.7. Model Analysis

Learned dropout ratios:

learning to learning technique을 사용해서 각 Residual block의 dropout ratio를 선정

Stiffness:

모델의 일반화 성능을 조사하기 위해 stiffness 측정

inter-class stiffness를 계산하기 위해서 label $y_i,y_j$는 다르며 sign 함수를 사용

Reference

https://arxiv.org/abs/2310.07361

Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters

Overfitting to the source domain is a common issue in gradient-based training of deep neural networks. To compensate for the over-parameterized models, numerous regularization techniques have been introduced such as those based on dropout. While these meth

arxiv.org

'Deep Learning' 카테고리의 다른 글

ON THE CONVERGENCE OF ADAM AND BEYOND (0)	2024.06.12
ADAMP: SLOWING DOWN THE SLOWDOWN FOR MOMENTUM OPTIMIZERS ON SCALE-INVARIANT WEIGHTS (0)	2024.06.10
Normalized Gradient Descent (0)	2024.05.29
Two Natural Weaknesses of Gradient Descent (0)	2024.05.29
How Does Batch Normalization Help Optimization? (0)	2024.05.22

현재글Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters

JM's Research

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

JM's Research

Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters

Abstract