Meta learning

Efficient Variance Reduction for Meta-Learning

Jongmin Lim 2024. 5. 30. 16:03

Abstract

Meta-learning은 많은 tasks로부터 meta-knowledge를 학습한다
그러나 stochastic meta-gradient는
- 1) 각 tasks로부터 data를 sampling하고
- 2) 전체 task distribution에서 task를 sampling하기 때문에
큰 variance를 가진다
따라서 본 논문에서는 Reptile과 같은 first-order meta-learning 알고리즘에 variance reduction을 통합하는 접근법을 제안
- meta-learning 구조인 bilevel firmulation 형태를 유지
- general bilevel variance reduction methods와 다르게 많은 task-specific parameter를 요구하지 않음

1. Introduction

Meta-learning[1] or learning to learn[2]은 이미 학습한 tasks로부터 얻은 meta-knwoledge를 이용하여 새로운 task에 빠르게 적응하는 것을 목표로 한다.

Meta-learning은 few-shot learning, 강화학습, Neural Architecture search, Semi-supervised learning 등에 성공적으로 사용됐다.

이 논문에서는 MAML과 MAML variants에 집중한다

수학적으로 Meta-learning은 bilevel optimization problem으로 공식화 되는데[3],

Outer problem은 모든 tasks에 유용한 meta-parameter를 학습하고
lower-level problem은 meta-parameter를 adapting하여 task-specific model을 학습한다

Meta-learning에는 많은 tasks를 기반으로 학습하여, stochastic meta-gradient의 분산은 두 가지 소스에서 발생

1) 각 task에 대한 data sample에 대한 분산

2) task distribution으로부터 tasks sample의 분산

이것은 Only single tasks 문제만을 해결하는 단순한 Machine learning 문제와는 다르다.

tasks가 다양해질때, 메타러닝 알고리즘은 더 큰 분산을 겪고, 수렴속도가 느려진다[4,5]

수렴속도를 빠르게 하기 위해서, 분산을 줄이기 위한 Natural approach가 사용된다[6]

SVRG, SARAH, STORM 등등…

최근에 VFML[7]이 메타러닝 분야에서 STORM을 first-order meta learning 알고리즘인 Reptile에 STORM을 통합해서 variance reduction을 수행했다.

그러나 VFML은 bilevel structure in meta-learning을 무시했다.

또한 [8][9]에서 bilevel structure형태로 variance reduction을 수행했으나, 모든 tasks에 대한 task-specific parameter가 필요하다

본 논문에서는 다양한 메타러닝 알고리즘에서 사용할 수 있는 efficient variance reduction을 제안한다.

이것은 bilevel structure를 유지하면서 task-specific parameter가 필요없다
variance reduction 때문에 더 빠르게 수렴할 수 있다는 것을 이론적으로 보여준다.

2. Related works

2.1. Meta-learning

task $\mathcal{I}$가 주어졌을 때, 메타러닝[1]은 bilevel optimization 문제로 정의된다

Outer loop (1)은 suitable meta-initialization을 찾고, inner loop (2)는 각 task에 $w$를 adapt한다.

2.2. Variance Reduction in Stochastic Optimization

Variance reduction[10]는 주로 stochastic gradients의 분산을 줄이고 최적화를 가속화하는데 사용된다

3. Variance Reduction for Meta-Learning

이 Section에서는 메타러닝을 위한 variance reduction 알고리즘을 제안하는데, 이 알고리즘은 bilevel structure를 유지하면서도 많은 task-specific parameter를 저장할 필요가 없다.

3.1. Proposed Algorithm

Reptile 알고리즘

VR-Reptile (variance-reduced Reptile)

Reference

https://proceedings.mlr.press/v162/yang22g.html

Efficient Variance Reduction for Meta-learning

Meta-learning tries to learn meta-knowledge from a large number of tasks. However, the stochastic meta-gradient can have large variance due to data sampling (from each task) and task sampling (from...

proceedings.mlr.press

'Meta learning' 카테고리의 다른 글

FREE: Faster and Better Data-Free Meta-Learning (0)	2024.06.24
HyperAdam: A Learnable Task-Adaptive Adam for Network Training (0)	2024.06.22
Adaptive Task Sampling and Variance Reduction for Gradient-Based Meta-Learning (0)	2024.05.30
Regularizing Meta-Learning via Gradient Dropout (0)	2024.05.27
Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation (0)	2024.01.18

현재글Efficient Variance Reduction for Meta-Learning

JM's Research

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

JM's Research

Efficient Variance Reduction for Meta-Learning

Abstract

1. Introduction

2. Related works

2.1. Meta-learning

2.2. Variance Reduction in Stochastic Optimization

3. Variance Reduction for Meta-Learning

3.1. Proposed Algorithm

Reference

'Meta learning' 카테고리의 다른 글

'Meta learning'의 다른글

티스토리툴바

Efficient Variance Reduction for Meta-Learning

Abstract

1. Introduction

2. Related works

2.1. Meta-learning

2.2. Variance Reduction in Stochastic Optimization

3. Variance Reduction for Meta-Learning

3.1. Proposed Algorithm

Reference

'Meta learning' 카테고리의 다른 글

'Meta learning'의 다른글

관련글

티스토리툴바