Meta learning

HyperShot: Few-Shot Learning by Kernel HyperNetworks

Jongmin Lim 2025. 2. 10. 12:25

좋은 문장

The goal of few-shot learning is not to recognize a fixed set of labels but to learn how to quickly adapt to new tasks with a small amount of training data.
For a unified taxonomy, we refer the reader to [4, 38].

Abstract

본 논문에서는 kernel과 hypernetwork paradigm의 fusion인 Hypershot 을 제안
기존 gradient를 기반으로 parameter를 조정하는 접근방식과 비교하여,
- 본 논문의 모델의 목표는 task’s embedding에 따라 classification module을 switch하는 것이다
실제로, 본 논문은 support set으로부터 aggregated information을 입력으로하고 classifier parameter를 반환하는 Hypernetwork를 이용
게다가 Hypernetwork의 입력으로 사용하는 support examples의 kernel을 기반 representation을 소개
결과적으로 backbone model이 제공하는 직접적인 feature value 대신에, support examples의 임베딩 간의 relation에 의존

1. Introduction

Few shot learning의 목표는 fixed labels을 인식하는 것이 아니라, 적은 양의 training data를 가진 새로운 tasks에 빠르게 학습하는 방법을 학습하는 것
Few shot learning 기술에는 2가지 기술이 있음

kernel method and Gaussian processes
Hypernetworks
- support set으로 부터 information을 aggregate하고 새로운 tasks에 대한 network weight를 생산

본 논문은 Hypernetworks paradigm과 kernel method를 결합
- 먼저, 전체 support set을 조사하고 각 class를 구별하기 위한 정보를 추출
- 그리고, feature를 기반으로 decision rule을 생성

[Process]

backbone architecture를 통해 support set으로부터 feature를 추출하고, feature들 사이에 kernel values를 계산
그때 kernel representation을 입력으로 하고 classifier의 형태로 decision rule을 생성하는 Hypernetwork architecture를 사용

2. HyperShot: Hypernetwork for few-shot learning

2.1. Background

Few-shot learning

Support set : $\mathcal{S}=\{
(x_l,y_l)
\}_{l=1}^L$
$L$: task 내에서 같은 Class
$K$: task 내에서 Class의 수
Query set: $\mathcal{Q}=\{
(x_m,y_m)
\}_{m=1}^M$
Task: $\mathcal{T=\{S,Q\}}$

2.2. HyperShot-overview

[Main idea]

Parameterized kernel function을 사용하여 support set으로부터 정보를 추출
- 결과적으로 training tasks와 차이가 큰 new task에 대한 embedding에 대해 robust
- Query image의 분류도 query image 관점에서 계산된 kernel value으로 수행됨

[Process]

[1] 먼저 Support set을 같은 class끼리 그룹화하여 encoder $E$를 통과하여 정렬

$$
\mathcal{S}={
(x_l,y_l)
}_{l=1}^L
$$

$$
E(x_l)=z_l
$$

$$
Z_s=[z_{\pi(1)},z_{\pi(2)},...,z_{\pi(K)}]^T
$$

여기서 $\pi(\cdot)$은 bijective function(일대일 대응 함수)

[2] 다음에는 $Z_s$의 row에 저장된 vector pairs에 대해 kernel matrix $K_{S,S}$를 계산

이를 위해 parameterized kernel function $k(\cdot,\cdot)$을 사용하고
각 $K_{S,S}$의 각 요소 $k_{i,g}$를 다음과 같이 계산

결과적으로 $K_{S,S}$는 support examples 사이에 관계에 대한 정보를 표현

[3] 그리고 $K_{S,S}$를 Vector 형태로 표현하여 Hypernetwork $H(\cdot)$의 input으로 사용

Hypernetwork의 역할을 query object를 분류하는 taget model $T(\cdot)$의 parameter $\theta_T$를 제공
Hypernetwork $H(\cdot)$으로 인해 gradient 없이 다양한 tasks 사이에 parameter를 switch할 수 있음

[4] Query image $x$는 다음과 같은 방식으로 분류

Input image가 encoder $E(x)$에 의해서 low-dimensional feature representation $z_x$로 전환
그리고 Query embedding과 정렬된 support vector $Z_S$사이에 kernel vector $k_{x,S}$를 계산

$$
k_{x,S}=[
k(z_x,z_{\pi(1)}),
k(z_x,z_
{\pi(2)}),
...
k(z_x,z_{\pi(K)})
]
$$

$k_{x,S}$는 target model $T(\cdot)$에게 제공
- target model 는 Hypernetwork $H(\cdot)$에 의해 생성된 parameter $\theta_T$를 사용
target model은 각 class에 대한 probability distribution을 return $p(y|S,x)$

2.3. Kernel function

본 논문의 핵심 Component는 kernel function $k(\cdot,\cdot)$
kernel function은 MLP model로 표현된 parameterized transformation function일 수 있고, identity operation($f(z)=z$)일 수 도 있고, dot product 일 수 있다
본 논문에서는 dot product를 사용

2.4. Training and prediction

$E := E_{\theta_E}$
- $\theta_E$: Encoder의 parameter
$H=H_{\theta_H}$
kernel function $k$ by $\theta_k$

During the training stage,
- $\theta_H,\theta_k,\theta_E$가 cross-entropy loss $L$로 최적화
During the inference stage,
- task $\mathcal{T}*$은 labeled support examples $\mathcal{S}$과 unlabelled query examples $\mathcal{X}_*$로 구성
- probability value $p(y|\mathcal{S}_*,x)$ 계산하며 참고로 $x$는 $\mathcal{X}_*$에 포함

2.5. Adaptation to few-shot scenarios

본 논문에서는 task adatpation을 수행할 때 $\theta_H,\theta_k,\theta_E$를 조금 tuning하는 것이 성능에 좋다라는 것을 관측
따라서 support set 예제로 adaptation task $\mathcal{T}i=\{
\mathcal{S}_*,
\mathcal{S}_*
\}$를 구성하여 $\theta_H,\theta_k,\theta_E$를 tuning
이때 order function $\pi(\cdot)$ 대신에, “A closer look at few-shot video classification”을 따라 같은 class의 support examples $z$에 aggregation function을 적용
aggregation function 덕분에 kernel matrix는 encoding network의 latent space에서 aggregated된 value를 기반으로 계산됨
aggregation function은 fine-grained와 averaged를 사용했는데, averaged가 더 좋았음

4. Experiments

Reference

https://arxiv.org/abs/2203.11378

HyperShot: Few-Shot Learning by Kernel HyperNetworks

Few-shot models aim at making predictions using a minimal number of labeled examples from a given task. The main challenge in this area is the one-shot setting where only one element represents each class. We propose HyperShot - the fusion of kernels and h

arxiv.org

'Meta learning' 카테고리의 다른 글

FeLMi : Few shot Learning with hard Mixup (0)	2025.05.01
메타러닝 Dataset setting (0)	2025.04.09
Any-Way Meta-Learning (0)	2024.12.03
Learning to Learn from APIs: Black-Box Data-Free Meta-Learning (0)	2024.11.24
META-KNOWLEDGE EXTRACTION: UNCERTAINTY-AWARE PROMPTED META-LEARNING (0)	2024.10.31

현재글HyperShot: Few-Shot Learning by Kernel HyperNetworks

JM's Research

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

JM's Research

HyperShot: Few-Shot Learning by Kernel HyperNetworks

좋은 문장

Abstract

1. Introduction

[Process]

2. HyperShot: Hypernetwork for few-shot learning

2.1. Background

Few-shot learning

2.2. HyperShot-overview

[Main idea]

[Process]

2.3. Kernel function

2.4. Training and prediction

2.5. Adaptation to few-shot scenarios

4. Experiments

Reference

'Meta learning' 카테고리의 다른 글

'Meta learning'의 다른글

티스토리툴바

HyperShot: Few-Shot Learning by Kernel HyperNetworks

좋은 문장

Abstract

1. Introduction

[Process]

2. HyperShot: Hypernetwork for few-shot learning

2.1. Background

Few-shot learning

2.2. HyperShot-overview

[Main idea]

[Process]

2.3. Kernel function

2.4. Training and prediction

2.5. Adaptation to few-shot scenarios

4. Experiments

Reference

'Meta learning' 카테고리의 다른 글

'Meta learning'의 다른글

관련글

티스토리툴바