-29일차- 딥러닝 레이어 이해하기 2

Embedding, Recurrent

Sparse Representation

벡터의 특정 차원에 단어 혹은 의미를 직접 매핑하는 방식

단어의 분산 표현(Distributed Representation)

유사한 맥락에서 나타나는 단어는 그 의미도 비슷하다라는 분포 가설에서 비롯된 표현

분산 표현을 사용하면 희소 표현과는 다르게 단어간의 유사도를 계산으로 구할 수 있음

Embedding 레이어는 단어의 분산 표현을 구현하기 위한 레이어로 단어 n개를 k차원으로 표현하는

분산 표현 사전을 만드는데 이것이 Weight와 파라미터

Embedding 레이어

임베딩 레이어를 이해하기 위한 첫번째, 원 핫 인코딩

카카오브레인

Unthinkable question makes impactful answer.

www.kakaobrain.com

예제로 알아보기

import tensorflow as tf

vocab = {      # 사용할 단어 사전 정의
    "i": 0,
    "need": 1,
    "some": 2,
    "more": 3,
    "coffee": 4,
    "cake": 5,
    "cat": 6,
    "dog": 7
}

sentence = "i i i i need some more coffee coffee coffee"
# 위 sentence
_input = [vocab[w] for w in sentence.split()]  # [0, 0, 0, 0, 1, 2, 3, 4, 4, 4]

vocab_size = len(vocab)   # 8

one_hot = tf.one_hot(_input, vocab_size)
print(one_hot.numpy())    # 원-핫 인코딩 벡터를 출력해 봅시다.

distribution_size = 2   # 보기 좋게 2차원으로 분산 표현하도록 하죠!
linear = tf.keras.layers.Dense(units=distribution_size, use_bias=False)
one_hot_linear = linear(one_hot)

print("Linear Weight")
print(linear.weights[0].numpy())

print("\nOne-Hot Linear Result")
print(one_hot_linear.numpy())

# result
Linear Weight
[[-0.601049    0.5825348 ]
 [-0.7512912  -0.57942784]
 [ 0.76143837 -0.01176971]
 [-0.6355816  -0.2171467 ]
 [-0.46792722 -0.68049556]
 [ 0.31385565  0.56059027]
 [ 0.17235708  0.3716153 ]
 [ 0.60743225  0.16453111]]

One-Hot Linear Result
[[-0.601049    0.5825348 ]
 [-0.601049    0.5825348 ]
 [-0.601049    0.5825348 ]
 [-0.601049    0.5825348 ]
 [-0.7512912  -0.57942784]
 [ 0.76143837 -0.01176971]
 [-0.6355816  -0.2171467 ]
 [-0.46792722 -0.68049556]
 [-0.46792722 -0.68049556]
 [-0.46792722 -0.68049556]]

인덱스를 원-핫 임베딩으로 변환 후 임베딩 레이어의 입력으로 넣어주는 처리

some_words = tf.constant([[3, 57, 35]])
# 3번 단어 / 57번 단어 / 35번 단어로 이루어진 한 문장입니다.

print("Embedding을 진행할 문장:", some_words.shape)
embedding_layer = tf.keras.layers.Embedding(input_dim=64, output_dim=100)
# 총 64개의 단어를 포함한 Embedding 레이어를 선언할 것이고,
# 각 단어는 100차원으로 분산 표현 할 것입니다.

print("Embedding된 문장:", embedding_layer(some_words).shape)
print("Embedding Layer의 Weight 형태:", embedding_layer.weights[0].shape)

임베딩 레이어는 단어만 대응시키기 때문에 미분이 불가능한 레이어

임베딩 레이어는 입력에 직접 연결되게 사용을 해야함(입력의 형태는 원핫 인코딩 단어벡터)

Recurrent 레이어 RNN

순차적인 데이터를 처리하기 위한 네트워크, 반복되는 성격이 특징

Illustrated Guide to Recurrent Neural Networks

Understanding the Intuition

towardsdatascience.com

RNN은 (입력차원, 출력차원)에 해당하는 하나의 Weight를 순차적으로 업데이트 함

이는 여러번의 연산이 필요한 다른 레이어에 비해 느리다는 단점

[출처 : Illustrated Guide to Recurrent Neural Networks]

마지막 문장으로 갈수록 입력의 정보가 옅어지는 기울기 소실 문제를 설명하는 그림

sentence = "What time is it ?"
dic = {
    "is": 0,
    "it": 1,
    "What": 2,
    "time": 3,
    "?": 4
}

print("RNN에 입력할 문장:", sentence)

sentence_tensor = tf.constant([[dic[word] for word in sentence.split()]])

print("Embedding을 위해 단어 매핑:", sentence_tensor.numpy())
print("입력 문장 데이터 형태:", sentence_tensor.shape)

embedding_layer = tf.keras.layers.Embedding(input_dim=len(dic), output_dim=100)
emb_out = embedding_layer(sentence_tensor)

print("\nEmbedding 결과:", emb_out.shape)
print("Embedding Layer의 Weight 형태:", embedding_layer.weights[0].shape)

rnn_seq_layer = \
tf.keras.layers.SimpleRNN(units=64, return_sequences=True, use_bias=False)
rnn_seq_out = rnn_seq_layer(emb_out)

print("\nRNN 결과 (모든 Step Output):", rnn_seq_out.shape)
print("RNN Layer의 Weight 형태:", rnn_seq_layer.weights[0].shape)

rnn_fin_layer = tf.keras.layers.SimpleRNN(units=64, use_bias=False)
rnn_fin_out = rnn_fin_layer(emb_out)

print("\nRNN 결과 (최종 Step Output):", rnn_fin_out.shape)
print("RNN Layer의 Weight 형태:", rnn_fin_layer.weights[0].shape)

#result
RNN에 입력할 문장: What time is it ?
Embedding을 위해 단어 매핑: [[2 3 0 1 4]]
입력 문장 데이터 형태: (1, 5)

Embedding 결과: (1, 5, 100)
Embedding Layer의 Weight 형태: (5, 100)

RNN 결과 (모든 Step Output): (1, 5, 64)
RNN Layer의 Weight 형태: (100, 64)

RNN 결과 (최종 Step Output): (1, 64)
RNN Layer의 Weight 형태: (100, 64)

문장의 긍,부정을 나누는 것이나 또는 단어를 생성하는 등에 대한 Task마다 필요 값이 달라지므로 SimpleRNN레이어의 return_sequences인자를 조절할 필요가 있음

LSTM

기울기 소실 문제를 해결하기 위해 고안된 RNN 레이어

LSTM가 gradient vanishing에 강한이유?

The reason why LSTM is strong on gradient vanishing

curt-park.github.io

lstm_seq_layer = tf.keras.layers.LSTM(units=64, return_sequences=True, use_bias=False)
lstm_seq_out = lstm_seq_layer(emb_out)

print("\nLSTM 결과 (모든 Step Output):", lstm_seq_out.shape)
print("LSTM Layer의 Weight 형태:", lstm_seq_layer.weights[0].shape)

lstm_fin_layer = tf.keras.layers.LSTM(units=64, use_bias=False)
lstm_fin_out = lstm_fin_layer(emb_out)

print("\nLSTM 결과 (최종 Step Output):", lstm_fin_out.shape)
print("LSTM Layer의 Weight 형태:", lstm_fin_layer.weights[0].shape)

#result
LSTM 결과 (모든 Step Output): (1, 5, 64)
LSTM Layer의 Weight 형태: (100, 256)
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.

LSTM 결과 (최종 Step Output): (1, 64)
LSTM Layer의 Weight 형태: (100, 256)

LSTM은 4종류의 Weight를 가지고 있으며, Gate라는 구조에 포함되어 기억할 정보와 다음 스텝에 전달할 정보를 결정

Cell state라는 개념이 있는데 긴 문장이 들어와도 이 Cell state를 통해 오래된 기억을 저장하며 Gate들이 Cell state의 정보를 추가하거나 빼는 역할을 함

Long Short-Term Memory (LSTM) 이해하기

이 글은 Christopher Olah가 2015년 8월에 쓴 글을 우리 말로 번역한 것이다. Recurrent neural network의 개념을 쉽게 설명했고, 그 중 획기적인 모델인 LSTM을 이론적으로 이해할 수 있도록 좋은 그림과 함께

dgkim5360.tistory.com

GRU

https://yjjo.tistory.com/18

Gated Recurrent Units (GRU)

Gated Recurrent Units (GRU) GRU는 게이트 메커니즘이 적용된 RNN 프레임워크의 일종으로 LSTM에 영감을 받았고, 더 간략한 구조를 가지고 있습니다. 아주 자랑스럽게도 한국인 조경현 박사님이 제안한 방

yjjo.tistory.com

LSTM의 변형모델로서 적은 데이터에도 웬만한 학습 성능을 보여주는 레이어, 가중치가 적음

양방향 RNN (Bidirectional)

여태 순방향이기만 했던 모델에서 양방향으로 전환된 모델로, 진행방향이 반대인 RNN2개를 겹친형태

import tensorflow as tf

sentence = "What time is it ?"
dic = {
    "is": 0,
    "it": 1,
    "What": 2,
    "time": 3,
    "?": 4
}

sentence_tensor = tf.constant([[dic[word] for word in sentence.split()]])

embedding_layer = tf.keras.layers.Embedding(input_dim=len(dic), output_dim=100)
emb_out = embedding_layer(sentence_tensor)

print("입력 문장 데이터 형태:", emb_out.shape)

bi_rnn = \
tf.keras.layers.Bidirectional(
    tf.keras.layers.SimpleRNN(units=64, use_bias=False, return_sequences=True)
)
bi_out = bi_rnn(emb_out)

print("Bidirectional RNN 결과 (최종 Step Output):", bi_out.shape)

#result
입력 문장 데이터 형태: (1, 5, 100)
Bidirectional RNN 결과 (최종 Step Output): (1, 5, 128)

순방향과 역방향의 Weight가 정의되므로 기존의 RNN의 2배 크기가 정의

저작자표시

'23년 이전 글 > 모두의연구소 아이펠' 카테고리의 다른 글

-31일차- 정규화와 정칙화 (0)	2022.02.11
-30일차- 시계열 예측 ARIMA (0)	2022.02.10
-27일차- 딥러닝 레이어의 이해 (0)	2022.02.07
-26일차- Deep Network (0)	2022.02.04
-25일차- 폐렴 진단 하기 (0)	2022.02.03