5. CNN-LSTMΒΆ
μ΄μ 4μ₯μμλ LSTMμ νμ©νμ¬ λνλ―Όκ΅ μ½λ‘λ19 νμ§μ μλ₯Ό μμΈ‘ν΄λ³΄μμ΅λλ€. LSTMμ Hochreiter & Schmidhuber (1997)μ μν΄ μ²μ μκ°λμκ³ , μ΄ν μ§μμ μΈ μ°κ΅¬μ ν¨κ» λ°μ ν΄μ€κ³ μμ΅λλ€.
μ΄λ² μ₯μμλ λͺ¨λΈ μ±λ₯ ν₯μμ μν λ€λ₯Έ λ°©λ²μ μ€νν΄λ³΄λλ‘ νκ² μ΅λλ€. λͺ¨λΈ μ±λ₯ ν₯μμ μν΄μλ λ°°μΉ (batch) μ¬μ΄μ¦μ μν (epoch) μ λ³κ²½, λ°μ΄ν°μ νλ μ΄ν (Dataset Curating), λ°μ΄ν°μ λΉμ¨ μ‘°μ , μμ€ ν¨μ(Loss Function) λ³κ²½, λͺ¨λΈ λ³κ²½ λ±μ λ°©λ²μ΄ μμ§λ§, μ΄λ² μ€μ΅μμλ λͺ¨λΈ ꡬ쑰 λ³κ²½μ ν΅ν΄ μ±λ₯μ ν₯μμ μ§νν΄λ³΄κ² μ΅λλ€. CNN-LSTM λͺ¨λΈμ μ¬μ©νμ¬, λνλ―Όκ΅ μ½λ‘λ19 νμ§μ μ μμΈ‘μ μμ΄μ λ λμ μ±λ₯μ λ³΄μΌ μ μλμ§ μ΄ν΄λ³΄λλ‘ νκ² μ΅λλ€.
κ°μ₯ λ¨Όμ μ΄λ² μ₯μ νμν λΌμ΄λΈλ¬λ¦¬λ€μ λΆλ¬μ€λλ‘ νκ² μ΅λλ€. κ°μ₯ κΈ°λ³Έμ μΈ torch
, numpy
, pandas
μ, νλ‘μΈμ€ μνλ₯Ό νμνλ tqdm
, μκ°ν λΌμ΄λΈλ¬λ¦¬μΈ pylab
, matplotlib
λ±μ μ¬μ©νκ² μ΅λλ€.
import torch
import os
import numpy as np
import pandas as pd
from tqdm import tqdm
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc
from sklearn.preprocessing import MinMaxScaler
from pandas.plotting import register_matplotlib_converters
from torch import nn, optim
%matplotlib inline
%config InlineBackend.figure_format='retina'
sns.set(style='whitegrid', palette='muted', font_scale=1.2)
rcParams['figure.figsize'] = 14, 10
register_matplotlib_converters()
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)
<torch._C.Generator at 0x7f782b5c3b88>
5.1 λ°μ΄ν°μ λ€μ΄λ‘λ λ° μ μ²λ¦¬ΒΆ
λͺ¨λΈλ§ μ€μ΅μ μν΄ λνλ―Όκ΅ μ½λ‘λ λμ νμ§μ λ°μ΄ν°λ₯Ό λΆλ¬μ€κ² μ΅λλ€. 2.1μ μ λμ¨ μ½λλ₯Ό νμ©νκ² μ΅λλ€.
!git clone https://github.com/Pseudo-Lab/Tutorial-Book-Utils
!python Tutorial-Book-Utils/PL_data_loader.py --data COVIDTimeSeries
!unzip -q COVIDTimeSeries.zip
Cloning into 'Tutorial-Book-Utils'...
remote: Enumerating objects: 24, done.
remote: Counting objects: 100% (24/24), done.
remote: Compressing objects: 100% (20/20), done.
remote: Total 24 (delta 6), reused 14 (delta 3), pack-reused 0
Unpacking objects: 100% (24/24), done.
COVIDTimeSeries.zip is done!
pandas
λΌμ΄λΈλ¬λ¦¬λ₯Ό μ΄μ©νμ¬ μ½λ‘λ νμ§μ λ°μ΄ν°λ₯Ό λΆλ¬μ¨ ν 3μ₯μμ μ€μ΅ν λ°μ΄ν° μ μ²λ¦¬λ₯Ό μ€μνκ² μ΅λλ€. λ°μ΄ν°μ
κΈ°κ°μ 2020λ
1μ 22μΌ λΆν° 2020λ
12μ 18μΌ μ
λλ€.
confirmed = pd.read_csv('time_series_covid19_confirmed_global.csv')
confirmed[confirmed['Country/Region']=='Korea, South']
korea = confirmed[confirmed['Country/Region']=='Korea, South'].iloc[:,4:].T
korea.index = pd.to_datetime(korea.index)
daily_cases = korea.diff().fillna(korea.iloc[0]).astype('int')
def create_sequences(data, seq_length):
xs = []
ys = []
for i in range(len(data)-seq_length):
x = data.iloc[i:(i+seq_length)]
y = data.iloc[i+seq_length]
xs.append(x)
ys.append(y)
return np.array(xs), np.array(ys)
seq_length = 5
X, y = create_sequences(daily_cases, seq_length)
#νμ΅μ©, κ²μ¦μ©, μνμ©μΌλ‘ λΆλ¦¬
train_size = int(327 * 0.8)
X_train, y_train = X[:train_size], y[:train_size]
X_val, y_val = X[train_size:train_size+33], y[train_size:train_size+33]
X_test, y_test = X[train_size+33:], y[train_size+33:]
MIN = X_train.min()
MAX = X_train.max()
def MinMaxScale(array, min, max):
return (array - min) / (max - min)
#MinMax μ€μΌμΌλ§
X_train = MinMaxScale(X_train, MIN, MAX)
y_train = MinMaxScale(y_train, MIN, MAX)
X_val = MinMaxScale(X_val, MIN, MAX)
y_val = MinMaxScale(y_val, MIN, MAX)
X_test = MinMaxScale(X_test, MIN, MAX)
y_test = MinMaxScale(y_test, MIN, MAX)
#Tensor ννλ‘ λ³ν
def make_Tensor(array):
return torch.from_numpy(array).float()
X_train = make_Tensor(X_train)
y_train = make_Tensor(y_train)
X_val = make_Tensor(X_val)
y_val = make_Tensor(y_val)
X_test = make_Tensor(X_test)
y_test = make_Tensor(y_test)
plt.plot(daily_cases.values)
[<matplotlib.lines.Line2D at 0x7f77cc4d1438>]
5.2 CNN-LSTM λͺ¨λΈ μ μΒΆ
5.2.1 1D CNN (1 Dimensional Convolution Neural Network) / Conv1DΒΆ
4μ₯μμλ LSTM λͺ¨λΈμ μ΄μ©νμ¬ νμ§μ μ μμΈ‘μ νμμ΅λλ€. μ΄λ² μ₯μμλ LSTMμ CNN λ μ΄μ΄λ₯Ό μΆκ°νμ¬ μμΈ‘μ μ§νν΄λ³΄κ³ μ ν©λλ€.
CNN λͺ¨λΈμ 1D, 2D, 3Dλ‘ λλλλ°, μΌλ°μ μΈ CNNμ λ³΄ν΅ μ΄λ―Έμ§ λΆλ₯μ μ¬μ©λλ 2Dλ₯Ό ν΅μΉν©λλ€. μ¬κΈ°μ Dλ μ°¨μμ λ»νλ dimensionalμ μ½μλ‘, μΈν λ°μ΄ν° ννμ λ°λΌ 1D, 2D, 3D ννμ CNN λͺ¨λΈμ΄ μ¬μ©λ©λλ€.
κ·Έλ¦Ό 5-1 μκ³μ΄ λ°μ΄ν° λμν (μΆμ²: Understanding 1D and 3D Convolution Neural Network | Keras)
κ·Έλ¦Ό 5-1μ 1D CNNμμ 컀λμ μμ§μμ 1μ°¨μ μΌλ‘ μκ°ν ν κ·Έλ¦Όμ λλ€. μκ°μ νλ¦μ λ°λΌ 컀λμ΄ μ€λ₯Έμͺ½μΌλ‘ μ΄λν©λλ€. μκ³μ΄ λ°μ΄ν°(Time-Series Data)λ₯Ό λ€λ£° λμλ 1D CNNμ΄ μ ν©ν©λλ€. 1D CNNμ νμ©νκ² λλ©΄ λ³μ κ°μ μ§μ½μ μΈ νΉμ§μ μΆμΆν μ μκ² λ©λλ€.
5.2.2 1D CNN ν μ€νΈΒΆ
κ·Έλ¦Ό 5-2 & 5-3 1D CNN μκ°ν
κ·Έλ¦Ό 5-2μ 5-3μ 1μ°¨μμ CNN ꡬ쑰λ₯Ό μκ°ν ν κ·Έλ¦Όμ
λλ€. κ·Έλ¦Ό 5-2μμ 5-3μΌλ‘ μ§νλλ κ²μ²λΌ, stride
κ° 1μΌ κ²½μ° νλμ© μ΄λνλ€κ³ 보μλ©΄ λ©λλ€. μ΄μ κ°λ΅ν μ½λλ₯Ό ν΅ν΄ 1D CNNμ μ΄ν΄λ³΄λλ‘ νκ² μ΅λλ€.
μ°μ 1D CNN λ μ΄μ΄λ₯Ό μ μν΄μ c
μ μ μ₯ν©λλ€. κ·Έλ¦Ό 5-2 & 5-3μ²λΌ in_channels
1κ°, out_channels
1κ°, kernel_size
2κ°, stride
λ 1λ‘ μ€μ μ νμμ΅λλ€. κ·Έλ¦¬κ³ λμ μ
λ ₯ κ°μΌλ‘ νμ©ν input
λ³μλ₯Ό μ μνκ³ c
μ μ
λ ₯ν΄ μμΈ‘κ°μ μ°μΆν©λλ€.
c = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=2, stride=1)
input = torch.Tensor([[[1,2,3,4,5]]])
output = c(input)
output
tensor([[[-0.3875, -0.8842, -1.3808, -1.8774]]], grad_fn=<SqueezeBackward1>)
5κ°μ μ
λ ₯ μμλ€μ΄ kernel_size
κ° 2μΈ 1D CNNμ ν΅κ³Όνλ 4κ°μ κ°μ΄ μ°μΆλμ΅λλ€. ν΄λΉ κ°λ€μ΄ μ΄λ»κ² μ°μΆ λλμ§ μμλ³΄κ² μ΅λλ€. μ°μ c
μ μ μ₯λ weightμ bias κ°μ νμΈν΄λ³΄κ² μ΅λλ€.
for param in c.parameters():
print(param)
Parameter containing:
tensor([[[-0.1021, -0.3946]]], requires_grad=True)
Parameter containing:
tensor([0.5037], requires_grad=True)
첫λ²μ§Έλ‘ λμ€λ κ°μ weight κ°μ μλ―Έν©λλ€. kernel_size
κ° 2 μ΄λ―λ‘ μ΄ 2κ°μ weightκ°μ΄ μ‘΄μ¬ν©λλ€. λ€μμΌλ‘ λμ€λ κ°μ bias κ°μ
λλ€. νλμ 1D CNN λ μ΄μ΄μ λν΄ νλμ bias κ°μ΄ μ‘΄μ¬ν©λλ€. μ΄μ ν΄λΉ κ°λ€μ κ°κ° w1
, w2
, b
λ³μμ μ μ₯ν΄λ³΄κ² μ΅λλ€.
w_list = []
for param in c.parameters():
w_list.append(param)
w = w_list[0]
b = w_list[1]
w1 = w[0][0][0]
w2 = w[0][0][1]
print(w1)
print(w2)
print(b)
tensor(-0.1021, grad_fn=<SelectBackward>)
tensor(-0.3946, grad_fn=<SelectBackward>)
Parameter containing:
tensor([0.5037], requires_grad=True)
μΈλ±μ±μ ν΅ν΄ κ°μ€μΉ κ°λ€μ κ°κ° w1
, w2
, b
λ³μμ μ μ₯νμ΅λλ€. κ·Έλ¦Ό 5-2μ 5-3μμ \(y1\) κ³Ό \(y2\)λ₯Ό μ°μΆν λ μ¬μ©ν μ°μμ μμ©νμ¬ 1D CNNμ ν΅κ³Όνμ λμ λμ¨ output
κ°μ κ³μ°ν μ μμ΅λλ€. 1D CNN νν°κ° 3κ³Ό 4λ₯Ό μ§λ λ μ°μΆλλ κ°μ κ³μ°νλ©΄ μλμ κ°μ΅λλ€.
w1 * 3 + w2 * 4 + b
tensor([-1.3808], grad_fn=<AddBackward0>)
μ΄λ output
μ 3λ²μ§Έ κ°κ³Ό κ°λ€λ κ²μ μ μ μμΌλ©°, λλ¨Έμ§ κ°λ€λ μ΄λ° λ°©μμΌλ‘ κ³μ°λμμμ μ μ μμ΅λλ€.
output
tensor([[[-0.3875, -0.8842, -1.3808, -1.8774]]], grad_fn=<SqueezeBackward1>)
5.3 CNN-LSTM λͺ¨λΈ μμ±ΒΆ
μ΄μ CNN-LSTM λͺ¨λΈμ μμ±ν΄λ³΄κ² μ΅λλ€. 4μ₯μμ μμ±ν LSTM λͺ¨λΈκ³Όμ κ°μ₯ ν° μ°¨μ΄μ μ 1D CNN λ μ΄μ΄λ₯Ό μΆκ°ν κ²μ
λλ€. μλ μ½λλ₯Ό 보μλ©΄, CovidPredictor
ν΄λμ€ μμ, nn.Conv1d
λ₯Ό ν΅ν΄ 1D CNN λ μ΄μ΄λ₯Ό μΆκ°ν κ²μ νμΈν μ μμ΅λλ€.
class CovidPredictor(nn.Module):
def __init__(self, n_features, n_hidden, seq_len, n_layers):
super(CovidPredictor, self).__init__()
self.n_hidden = n_hidden
self.seq_len = seq_len
self.n_layers = n_layers
self.c1 = nn.Conv1d(in_channels=1, out_channels=1, kernel_size = 2, stride = 1) # 1D CNN λ μ΄μ΄ μΆκ°
self.lstm = nn.LSTM(
input_size=n_features,
hidden_size=n_hidden,
num_layers=n_layers
)
self.linear = nn.Linear(in_features=n_hidden, out_features=1)
def reset_hidden_state(self):
self.hidden = (
torch.zeros(self.n_layers, self.seq_len-1, self.n_hidden),
torch.zeros(self.n_layers, self.seq_len-1, self.n_hidden)
)
def forward(self, sequences):
sequences = self.c1(sequences.view(len(sequences), 1, -1))
lstm_out, self.hidden = self.lstm(
sequences.view(len(sequences), self.seq_len-1, -1),
self.hidden
)
last_time_step = lstm_out.view(self.seq_len-1, len(sequences), self.n_hidden)[-1]
y_pred = self.linear(last_time_step)
return y_pred
5.4 λͺ¨λΈ νμ΅ΒΆ
4μ₯μμ ꡬμΆν train_model
ν¨μλ₯Ό νμ©ν΄ λͺ¨λΈ νμ΅μ ν΄λ³΄κ² μ΅λλ€. μ΅ν°λ§μ΄μ λ‘λ Adam
μ μ ννμμ΅λλ€. νμ΅λΉμ¨μ 0.001
λ‘ μ€μ νμμ΅λλ€. μμ€ ν¨μ(Loss Function)λ‘λ MAE (Mean Absolute Error)
λ₯Ό μ ννμ΅λλ€.
def train_model(model, train_data, train_labels, val_data=None, val_labels=None, num_epochs=100, verbose = 10, patience = 10):
loss_fn = torch.nn.L1Loss() #
optimiser = torch.optim.Adam(model.parameters(), lr=0.001)
train_hist = []
val_hist = []
for t in range(num_epochs):
epoch_loss = 0
for idx, seq in enumerate(train_data): # sample λ³ hidden state resetμ ν΄μ€μΌ ν¨
model.reset_hidden_state()
# train loss
seq = torch.unsqueeze(seq, 0)
y_pred = model(seq)
loss = loss_fn(y_pred[0].float(), train_labels[idx]) # 1κ°μ stepμ λν loss
# update weights
optimiser.zero_grad()
loss.backward()
optimiser.step()
epoch_loss += loss.item()
train_hist.append(epoch_loss / len(train_data))
if val_data is not None:
with torch.no_grad():
val_loss = 0
for val_idx, val_seq in enumerate(val_data):
model.reset_hidden_state() #seq λ³λ‘ hidden state μ΄κΈ°ν
val_seq = torch.unsqueeze(val_seq, 0)
y_val_pred = model(val_seq)
val_step_loss = loss_fn(y_val_pred[0].float(), val_labels[val_idx])
val_loss += val_step_loss
val_hist.append(val_loss / len(val_data)) # val histμ μΆκ°
## verbose λ²μ§Έ λ§λ€ loss μΆλ ₯
if t % verbose == 0:
print(f'Epoch {t} train loss: {epoch_loss / len(train_data)} val loss: {val_loss / len(val_data)}')
## patience λ²μ§Έ λ§λ€ early stopping μ¬λΆ νμΈ
if (t % patience == 0) & (t != 0):
## lossκ° μ»€μ‘λ€λ©΄ early stop
if val_hist[t - patience] < val_hist[t] :
print('\n Early Stopping')
break
elif t % verbose == 0:
print(f'Epoch {t} train loss: {epoch_loss / len(train_data)}')
return model, train_hist, val_hist
model = CovidPredictor(
n_features=1,
n_hidden=4,
seq_len=seq_length,
n_layers=1
)
μμΈ‘ λͺ¨λΈμ κ°λ΅ν μ΄ν΄λ³΄λλ‘ νκ² μ΅λλ€.
print(model)
CovidPredictor(
(c1): Conv1d(1, 1, kernel_size=(2,), stride=(1,))
(lstm): LSTM(1, 4)
(linear): Linear(in_features=4, out_features=1, bias=True)
)
μ΄μ λͺ¨λΈ νμ΅μ μ§νν΄λ³΄κ² μ΅λλ€
model, train_hist, val_hist = train_model(
model,
X_train,
y_train,
X_val,
y_val,
num_epochs=100,
verbose=10,
patience=50
)
Epoch 0 train loss: 0.08868540743530025 val loss: 0.04381682723760605
Epoch 10 train loss: 0.03551809817384857 val loss: 0.033296383917331696
Epoch 20 train loss: 0.033714159246412786 val loss: 0.033151865005493164
Epoch 30 train loss: 0.03314930358741047 val loss: 0.03351602330803871
Epoch 40 train loss: 0.03311298256454511 val loss: 0.03455767780542374
Epoch 50 train loss: 0.033384358255242594 val loss: 0.03596664220094681
Epoch 60 train loss: 0.03306851693218524 val loss: 0.035104189068078995
Epoch 70 train loss: 0.03264325369823853 val loss: 0.03546909987926483
Epoch 80 train loss: 0.03269847107237612 val loss: 0.035008616745471954
Epoch 90 train loss: 0.033151885962927306 val loss: 0.034998856484889984
μκ°νλ₯Ό ν΅ν΄ νλ ¨ μμ€κ°(Training Loss)κ³Ό μν μμ€κ°(Test Loss)μ μ΄ν΄λ³΄κ² μ΅λλ€.
plt.plot(train_hist, label="Training loss")
plt.plot(val_hist, label="Val loss")
plt.legend()
<matplotlib.legend.Legend at 0x7f77c2ac9fd0>
λ κ°μ μμ€κ°μ΄ λͺ¨λ μλ ΄νλ κ²μ νμΈν μ μμ΅λλ€.
5.5 νμ§μ μ μμΈ‘ΒΆ
λͺ¨λΈ νμ΅μ λ§μ³€μΌλ νμ§μ μ μμΈ‘μ ν΄λ³΄λλ‘ νκ² μ΅λλ€. μμΈ‘ν λλ μλ‘μ΄ μνμ€κ° μ
λ ₯λ λ λ§λ€ hidden_state
λ μ΄κΈ°νλ₯Ό ν΄μ€μΌ μ΄μ μνμ€μ hidden_state
κ° λ°μλμ§ μμ΅λλ€. torch.unsqueeze
ν¨μλ₯Ό μ¬μ©νμ¬ μ
λ ₯ λ°μ΄ν°μ μ°¨μμ λλ € λͺ¨λΈμ΄ μμνλ 3μ°¨μ ννλ‘ λ§λ€μ΄μ€λλ€. κ·Έλ¦¬κ³ μμΈ‘λ λ°μ΄ν° λ΄μ μ‘΄μ¬νλ μ€μΉΌλΌκ°λ§ μΆμΆνμ¬ preds
리μ€νΈμ μΆκ°ν©λλ€.
pred_dataset = X_test
with torch.no_grad():
preds = []
for _ in range(len(pred_dataset)):
model.reset_hidden_state()
y_test_pred = model(torch.unsqueeze(pred_dataset[_], 0))
pred = torch.flatten(y_test_pred).item()
preds.append(pred)
plt.plot(np.array(y_test)*MAX, label = 'True')
plt.plot(np.array(preds)*MAX, label = 'Pred')
plt.legend()
<matplotlib.legend.Legend at 0x7f77c29aafd0>
def MAE(true, pred):
return np.mean(np.abs(true-pred))
MAE(np.array(y_test)*MAX, np.array(preds)*MAX)
247.63305325632362
LSTMλ§ μ¬μ©ν λͺ¨λΈμ MAEλ μ½ 250μ΄ λμμμ΅λλ€. μ½λ‘λ νμ§μ λ°μ΄ν°μ λν΄ μ±λ₯μ μμ΄μ ν° μ°¨μ΄λ μλ κ²μ νμΈν μ μμ΅λλ€. LSTMκ³Ό CNN-LSTMμ lossκ° λͺ¨λ νΉμ κ°μ μλ ΄νκΈ° λλ¬Έμ΄λΌκ³ λ³Ό μ μμΌλ©°, μ΄λ λͺ¨λΈ ꡬ쑰μ λΉν΄ μ λ ₯λλ λ°μ΄ν°κ° λ무 κ°λ¨νκΈ° λλ¬Έμ΄λΌκ³ λ λ³Ό μ μμ΅λλ€.
μ΄μμΌλ‘ μ½λ‘λ19 νμ§μ μ μμΈ‘μ λνλ―Όκ΅ λ°μ΄ν°μ κ³Ό CNN-LSTM λͺ¨λΈλ‘ μ€μ΅ν΄λ³΄μμ΅λλ€. μ΄λ² νν 리μΌμ ν΅ν΄μ λ°μ΄ν°μ νμκ³Ό λ°μ΄ν°μ μ μ²λ¦¬λΆν° μμν΄μ LSTMλͺ¨λΈμ νμ΅νκ³ μμΈ‘μ ν΄λ³΄μκ³ , λ λμκ° CNN-LSTMλͺ¨λΈλ μ¬μ©ν΄λ³΄μμ΅λλ€.
μκ³μ΄(Time Series) μμΈ‘μ λ°μ΄ν°κ° λ§μ§ μμΌλ©΄ μ νλκ° λ¨μ΄μ§λκ² μ¬μ€μ λλ€. μ΄λ² νν 리μΌμμλ νμ§μ μ λ°μ΄ν°λ§ μ¬μ©νμ¬ λ₯λ¬λ λͺ¨λΈμ νμ΅ν΄λ³΄μ΅λλ€. μ΄ μΈμλ λ€μν λ°μ΄ν°μ μ νμ©νμ¬ λ₯λ¬λ λͺ¨λΈμ νμ΅ν΄λ³΄μκΈ° λ°λλλ€.