5. Faster R-CNNยถ

Open In Colab

4์žฅ์—์„œ๋Š” One-Stage Detector ๋ชจ๋ธ์ธ RetinaNet์„ ํ™œ์šฉํ•ด ์˜๋ฃŒ์šฉ ๋งˆ์Šคํฌ ํƒ์ง€ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ์žฅ์—์„œ๋Š” Two-Stage Detector์ธ Faster R-CNN์œผ๋กœ ๊ฐ์ฒด ํƒ์ง€๋ฅผ ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

5.1์ ˆ๋ถ€ํ„ฐ 5.3์ ˆ๊นŒ์ง€๋Š” 2์žฅ๊ณผ 3์žฅ์—์„œ ํ™•์ธํ•œ ๋‚ด์šฉ์„ ๋ฐ”ํƒ•์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ํ›ˆ๋ จ์šฉ, ์‹œํ—˜์šฉ ๋ฐ์ดํ„ฐ๋กœ ๋‚˜๋ˆˆ ํ›„ ๋ฐ์ดํ„ฐ์…‹ ํด๋ž˜์Šค๋ฅผ ์ •์˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. 5.4์ ˆ์—์„œ๋Š” torchvision API๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค. 5.5์ ˆ์—์„œ๋Š” ์ „์ด ํ•™์Šต์„ ํ†ตํ•ด ๋ชจ๋ธ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ ํ›„ 5.6์ ˆ์—์„œ ์˜ˆ์ธก๊ฐ’ ์‚ฐ์ถœ ๋ฐ ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์‹คํ—˜์— ์•ž์„œ Google Colab์—์„œ๋Š” ๋žœ๋ค GPU๋ฅผ ํ• ๋‹นํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑํ˜„์ƒ์ด ์ผ์–ด๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋จผ์ € GPU๋ฅผ ํ™•์ธ ํ›„์— ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ถฉ๋ถ„ํ•  ๊ฒฝ์šฐ ์‹คํ—˜์„ ํ•˜์‹œ๊ธธ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. ๋Ÿฐํƒ€์ž„์„ ์ดˆ๊ธฐํ™”ํ•  ๊ฒฝ์šฐ ์ƒˆ๋กœ์šด GPU๋ฅผ ํ• ๋‹น๋ฐ›์œผ์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

import torch

if torch.cuda.is_available():    
    device = torch.device("cuda")
    print('There are %d GPU(s) available.' % torch.cuda.device_count())
    print('We will use the GPU:', torch.cuda.get_device_name(0))

else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")
Copy to clipboard
There are 1 GPU(s) available.
We will use the GPU: Tesla T4
Copy to clipboard

5.1 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐยถ

๋ชจ๋ธ๋ง ์‹ค์Šต์„ ์œ„ํ•ด 2.1์ ˆ์— ๋‚˜์˜จ ์ฝ”๋“œ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ฐ€์งœ์—ฐ๊ตฌ์†Œ ๊นƒํ—ˆ๋ธŒ์˜ Tutorial-Book-Utils์— ์žˆ๋Š” PL_data_loader.py ํŒŒ์ผ๋กœ FascMaskDetection ๋ฐ์ดํ„ฐ์…‹์„ ๋‹ค์šด๋ฐ›๊ณ  ์••์ถ• ํŒŒ์ผ์„ ํ‘ธ๋Š” ์ˆœ์„œ์ž…๋‹ˆ๋‹ค.

!git clone https://github.com/Pseudo-Lab/Tutorial-Book-Utils
!python Tutorial-Book-Utils/PL_data_loader.py --data FaceMaskDetection
!unzip -q Face\ Mask\ Detection.zip
Copy to clipboard
Cloning into 'Tutorial-Book-Utils'...
remote: Enumerating objects: 18, done.
remote: Counting objects:   5% (1/18)
remote: Counting objects:  11% (2/18)
remote: Counting objects:  16% (3/18)
remote: Counting objects:  22% (4/18)
remote: Counting objects:  27% (5/18)
remote: Counting objects:  33% (6/18)
remote: Counting objects:  38% (7/18)
remote: Counting objects:  44% (8/18)
remote: Counting objects:  50% (9/18)
remote: Counting objects:  55% (10/18)
remote: Counting objects:  61% (11/18)
remote: Counting objects:  66% (12/18)
remote: Counting objects:  72% (13/18)
remote: Counting objects:  77% (14/18)
remote: Counting objects:  83% (15/18)
remote: Counting objects:  88% (16/18)
remote: Counting objects:  94% (17/18)
remote: Counting objects: 100% (18/18)
remote: Counting objects: 100% (18/18), done.
remote: Compressing objects:   6% (1/15)
remote: Compressing objects:  13% (2/15)
remote: Compressing objects:  20% (3/15)
remote: Compressing objects:  26% (4/15)
remote: Compressing objects:  33% (5/15)
remote: Compressing objects:  40% (6/15)
remote: Compressing objects:  46% (7/15)
remote: Compressing objects:  53% (8/15)
remote: Compressing objects:  60% (9/15)
remote: Compressing objects:  66% (10/15)
remote: Compressing objects:  73% (11/15)
remote: Compressing objects:  80% (12/15)
remote: Compressing objects:  86% (13/15)
remote: Compressing objects:  93% (14/15)
remote: Compressing objects: 100% (15/15)
remote: Compressing objects: 100% (15/15), done.
remote: Total 18 (delta 4), reused 8 (delta 2), pack-reused 0
Unpacking objects:   5% (1/18)   
Unpacking objects:  11% (2/18)   
Unpacking objects:  16% (3/18)   
Unpacking objects:  22% (4/18)   
Unpacking objects:  27% (5/18)   
Unpacking objects:  33% (6/18)   
Unpacking objects:  38% (7/18)   
Unpacking objects:  44% (8/18)   
Unpacking objects:  50% (9/18)   
Unpacking objects:  55% (10/18)   
Unpacking objects:  61% (11/18)   
Unpacking objects:  66% (12/18)   
Unpacking objects:  72% (13/18)   
Unpacking objects:  77% (14/18)   
Unpacking objects:  83% (15/18)   
Unpacking objects:  88% (16/18)   
Unpacking objects:  94% (17/18)   
Unpacking objects: 100% (18/18)   
Unpacking objects: 100% (18/18), done.
Face Mask Detection.zip is done!
Copy to clipboard

5.2 ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌยถ

3.3์ ˆ๊ณผ ๊ฐ™์ด ๋ฐ์ดํ„ฐ์…‹์„ ๋ถ„๋ฆฌํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ์ž„์˜๋กœ 170์žฅ์„ ์ถ”์ถœํ•˜๊ณ  testํด๋”์— ์˜ฎ๊ธฐ๋Š” ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

import os
import random
import numpy as np
import shutil

print(len(os.listdir('annotations')))
print(len(os.listdir('images')))

!mkdir test_images
!mkdir test_annotations


random.seed(1234)
idx = random.sample(range(853), 170)

for img in np.array(sorted(os.listdir('images')))[idx]:
    shutil.move('images/'+img, 'test_images/'+img)

for annot in np.array(sorted(os.listdir('annotations')))[idx]:
    shutil.move('annotations/'+annot, 'test_annotations/'+annot)

print(len(os.listdir('annotations')))
print(len(os.listdir('images')))
print(len(os.listdir('test_annotations')))
print(len(os.listdir('test_images')))
Copy to clipboard
853
853
683
683
170
170
Copy to clipboard

๋˜ํ•œ ๋ชจ๋ธ๋ง์— ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค. torchvision์€ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ๋ฅผ ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋ฉฐ ๋ฐ์ดํ„ฐ์…‹์— ๊ด€ํ•œ ํŒจํ‚ค์ง€์™€ ๋ชจ๋ธ์— ๊ด€ํ•œ ํŒจํ‚ค์ง€๊ฐ€ ๋‚ด์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

import os
import numpy as np
import matplotlib.patches as patches
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
from PIL import Image
import torchvision
from torchvision import transforms, datasets, models
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
import time
Copy to clipboard

5.3 ๋ฐ์ดํ„ฐ์…‹ ํด๋ž˜์Šค ์ •์˜ยถ

์ด๋ฒˆ์—๋Š” ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ ์œ„ํ•œ ํ•จ์ˆ˜๋“ค์„ 2.3์ ˆ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ •์˜ํ•ด์ค๋‹ˆ๋‹ค.

def generate_box(obj):
    
    xmin = float(obj.find('xmin').text)
    ymin = float(obj.find('ymin').text)
    xmax = float(obj.find('xmax').text)
    ymax = float(obj.find('ymax').text)
    
    return [xmin, ymin, xmax, ymax]

adjust_label = 1

def generate_label(obj):

    if obj.find('name').text == "with_mask":

        return 1 + adjust_label

    elif obj.find('name').text == "mask_weared_incorrect":

        return 2 + adjust_label

    return 0 + adjust_label

def generate_target(file): 
    with open(file) as f:
        data = f.read()
        soup = BeautifulSoup(data, "html.parser")
        objects = soup.find_all("object")

        num_objs = len(objects)

        boxes = []
        labels = []
        for i in objects:
            boxes.append(generate_box(i))
            labels.append(generate_label(i))

        boxes = torch.as_tensor(boxes, dtype=torch.float32) 
        labels = torch.as_tensor(labels, dtype=torch.int64) 
        
        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        
        return target

def plot_image_from_output(img, annotation):
    
    img = img.cpu().permute(1,2,0)
    
    fig,ax = plt.subplots(1)
    ax.imshow(img)
    
    for idx in range(len(annotation["boxes"])):
        xmin, ymin, xmax, ymax = annotation["boxes"][idx]

        if annotation['labels'][idx] == 1 :
            rect = patches.Rectangle((xmin,ymin),(xmax-xmin),(ymax-ymin),linewidth=1,edgecolor='r',facecolor='none')
        
        elif annotation['labels'][idx] == 2 :
            
            rect = patches.Rectangle((xmin,ymin),(xmax-xmin),(ymax-ymin),linewidth=1,edgecolor='g',facecolor='none')
            
        else :
        
            rect = patches.Rectangle((xmin,ymin),(xmax-xmin),(ymax-ymin),linewidth=1,edgecolor='orange',facecolor='none')

        ax.add_patch(rect)

    plt.show()
Copy to clipboard

๋˜ํ•œ 4.3์ ˆ์ฒ˜๋Ÿผ ๋ฐ์ดํ„ฐ์…‹ ํด๋ž˜์Šค์™€ ๋ฐ์ดํ„ฐ ๋กœ๋”๋ฅผ ์ •์˜ํ•ด์ค๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์€ torch.utils.data.DataLoader ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋Š” 4๋กœ ์ง€์ •ํ•˜์—ฌ ๋ถˆ๋Ÿฌ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค.๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋Š” ๊ฐœ์ธ์˜ ๋ฉ”๋ชจ๋ฆฌ ํฌ๊ธฐ์— ๋”ฐ๋ผ ์ž์œ ๋กญ๊ฒŒ ์„ค์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

class MaskDataset(object):
    def __init__(self, transforms, path):
        '''
        path: path to train folder or test folder
        '''
        # transform module๊ณผ img path ๊ฒฝ๋กœ๋ฅผ ์ •์˜
        self.transforms = transforms
        self.path = path
        self.imgs = list(sorted(os.listdir(self.path)))


    def __getitem__(self, idx): #special method
        # load images ad masks
        file_image = self.imgs[idx]
        file_label = self.imgs[idx][:-3] + 'xml'
        img_path = os.path.join(self.path, file_image)
        
        if 'test' in self.path:
            label_path = os.path.join("test_annotations/", file_label)
        else:
            label_path = os.path.join("annotations/", file_label)

        img = Image.open(img_path).convert("RGB")
        #Generate Label
        target = generate_target(label_path)
        
        if self.transforms is not None:
            img = self.transforms(img)

        return img, target

    def __len__(self): 
        return len(self.imgs)

data_transform = transforms.Compose([  # transforms.Compose : list ๋‚ด์˜ ์ž‘์—…์„ ์—ฐ๋‹ฌ์•„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ˜ธ์ถœํ•˜๋Š” ํด๋ž˜์Šค
        transforms.ToTensor() # ToTensor : numpy ์ด๋ฏธ์ง€์—์„œ torch ์ด๋ฏธ์ง€๋กœ ๋ณ€๊ฒฝ
    ])

def collate_fn(batch):
    return tuple(zip(*batch))

dataset = MaskDataset(data_transform, 'images/')
test_dataset = MaskDataset(data_transform, 'test_images/')

data_loader = torch.utils.data.DataLoader(dataset, batch_size=4, collate_fn=collate_fn)
test_data_loader = torch.utils.data.DataLoader(test_dataset, batch_size=2, collate_fn=collate_fn)
Copy to clipboard

5.4 ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐยถ

torchvision.models.detection์—์„œ๋Š” Faster R-CNN API(torchvision.models.detection.fasterrcnn_resnet50_fpn)๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์žˆ์–ด ์‰ฝ๊ฒŒ ๊ตฌํ˜„์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” COCO ๋ฐ์ดํ„ฐ์…‹์„ ResNet50์œผ๋กœ pre-trainedํ•œ ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, pretrained=True/False๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ดํ›„ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ๋Š” num_classes์— ์›ํ•˜๋Š” ํด๋ž˜์Šค ๊ฐœ์ˆ˜๋ฅผ ์„ค์ •ํ•˜๊ณ  ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. Faster R-CNN ์‚ฌ์šฉ ์‹œ ์ฃผ์˜ํ•  ์ ์€ background ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•œ ๊ฐœ์ˆ˜๋ฅผ num_classes์— ๋ช…์‹œํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์‹ค์ œ ๋ฐ์ดํ„ฐ์…‹์˜ ํด๋ž˜์Šค ๊ฐœ์ˆ˜์— 1๊ฐœ๋ฅผ ๋Š˜๋ ค background ํด๋ž˜์Šค๋ฅผ ์ถ”๊ฐ€ํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

def get_model_instance_segmentation(num_classes):
  
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    return model
Copy to clipboard

5.5 ์ „์ด ํ•™์Šตยถ

Face Mask Detection์— ์ „์ด ํ•™์Šต์„ ์‹ค์‹œํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Face Mask Detection ๋ฐ์ดํ„ฐ์…‹์€ 3๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์ง€๋งŒ background ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•˜์—ฌ num_classes๋ฅผ 4๋กœ ์„ค์ •ํ•œ ํ›„ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.

GPU๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ™˜๊ฒฝ์ด๋ผ๋ฉด device๋กœ ์ง€์ •ํ•˜์—ฌ ๋ถˆ๋Ÿฌ์˜จ ๋ชจ๋ธ์„ GPU์— ๋ณด๋‚ด์ค๋‹ˆ๋‹ค.

model = get_model_instance_segmentation(4)

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') 
model.to(device)
Copy to clipboard
FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): FrozenBatchNorm2d(256)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256)
          (relu): ReLU(inplace=True)
        )
      )
      (layer2): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(512)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512)
          (relu): ReLU(inplace=True)
        )
      )
      (layer3): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(1024)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
        (4): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
        (5): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
      )
      (layer4): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(512)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(512)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(2048)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(2048)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(512)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(512)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(2048)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(512)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(512)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(2048)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (fpn): FeaturePyramidNetwork(
      (inner_blocks): ModuleList(
        (0): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
        (3): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
      )
      (layer_blocks): ModuleList(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (extra_blocks): LastLevelMaxPool()
    )
  )
  (rpn): RegionProposalNetwork(
    (anchor_generator): AnchorGenerator()
    (head): RPNHead(
      (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
      (bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (roi_heads): RoIHeads(
    (box_roi_pool): MultiScaleRoIAlign()
    (box_head): TwoMLPHead(
      (fc6): Linear(in_features=12544, out_features=1024, bias=True)
      (fc7): Linear(in_features=1024, out_features=1024, bias=True)
    )
    (box_predictor): FastRCNNPredictor(
      (cls_score): Linear(in_features=1024, out_features=4, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=16, bias=True)
    )
  )
)
Copy to clipboard

์œ„์˜ ์ถœ๋ ฅ๋˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด Fastser R-CNN์ด ์–ด๋–ค layer๋“ค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š”์ง€ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋•Œ, GPU ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์—ฌ๋ถ€๋Š” torch.cuda.is_available()๋ฅผ ํ†ตํ•ด ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

torch.cuda.is_available()
Copy to clipboard
True
Copy to clipboard

์ด์ œ ๋ชจ๋ธ์ด ๋งŒ๋“ค์–ด์กŒ์œผ๋‹ˆ ํ•™์Šต์„ ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ํ•™์Šต ํšŸ์ˆ˜(num_epochs)๋Š” 10์œผ๋กœ ์ง€์ •ํ•˜๊ณ , SGD ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์ตœ์ ํ™” ์‹œ์ผœ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ฐ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์ž์œ ๋กญ๊ฒŒ ์ˆ˜์ •ํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

num_epochs = 10
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                                momentum=0.9, weight_decay=0.0005)
Copy to clipboard

์ด์ œ ํ•™์Šต์„ ์‹œ์ผœ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์œ„์—์„œ ์ƒ์„ฑํ•œ data_loader์—์„œ ํ•œ ๋ฐฐ์น˜์”ฉ ์ˆœ์„œ๋Œ€๋กœ ๋ชจ๋ธ์— ์‚ฌ์šฉํ•˜๋ฉฐ, ์ดํ›„ ๋กœ์Šค ๊ณ„์‚ฐ์„ ํ†ตํ•ด ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์—ํญ๋งˆ๋‹ค ์ถœ๋ ฅ๋˜๋Š” ๋กœ์Šค๋ฅผ ํ†ตํ•ด์„œ ํ•™์Šต์ด ์ง„ํ–‰๋˜๋Š”๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

print('----------------------train start--------------------------')
for epoch in range(num_epochs):
    start = time.time()
    model.train()
    i = 0    
    epoch_loss = 0
    for imgs, annotations in data_loader:
        i += 1
        imgs = list(img.to(device) for img in imgs)
        annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
        loss_dict = model(imgs, annotations) 
        losses = sum(loss for loss in loss_dict.values())        

        optimizer.zero_grad()
        losses.backward()
        optimizer.step() 
        epoch_loss += losses
    print(f'epoch : {epoch+1}, Loss : {epoch_loss}, time : {time.time() - start}')
Copy to clipboard
----------------------train start--------------------------
epoch : 1, Loss : 77.14759063720703, time : 252.42370867729187
epoch : 2, Loss : 48.91315460205078, time : 263.22984743118286
epoch : 3, Loss : 43.18947982788086, time : 264.4591932296753
epoch : 4, Loss : 36.07373046875, time : 265.2568733692169
epoch : 5, Loss : 31.8864688873291, time : 265.57766008377075
epoch : 6, Loss : 31.76308250427246, time : 265.0076003074646
epoch : 7, Loss : 31.24744415283203, time : 265.16882514953613
epoch : 8, Loss : 29.340274810791016, time : 265.73448038101196
epoch : 9, Loss : 25.922008514404297, time : 267.91367626190186
epoch : 10, Loss : 23.59230613708496, time : 266.9004054069519
Copy to clipboard

ํ•™์Šต์‹œํ‚จ ๊ฐ€์ค‘์น˜๋ฅผ ์ €์žฅํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด, torch.save๋ฅผ ์ด์šฉํ•˜์—ฌ ์ €์žฅํ•ด๋‘๊ณ  ๋‚˜์ค‘์— ์–ธ์ œ๋“ ์ง€ ๋ถˆ๋Ÿฌ์™€ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

torch.save(model.state_dict(),f'model_{num_epochs}.pt')
Copy to clipboard
model.load_state_dict(torch.load(f'model_{num_epochs}.pt'))
Copy to clipboard
<All keys matched successfully>
Copy to clipboard

5.6 ์˜ˆ์ธกยถ

๋ชจ๋ธ ํ•™์Šต์ด ๋๋‚ฌ์œผ๋ฉด ์ž˜ ํ•™์Šต๋˜์—ˆ๋Š”์ง€ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์˜ˆ์ธก๊ฒฐ๊ณผ์—๋Š” ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์˜ ์ขŒํ‘œ(boxes)์™€ ํด๋ž˜์Šค(labels), ์ ์ˆ˜(scores)๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ ์ˆ˜(scores)์—๋Š” ํ•ด๋‹น ํด๋ž˜์Šค์˜ ์‹ ๋ขฐ๋„ ๊ฐ’์ด ์ €์žฅ๋˜๋Š”๋ฐ threshold๋กœ 0.5 ์ด์ƒ์ธ ๊ฒƒ๋งŒ ์ถ”์ถœํ•˜๋„๋ก ํ•จ์ˆ˜make_prediction๋ฅผ ์ •์˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  test_data_loader์˜ ์ฒซ๋ฒˆ์งธ ๋ฐฐ์น˜์— ๋Œ€ํ•ด์„œ๋งŒ ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค.

def make_prediction(model, img, threshold):
    model.eval()
    preds = model(img)
    for id in range(len(preds)) :
        idx_list = []

        for idx, score in enumerate(preds[id]['scores']) :
            if score > threshold : 
                idx_list.append(idx)

        preds[id]['boxes'] = preds[id]['boxes'][idx_list]
        preds[id]['labels'] = preds[id]['labels'][idx_list]
        preds[id]['scores'] = preds[id]['scores'][idx_list]

    return preds
Copy to clipboard
with torch.no_grad(): 
    # ํ…Œ์ŠคํŠธ์…‹ ๋ฐฐ์น˜์‚ฌ์ด์ฆˆ= 2
    for imgs, annotations in test_data_loader:
        imgs = list(img.to(device) for img in imgs)

        pred = make_prediction(model, imgs, 0.5)
        print(pred)
        break
Copy to clipboard
[{'boxes': tensor([[117.7811,   1.4936, 132.9596,  18.4192],
        [214.8204,  59.8669, 249.7893,  97.6275]], device='cuda:0'), 'labels': tensor([2, 2], device='cuda:0'), 'scores': tensor([0.9430, 0.9414], device='cuda:0')}, {'boxes': tensor([[218.8598,  99.3362, 260.0332, 138.8516],
        [130.5172, 109.1189, 179.2908, 152.5566],
        [ 29.2499,  88.7732,  45.5664, 104.5635],
        [ 40.9168, 109.1093,  67.3653, 140.0567],
        [165.5889,  90.0294, 179.4471, 109.1606],
        [ 83.7276,  84.3918,  94.5928,  96.4693],
        [302.4648, 130.4534, 332.0580, 158.8674],
        [258.4624,  90.7134, 269.2498, 102.2883],
        [  2.8419, 103.6409,  21.9580, 125.5492]], device='cuda:0'), 'labels': tensor([2, 2, 1, 1, 1, 1, 1, 1, 1], device='cuda:0'), 'scores': tensor([0.9962, 0.9918, 0.9900, 0.9894, 0.9891, 0.9653, 0.9652, 0.9573, 0.9046],
       device='cuda:0')}]
Copy to clipboard

์˜ˆ์ธก๋œ ๊ฒฐ๊ณผ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ์œ„์— ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ ๊ทธ๋ ค๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์œ„์—์„œ ์ •์˜ํ•œ plot_image_from_output ํ•จ์ˆ˜๋กœ ๊ทธ๋ฆผ์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. Target์ด ์‹ค์ œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์œ„์น˜์ด๋ฉฐ Prediction์ด ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ์‹ค์ œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์˜ ์œ„์น˜๋ฅผ ์ž˜ ์ฐพ์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

_idx = 1
print("Target : ", annotations[_idx]['labels'])
plot_image_from_output(imgs[_idx], annotations[_idx])
print("Prediction : ", pred[_idx]['labels'])
plot_image_from_output(imgs[_idx], pred[_idx])
Copy to clipboard
Target :  tensor([1, 1, 1, 2, 2, 1, 1, 1])
Copy to clipboard
../../_images/Ch5-Faster-R-CNN_38_1.png
Prediction :  tensor([2, 2, 1, 1, 1, 1, 1, 1, 1], device='cuda:0')
Copy to clipboard
../../_images/Ch5-Faster-R-CNN_38_3.png

์ด๋ฒˆ์—” ์ „์ฒด ์‹œํ—˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ฐ€ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋จผ์ € ๋ชจ๋“  ์‹œํ—˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก ๊ฒฐ๊ณผ์™€ ์‹ค์ œ label์„ ๊ฐ๊ฐ preds_adj_all, annot_all์— ๋‹ด์•„์ค๋‹ˆ๋‹ค.

from tqdm import tqdm

labels = []
preds_adj_all = []
annot_all = []

for im, annot in tqdm(test_data_loader, position = 0, leave = True):
    im = list(img.to(device) for img in im)
    #annot = [{k: v.to(device) for k, v in t.items()} for t in annot]

    for t in annot:
        labels += t['labels']

    with torch.no_grad():
        preds_adj = make_prediction(model, im, 0.5)
        preds_adj = [{k: v.to(torch.device('cpu')) for k, v in t.items()} for t in preds_adj]
        preds_adj_all.append(preds_adj)
        annot_all.append(annot)
Copy to clipboard
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 85/85 [00:25<00:00,  3.34it/s]
Copy to clipboard

๊ทธ๋ฆฌ๊ณ  Tutorial-Book-Utils ํด๋” ๋‚ด์— ์žˆ๋Š” utils_ObjectDetection.py ํŒŒ์ผ์„ ํ†ตํ•ด์„œ mAP ๊ฐ’์„ ์‚ฐ์ถœํ•ฉ๋‹ˆ๋‹ค. get_batch_statistics ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด IoU(Intersection of Union) ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๊ฐ„์˜ ํ†ต๊ณ—๊ฐ’์„ ๊ณ„์‚ฐํ›„ ap_per_class ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ AP๊ฐ’์„ ๊ณ„์‚ฐํ•ด์ค๋‹ˆ๋‹ค.

%cd Tutorial-Book-Utils/
import utils_ObjectDetection as utils
Copy to clipboard
/content/Tutorial-Book-Utils
Copy to clipboard
sample_metrics = []
for batch_i in range(len(preds_adj_all)):
    sample_metrics += utils.get_batch_statistics(preds_adj_all[batch_i], annot_all[batch_i], iou_threshold=0.5) 

true_positives, pred_scores, pred_labels = [torch.cat(x, 0) for x in list(zip(*sample_metrics))]  # ๋ฐฐ์น˜๊ฐ€ ์ „๋ถ€ ํ•ฉ์ณ์ง
precision, recall, AP, f1, ap_class = utils.ap_per_class(true_positives, pred_scores, pred_labels, torch.tensor(labels))
mAP = torch.mean(AP)
print(f'mAP : {mAP}')
print(f'AP : {AP}')
Copy to clipboard
mAP : 0.7182363990382057
AP : tensor([0.8694, 0.9189, 0.3664], dtype=torch.float64)
Copy to clipboard

AP๊ฐ’์€ background ํด๋ž˜์Šค๋ฅผ ์ œ์™ธํ•œ ์‹ค์ œ 3๊ฐœ์˜ ํด๋ž˜์Šค์— ๋Œ€ํ•ด์„œ๋งŒ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. 10๋ฒˆ๋งŒ ํ•™์Šตํ–ˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  4์žฅ์˜ RetinaNet ๊ฒฐ๊ณผ๋ณด๋‹ค ํ–ฅ์ƒ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ๋‚˜ 1๋ฒˆ ํด๋ž˜์Šค์ธ ๋งˆ์Šคํฌ ์ฐฉ์šฉ ๊ฐ์ฒด์— ๋Œ€ํ•ด์„œ๋Š” 0.9189 AP์— ํ•ด๋‹นํ•˜๋Š” ์ •ํ™•๋„๊นŒ์ง€ ๋ณด์ด๊ณ  2๋ฒˆ ํด๋ž˜์Šค์ธ ๋งˆ์Šคํฌ๋ฅผ ์ œ๋Œ€๋กœ ์ฐฉ์šฉํ•˜๊ณ  ์žˆ์ง€ ์•Š๋Š” ๊ฐ์ฒด์—์„œ๋„ 0.3664 AP๋ฅผ ๋ณด์ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. RetinaNet์ด FPN๊ณผ Focal loss๋กœ one-stage method์ž„์—๋„ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค๊ณ  ์ผ๋ฐ˜์ ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌผ๋ก  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ํ†ตํ•ด RetinaNet์˜ ์„ฑ๋Šฅ์„ ์ตœ์ ํ™” ํ•ด๋„ ๋˜๊ฒ ์ง€๋งŒ, ํ˜„์žฌ ์‹คํ—˜ ๊ฒฐ๊ณผ๋กœ ๋ฏธ๋ค„๋ดค์„ ๋•Œ ์ด ๋ฐ์ดํ„ฐ์…‹์—๋Š” Faster-RCNN์ด ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ด์ƒ์œผ๋กœ ์˜๋ฃŒ์šฉ ๋งˆ์Šคํฌ ํƒ์ง€ ํŠœํ† ๋ฆฌ์–ผ์„ ๋งˆ์น˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ํŠœํ† ๋ฆฌ์–ผ์„ ํ†ตํ•ด์„œ ๋ฐ์ดํ„ฐ์…‹์„ ์ „์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ๋ถ€ํ„ฐ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ๊นŒ์ง€ ์ง„ํ–‰ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ•™์Šต ํšŸ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ํ•ด๋ณด๋Š” ๋ฐฉ๋ฒ•๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž์‹ ์ด ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ์— ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์„ ์ž์œ ๋กญ๊ฒŒ ํ™œ์šฉํ•ด๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.