5. Faster R-CNNยถ

Open In Colab

4์žฅ์—์„œ๋Š” One-Stage Detector ๋ชจ๋ธ์ธ RetinaNet์„ ํ™œ์šฉํ•ด ์˜๋ฃŒ์šฉ ๋งˆ์Šคํฌ ํƒ์ง€ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ์žฅ์—์„œ๋Š” Two-Stage Detector์ธ Faster R-CNN์œผ๋กœ ๊ฐ์ฒด ํƒ์ง€๋ฅผ ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

5.1์ ˆ๋ถ€ํ„ฐ 5.3์ ˆ๊นŒ์ง€๋Š” 2์žฅ๊ณผ 3์žฅ์—์„œ ํ™•์ธํ•œ ๋‚ด์šฉ์„ ๋ฐ”ํƒ•์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ํ›ˆ๋ จ์šฉ, ์‹œํ—˜์šฉ ๋ฐ์ดํ„ฐ๋กœ ๋‚˜๋ˆˆ ํ›„ ๋ฐ์ดํ„ฐ์…‹ ํด๋ž˜์Šค๋ฅผ ์ •์˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. 5.4์ ˆ์—์„œ๋Š” torchvision API๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค. 5.5์ ˆ์—์„œ๋Š” ์ „์ด ํ•™์Šต์„ ํ†ตํ•ด ๋ชจ๋ธ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ ํ›„ 5.6์ ˆ์—์„œ ์˜ˆ์ธก๊ฐ’ ์‚ฐ์ถœ ๋ฐ ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์‹คํ—˜์— ์•ž์„œ Google Colab์—์„œ๋Š” ๋žœ๋ค GPU๋ฅผ ํ• ๋‹นํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑํ˜„์ƒ์ด ์ผ์–ด๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋จผ์ € GPU๋ฅผ ํ™•์ธ ํ›„์— ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ถฉ๋ถ„ํ•  ๊ฒฝ์šฐ ์‹คํ—˜์„ ํ•˜์‹œ๊ธธ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. ๋Ÿฐํƒ€์ž„์„ ์ดˆ๊ธฐํ™”ํ•  ๊ฒฝ์šฐ ์ƒˆ๋กœ์šด GPU๋ฅผ ํ• ๋‹น๋ฐ›์œผ์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

import torch

if torch.cuda.is_available():    
    device = torch.device("cuda")
    print('There are %d GPU(s) available.' % torch.cuda.device_count())
    print('We will use the GPU:', torch.cuda.get_device_name(0))

else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")
There are 1 GPU(s) available.
We will use the GPU: Tesla T4

5.1 ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐยถ

๋ชจ๋ธ๋ง ์‹ค์Šต์„ ์œ„ํ•ด 2.1์ ˆ์— ๋‚˜์˜จ ์ฝ”๋“œ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ฐ€์งœ์—ฐ๊ตฌ์†Œ ๊นƒํ—ˆ๋ธŒ์˜ Tutorial-Book-Utils์— ์žˆ๋Š” PL_data_loader.py ํŒŒ์ผ๋กœ FascMaskDetection ๋ฐ์ดํ„ฐ์…‹์„ ๋‹ค์šด๋ฐ›๊ณ  ์••์ถ• ํŒŒ์ผ์„ ํ‘ธ๋Š” ์ˆœ์„œ์ž…๋‹ˆ๋‹ค.

!git clone https://github.com/Pseudo-Lab/Tutorial-Book-Utils
!python Tutorial-Book-Utils/PL_data_loader.py --data FaceMaskDetection
!unzip -q Face\ Mask\ Detection.zip
Cloning into 'Tutorial-Book-Utils'...
remote: Enumerating objects: 18, done.
remote: Counting objects:   5% (1/18)
remote: Counting objects:  11% (2/18)
remote: Counting objects:  16% (3/18)
remote: Counting objects:  22% (4/18)
remote: Counting objects:  27% (5/18)
remote: Counting objects:  33% (6/18)
remote: Counting objects:  38% (7/18)
remote: Counting objects:  44% (8/18)
remote: Counting objects:  50% (9/18)
remote: Counting objects:  55% (10/18)
remote: Counting objects:  61% (11/18)
remote: Counting objects:  66% (12/18)
remote: Counting objects:  72% (13/18)
remote: Counting objects:  77% (14/18)
remote: Counting objects:  83% (15/18)
remote: Counting objects:  88% (16/18)
remote: Counting objects:  94% (17/18)
remote: Counting objects: 100% (18/18)
remote: Counting objects: 100% (18/18), done.
remote: Compressing objects:   6% (1/15)
remote: Compressing objects:  13% (2/15)
remote: Compressing objects:  20% (3/15)
remote: Compressing objects:  26% (4/15)
remote: Compressing objects:  33% (5/15)
remote: Compressing objects:  40% (6/15)
remote: Compressing objects:  46% (7/15)
remote: Compressing objects:  53% (8/15)
remote: Compressing objects:  60% (9/15)
remote: Compressing objects:  66% (10/15)
remote: Compressing objects:  73% (11/15)
remote: Compressing objects:  80% (12/15)
remote: Compressing objects:  86% (13/15)
remote: Compressing objects:  93% (14/15)
remote: Compressing objects: 100% (15/15)
remote: Compressing objects: 100% (15/15), done.
remote: Total 18 (delta 4), reused 8 (delta 2), pack-reused 0
Unpacking objects:   5% (1/18)   
Unpacking objects:  11% (2/18)   
Unpacking objects:  16% (3/18)   
Unpacking objects:  22% (4/18)   
Unpacking objects:  27% (5/18)   
Unpacking objects:  33% (6/18)   
Unpacking objects:  38% (7/18)   
Unpacking objects:  44% (8/18)   
Unpacking objects:  50% (9/18)   
Unpacking objects:  55% (10/18)   
Unpacking objects:  61% (11/18)   
Unpacking objects:  66% (12/18)   
Unpacking objects:  72% (13/18)   
Unpacking objects:  77% (14/18)   
Unpacking objects:  83% (15/18)   
Unpacking objects:  88% (16/18)   
Unpacking objects:  94% (17/18)   
Unpacking objects: 100% (18/18)   
Unpacking objects: 100% (18/18), done.
Face Mask Detection.zip is done!

5.2 ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌยถ

3.3์ ˆ๊ณผ ๊ฐ™์ด ๋ฐ์ดํ„ฐ์…‹์„ ๋ถ„๋ฆฌํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ์ž„์˜๋กœ 170์žฅ์„ ์ถ”์ถœํ•˜๊ณ  testํด๋”์— ์˜ฎ๊ธฐ๋Š” ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

import os
import random
import numpy as np
import shutil

print(len(os.listdir('annotations')))
print(len(os.listdir('images')))

!mkdir test_images
!mkdir test_annotations


random.seed(1234)
idx = random.sample(range(853), 170)

for img in np.array(sorted(os.listdir('images')))[idx]:
    shutil.move('images/'+img, 'test_images/'+img)

for annot in np.array(sorted(os.listdir('annotations')))[idx]:
    shutil.move('annotations/'+annot, 'test_annotations/'+annot)

print(len(os.listdir('annotations')))
print(len(os.listdir('images')))
print(len(os.listdir('test_annotations')))
print(len(os.listdir('test_images')))
853
853
683
683
170
170

๋˜ํ•œ ๋ชจ๋ธ๋ง์— ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค. torchvision์€ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ๋ฅผ ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋ฉฐ ๋ฐ์ดํ„ฐ์…‹์— ๊ด€ํ•œ ํŒจํ‚ค์ง€์™€ ๋ชจ๋ธ์— ๊ด€ํ•œ ํŒจํ‚ค์ง€๊ฐ€ ๋‚ด์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

import os
import numpy as np
import matplotlib.patches as patches
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
from PIL import Image
import torchvision
from torchvision import transforms, datasets, models
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
import time

5.3 ๋ฐ์ดํ„ฐ์…‹ ํด๋ž˜์Šค ์ •์˜ยถ

์ด๋ฒˆ์—๋Š” ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ ์œ„ํ•œ ํ•จ์ˆ˜๋“ค์„ 2.3์ ˆ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ •์˜ํ•ด์ค๋‹ˆ๋‹ค.

def generate_box(obj):
    
    xmin = float(obj.find('xmin').text)
    ymin = float(obj.find('ymin').text)
    xmax = float(obj.find('xmax').text)
    ymax = float(obj.find('ymax').text)
    
    return [xmin, ymin, xmax, ymax]

adjust_label = 1

def generate_label(obj):

    if obj.find('name').text == "with_mask":

        return 1 + adjust_label

    elif obj.find('name').text == "mask_weared_incorrect":

        return 2 + adjust_label

    return 0 + adjust_label

def generate_target(file): 
    with open(file) as f:
        data = f.read()
        soup = BeautifulSoup(data, "html.parser")
        objects = soup.find_all("object")

        num_objs = len(objects)

        boxes = []
        labels = []
        for i in objects:
            boxes.append(generate_box(i))
            labels.append(generate_label(i))

        boxes = torch.as_tensor(boxes, dtype=torch.float32) 
        labels = torch.as_tensor(labels, dtype=torch.int64) 
        
        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        
        return target

def plot_image_from_output(img, annotation):
    
    img = img.cpu().permute(1,2,0)
    
    fig,ax = plt.subplots(1)
    ax.imshow(img)
    
    for idx in range(len(annotation["boxes"])):
        xmin, ymin, xmax, ymax = annotation["boxes"][idx]

        if annotation['labels'][idx] == 1 :
            rect = patches.Rectangle((xmin,ymin),(xmax-xmin),(ymax-ymin),linewidth=1,edgecolor='r',facecolor='none')
        
        elif annotation['labels'][idx] == 2 :
            
            rect = patches.Rectangle((xmin,ymin),(xmax-xmin),(ymax-ymin),linewidth=1,edgecolor='g',facecolor='none')
            
        else :
        
            rect = patches.Rectangle((xmin,ymin),(xmax-xmin),(ymax-ymin),linewidth=1,edgecolor='orange',facecolor='none')

        ax.add_patch(rect)

    plt.show()

๋˜ํ•œ 4.3์ ˆ์ฒ˜๋Ÿผ ๋ฐ์ดํ„ฐ์…‹ ํด๋ž˜์Šค์™€ ๋ฐ์ดํ„ฐ ๋กœ๋”๋ฅผ ์ •์˜ํ•ด์ค๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์€ torch.utils.data.DataLoader ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋Š” 4๋กœ ์ง€์ •ํ•˜์—ฌ ๋ถˆ๋Ÿฌ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค.๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋Š” ๊ฐœ์ธ์˜ ๋ฉ”๋ชจ๋ฆฌ ํฌ๊ธฐ์— ๋”ฐ๋ผ ์ž์œ ๋กญ๊ฒŒ ์„ค์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

class MaskDataset(object):
    def __init__(self, transforms, path):
        '''
        path: path to train folder or test folder
        '''
        # transform module๊ณผ img path ๊ฒฝ๋กœ๋ฅผ ์ •์˜
        self.transforms = transforms
        self.path = path
        self.imgs = list(sorted(os.listdir(self.path)))


    def __getitem__(self, idx): #special method
        # load images ad masks
        file_image = self.imgs[idx]
        file_label = self.imgs[idx][:-3] + 'xml'
        img_path = os.path.join(self.path, file_image)
        
        if 'test' in self.path:
            label_path = os.path.join("test_annotations/", file_label)
        else:
            label_path = os.path.join("annotations/", file_label)

        img = Image.open(img_path).convert("RGB")
        #Generate Label
        target = generate_target(label_path)
        
        if self.transforms is not None:
            img = self.transforms(img)

        return img, target

    def __len__(self): 
        return len(self.imgs)

data_transform = transforms.Compose([  # transforms.Compose : list ๋‚ด์˜ ์ž‘์—…์„ ์—ฐ๋‹ฌ์•„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ˜ธ์ถœํ•˜๋Š” ํด๋ž˜์Šค
        transforms.ToTensor() # ToTensor : numpy ์ด๋ฏธ์ง€์—์„œ torch ์ด๋ฏธ์ง€๋กœ ๋ณ€๊ฒฝ
    ])

def collate_fn(batch):
    return tuple(zip(*batch))

dataset = MaskDataset(data_transform, 'images/')
test_dataset = MaskDataset(data_transform, 'test_images/')

data_loader = torch.utils.data.DataLoader(dataset, batch_size=4, collate_fn=collate_fn)
test_data_loader = torch.utils.data.DataLoader(test_dataset, batch_size=2, collate_fn=collate_fn)

5.4 ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐยถ

torchvision.models.detection์—์„œ๋Š” Faster R-CNN API(torchvision.models.detection.fasterrcnn_resnet50_fpn)๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์žˆ์–ด ์‰ฝ๊ฒŒ ๊ตฌํ˜„์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” COCO ๋ฐ์ดํ„ฐ์…‹์„ ResNet50์œผ๋กœ pre-trainedํ•œ ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, pretrained=True/False๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ดํ›„ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ๋Š” num_classes์— ์›ํ•˜๋Š” ํด๋ž˜์Šค ๊ฐœ์ˆ˜๋ฅผ ์„ค์ •ํ•˜๊ณ  ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. Faster R-CNN ์‚ฌ์šฉ ์‹œ ์ฃผ์˜ํ•  ์ ์€ background ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•œ ๊ฐœ์ˆ˜๋ฅผ num_classes์— ๋ช…์‹œํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์‹ค์ œ ๋ฐ์ดํ„ฐ์…‹์˜ ํด๋ž˜์Šค ๊ฐœ์ˆ˜์— 1๊ฐœ๋ฅผ ๋Š˜๋ ค background ํด๋ž˜์Šค๋ฅผ ์ถ”๊ฐ€ํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

def get_model_instance_segmentation(num_classes):
  
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    return model

5.5 ์ „์ด ํ•™์Šตยถ

Face Mask Detection์— ์ „์ด ํ•™์Šต์„ ์‹ค์‹œํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Face Mask Detection ๋ฐ์ดํ„ฐ์…‹์€ 3๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์ง€๋งŒ background ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•˜์—ฌ num_classes๋ฅผ 4๋กœ ์„ค์ •ํ•œ ํ›„ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.

GPU๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ™˜๊ฒฝ์ด๋ผ๋ฉด device๋กœ ์ง€์ •ํ•˜์—ฌ ๋ถˆ๋Ÿฌ์˜จ ๋ชจ๋ธ์„ GPU์— ๋ณด๋‚ด์ค๋‹ˆ๋‹ค.

model = get_model_instance_segmentation(4)

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') 
model.to(device)
FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): FrozenBatchNorm2d(256)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256)
          (relu): ReLU(inplace=True)
        )
      )
      (layer2): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(512)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512)
          (relu): ReLU(inplace=True)
        )
      )
      (layer3): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(1024)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
        (4): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
        (5): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
      )
      (layer4): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(512)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(512)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(2048)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(2048)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(512)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(512)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(2048)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(512)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(512)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(2048)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (fpn): FeaturePyramidNetwork(
      (inner_blocks): ModuleList(
        (0): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
        (3): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
      )
      (layer_blocks): ModuleList(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (extra_blocks): LastLevelMaxPool()
    )
  )
  (rpn): RegionProposalNetwork(
    (anchor_generator): AnchorGenerator()
    (head): RPNHead(
      (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
      (bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (roi_heads): RoIHeads(
    (box_roi_pool): MultiScaleRoIAlign()
    (box_head): TwoMLPHead(
      (fc6): Linear(in_features=12544, out_features=1024, bias=True)
      (fc7): Linear(in_features=1024, out_features=1024, bias=True)
    )
    (box_predictor): FastRCNNPredictor(
      (cls_score): Linear(in_features=1024, out_features=4, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=16, bias=True)
    )
  )
)

์œ„์˜ ์ถœ๋ ฅ๋˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด Fastser R-CNN์ด ์–ด๋–ค layer๋“ค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š”์ง€ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋•Œ, GPU ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์—ฌ๋ถ€๋Š” torch.cuda.is_available()๋ฅผ ํ†ตํ•ด ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

torch.cuda.is_available()
True

์ด์ œ ๋ชจ๋ธ์ด ๋งŒ๋“ค์–ด์กŒ์œผ๋‹ˆ ํ•™์Šต์„ ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ํ•™์Šต ํšŸ์ˆ˜(num_epochs)๋Š” 10์œผ๋กœ ์ง€์ •ํ•˜๊ณ , SGD ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์ตœ์ ํ™” ์‹œ์ผœ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ฐ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์ž์œ ๋กญ๊ฒŒ ์ˆ˜์ •ํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

num_epochs = 10
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                                momentum=0.9, weight_decay=0.0005)

์ด์ œ ํ•™์Šต์„ ์‹œ์ผœ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์œ„์—์„œ ์ƒ์„ฑํ•œ data_loader์—์„œ ํ•œ ๋ฐฐ์น˜์”ฉ ์ˆœ์„œ๋Œ€๋กœ ๋ชจ๋ธ์— ์‚ฌ์šฉํ•˜๋ฉฐ, ์ดํ›„ ๋กœ์Šค ๊ณ„์‚ฐ์„ ํ†ตํ•ด ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์—ํญ๋งˆ๋‹ค ์ถœ๋ ฅ๋˜๋Š” ๋กœ์Šค๋ฅผ ํ†ตํ•ด์„œ ํ•™์Šต์ด ์ง„ํ–‰๋˜๋Š”๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

print('----------------------train start--------------------------')
for epoch in range(num_epochs):
    start = time.time()
    model.train()
    i = 0    
    epoch_loss = 0
    for imgs, annotations in data_loader:
        i += 1
        imgs = list(img.to(device) for img in imgs)
        annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
        loss_dict = model(imgs, annotations) 
        losses = sum(loss for loss in loss_dict.values())        

        optimizer.zero_grad()
        losses.backward()
        optimizer.step() 
        epoch_loss += losses
    print(f'epoch : {epoch+1}, Loss : {epoch_loss}, time : {time.time() - start}')
----------------------train start--------------------------
epoch : 1, Loss : 77.14759063720703, time : 252.42370867729187
epoch : 2, Loss : 48.91315460205078, time : 263.22984743118286
epoch : 3, Loss : 43.18947982788086, time : 264.4591932296753
epoch : 4, Loss : 36.07373046875, time : 265.2568733692169
epoch : 5, Loss : 31.8864688873291, time : 265.57766008377075
epoch : 6, Loss : 31.76308250427246, time : 265.0076003074646
epoch : 7, Loss : 31.24744415283203, time : 265.16882514953613
epoch : 8, Loss : 29.340274810791016, time : 265.73448038101196
epoch : 9, Loss : 25.922008514404297, time : 267.91367626190186
epoch : 10, Loss : 23.59230613708496, time : 266.9004054069519

ํ•™์Šต์‹œํ‚จ ๊ฐ€์ค‘์น˜๋ฅผ ์ €์žฅํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด, torch.save๋ฅผ ์ด์šฉํ•˜์—ฌ ์ €์žฅํ•ด๋‘๊ณ  ๋‚˜์ค‘์— ์–ธ์ œ๋“ ์ง€ ๋ถˆ๋Ÿฌ์™€ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

torch.save(model.state_dict(),f'model_{num_epochs}.pt')
model.load_state_dict(torch.load(f'model_{num_epochs}.pt'))
<All keys matched successfully>

5.6 ์˜ˆ์ธกยถ

๋ชจ๋ธ ํ•™์Šต์ด ๋๋‚ฌ์œผ๋ฉด ์ž˜ ํ•™์Šต๋˜์—ˆ๋Š”์ง€ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์˜ˆ์ธก๊ฒฐ๊ณผ์—๋Š” ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์˜ ์ขŒํ‘œ(boxes)์™€ ํด๋ž˜์Šค(labels), ์ ์ˆ˜(scores)๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ ์ˆ˜(scores)์—๋Š” ํ•ด๋‹น ํด๋ž˜์Šค์˜ ์‹ ๋ขฐ๋„ ๊ฐ’์ด ์ €์žฅ๋˜๋Š”๋ฐ threshold๋กœ 0.5 ์ด์ƒ์ธ ๊ฒƒ๋งŒ ์ถ”์ถœํ•˜๋„๋ก ํ•จ์ˆ˜make_prediction๋ฅผ ์ •์˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  test_data_loader์˜ ์ฒซ๋ฒˆ์งธ ๋ฐฐ์น˜์— ๋Œ€ํ•ด์„œ๋งŒ ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค.

def make_prediction(model, img, threshold):
    model.eval()
    preds = model(img)
    for id in range(len(preds)) :
        idx_list = []

        for idx, score in enumerate(preds[id]['scores']) :
            if score > threshold : 
                idx_list.append(idx)

        preds[id]['boxes'] = preds[id]['boxes'][idx_list]
        preds[id]['labels'] = preds[id]['labels'][idx_list]
        preds[id]['scores'] = preds[id]['scores'][idx_list]

    return preds
with torch.no_grad(): 
    # ํ…Œ์ŠคํŠธ์…‹ ๋ฐฐ์น˜์‚ฌ์ด์ฆˆ= 2
    for imgs, annotations in test_data_loader:
        imgs = list(img.to(device) for img in imgs)

        pred = make_prediction(model, imgs, 0.5)
        print(pred)
        break
[{'boxes': tensor([[117.7811,   1.4936, 132.9596,  18.4192],
        [214.8204,  59.8669, 249.7893,  97.6275]], device='cuda:0'), 'labels': tensor([2, 2], device='cuda:0'), 'scores': tensor([0.9430, 0.9414], device='cuda:0')}, {'boxes': tensor([[218.8598,  99.3362, 260.0332, 138.8516],
        [130.5172, 109.1189, 179.2908, 152.5566],
        [ 29.2499,  88.7732,  45.5664, 104.5635],
        [ 40.9168, 109.1093,  67.3653, 140.0567],
        [165.5889,  90.0294, 179.4471, 109.1606],
        [ 83.7276,  84.3918,  94.5928,  96.4693],
        [302.4648, 130.4534, 332.0580, 158.8674],
        [258.4624,  90.7134, 269.2498, 102.2883],
        [  2.8419, 103.6409,  21.9580, 125.5492]], device='cuda:0'), 'labels': tensor([2, 2, 1, 1, 1, 1, 1, 1, 1], device='cuda:0'), 'scores': tensor([0.9962, 0.9918, 0.9900, 0.9894, 0.9891, 0.9653, 0.9652, 0.9573, 0.9046],
       device='cuda:0')}]

์˜ˆ์ธก๋œ ๊ฒฐ๊ณผ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ์œ„์— ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ ๊ทธ๋ ค๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์œ„์—์„œ ์ •์˜ํ•œ plot_image_from_output ํ•จ์ˆ˜๋กœ ๊ทธ๋ฆผ์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. Target์ด ์‹ค์ œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์œ„์น˜์ด๋ฉฐ Prediction์ด ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ์‹ค์ œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์˜ ์œ„์น˜๋ฅผ ์ž˜ ์ฐพ์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

_idx = 1
print("Target : ", annotations[_idx]['labels'])
plot_image_from_output(imgs[_idx], annotations[_idx])
print("Prediction : ", pred[_idx]['labels'])
plot_image_from_output(imgs[_idx], pred[_idx])
Target :  tensor([1, 1, 1, 2, 2, 1, 1, 1])
../../_images/Ch5-Faster-R-CNN_38_1.png
Prediction :  tensor([2, 2, 1, 1, 1, 1, 1, 1, 1], device='cuda:0')
../../_images/Ch5-Faster-R-CNN_38_3.png

์ด๋ฒˆ์—” ์ „์ฒด ์‹œํ—˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ฐ€ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋จผ์ € ๋ชจ๋“  ์‹œํ—˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก ๊ฒฐ๊ณผ์™€ ์‹ค์ œ label์„ ๊ฐ๊ฐ preds_adj_all, annot_all์— ๋‹ด์•„์ค๋‹ˆ๋‹ค.

from tqdm import tqdm

labels = []
preds_adj_all = []
annot_all = []

for im, annot in tqdm(test_data_loader, position = 0, leave = True):
    im = list(img.to(device) for img in im)
    #annot = [{k: v.to(device) for k, v in t.items()} for t in annot]

    for t in annot:
        labels += t['labels']

    with torch.no_grad():
        preds_adj = make_prediction(model, im, 0.5)
        preds_adj = [{k: v.to(torch.device('cpu')) for k, v in t.items()} for t in preds_adj]
        preds_adj_all.append(preds_adj)
        annot_all.append(annot)
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 85/85 [00:25<00:00,  3.34it/s]

๊ทธ๋ฆฌ๊ณ  Tutorial-Book-Utils ํด๋” ๋‚ด์— ์žˆ๋Š” utils_ObjectDetection.py ํŒŒ์ผ์„ ํ†ตํ•ด์„œ mAP ๊ฐ’์„ ์‚ฐ์ถœํ•ฉ๋‹ˆ๋‹ค. get_batch_statistics ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด IoU(Intersection of Union) ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๊ฐ„์˜ ํ†ต๊ณ—๊ฐ’์„ ๊ณ„์‚ฐํ›„ ap_per_class ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ AP๊ฐ’์„ ๊ณ„์‚ฐํ•ด์ค๋‹ˆ๋‹ค.

%cd Tutorial-Book-Utils/
import utils_ObjectDetection as utils
/content/Tutorial-Book-Utils
sample_metrics = []
for batch_i in range(len(preds_adj_all)):
    sample_metrics += utils.get_batch_statistics(preds_adj_all[batch_i], annot_all[batch_i], iou_threshold=0.5) 

true_positives, pred_scores, pred_labels = [torch.cat(x, 0) for x in list(zip(*sample_metrics))]  # ๋ฐฐ์น˜๊ฐ€ ์ „๋ถ€ ํ•ฉ์ณ์ง
precision, recall, AP, f1, ap_class = utils.ap_per_class(true_positives, pred_scores, pred_labels, torch.tensor(labels))
mAP = torch.mean(AP)
print(f'mAP : {mAP}')
print(f'AP : {AP}')
mAP : 0.7182363990382057
AP : tensor([0.8694, 0.9189, 0.3664], dtype=torch.float64)

AP๊ฐ’์€ background ํด๋ž˜์Šค๋ฅผ ์ œ์™ธํ•œ ์‹ค์ œ 3๊ฐœ์˜ ํด๋ž˜์Šค์— ๋Œ€ํ•ด์„œ๋งŒ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. 10๋ฒˆ๋งŒ ํ•™์Šตํ–ˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  4์žฅ์˜ RetinaNet ๊ฒฐ๊ณผ๋ณด๋‹ค ํ–ฅ์ƒ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ๋‚˜ 1๋ฒˆ ํด๋ž˜์Šค์ธ ๋งˆ์Šคํฌ ์ฐฉ์šฉ ๊ฐ์ฒด์— ๋Œ€ํ•ด์„œ๋Š” 0.9189 AP์— ํ•ด๋‹นํ•˜๋Š” ์ •ํ™•๋„๊นŒ์ง€ ๋ณด์ด๊ณ  2๋ฒˆ ํด๋ž˜์Šค์ธ ๋งˆ์Šคํฌ๋ฅผ ์ œ๋Œ€๋กœ ์ฐฉ์šฉํ•˜๊ณ  ์žˆ์ง€ ์•Š๋Š” ๊ฐ์ฒด์—์„œ๋„ 0.3664 AP๋ฅผ ๋ณด์ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. RetinaNet์ด FPN๊ณผ Focal loss๋กœ one-stage method์ž„์—๋„ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค๊ณ  ์ผ๋ฐ˜์ ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌผ๋ก  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ํ†ตํ•ด RetinaNet์˜ ์„ฑ๋Šฅ์„ ์ตœ์ ํ™” ํ•ด๋„ ๋˜๊ฒ ์ง€๋งŒ, ํ˜„์žฌ ์‹คํ—˜ ๊ฒฐ๊ณผ๋กœ ๋ฏธ๋ค„๋ดค์„ ๋•Œ ์ด ๋ฐ์ดํ„ฐ์…‹์—๋Š” Faster-RCNN์ด ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ด์ƒ์œผ๋กœ ์˜๋ฃŒ์šฉ ๋งˆ์Šคํฌ ํƒ์ง€ ํŠœํ† ๋ฆฌ์–ผ์„ ๋งˆ์น˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ํŠœํ† ๋ฆฌ์–ผ์„ ํ†ตํ•ด์„œ ๋ฐ์ดํ„ฐ์…‹์„ ์ „์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ๋ถ€ํ„ฐ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ๊นŒ์ง€ ์ง„ํ–‰ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ•™์Šต ํšŸ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ํ•ด๋ณด๋Š” ๋ฐฉ๋ฒ•๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž์‹ ์ด ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ์— ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์„ ์ž์œ ๋กญ๊ฒŒ ํ™œ์šฉํ•ด๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.