4. Wh- Questionยถ
3์ฅ์์๋ ๊ฐ ์ง๋ฌธ์ ๋ํ True/False ๋ฌธ์ ๋ฅผ ์์ฑํด๋ณด์์ต๋๋ค. ์ด๋ฒ 4์ฅ์์๋ T5๋ชจ๋ธ๊ณผ BERT๋ชจ๋ธ๋ก Wh- ๋ฌธ์ ๋ฅผ ์์ฑํด๋ณด๊ฒ ์ต๋๋ค. Cambridge Dictionary์ ์ํ๋ฉด Wh- ๋ฌธ์ ๋ what, when, where, who, whom, which, whose, why, how๋ก ์์ํ๋ ๋ฌธ์ ๋ผ๊ณ ์ ์๋์ด ์์ต๋๋ค. ์ด๋ฒ์ฅ์์๋ ์ด๋ฌํ Wh- ๋ฌธ์ ๋ฅผ ๋ค์ง์ ๋ค(Multi-Choice)๋ก ์์ฑํด ๋ณผ ๊ฒ์ ๋๋ค.
4.1์ ์์๋ ๋ชจ๋ธ์ ์ ๋ ฅ๊ฐ์ผ๋ก ์ฌ์ฉํ ์์ธ์ด ๋ฐ์ดํฐ๋ฅผ ๋ถ๋ฌ์ค๊ณ 4.2์ ์์๋ ์ ๋ต ๋จ์ด๋ฅผ ์ถ์ถํ๋ ํจ์๋ฅผ ๋ง๋ค์ด ๋ณด๊ฒ ์ต๋๋ค. 4.3์ ์์๋ Wh- ๋ฌธ์ ๋ฅผ ์์ฑํ๋ ํด๋์ค, 4.4์ ์์๋ ๋ฌธ์ ๋ฅผ ํ๊ฐํ๋ ํด๋์ค, ๊ทธ๋ฆฌ๊ณ 4.5์ ์์๋ ์ค๋ต์ ์์ฑํ๋ ํด๋์ค๋ฅผ ์ ์ํ๋๋ก ํ๊ฒ ์ต๋๋ค. ๋ง์ง๋ง์ผ๋ก 4.6์ ์์๋ ์ ์ํ ํด๋์ค๋ค์ ์ฌ์ฉํ์ฌ ์ค์ ์์ธ์ด ์ง๋ฌธ์์ Wh- ๋ฌธ์ ๋ฅผ ์์ฑํด๋ณด๋๋ก ํ๊ฒ ์ต๋๋ค.
4.1 ๋ฐ์ดํฐ์ ๋ค์ด๋ก๋ยถ
๋ฌธ์ ์์ฑ์ ์์ 2์ฅ์์ ์ ์ฅํ ๋ฐ์ดํฐ์
์ ๋ถ๋ฌ์ค๋๋ก ํ๊ฒ ์ต๋๋ค. ์ด ๋ฐ์ดํฐ์
์ ์ ์ฒ๋ฆฌ๋ฅผ ํตํด ์ฌ๋ฐ๋ฅธ ํํ์ผ๋ก ์์ ํ์ฌ ์ ์ฅํด๋ ๋ฐ์ดํฐ์
์
๋๋ค. 2์ฅ์์ ์ง์ ์ ์ฅํ ํ์ผ์ ๋ถ๋ฌ์ค๊ฑฐ๋, ํน์ ๊ฐ์ง์ฐ๊ตฌ์ ๊นํ๋ธ์ ์ ์ฅ๋ ํ์ผ์ ๋ถ๋ฌ์ ์ฌ์ฉํ ์ ์์ต๋๋ค. CoNLL+BEA_corrected_essays.pkl
ํ์ผ ์์๋ ์ด 170๊ฐ์ ์ง๋ฌธ์ด ์กด์ฌํ๋ฉฐ, ์ด ๋ฐ์ดํฐ๋ Wh- ๋ฌธ์ ๋ฅผ ๋ง๋ค ์ง๋ฌธ์ผ๋ก ์ฌ์ฉ๋ฉ๋๋ค.
!git clone https://github.com/Pseudo-Lab/Tutorial-Book-Utils
!python Tutorial-Book-Utils/PL_data_loader.py --data NLP-QG
Cloning into 'Tutorial-Book-Utils'...
remote: Enumerating objects: 30, done.
remote: Counting objects: 100% (30/30), done.
remote: Compressing objects: 100% (24/24), done.
remote: Total 30 (delta 9), reused 18 (delta 5), pack-reused 0
Unpacking objects: 100% (30/30), done.
CoNLL+BEA_corrected_essays.pkl is done!
import pickle
file_name = "CoNLL+BEA_corrected_essays.pkl"
open_file = open(file_name, "rb")
data = pickle.load(open_file)
open_file.close()
len(data)
170
์ฒซ ๋ฒ์งธ ์ง๋ฌธ์ ์์๋ก ์ถ๋ ฅํด๋ณด๊ฒ ์ต๋๋ค.
print(data[0])
Keeping the Secret of Genetic Testing What is genetic risk? Genetic risk refers to your chance of inheriting a disorder or disease. People get certain diseases because of genetic changes. How much a genetic change tells us about your chance of developing a disorder is not always clear. If your genetic results indicate that you have gene changes associated with an increased risk of heart disease, it does not mean that you definitely will develop heart disease. The opposite is also true. If your genetic results show that you do not have changes associated with an increased risk of heart disease, it is still possible that you develop heart disease. However, for some rare diseases, people who have certain gene changes are guaranteed to develop the disease. When we are diagnosed with certain genetic diseases, are we suppose to disclose this result to our relatives? My answer is no. On one hand, we do not want this potential danger havingfrightening effects in our families' later lives. When people around us know that we have certain diseases, their attitude will easily change, whether caring for us too much or keeping away from us. And both are not what we want since most of us just want to live as normal people. Surrounded by such concerns, it is very likely that we are distracted and worry about these problems. It is a concern that will be with us during our whole life, because we never know when the ''potential bomb'' will explode. On the other hand, if there are ways that can help us to control or cure the disease, we can go through these processes from the scope of the whole family. For example, if exercising is helpful reducing family potential disease, we can always look for more chances for the family to do exerciseso we keep track of all family members health conditions. At the same time, we are prepared to know when there are other members who have got this disease. Here I want to share Forests'sview on this issue. Although some people feel that an individual who is found to carry a dominant gene for Huntington's disease has an ethical obligation to disclose that fact to his or her siblings, there currently is no legal requirement to do so. In fact, requiring someone to communicate his or her own genetic risk to family members who are therefore also at risk is considered by many to be ethically dubious." Nothing is absolutely right or wrong. If a certain genetic test is very accurate and it is unavoidable and necessary to get treatment and tell others, it is OK to disclose the result. Above all, life is more important than secrets.
ํจ์๋ฅผ ์ ์ํ๊ธฐ ์ ์ 4์ฅ์์ ์ฌ์ฉํ ํจํค์ง๋ค์ import ํด๋ณด๊ฒ ์ต๋๋ค. ๊ฐ ํ๊ฒฝ์ ๋ฐ๋ผ ํจํค์ง๊ฐ ์์ ์ ์์ผ๋ ํ์ธํ์๊ณ ์ค์นํ์๊ธฐ ๋ฐ๋๋๋ค. (Colab ํ๊ฒฝ์์๋ benepar
ํจํค์ง๊ฐ ์์ด ์๋ ์ฝ๋๋ฅผ ํตํด ์ค์นํด์ค๋๋ค.)
benepar
์ nltk
๋ ํ
์คํธ ํ์ฑ๊ณผ ํ ํฌ๋์ด์ง์ ์ฌ์ฉํ ์ ์๋ ํจํค์ง์
๋๋ค. ๋ํ pandas
๋ ๊ธฐ๋ณธ์ ์ธ ๋ฐ์ดํฐ ํ๋ ์์ ์ํ ํจํค์ง์ด๊ณ , numpy
๋ ์์น์ฐ์ฐ์ ์ฌ์ฉํ๋ ํจํค์ง์
๋๋ค. ๊ทธ๋ฆฌ๊ณ torch
๋ ๋ชจ๋ธ ๊ตฌ์ถ ํ๋ ์์ํฌ์ด๊ณ , ๋ค์ํ ํ ํฐํ ๊ธฐ๋ฒ๊ณผ ๋ชจ๋ธ๋ค์ด ๋ด์ฅ๋์ด ์๋ transformers
ํจํค์ง๋ ์์ต๋๋ค.
!pip install -q benepar
import benepar
benepar.download('benepar_en3')
benepar_parser = benepar.Parser("benepar_en3")
import nltk
nltk.download('punkt')
from nltk import sent_tokenize, word_tokenize
import pandas as pd
import numpy as np
import torch
from transformers import (
AutoTokenizer,
AutoModelForSeq2SeqLM,
AutoModelForSequenceClassification,
pipeline
)
[nltk_data] Downloading package benepar_en3 to /root/nltk_data...
[nltk_data] Unzipping models/benepar_en3.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
๋ณธ๊ฒฉ์ ์ธ ๋ชจ๋ธ๋ง์ ์์ ์คํ ํ๊ฒฝ์์ GPU๊ฐ ์ฌ์ฉ ๊ฐ๋ฅํ์ง ํ์ธํ ์ ์์ต๋๋ค. cuda
๊ฐ ์ถ๋ ฅ๋๋ฉด GPU๋ฅผ ์ฌ์ฉํ ์ ์๋ ๊ฒ์
๋๋ค. ๋จ, GPU๊ฐ ์๋ค๋ฉด CPU๋ฅผ ์ฌ์ฉํด๋ ๊ฐ๋ฅํฉ๋๋ค.
print('cuda' if torch.cuda.is_available() else 'cpu')
cuda
4.2 ์ ๋ต ๋จ์ด ์ถ์ถยถ
Wh- ๋ฌธ์ ๋ฅผ ์์ฑํ๊ธฐ ์ํ ์ฒซ ๋ฒ์งธ ๋จ๊ณ๋ ์ง๋ฌธ์์ ์ ๋ต์ผ๋ก ์ฌ์ฉํ ๋จ์ด๋ฅผ ์ถ์ถํ๋ ๊ฒ์
๋๋ค. ๊ฐ ์ง๋ฌธ์์ benepar_parser tree
๋ฅผ ํตํด ๊ฐ์ฒด๋ช
์ ์ธ์ํ๊ณ ๋ช
์ฌ๊ตฌ์ ํด๋นํ๋ ๋จ์ด๋ค๋ง ์ถ์ถํ์ฌ ๋ฆฌ์คํธ๋ก ์ ์ฅํฉ๋๋ค.
def get_flattened(t):
sent_str_final = None
if t is not None:
sent_str = [" ".join(x.leaves()) for x in list(t)]
sent_str_final = [" ".join(sent_str)]
sent_str_final = sent_str_final[0]
return sent_str_final
def get_NP(doc):
answers = []
trees = benepar_parser.parse_sents(sent_tokenize(doc))
for sent_idx, tree in enumerate(trees):
subtrees = tree.subtrees()
for subtree in subtrees:
if subtree.label() == "NP":
answers.append(get_flattened(subtree))
return answers
4.3 ๋ฌธ์ ์์ฑ ํด๋์ค ์ ์ยถ
์ด๋ฒ์๋ Wh- ๋ฌธ์ ๋ฅผ ์์ฑํ๋ ํด๋์ค๋ฅผ ์ ์ํด๋ณด๊ฒ ์ต๋๋ค. ์์์ ์ถ์ถํ ์ ๋ต ๋จ์ด๋ค๊ณผ ์ง๋ฌธ์ T5๊ธฐ๋ฐ์ Seq2SeqLM๋ชจ๋ธ์ ๋ฃ์ด ๋ฌธ์ ๋ฅผ ์์ฑํ๊ฒ ๋ฉ๋๋ค. ์ฌ๊ธฐ์ ์ฌ์ฉ๋ ํ ํฌ๋์ด์ ์ ๋ชจ๋ธ์ huggingface๊ฐ ์ ๊ณตํ๋ ์ฌ์ ํ์ต๋ชจ๋ธ(API)์ ๊ฐ์ ธ์ ์ฌ์ฉํ์์ต๋๋ค. ์ด ๋, AutoModel API๋ ํ์ต ๊ฐ์ค์น์ ๋ฃ๊ณ ์ ํ๋ ๋ชจ๋ธ์ ์๋์ผ๋ก ์ฐพ์์ ์์ฑํด์ฃผ๋ API์ ๋๋ค. ์๋ฅผ ๋ค์ด T5๋ชจ๋ธ์ ๊ฒฝ๋ก์ ๋ฃ์ด์ฃผ๋ฉด ์๋์ผ๋ก T5๊ตฌ์กฐ๋ฅผ ์์ฑํด ๊ทธ ์์ ํ์ต ๊ฐ์ค์น๋ฅผ ๋ฃ์ด์ฃผ๊ณ , BERT๋ชจ๋ธ์ ๊ฒฝ๋ก์ ๋ฃ์ด์ฃผ๋ฉด BERT๊ตฌ์กฐ๋ฅผ ์์ฑํด ํ์ต ๊ฐ์ค์น๋ฅผ ๋ฃ์ด์ฃผ๊ฒ ๋ฉ๋๋ค. ์ด๋ฒ QuestionGenerator ํด๋์ค์์๋ iarfmoose/t5-base-question-generator๋ฅผ ๋ถ๋ฌ์์ต๋๋ค.
class QuestionGenerator():
def __init__(self):
QG_PRETRAINED = "iarfmoose/t5-base-question-generator"
self.ANSWER_TOKEN = "<answer>"
self.CONTEXT_TOKEN = "<context>"
self.SEQ_LENGTH = 512
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.qg_tokenizer = AutoTokenizer.from_pretrained(QG_PRETRAINED, use_fast=False)
self.qg_model = AutoModelForSeq2SeqLM.from_pretrained(QG_PRETRAINED)
self.qg_model.to(self.device)
self.qg_model.eval()
def generate_question(self, answers):
questions = []
for ans in answers:
qg_input = "{} {} {} {}".format(self.ANSWER_TOKEN, ans, self.CONTEXT_TOKEN, passage)
encoded_input = self.qg_tokenizer(qg_input, padding='max_length', max_length=self.SEQ_LENGTH, truncation=True, return_tensors="pt").to(self.device)
with torch.no_grad():
output = self.qg_model.generate(input_ids=encoded_input["input_ids"])
question = self.qg_tokenizer.decode(output[0], skip_special_tokens=True)
questions.append(question)
return questions
q_generator = QuestionGenerator()
4.4 ๋ฌธ์ ํ๊ฐ ํด๋์ค ์ ์ยถ
๋ค์์ผ๋ก ์์ฑ๋ ๋ฌธ์ ์ ์ ๋ต์์ ํ๊ฐํ๋ ํด๋์ค๋ฅผ ์์ฑํด๋ณด๊ฒ ์ต๋๋ค. ๋ฌธ์ ์ ์ ๋ต๊ฐ์ ์๋ฏธ์ ๊ด๋ จ์ฑ์ Sequence Classification๋ชจ๋ธ์ ํตํด ์ ์๋งค๊ธฐ๊ฒ ๋ฉ๋๋ค. QAEvaluator ํด๋์ค์์๋ iarfmoose/bert-base-cased-qa-evaluator๋ฅผ ์ฌ์ฉํ์์ต๋๋ค.
class QAEvaluator():
def __init__(self):
QAE_PRETRAINED = "iarfmoose/bert-base-cased-qa-evaluator"
self.SEQ_LENGTH = 512
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.qae_tokenizer = AutoTokenizer.from_pretrained(QAE_PRETRAINED)
self.qae_model = AutoModelForSequenceClassification.from_pretrained(QAE_PRETRAINED)
self.qae_model.to(self.device)
def encode_qa_pairs(self, questions, answers):
encoded_pairs = []
for i in range(len(questions)):
encoded_qa = self.qae_tokenizer(text=questions[i], text_pair=answers[i], padding="max_length", max_length=self.SEQ_LENGTH, truncation=True, return_tensors="pt")
encoded_pairs.append(encoded_qa.to(self.device))
return encoded_pairs
def get_scores(self, encoded_qa_pairs):
scores = {}
self.qae_model.eval()
with torch.no_grad():
for i in range(len(encoded_qa_pairs)):
scores[i] = self.qae_model(**encoded_qa_pairs[i])[0][0][1]
return [k for k, v in sorted(scores.items(), key=lambda item: item[1], reverse=True)]
qa_evaluator = QAEvaluator()
4.5 ์ค๋ต ์์ฑ ํด๋์ค ์ ์ยถ
๋ง์ง๋ง์ผ๋ก ์ค๋ต์ ์์ฑํ๋ ํด๋์ค๋ฅผ ์ ์ํด๋ณด๊ฒ ์ต๋๋ค. 4.3์ ์์ ์ถ์ถํ ์ ๋ต ๋จ์ด๋ฅผ ์ ์ธํ๊ณ 4๊ฐ์ ๋ณด๊ธฐ ์ค 3๊ฐ์ ์ค๋ต์ด ํ์ํฉ๋๋ค. ์ด๋ ์ ๋ต ๋จ์ด์์ ๋ช ์ฌ์ ํด๋นํ๋ ์ฒซ ๋จ์ด๋ฅผ ์ถ์ถํด ๋ง์คํนํ๊ณ , BERT๋ชจ๋ธ์ ํตํด ๊ทธ ๋ถ๋ถ์ ๋ค๋ฅธ ๋จ์ด๋ก ์์ฑํ๊ฒ ๋ฉ๋๋ค. DistractorGenerator ํด๋์ค์์๋ bert-base-cased๋ฅผ ์ฌ์ฉํ์์ต๋๋ค.
class DistractorGenerator():
def __init__(self):
self.unmasker = pipeline('fill-mask', model='bert-base-cased')
def generate_distractor(self, text, candidate, answers, NNs: list):
distractor = []
divided = word_tokenize(text)
substitute_word = NNs[0]
mask_index = divided.index(substitute_word)
divided.pop(mask_index)
divided.insert(mask_index, '[MASK]')
text = ' '.join(divided)
unmasked_result = self.unmasker(text, top_k=10)[candidate]
text = unmasked_result["sequence"]
answers = answers.split(' ')
answer_index = answers.index(substitute_word)
answers.pop(answer_index)
answers.insert(answer_index, unmasked_result["token_str"])
return " ".join(answers)
def get_NN(distractor):
NNs = []
tree = benepar_parser.parse(distractor)
subtrees = tree.subtrees()
for subtree in subtrees:
if subtree.label() in ["NN", "NNP", "NNS", "VB"]: #VB for edge case
NNs.extend(subtree.leaves())
return NNs
d_generator = DistractorGenerator()
4.6 Wh- ๋ฌธ์ ์์ฑยถ
์ง๊ธ๊น์ง ์ ์ํ ํด๋์ค๋ฅผ ์ฌ์ฉํ์ฌ ์ง๋ฌธ์ ๋ํ Wh- ๋ฌธ์ ๋ฅผ ์์ฑํด๋ณด๊ฒ ์ต๋๋ค. ๋จผ์ ์์ฑํ ๋ฌธ์ ์ ๋ณด๊ธฐ๋ค์ ์ ์ฅํ ๋ฐ์ดํฐ ํ๋ ์์ ์์ฑํ๊ณ , ์ฌ์ฉํ ์ง๋ฌธ 20๊ฐ๋ฅผ ์ ์ ํฉ๋๋ค.(20๊ฐ์ ๋ฒํธ๋ ์์๋ก ์ ์ ํ์์ต๋๋ค.)
df_WHQuestions = pd.DataFrame({'id': np.zeros(20),
'passage': np.zeros(20),
'question': np.zeros(20),
'distractor_1': np.zeros(20),
'distractor_2': np.zeros(20),
'distractor_3': np.zeros(20),
'distractor_4': np.zeros(20)})
passage_id_list = [163, 28, 62, 57, 35, 26, 22, 151, 108, 55, 59, 129, 167, 143, 50, 161, 107, 56, 114, 71]
์ ์ํ ํด๋์ค๋ค์ ์์๋๋ก ์งํํฉ๋๋ค. ์ค๊ฐ์ ์ ๋ต์ด 4๋จ์ด๋ฅผ ๋์ง ์๋๋ก ์ ํํ๋ ๊ณผ์ ๊ณผ ์ค๋ต์ ์์ฑํ๊ธฐ ์ ์ ์ ๋ต์ด ์๋ ๋ฌธ์ฅ์ ์ฐพ์์ฃผ๋ ๊ณผ์ ์ด ํฌํจ๋์ด ์์ต๋๋ค. ๊ทธ๋ฆฌ๊ณ ์ต์ข ์์ฑํ ๋ฌธ์ , ์ ๋ต, ์ค๋ต์ ๋ฐ์ดํฐ ํ๋ ์์ ์ ์ฅํฉ๋๋ค.
df_idx = 0
for passage_id in passage_id_list:
passage = data[passage_id]
answers = get_NP(passage)
questions = q_generator.generate_question(answers)
encoded_qa_pairs = qa_evaluator.encode_qa_pairs(questions, answers)
scores = qa_evaluator.get_scores(encoded_qa_pairs)
## ์ ๋ต์ ๋จ์ด ๊ฐ์ len() <= 4 ์ฌ์ฉํ๋ค.
for i in range(len(scores)):
index = scores[i]
if len(answers[index].split(' ')) > 4:
continue
break
sentences = nltk.sent_tokenize(passage)
for sentence in sentences:
if answers[index] in sentence:
target_sentence = sentence
NNs = get_NN(answers[index])
distractors = []
for i in range(3):
distractors.append(d_generator.generate_distractor(target_sentence, 9-i, answers[index], NNs))
df_WHQuestions.loc[df_idx] = [passage_id, passage, questions[index].split("?")[0] + "?", answers[index]] + distractors
print(f"finished {passage_id}")
df_idx += 1
์ต์ข ์์ฑ๋ ๊ฒฐ๊ณผ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
df_WHQuestions
id | passage | question | distractor_1 | distractor_2 | distractor_3 | distractor_4 | |
---|---|---|---|---|---|---|---|
0 | 163.0 | The waters of the culinary seas had been calm ... | What is the best definition of a mother who ca... | A mother who works | A cook who works | A kid who works | A waitress who works |
1 | 28.0 | The world is increasingly becoming flat with a... | What are the disadvantages of social network s... | The cyber communication | The cyber network | The cyber environment | The cyber community |
2 | 62.0 | The best places for y... | What country is the best place to visit for yo... | a different country | a different village | a different generation | a different town |
3 | 57.0 | Puerquitour: A great experience for your mouth... | Where did we go to watch the movie Renoir? | a movie theatre | a new theatre | a big theatre | a pizza theatre |
4 | 35.0 | Nowadays, social media sites are commonly used... | What are the advantages of using social media ... | Substantial costs | Substantial parts | Substantial events | Substantial frames |
5 | 26.0 | Interpersonal skills, like any other skills re... | What are the two most popular social media app... | Skype and Facetime | Word and Facetime | Flash and Facetime | email and Facetime |
6 | 22.0 | Nowadays, with the advancement of technology, ... | What is the reason why a person may not have a... | anymore contact | anymore interactions | anymore dealings | anymore sex |
7 | 151.0 | In this century there have been many technolog... | What is the most important aspect of television? | The entertainment aspect | The technical aspect | The broadcasting aspect | The political aspect |
8 | 108.0 | I met a friend about one week ago, and he aske... | What is the feeling that you have now? | an awful feeling | an awful question | an awful think | an awful thinking |
9 | 55.0 | Dear Sir or Madam,\nI am writing to apply for ... | What do you hope to get from this job? | valuable experience | valuable ##s | valuable tips | valuable time |
10 | 59.0 | Anna knew that it was going to be a very speci... | What was the day she was going to meet her mot... | a very special day | a very special experience | a very special week | a very special one |
11 | 129.0 | On Britain's roads there is an ever-increasing... | What would be the easiest solution to this may... | an easy solution | an easy exit | an easy path | an easy reaction |
12 | 167.0 | According the Lunde, 35% of homicide victims a... | How many statistics are greater than Lundes' t... | Statistics from 56 | Appeals from 56 | quotes from 56 | gains from 56 |
13 | 143.0 | "In Vitro fertilisation" is the fertilisation ... | What is the definition of a woman who is given... | a post-menopausal woman | a post-menopausal man | a post-menopausal mother | a post-menopausal survivor |
14 | 50.0 | Dear Mrs. Ashby, \n\nYesterday I was in Green ... | What kind of food do you like to serve? | Italian pasta | Italian sandwiches | Italian sauce | Italian restaurants |
15 | 161.0 | Computers have definitely affected peoples liv... | What program does he use to make the calculati... | the Communications program | the Communications software | the Communications Unit | the Communications System |
16 | 107.0 | Cricket is my passion. I love playing, watchin... | What is the best way to learn more about cricket? | more about bowling | more about India | more about themselves | more about life |
17 | 56.0 | Well, I would like to talk about my school lif... | What is the best place to study in the UK? | university | UCLA | USC | Purdue |
18 | 114.0 | I have been learning English as a second langu... | How long ago did I decide to take the Cambridg... | One year | One hour | One decade | One weekend |
19 | 71.0 | Glad to hear that you've been invited to att... | How long will you wait for a candidate who's l... | one more minute | one more person | one more night | one more opportunity |
์์ฑ๋ ๋ฌธ์ ์ ๋ณด๊ธฐ๋ฅผ ๋ณด๋ฉด ๋ฌธ๋ฒ์ ์ผ๋ก๋ ์ด๋ ์ ๋ ์ ์์ฑํด๋ด๋ ๊ฒ ๊ฐ์ต๋๋ค. ํ์ง๋ง ๋จ์ํ๊ฒ ๋ช ์ฌ๋ฅผ ์ถ์ถํ์ฌ ๋ณด๊ธฐ๋ฅผ ๋ง๋ค์๊ธฐ ๋๋ฌธ์ ์ง๋ฌธ์์ ํฌ๊ฒ ์ค์ํ์ง ์์ ์ง๋ฌธ๋ค๋ ์๊ณ ์๋ฑํ ๋ณด๊ธฐ๋ ์์ด๋ณด์ ๋๋ค. ์ ๋ต๊ณผ ์ค๋ต์ ์ ๊ณ ๋ฅธ๋ค๋ฉด ์ข ๋ ์ ์๋ฏธํ ๋ฌธ์ ๋ฅผ ์์ฑํ ๊ฒ์ด๋ผ๊ณ ๊ธฐ๋๋์ด์ง๋๋ค.
์ง๊ธ๊น์ง Huggingface์์ ์ ๊ณตํ๋ ์ฌ์ ํ์ต๋ชจ๋ธ๋ค๋ก Wh- ๋ฌธ์ ๋ฅผ ์์ฑํด๋ณด์์ต๋๋ค. ๋ง์ฝ SQuAD์ ๊ฐ์ QA ๋ฐ์ดํฐ์ ์ด ์๋ค๋ฉด, ๊ฐ์ ์์ฑํ๊ณ ์ถ์ ๋๋ฉ์ธ์ผ๋ก ํ์ต์์ผ ํนํ๋ ์ง๋ฌธ์ ๋ง๋๋๋ฐ ํ์ฉํด๋ณด์๊ธฐ ๋ฐ๋๋๋ค.