20250514_ChatGpt와 공부하기

2025 SCG

20250514_ChatGpt와 공부하기_Day12

mawarikei 2025. 5. 14. 21:00

좋아, 오늘은 Hugging Face Transformers 라이브러리 심화 문제를 준비했어. 지금까지는 모델 불러오기, 분류, 요약, 번역 등 기본적인 예제를 다뤘다면, 오늘은 직접 모델을 활용해서 커스텀 작업을 수행해보는 실전 문제야.

✅ 오늘의 목표

문장 요약 + 감정 분석 연계 처리
사용자 질문에 대한 문맥 기반 QA
문장 임베딩 기반 유사도 비교
에러 핸들링을 고려한 실전 번역기 만들기

🧠 Problem 01: 감정 분석 기반 요약 필터링

주어진 여러 문장을 요약한 뒤, 긍정적인 내용만 필터링해서 출력하라.

from transformers import pipeline

summarizer = pipeline("summarization")
classifier = pipeline("sentiment-analysis")

texts = [
    "This movie had breathtaking visuals and a compelling story. I was deeply moved.",
    "The plot made no sense, and the acting was terrible. A total waste of time.",
    "Fantastic direction and well-developed characters. Highly recommend.",
    "The pacing was slow, and I fell asleep halfway through."
]

# 1. 각 문장을 요약
# 2. 요약된 문장을 감정 분석
# 3. 긍정적인 문장만 출력

🔍 Problem 02: 사용자 질문을 위한 Context 기반 QA

사용자의 질문과 컨텍스트를 기반으로 답을 출력하는 시스템을 구성하라.

from transformers import pipeline

qa = pipeline("question-answering")

context = """
OpenAI is a research and deployment company. Our mission is to ensure that artificial general intelligence benefits all of humanity.
We are governed by a nonprofit and unique capped-profit model. We build safe and powerful AI systems.
"""

question = "What is OpenAI's mission?"

# 답을 추론하고 출력

🔁 Problem 03: 임베딩 기반 유사도 비교

문장 쌍 중 가장 유사한 두 문장을 찾기 위한 코드 작성

from transformers import AutoTokenizer, AutoModel
import torch
from sklearn.metrics.pairwise import cosine_similarity

sentences = [
    "I love machine learning.",
    "Artificial intelligence is fascinating.",
    "I went to the grocery store yesterday.",
    "Deep learning models can recognize images."
]

# 1. 각 문장을 BERT 임베딩
# 2. 임베딩 벡터 간 유사도 계산
# 3. 가장 유사한 문장 쌍 출력

🌐 Problem 04: 번역기 예외처리 및 다국어 지원

번역 중 오류 발생 시 대체 번역기를 사용하고, 번역 불가능할 경우 적절히 처리하라.

from transformers import pipeline

src_text = "이 서비스는 매우 훌륭하고 사용자 친화적입니다."

try:
    # 1차 번역 (ko → en)
    translator = pipeline("translation", model="Helsinki-NLP/opus-mt-ko-en")
    result = translator(src_text)
    print("번역 결과:", result[0]["translation_text"])
except Exception as e:
    print("1차 번역 실패, 백업 모델 사용:", e)
    try:
        # 백업 번역기
        translator = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-ko-en")  # 예시
        result = translator(src_text)
        print("백업 번역 결과:", result[0]["translation_text"])
    except:
        print("모든 번역 실패. 입력 언어를 확인하세요.")

📌 실전에서는 번역 모델이 존재하지 않거나 인증 오류가 발생할 수 있으니 예외 처리가 중요해.

필요하면 각 문제에 대한 해설, 정답 코드, 또는 일부 힌트도 줄 수 있어.

준비되면 “1번부터 가자” 같이 말해줘.

# problem 01

from transformers import pipeline

summarizer = pipeline("summarization" , model="sshleifer/distilbart-cnn-12-6")

classifier = pipeline("sentiment-analysis")

texts = [

"This movie had breathtaking visuals and a compelling story. I was deeply moved.",

"The plot made no sense, and the acting was terrible. A total waste of time.",

"Fantastic direction and well-developed characters. Highly recommend.",

"The pacing was slow, and I fell asleep halfway through."

]

summary = summarizer(texts , max_length = 14 , min_length = 3 , do_sample = False)

# print(summary[2]['summary_text'])

summarized_text = [summary[i]['summary_text'] for i in range(len(texts))]

summarized_text

sentiment_analysis = classifier(summarized_text[0])

print(sentiment_analysis)

sentiment_analysis_list = [classifier(summarized_text[i]) for i in range(len(summarized_text))]

print(sentiment_analysis_list)

positive_sentiment_list = []

for i in range(len(sentiment_analysis_list)):

if sentiment_analysis_list[i][0]['label'] == 'POSITIVE':

positive_sentiment_list.append(summarized_text[i])

positive_sentiment_list

# problem 02

qa = pipeline("question-answering")

context = """

OpenAI is a research and deployment company. Our mission is to ensure that artificial general intelligence benefits all of humanity.

We are governed by a nonprofit and unique capped-profit model. We build safe and powerful AI systems.

"""

question = "What is OpenAI's mission?"

result = qa(question = question , context = context)

print(result)

# problem 03

from transformers import AutoTokenizer, AutoModel

import torch

from sklearn.metrics.pairwise import cosine_similarity

sentences = [

"I love machine learning.",

"Artificial intelligence is fascinating.",

"I went to the grocery store yesterday.",

"Deep learning models can recognize images."

]

# 1. 각 문장을 BERT 임베딩

# 2. 임베딩 벡터 간 유사도 계산

# 3. 가장 유사한 문장 쌍 출력

model_name = "bert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModel.from_pretrained(model_name)

inputs = tokenizer(sentences , return_tensors='pt' , padding = True , truncation = True)

with torch.no_grad():

outputs = model(**inputs)

sentence_embeddings = outputs.last_hidden_state[: , 0 , :]

print(sentence_embeddings)

for i in range(len(sentence_embeddings)):

for j in range(len(sentence_embeddings[i])):

sentence_embeddings[i][j] = 1 / sum(sentence_embeddings[i]) * sentence_embeddings[i][j]

import numpy as np

result_1 = torch.dot(sentence_embeddings[0] , sentence_embeddings[1])

result_2 = torch.dot(sentence_embeddings[0] , sentence_embeddings[2])

result_3 = torch.dot(sentence_embeddings[0] , sentence_embeddings[3])

result_4 = torch.dot(sentence_embeddings[1] , sentence_embeddings[2])

result_5 = torch.dot(sentence_embeddings[1] , sentence_embeddings[3])

result_6 = torch.dot(sentence_embeddings[2] , sentence_embeddings[3])

results_list = [result_1 , result_2, result_3, result_4, result_5, result_6]

results_dict = {idx : result for result , idx in enumerate(results_list)}

print(results_dict[max(results_list)])

if results_dict[max(results_list)] == 1:

print(sentences[0])

print(sentences[2])

# problem 04

src_text = "이 서비스는 매우 훌륭하고 사용자 친화적입니다."

try:

# 1차 번역 (ko → en)

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-ko-en")

result = translator(src_text)

print("번역 결과:", result[0]["translation_text"])

except Exception as e:

print("1차 번역 실패, 백업 모델 사용:", e)

try:

# 백업 번역기

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-ko-en") # 예시

result = translator(src_text)

print("백업 번역 결과:", result[0]["translation_text"])

except:

print("모든 번역 실패. 입력 언어를 확인하세요.")

'2025 SCG' 카테고리의 다른 글

20250510_ChatGpt와 공부하기_Day11 (0)	2025.05.10
20250509_ChatGpt와 공부하기_Day10 (0)	2025.05.09
20250508_ChatGpt와 공부하기_Day9 (0)	2025.05.08
20250507_ChatGpt와 공부하기_Day8 (0)	2025.05.07
20250506_ChatGpt와 공부하기_Day7 (0)	2025.05.06

현재글20250514_ChatGpt와 공부하기_Day12

Y_InventOry

24살 대학생의 글쓰기 연습하는 블로그입니다. 영화, 책, 음악, 유튜브 등 잡다한 것에 대한 글을 씁니다. 나중에 다시 읽었을 때 의미가 있는 글들이었으면 좋겠습니다.

다카마쓰, 아리무라카스미, 치히로상, 아야나미레이, 아침이 오면 공허해진다, 드라마, 힐링, 백엔의 사랑, 아스카랑그레이, 아이묭, 카라타에리카, 후쿠시소타, 카츠라기미사토, 영화리뷰, 가족의색깔, 나는내일어제의너와만난다, 평행우주이론, 고마츠나나, 일본영화, 로맨스,

Today :
Yesterday :