한국어 트위터의 감정 분류를 위한 기계학습의 실증적 비교

한국어 트위터의 감정 분류를 위한 기계학습의 실증적 비교

ㆍ 저자명: 임좌상,김진만,Lim. Joa-Sang,Kim. Jin-Man
ㆍ 간행물명: 멀티미디어학회논문지
ㆍ 권/호정보: 2014년|17권 2호|pp.232-239 (8 pages)
ㆍ 발행정보: 한국멀티미디어학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

온라인에서의 글쓰기가 늘어나면서, 기계학습을 통해 이를 분류하는 연구가 늘고 있다. 그럼에도 불구하고 한국어로 작성된 마이크로블로그를 대상으로 한 연구는 많지 않다. 또한 통계적으로 기계학습을 평가한 연구를 찾아보기 힘들다. 본 논문에서는 트위터를 대상으로, 표본을 추출하고, 형태소와 음절을 자질로 사용하여 기계학습에 따라 감정을 분류하였다. 그 결과 약 76%정도 트위터에 포함된 감정이 분류되었다. Support Vector Machine이 Na$ddot{i}$ve Bayes보다 정확했고, 선형모델도 비구조적인 텍스트 처리에 비선형모델에 상응하는 정확성을 보였다. 또한 형태소가 음절 자질에 비해 높은 정확성을 보이지 않았다.

기타언어초록

As online texts have been rapidly growing, their automatic classification gains more interest with machine learning methods. Nevertheless, comparatively few research could be found, aiming for Korean texts. Evaluating them with statistical methods are also rare. This study took a sample of tweets and used machine learning methods to classify emotions with features of morphemes and n-grams. As a result, about 76% of emotions contained in tweets was correctly classified. Of the two methods compared in this study, Support Vector Machines were found more accurate than Na$ddot{i}$ve Bayes. The linear model of SVM was not inferior to the non-linear one. Morphological features did not contribute to accuracy more than did the n-grams.

키워드

기계학습 서포트벡터머신 나이브베이즈 트위터감성분류 Machine Learning Support Vector Machine Na$ddot{i}$ve Bayes Twitter emotion classification

다운URL