자질집합선택 기반의 기계학습을 통한 한국어 기본구 인식의 성능향상

자질집합선택 기반의 기계학습을 통한 한국어 기본구 인식의 성능향상

ㆍ 저자명: 황영숙,정후중,박소영,곽용재,임해창,Hwang. Young-Sook,Chung. Hoo-jung,Park. So-Young,Kwak. Young-Jae,Rim. Hae-Chang
ㆍ 간행물명: 정보과학회논문지. Journal of KIISE. 소프트웨어 및 응용
ㆍ 권/호정보: 2002년|29권 9호|pp.654-668 (15 pages)
ㆍ 발행정보: 한국정보과학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

In this paper, we present an empirical study for improving the Korean text chunking based on machine learning and feature set selection approaches. We focus on two issues: the problem of selecting feature set for Korean chunking, and the problem of alleviating the data sparseness. To select a proper feature set, we use a heuristic method of searching through the space of feature sets using the estimated performance from a machine learning algorithm as a measure of "incremental usefulness" of a particular feature set. Besides, for smoothing the data sparseness, we suggest a method of using a general part-of-speech tag set and selective lexical information under the consideration of Korean language characteristics. Experimental results showed that chunk tags and lexical information within a given context window are important features and spacing unit information is less important than others, which are independent on the machine teaming techniques. Furthermore, using the selective lexical information gives not only a smoothing effect but also the reduction of the feature space than using all of lexical information. Korean text chunking based on the memory-based learning and the decision tree learning with the selected feature space showed the performance of precision/recall of 90.99%/92.52%, and 93.39%/93.41% respectively.

키워드

한국어 문장의 기본구 인식 기계학습 자질집합선택 결정트리 학습 메모리 기반 학습 Korean Base Phrase Recognition Machine Learning Decision Tree Feature Set Selection Memory -based Learning

다운URL