구문 패턴과 키워드 집합을 이용한 통계적 자동 문서 분류의 성능 향상

구문 패턴과 키워드 집합을 이용한 통계적 자동 문서 분류의 성능 향상
Improving the Performance of Statistical Automatic Text Categorization by using Phrasal Patterns and Keyword Sets

ㆍ 저자명: 한정기,박민규,조광제,김준태,Han. Jeong-Gi,Park. Min-Gyu,Jo. Gwang-Je,Kim. Jun-Tae
ㆍ 간행물명: 정보처리논문지
ㆍ 권/호정보: 2000년|7권 4호|pp.1150-1159 (10 pages)
ㆍ 발행정보: 한국정보처리학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

This paper presents an automatic text categorization model that improves the accuracy by combining statistical and knowledge-based categorization methods. In our model we apply knowledge-based method first, and then apply statistical method on the text which are not categorized by knowledge-based method. By using this combined method, we can improve the accuracy of categorization while categorize all the texts without failure. For statistical categorization, the vector model with Inverted Category Frequency (ICF) weighting is used. For knowledge-based categorization, Phrasal Patterns and Keyword Sets are introduced to represent sentence patterns, and then pattern matching is performed. Experimental results on new articles show that the accuracy of categorization can be improved by combining the tow different categorization methods.

다운URL