연관 규칙 마이닝을 이용한 영작문 형태-통사 오류 자동 탐지

서지반출

기타언어초록

본 연구에서는 일련의 연구에서 수집된 영작문 오류 유형의 정제된 자료를 토대로 연관 규칙을 생성하고, 학습을 통해서 효용성이 검증된 연관 규칙을 활용해서 영작문 데이터의 형태 통사 오류를 자동으로 탐지한다. 영작문 데이터에서 형태 통사 오류를 찾아내는 작업은 많은 시간과 자원이 소요되는 작업이므로 자동화가 필수적이다. 기존의 연구들이 통계적 모델을 활용한 어휘적 오류에 치중하거나 언어 이론적 틀에 근거한 통사 처리에 집중하는 반면에, 본 연구는 데이터 마이닝을 통해서 정제된 데이터에서 연관 규칙을 생성하고 이를 검증한 후 형태 통사 오류를 감지한다. 이전 연구들에서는 이론적 틀에 맞추어진 규칙 생성이나 언어 모델 생성을 위한 대량의 코퍼스 데이터와 같은 다량의 지식 베이스 생성이 필수적인데, 본 연구는 적은 양의 정제된 데이터를 활용한다. 영작문 오류 유형의 형태 통사 연관 규칙을 생성하기 위해서 Apriori 알고리즘을 활용하였다. 알고리즘을 통해서 생성된 연관 규칙 중 잘못된 규칙이 생성될 가능성이 있으므로, 상관성 검정, 코사인 유사도와 같은 규칙 효용성의 통계적 검증을 활용해서 타당한 규칙만을 학습하고 축적된 연관 규칙들을 영작문 오류를 자동으로 탐지하는 실험에 활용하였다. 연구 결과로 형태 통사적 문법 오류를 정확하게 탐지함을 알 수 있다.

기타언어초록

Since manual error detection of morpho-syntactic errors of English writing requires lots of time and resources, automation of error detection is essential in both Computer-Assisted Language Learning and English learning studies. This approach aims at automatic detection of morpho-syntactic errors of English writing using association rule mining, which needs three steps of procedures. As the first step, we generate association rules based on the refined data. Second, we statistically verify the generated rules. Third, we testify the verified rules on the test data. Previous studies have focused on either word errors based on the language models using large corpora, or the systems very specific to the complex grammatical theories. Meanwhile, this study uses relatively small amount of data. We used the Apriori algorithm for the rule mining task. Since rules generated by the algorithm can contains lots of noise to be reduced, we apply statistical machine learning methods using correlation coefficient and cosine similarity. This process sifts valid mal-rules for the automatic detection tasks from lots of noise.

키워드

컴퓨터 언어 보조 학습 영작문 오류 자동 탐지 형태 통사 분석 영작문 연관 규칙 마이닝 데이터 마이닝 기계학습 Apriori Algorithm Computer-Assisted Language Learning English Writing Automatic Error Detection Morpho-syntactic Analysis English Writing Association Rule Mining

다운URL