효율적 영어 구문분석을 위한 유한 오토마타를 이용한 문장분할

서지반출

기타언어초록

실용적 영한 기계번역 시스템은 긴 문장을 빠르게 분석하고 정확한 번역을 생성할 수 있어야 한다. 영어 구문분석의 복잡도는 문장의 길이가 길어질수록 급격하게 증가하고 또한 정확한 번역 생성도 그만큼 더 어려워진다. 본 논문에서는 긴 문장을 빠르고 정확하게 번역하기 위해 유한 오토마타를 이용하여 입력 문장에서 적절한 분할위치를 찾아 문장을 분할하는 방법을 제안한다. 입력 문장을 구성하는 단어의 유형이나 품사들의 순서를 이용하여 분할 가능 위치의 문맥을 정의하고 이를 유한 오토마타로 구성한다. 구성된 유한 오토마타를 이용하여 입력 문장에서 가능한 분할 위치의 문맥을 인식하여 후보 분할위치를 선정한다. 여러 개의 후보 분할 위치가 존재하는 경우, 후보 분할 위치의 순서와 단어의 유형간의 우선순위를 이용하여 적절한 분할위치를 선정한다. 실험을 통해 기존 문장분할 방법과의 문장분할의 정확률과 재현률을 비교하여 문장분할의 성능이 개선되었음을 보인다.

기타언어초록

The practical English-Korean machine translation system must have an ability of the fast parsing and the accurate target sentence generation for the long sentences. The parsing complexity exponentially increases as the length of the input sentences increases. It is also difficult to generate accurate translation for the long sentences. This paper proposes the intra-sentence segmentation method which uses the finite automata to identify the candidate segmentation positions in the input sentences. We define the context of the segmentable positions using the word types or the part-of-speeches of the words. Then, we construct the finite automata based on the defined context. We recognize the segmentable positions using the constructed finite automata and collect the candidate segmentation positions. In case of multiple candidate segmentation positions, we determine the segmentation positions by considering the priority among the word types of the candidate segmentation positions and the positions in the input sentence. In the experiment, we compare the segmentation precision and the recall with those by other methods. Through the comparison, we show that the proposed intra-sentence segmentation method improves the segmentation performance.

키워드

영한 기계번역 영어 구문분석 유한 오토마타 문장분할 English-Korean machine translation English syntactic analysis finite automata intra-sentence segmentation

다운URL