한국어 Hedge 문장 인식을 위한 태깅 말뭉치 및 단서어구 패턴 구축

한국어 Hedge 문장 인식을 위한 태깅 말뭉치 및 단서어구 패턴 구축

ㆍ 저자명: 정주석,김준혁,김해일,오성호,강신재,Jeong. Ju-Seok,Kim. Jun-Hyeouk,Kim. Hae-Il,Oh. Sung-Ho,Kang. Sin-Jae
ㆍ 간행물명: 한국지능시스템학회 논문지
ㆍ 권/호정보: 2011년|21권 6호|pp.761-766 (6 pages)
ㆍ 발행정보: 한국지능시스템학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

Hedge는 불확실함을 나타내는 언어적 표현으로, 저자가 자신의 글에 내포된 내용이 불확실하거나 의심이 갈 때 사용한다. 이러한 불확실성 때문에 hedge가 포함된 문장은 사실이 아닌 문장으로 간주된다. 문장이 사실인지 아닌지를 판단하는 것은 여러 응용에서 사용될 수 있는데, 정보검색, 정보추출, 질의응답 등의 응용분야에서 전처리 과정으로 사용되어, 보다 정확한 결과를 얻게 한다. 본 논문에서는 한국어 hedge 말뭉치를 구축하고, 이로부터 hedge 단서 어구들을 추출하여 일반화된 단서어구 패턴을 구축한 후, 한국어 hedge 인식 실험을 하였다. 실험을 통하여 78.6%의 F1-measure값을 얻을 수 있었다.

기타언어초록

A hedge is a linguistic device to express uncertainties. Hedges are used in a sentence when the writer is uncertain or has doubt about the contents of the sentence. Due to this uncertainty, sentences with hedges are considered to be non-factual. There are many applications which need to determine whether a sentence is factual or not. Detecting hedges has the advantage in information retrieval, and information extraction, and QnA systems, which make use of non-hedge sentences as target to get more accurate results. In this paper, we constructed Korean hedge corpus, and extracted generalized hedge cue-word patterns from the corpus, and then used them in detecting hedges. In our experiments, we achieved 78.6% in F1-measure.

키워드

Hedge 인식 문장 분류 태깅 도구 한국어 hedge 말뭉치 자연어 처리 Hedge detection Sentence classification Tagging tool Korean hedge-tagged corpus Natural language processing

다운URL