태그간 의미 분석을 이용한 다중 문서 요약 기법

서지반출

기타언어초록

최근 인터넷의 급속한 발달과 보급으로 인하여 웹 상에서 생성되는 문서의 양은 하루가 다르게 증가하고 있다. 하지만 사용자는 많은 시간과 노력을 들여 자신이 원하는 정보가 담긴 문서를 찾기 위하여 검색 엔진의 도움을 받더라도 검색된 문서를 일일이 검토해야 한다는 어려움이 존재한다. 사용자들의 이러한 어려움을 해소하기 위하여 문서의 핵심을 효과적으로 요약하는 다중 문서 요약 기법에 대한 연구가 활발히 진행되어왔다. 그러나 기존의 다중 문서 요약 기법들은 대부분 확률이론 및 기계학습에 근거한 방법을 사용하기 때문에 학습 및 요약 과정에 높은 비용이 요구되며 새롭게 출현하는 고유명사들에 대한 분석이 용이하지 않다는 단점들을 가지고 있다. 본 논문은 이러한 단점들을 해결하기 위하여 고유명사를 포함한 문서 내에 존재하는 단어들에 대한 태그 클러스터를 폭소노미 시스템인 플리커를 이용하여 실시간으로 획득하여 분석시간과 계산비용을 줄이고, 단어들간 의미적 관계를 분석하여 중요 단어들을 추려내며 이를 기반으로 단어의 중요도와 다른 단어들간의 의미적인 관계를 분석하여 중요문장들을 추려내는 다중 문서 요약 기법을 제안한다.

기타언어초록

Recently, the amount of documents created on the web has been rapidly increasing day by day, due to the rapid propagation and the development of the internet. In order to find the necessary information, the user has to manual1y review all of the searched documents in spite of the assistance of search engines, and it requires too much time and effort. To address this problem, various multi-document summarization techniques have been studied to efficiently summarize the core of the original document. However, most of all existing multi-document summarization techniques suffer from a high cost in learning and summarization processes because their methods are mainly based on probability theory and machine learning, and failed to analyze the proper nouns which are emergently upraised. To overcome these drawbacks, we propose a novel multi-document summarization technique that analyze the importance of the key words and the semantic relatedness among them, and then detect the key sentences in the document based on the tag cluster of the each word including proper nouns in the document by exploiting Flickr which is one of the representative folksonomy systems to reduce analysis time and computing cost between each words in the documents.

키워드

다중 문서 요약 태그 클러스터 의미 분석 Multi-Document Summarization Tag Cluster Semantic Analysis

다운URL