지식베이스를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선

지식베이스를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선
Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Knowledgebase

ㆍ 저자명: 김광호,임민규,김지환,Kim. Kwang-Ho,Lim. Min-Kyu,Kim. Ji-Hwan
ㆍ 간행물명: 말소리
ㆍ 권/호정보: 2008년|68권 5호|pp.115-126 (12 pages)
ㆍ 발행정보: 대한음성학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using knowledgebase. A vocabulary in CSR is normally derived from a word frequency list. Therefore, the vocabulary coverage is dependent on a corpus. In the previous research, we presented an improved way of vocabulary generation using part-of-speech (POS) tagged corpus. We analyzed all words paired with 101 among 152 POS tags and decided on a set of words which have to be included in vocabularies of any size. However, for the other 51 POS tags (e.g. nouns, verbs), the vocabulary inclusion of words paired with such POS tags are still based on word frequency counted on a corpus. In this paper, we propose a corpus independent word inclusion method for noun-, verb-, and named entity(NE)-related POS tags using knowledgebase. For noun-related POS tags, we generate synonym groups and analyze their relative importance using Google search. Then, we categorize verbs by lemma and analyze relative importance of each lemma from a pre-analyzed statistic for verbs. We determine the inclusion order of NEs through Google search. The proposed method shows better coverage for the test short message service (SMS) text corpus.

키워드

Vocabulary Coverage Embedded speech recognition Knowledgebase

다운URL