HMM 기반 한국어 음성합성에서의 화자적응 방식 성능비교 및 지속시간 모델 개선

HMM 기반 한국어 음성합성에서의 화자적응 방식 성능비교 및 지속시간 모델 개선

ㆍ 저자명: 이혜민,김형순,Lee. Hea-Min,Kim. Hyung-Soon
ㆍ 간행물명: 말소리와 음성과학
ㆍ 권/호정보: 2012년|4권 3호|pp.111-117 (7 pages)
ㆍ 발행정보: 한국음성학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

In this paper, we compare the performance of several speaker adaptation methods for a HMM-based Korean speech synthesis system with small amounts of adaptation data. According to objective and subjective evaluations, a hybrid method of constrained structural maximum a posteriori linear regression (CSMAPLR) and maximum a posteriori (MAP) adaptation shows better performance than other methods, when only five minutes of adaptation data are available for the target speaker. During the objective evaluation, we find that the duration models are insufficiently adapted to the target speaker as the spectral envelope and pitch models. To alleviate the problem, we propose the duration rectification method and the duration interpolation method. Both the objective and subjective evaluations reveal that the incorporation of the proposed two methods into the conventional speaker adaptation method is effective in improving the performance of the duration model adaptation.

키워드

Speech synthesis HTS speaker adaptation duration model

다운URL