Phan, T.-S. and Duong, T.-C. and Dinh, A.-T. and Vu, T.-T. and Luong, C.-M. (2013) Improvement of naturalness for an HMM-based Vietnamese speech synthesis using the prosodic information. In: 2013 IEEE RIVF International Conference on Computing and Communication Technologies: Research, Innovation, and Vision for Future, RIVF 2013, 10 November 2013 through 13 November 2013, Hanoi.
Improvement of naturalness for an HMM-based Vietnamese speech synthesis using the prosodic information.pdf
Download (1MB) | Preview
Abstract
Natural-sounding synthesized speech is goal of HMM-based Text-to-Speech systems. Besides using context dependent tri-phone units from a large corpus speech database, many prosody features have been used in full-context labels to improve naturalness of HMM-based Vietnamese synthesizer. In the prosodic specification, tone, part-of-speech (POS) and intonation information are considered not as important as positional information. Context-dependent information includes phoneme sequence as well as prosodic information because the naturalness of synthetic speech highly depends on the prosody such as pause, tone, intonation pattern, and segmental duration. In this paper, we propose decision tree questions that use context-dependent tones and investigate the impact of POS and intonation tagging on the naturalness of HMM-based voice. Experimental results show that our proposed method can improve naturalness of a HMM-based Vietnamese TTS through objective evaluation and MOS test. © 2013 IEEE.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Divisions: | Faculties > Faculty of Information Technology |
Identification Number: | 10.1109/RIVF.2013.6719907 |
Additional Information: | Conference code: 102707. Language of original document: English. |
URI: | http://eprints.lqdtu.edu.vn/id/eprint/10031 |