LE QUY DON
Technical University
VietnameseClear Cookie - decide language by browser settings

Improving Phonetic Recognition with Sequence-length Standardized MFCC Features and Deep Bi-Directional LSTM

Van Toan, P. and Thanh, H.N. and Thanh, T.M. (2019) Improving Phonetic Recognition with Sequence-length Standardized MFCC Features and Deep Bi-Directional LSTM. In: 5th NAFOSTED Conference on Information and Computer Science, NICS 2018, 23 November 2018 through 24 November 2018.

Text
Improving Phonetic Recognition with Sequence-length Standardized MFCC Features and Deep Bi-Directional LSTM.pdf

Download (162kB) | Preview

Abstract

Phonetic recognition is one of the most challenging problems in the field of speech analysis. These applications can be mentioned such as dialect identification [1], mispronunciation detection [2], spoken document retrieval [3], and so on. There are different approaches to solve these problems such as improving the feature selection on input speech [4], applying deep learning technique [5] [6] [7] or combining both of them [8]. With the sequence data as the phonetics, the architecture which is based on recurrent neural network (RNN) is an appropriate approach [9]. It is even more powerful when combined with the improvement of features selection on input data. In our approach, we combine the Mel Frequency Cepstral Coefficients (MFCC) method with sequence-length to present the acoustic features of speech and use some RNN models to phonetic classification. Our experiments are implemented on the Texas Instruments Massachusetts Institute of Technology (TIMIT) [10] phone recognition dataset. Especially, our data processing and features selection method give consistently better results than other researches using the same neural network model. Currently, we have achieved the lowest error test rate (13.05%) by using Bidirectional LSTM, which is the best result in TIMIT dataset with the reduction of about 3.5% over the last best result [5] [6]. © 2018 IEEE.

Item Type: Conference or Workshop Item (Paper)
Divisions: Faculties > Faculty of Information Technology
Identification Number: 10.1109/NICS.2018.8606886
Uncontrolled Keywords: Data handling; Deep learning; Feature extraction; Information retrieval; Linguistics; Speech recognition; Statistical tests; Bidirectional LSTM; MFCC features; Phonetic recognition; Sequence lengths; TIMIT; Long short-term memory
Additional Information: Conference code: 144343. Language of original document: English.
URI: http://eprints.lqdtu.edu.vn/id/eprint/9407

Actions (login required)

View Item
View Item