LE QUY DON
Technical University
VietnameseClear Cookie - decide language by browser settings

Enhancing Whisper Model for Vietnamese Specific Domain with Data Blending and LoRA Fine-Tuning

Phung, N.H. and Dang, D.T. and Ta, K.D. and Nguyen, K.T.A. and Tran, T.K. and Nguyen, C.T. (2024) Enhancing Whisper Model for Vietnamese Specific Domain with Data Blending and LoRA Fine-Tuning. In: International Conference on Intelligent Systems and Networks, ICISN 2024, 22 March 2024 Through 23 March 2024, Hanoi.

Full text not available from this repository. (Upload)

Abstract

Recent advancements in Automatic Speech Recognition (ASR), particularly driven by transformer-based architectures have significantly improved ASR accuracy and robustness. The trend towards utilizing large-scale models, such as OpenAI’s Whisper model has bridged the gap between automated systems and human-level performance in various languages. Despite the remarkable achievements of large-scale ASR models in general domains and high resource languages, their effective adaptation to specialized domains, such as military applications and low-resource languages like Vietnamese, poses significant challenges. This paper introduces a novel methodology designed to overcome these challenges. Leveraging a blend of publicly available general-domain data into a target-specific domain dataset, the proposed method aims to enhance Whisper model performance in a specific domain while requiring only a modest targeted-specific domain dataset for fine-tuning. To address resource constraints, we adopts LoRA, an efficient fine-tuning method that allows the fine-tuning of large-scale ASR models with limited computational capacity. Our methodology is evaluated on a self-collected military information retrieval dataset and different Vietnamese general domain datasets. This is one of the first studies that focus on enhancing performance of ASR model on Vietnamese specific domain. Our results show that the proposed method contributes to a 20 improvement in word error rate, and reduces a nearly 32 character error rate across the three versions of Whisper model: small, base and tiny. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.

Item Type: Conference or Workshop Item (Paper)
Divisions: Offices > Office of International Cooperation
Identification Number: 10.1007/978-981-97-5504-2₁₈
Uncontrolled Keywords: Network security; Speech enhancement, Automatic speech recognition; Data blending; Fine tuning; Large-scales; LoRA; Recognition accuracy; Recognition models; Specific domain; Vietnamese; Whisper, Speech recognition
Additional Information: Conference of International Conference on Intelligent Systems and Networks, ICISN 2024 ; Conference Date: 22 March 2024 Through 23 March 2024; Conference Code:318189
URI: http://eprints.lqdtu.edu.vn/id/eprint/11379

Actions (login required)

View Item
View Item