LE QUY DON
Technical University
VietnameseClear Cookie - decide language by browser settings

Towards An Accurate and Effective Printed Document Reader for Visually Impaired People

Viet, A.P. and Duy, D.L. and Thi, V.A.T. and Duy, H.P. and Van, T.V. and Thu, L.B. (2022) Towards An Accurate and Effective Printed Document Reader for Visually Impaired People. In: Conference of 14th International Conference on Knowledge and Systems Engineering, KSE 2022, 19 October 2022 Through 21 October 2022, Virtual, Online.

Full text not available from this repository. (Upload)

Abstract

This paper introduces a solution to assist visually impaired or blind (VIB) people in independently accessing printed and electronic documents. The highlight of the solution is the cost-effectiveness and accuracy. Extracting texts and reading out to users are performed by a pure smartphone application. To be usable by VIB people, advanced technologies in image and speech processing are leveraged to enhance the user experience and accuracy in converting images to texts. To build accurate optical character recognition (OCR) models with low-quality images, we combine different solutions includings 1) generating a large and balanced dataset with various backgrounds, 2) correcting the distortion and direction, and 3) applying the sequence to sequence model with transformers as the encoder. For ease of use, the text to speech (TTS) model generates voice instructions at every interaction, and the interface is designed and adjusted according to user feedback. A test on a scanned document set has showed the high accuracy of the OCR model with 98,6 by characters, and the fluency of the TTS model. As being indicated in a trial with VIB people, our application can help them read printed documents conveniently, and it is an affordable solution since the popularity of smartphones. © 2022 IEEE.

Item Type: Conference or Workshop Item (Paper)
Divisions: Faculties > Faculty of Information Technology
Identification Number: 10.1109/KSE56063.2022.9953768
Uncontrolled Keywords: Cost effectiveness; Image enhancement; Large dataset; Smartphones; Speech processing; Speech recognition, Blind people; Electronic document; Optical character recognition; Printed documents; Recognition models; Smart-phone applications; Speech models; Text to speech; Visually impaired people, Optical character recognition
Additional Information: Conference of 14th International Conference on Knowledge and Systems Engineering, KSE 2022 ; Conference Date: 19 October 2022 Through 21 October 2022; Conference Code:184621
URI: http://eprints.lqdtu.edu.vn/id/eprint/10716

Actions (login required)

View Item
View Item