Van, T.P. and Thanh, T.M. (2017) Vietnamese news classification based on BoW with keywords extraction and neural network. In: 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems, IES 2017, 15 November 2017 through 17 November 2017.
Vietnamese news classification based on BoW with keywords extraction and neural network..pdf
Download (1MB) | Preview
Abstract
Nowadays, text classification (TC) becomes the main applications of NLP (natural language processing). Actually, we have a lot of researches in classifying text documents, such as Random Forest, Support Vector Machines and Naive Bayes. However, most of them are applied for English documents. Therefore, the text classification researches on Vietnamese still are limited. By using a Vietnamese news corpus, we propose some methods to solve Vietnamese news classification problems. By employing the Bag of Words (BoW) with keywords extraction and Neural Network approaches, we trained a machine learning model that could achieve an average of ≊ 99.75% accuracy. We also analyzed the merit and demerit of each method in order to find out the best one to solve the text classification in Vietnamese news. © 2017 IEEE.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Divisions: | Faculties > Faculty of Information Technology |
Identification Number: | 10.1109/IESYS.2017.8233559 |
Uncontrolled Keywords: | Classification (of information); Decision trees; Extraction; Information retrieval; Learning algorithms; Natural language processing systems; Neural networks; Support vector machines; Keywords extraction; Machine learning models; News corpora; Nlp (natural language processing); Random forests; Text classification; Text document; Vietnamese; Text processing |
Additional Information: | Conference code: 132093. Language of original document: English. |
URI: | http://eprints.lqdtu.edu.vn/id/eprint/9653 |