LE QUY DON
Technical University
VietnameseClear Cookie - decide language by browser settings

An effective and efficient approach to classification with incomplete data

Tran, C.T. and Zhang, M. and Andreae, P. and Xue, B. and Bui, L.T. (2018) An effective and efficient approach to classification with incomplete data. Knowledge-Based Systems, 154. pp. 1-16. ISSN 9507051

Text
107. An effective and efficient approach to classification with incomplete data..pdf

Download (846kB) | Preview

Abstract

Many real-world datasets suffer from the unavoidable issue of missing values. Classification with incomplete data has to be carefully handled because inadequate treatment of missing values will cause large classification errors. Using imputation to transform incomplete data into complete data is a common approach to classification with incomplete data. However, simple imputation methods are often not accurate, and powerful imputation methods are usually computationally intensive. A recent approach to handling incomplete data constructs an ensemble of classifiers, each tailored to a known pattern of missing data. The main advantage of this approach is that it can classify new incomplete instances without requiring any imputation. This paper proposes an improvement on the ensemble approach by integrating imputation and genetic-based feature selection. The imputation creates higher quality training data. The feature selection reduces the number of missing patterns which increases the speed of classification, and greatly increases the fraction of new instances that can be classified by the ensemble. The results of experiments show that the proposed method is more accurate, and faster than previous common methods for classification with incomplete data. © 2018 Elsevier B.V.

Item Type: Article
Divisions: Faculties > Faculty of Information Technology
Identification Number: 10.1016/j.knosys.2018.05.013
Uncontrolled Keywords: Data handling; Feature extraction; Classification errors; Ensemble approaches; Ensemble learning; Ensemble of classifiers; Imputation; Incomplete data; Missing data; Real-world datasets; Classification (of information)
Additional Information: Language of original document: English.
URI: http://eprints.lqdtu.edu.vn/id/eprint/9537

Actions (login required)

View Item
View Item