LE QUY DON
Technical University
VietnameseClear Cookie - decide language by browser settings

Improving performance of classification on incomplete data using feature selection and clustering

Tran, C.T. and Zhang, M. and Andreae, P. and Xue, B. and Bui, L.T. (2018) Improving performance of classification on incomplete data using feature selection and clustering. Applied Soft Computing Journal, 73. pp. 848-861. ISSN 15684946

Text
106. Improving performance of classification on incomplete data using feature selection and clustering. .pdf

Download (699kB) | Preview

Abstract

Missing values are an unavoidable issue in many real-world datasets. One of the most popular approaches to classification with incomplete data is to use imputation to replace missing values with plausible values. However, powerful imputation methods are too computationally intensive when applying a classifier to a new unknown instance. This paper proposes new approaches to integrating imputation, clustering and feature selection for classification with incomplete data in order to improve efficiency without loss of accuracy. Clustering is used to reduce the number of instances used by the imputation. Feature selection is used to remove redundant and irrelevant features of training data which greatly reduces the cost of imputation. The paper also investigates the ability of Differential Evolution (DE) to search feature subsets with incomplete data. Results show that the integration of imputation, clustering and feature selection not only improves classification accuracy, but also dramatically reduces the computation time required to estimate missing values when classifying new instances. © 2018 Elsevier B.V.

Item Type: Article
Divisions: Faculties > Faculty of Information Technology
Identification Number: 10.1016/j.asoc.2018.09.026
Uncontrolled Keywords: Data reduction; Evolutionary algorithms; Feature extraction; Optimization; Classification accuracy; Clustering; Differential Evolution; Improving performance; Imputation; Imputation methods; Incomplete data; Real-world datasets; Classification (of information)
Additional Information: Language of original document: English.
URI: http://eprints.lqdtu.edu.vn/id/eprint/9495

Actions (login required)

View Item
View Item