Phan, A.V. and Nguyen, K.D.T. and Bui, L.T. (2022) Semi-supervised multitask learning using convolutional autoencoder for faulty code detection with limited data. Applied Intelligence. ISSN 0924669X
Full text not available from this repository. (Upload)Abstract
Detecting faults in source code to fix is an important task in the software quality assurance. Building automated detectors using machine learning has been faced two big challenges of data imbalance and shortages. To address the issues, this paper proposes a deep neural network and training procedures to allow learning with limited annotated data. The network is composed of an unsupervised auto-encoder and a supervised classifier. The two components share some first layers that plays as a program feature extractor. Interestingly, we can leverage a large amount of unlabeled data from various sources to train the auto-encoder independently then transfer to the target domain. Additionally, sharing layers, and jointly training the reconstruction and the classification tasks stimulate the generation of the sophisticated features. We conducted the experiments on four real datasets with different amount of labeled data and with adding more unlabeled data. The results have confirmed that the multi-task outperforms the single-task and leveraging the unlabeled data is beneficial. Specifically, when reducing the labeled data from 100 to 75, 50, 25, the performance of several deep networks drops sharply, while it reduces gradually for our model. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Item Type: | Article |
---|---|
Divisions: | Faculties > Faculty of Information Technology |
Identification Number: | 10.1007/s10489-022-03663-5 |
Uncontrolled Keywords: | Computer software selection and evaluation; Deep neural networks; Learning algorithms; Quality assurance; Signal encoding; Supervised learning, Auto encoders; Code detection; Convolutional autoencoder; Faulty code detection; Labeled data; Limited data; Self-supervised learning; Semi-supervised; Source codes; Unlabeled data, Convolution |
URI: | http://eprints.lqdtu.edu.vn/id/eprint/10464 |