Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records

Nguyen, B.P. and Pham, H.N. and Tran, H. and Nghiem, N. and Nguyen, Q.H. and Do, T.T.T. and Tran, C.T. and Simpson, C.R. (2019) Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records. Computer Methods and Programs in Biomedicine, 182: 105055. ISSN 1692607

Text
Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records.pdf
Download (868kB) | Preview

Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

Objective: Diabetes is responsible for considerable morbidity, healthcare utilisation and mortality in both developed and developing countries. Currently, methods of treating diabetes are inadequate and costly so prevention becomes an important step in reducing the burden of diabetes and its complications. Electronic health records (EHRs) for each individual or a population have become important tools in understanding developing trends of diseases. Using EHRs to predict the onset of diabetes could improve the quality and efficiency of medical care. In this paper, we apply a wide and deep learning model that combines the strength of a generalised linear model with various features and a deep feed-forward neural network to improve the prediction of the onset of type 2 diabetes mellitus (T2DM). Materials and methods: The proposed method was implemented by training various models into a logistic loss function using a stochastic gradient descent. We applied this model using public hospital record data provided by the Practice Fusion EHRs for the United States population. The dataset consists of de-identified electronic health records for 9948 patients, of which 1904 have been diagnosed with T2DM. Prediction of diabetes in 2012 was based on data obtained from previous years (2009–2011). The imbalance class of the model was handled by Synthetic Minority Oversampling Technique (SMOTE) for each cross-validation training fold to analyse the performance when synthetic examples for the minority class are created. We used SMOTE of 150 and 300 percent, in which 300 percent means that three new synthetic instances are created for each minority class instance. This results in the approximated diabetes:non-diabetes distributions in the training set of 1:2 and 1:1, respectively. Results: Our final ensemble model not using SMOTE obtained an accuracy of 84.28%, area under the receiver operating characteristic curve (AUC) of 84.13%, sensitivity of 31.17% and specificity of 96.85%. Using SMOTE of 150 and 300 percent did not improve AUC (83.33% and 82.12%, respectively) but increased sensitivity (49.40% and 71.57%, respectively) with a moderate decrease in specificity (90.16% and 76.59%, respectively). Discussion and conclusions: Our algorithm has further optimised the prediction of diabetes onset using a novel state-of-the-art machine learning algorithm: the wide and deep learning neural network architecture. © 2019 Elsevier B.V.

Item Type:	Article
Divisions:	Faculties > Faculty of Information Technology
Identification Number:	10.1016/j.cmpb.2019.105055
Uncontrolled Keywords:	Deep neural networks; Developing countries; E-learning; Feedforward neural networks; Forecasting; Gradient methods; Health; Hospitals; Machine learning; Network architecture; Population statistics; Records management; Stochastic models; Stochastic systems; Electronic health record; Electronic health record (EHRs); Incidence; Onset; Receiver operating characteristic curves; Stochastic gradient descent; Synthetic minority over-sampling techniques; Type 2 diabetes mellitus; Learning algorithms; glucose; insulin; adult; aged; area under the curve; Article; deep feed forward neural network; deep learning; diastolic blood pressure; electronic health record; feed forward neural network; female; glucose blood level; human; insulin blood level; major clinical study; male; non insulin dependent diabetes mellitus; prediction; public hospital; receiver operating characteristic; sensitivity and specificity; systolic blood pressure; machine learning; non insulin dependent diabetes mellitus; Deep Learning; Diabetes Mellitus, Type 2; Electronic Health Records; Humans; Machine Learning
Additional Information:	Language of original document: English.
URI:	http://eprints.lqdtu.edu.vn/id/eprint/9216

Actions (login required)

: View Item