A novel approach based on CatBoost and explainable artificial intelligence for diagnosis of COVID-19 cases using patients' symptoms

Document Type : Computer Article

Authors

1 Assistant Professor Of Department of Computer Hardware Engineering @ Faculty of Electrical & Computer Engineering

2 Department of Electrical and Computer Engineering, Semnan University, Semnan, Iran

3 Department of Computer Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran

Abstract

The COVID-19 virus, which was discovered in December 2019 in the city of Wuhan, China and quickly spread throughout the world, continues to be an important threat to the health of the world. Despite all the strategies used to deal with the spread of COVID-19, more contrivances are still needed to deal with its consequences. In this research, the clinical characteristics of people have been used as input data to diagnose a person with COVID-19, which is the result of collecting information from similar studies. Also, various algorithms including support vector machine, logistic regression, k nearest neighbor (k=9), simple bayes, random forest, LightGBM, XgBoost and CatBoost have been used, among which the CatBoost algorithm, with a sensitivity of 97.97%, accuracy 97.72% and 96.96% accuracy showed the best results. In this algorithm, the trial and error method has been used to adjust hyperparameters as accurately as possible to achieve the desired results, and SHAP is used to interpret the results and determine the impact of features on the output.

Keywords

Main Subjects


[1] I. Chakraborty and P. Maity, “COVID-19 outbreak: Migration, effects on society, global environment and prevention,” Sci. Total Environ., vol. 728, 2020, pp. 138882.
[2] علی احمدیان رمکی، عباس رسولزادگان وعباس جوان‌جعفری، "تشخیص نفوذ مبتنی بر مدل‌های مخفی مارکوف: روش‌ها، کاربردها و چالش‌ها"، نشریه مدل‌سازی در مهندسی، دوره 16، شماره 53، تیر 1397، صفحه 183- 206.
[3] الهام پارسایی‌مهر، مهدی فرتاش و جواد اکبری ترکستانی، "بهبود استخراج ویژگی با استفاده از یک مدل یادگیری عمیق گروهی برای تشخیص موجودیت"، نشریه مدل‌سازی در مهندسی، دوره 20، شماره 69، تیر 1401، صفحه 103- 112.
[4] محمود معلم و علی‌اکبر پویان، "کشف ناهنجاری با استفاده از کدکننده خودکار مبتنی بر LSTM"، نشریه مدل‌سازی در مهندسی، دوره 17، شماره 56، اردیبهشت 1398، صفحه 191- 211.
[5] M. Ciotti et al., “COVID-19 Outbreak: An Overview,” Chemotherapy, vol. 64, no. 5–6, 2020, pp. 215–223.
[6] W. T. Li et al., “Using machine learning of clinical data to diagnose COVID-19: A systematic review and meta-analysis,” BMC Med. Inform. Decis. Mak., vol. 20, no. 1, Sep. 2020.
[7] Y. Zoabi, S. Deri-Rozov, and N. Shomron, “Machine learning-based prediction of COVID-19 diagnosis based on symptoms,” npj Digit. Med., vol. 4, no. 1, 2021, pp. 1–5.
[8] M. Soui, N. Mansouri, R. Alhamad, M. Kessentini, and K. Ghedira, “NSGA-II as feature selection technique and AdaBoost classifier for COVID-19 prediction using patient’s symptoms,” Nonlinear Dyn., vol. 106, no. 2, 2021, pp. 1453–1475.
[9] S. Banik, S. Banik, A. Ghosh, and A. Mukherjee, “Probabilistic estimation of COVID-19 using patient’s symptoms,” in Data Driven Approach Towards Disruptive Technologies, Springer, 2021, pp. 369–378.
[10] S. N. Nan et al., “A prediction model based on machine learning for diagnosing the early COVID-19 patients,” pp. 1–12, 2020.
[11] A. Chansik et al., “Machine learning prediction for mortality of patients diagnosed with COVID-19: a nationwide Korean cohort study,” Scientific report in nature research, 2020.
[12] C. Fang et al., “Deep learning for predicting COVID-19 malignant progression,” in Medical Image Analysis, vol. 79, 2021.
[13] A. Mariot, S. Sgoifo, and M. Sauli, “I gozzi endotoracici: contributo casistico-clinico (20 casi),” Friuli Med., vol. 19, no. 6, 1964.
[14] Y. Xu, X. Zhao, Y. Chen, and Z. Yang, “Research on a Mixed Gas Classification Algorithm Based on Extreme Random Tree,” Appl. Sci., vol. 9, no. 9, 2019, pp. 1728.
[15] W. Wang, G. Chakraborty, and B. Chakraborty, “Predicting the risk of chronic kidney disease (Ckd) using machine learning algorithm,” Appl. Sci., vol. 11, no. 1, 2021, pp. 1–17.
[16] K. Song, F. Yan, T. Ding, L. Gao, and S. Lu, “A steel property optimization model based on the XGBoost algorithm and improved PSO,” Comput. Mater. Sci., vol. 174, 2020, pp. 109472.
[17] H. Wang, C. Liu, and L. Deng, “Enhanced prediction of hot spots at protein-protein interfaces using extreme gradient boosting,” Sci. Rep., vol. 8, no. 1, 2018, pp. 1–13.
[18] W. Zhang, C. Wu, H. Zhong, Y. Li, and L. Wang, “Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization,” Geosci. Front., vol. 12, no. 1, 2021, pp. 469–477.
[19] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2097–2106.
[20] J. Ma, Y. Ding, J. C. P. Cheng, Y. Tan, V. J. L. Gan, and J. Zhang, “Analyzing the leading causes of traffic fatalities using XGBoost and grid-based analysis: a city management perspective,” IEEE Access, vol. 7, 2019, pp. 148059–148072.
[21] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” Adv. Neural Inf. Process. Syst., vol. 31, 2018.
[22] A. V. Dorogush, V. Ershov, and A. Gulin, “CatBoost: gradient boosting with categorical features support,” arXiv Prepr. arXiv1810.11363, 2018.
[23] G. Ke et al., “LightGBM: A highly efficient gradient boosting decision tree,” Adv. Neural Inf. Process. Syst., vol. 2017-Decem, no. Nips, 2017, pp. 3147–3155.
[24] M. Ezzoddin, H. Nasiri, and M. Dorrigiv, “Diagnosis of COVID-19 Cases from Chest X-ray Images Using Deep Neural Network and LightGBM,” in 2022 International Conference on Machine Vision and Image Processing (MVIP), 2022, pp. 1–7.
[25] C. Chen, Q. Zhang, Q. Ma, and B. Yu, “LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion,” Chemom. Intell. Lab. Syst., vol. 191, 2019, pp. 54–64.
[26] S. Chehreh Chelgani, H. Nasiri, and A. Tohry, “Modeling of particle sizes for industrial HPGR products by a unique explainable AI tool- A ‘Conscious Lab’ development,” Adv. Powder Technol., vol. 32, no. 11, 2021, pp. 4141–4148.
[27] S. C. Chelgani, H. Nasiri, and M. Alidokht, “Interpretable modeling of metallurgical responses for an industrial coal column flotation circuit by XGBoost and SHAP-A ‘conscious-lab’ development,” Int. J. Min. Sci. Technol., vol. 31, no. 6,2021, pp. 1135–1144.
[28] A. Movsessian, D. G. Cava, and D. Tcherniak, “Interpretable machine learning in damage detection using Shapley Additive Explanations,” 2021.
[29] H. Mao et al., “Driving safety assessment for ride-hailing drivers,” Accid. Anal. \& Prev., vol. 149, 2021, pp. 105574.
[30] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 4765–4774.
[31] N. Bussmann, P. Giudici, D. Marinelli, and J. Papenbrock, “Explainable machine learning in credit risk management,” Comput. Econ., vol. 57, no. 1, 2021, pp. 203–216.
[32] S. Mangalathu, S. H. Hwang, and J. S. Jeon, “Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach,” Eng. Struct., vol. 219, no. February,2020, pp. 110927.
[33] K. Zhou, S. Li, X. Zhou, Y. Hu, C. Zhang, and J. Liu, “Data-driven prediction and analysis method for nanoparticle transport behavior in porous media,” Measurement, vol. 172,2021, pp. 108869.
[34] S. Mangalathu, H. Shin, E. Choi, and J.-S. Jeon, “Explainable machine learning models for punching shear strength estimation of flat slabs without transverse reinforcement,” J. Build. Eng., vol. 39, 2021, pp. 102300.
[35] H. Nasiri and S. A. Alavi, “A Novel Framework Based on Deep Learning and ANOVA Feature Selection Method for Diagnosis of COVID-19 Cases from Chest X-Ray Images,” Comput. Intell. Neurosci., vol. 2022, pp. 4694567.