IDCOST: A Method for Increasing Data Criterion Service by Scoring Credit Imbalanced Data Using Applied SVM

Document Type : Computer Article

Authors

Department of Computer Engineering and Information Technology, Payam Noor University, Tehran, Iran

Abstract

Unbalanced credit data can pose significant challenges in applied data mining. To address this, we propose a method that utilizes a scoring technique and support vector machine (SVM) to enhance data criterion service. Our approach integrates index feature selection and IDCOST method, which reduces data redundancy and balances feature selection data sets with a valid index. We also use feature selection and kernel modification to improve accuracy while reducing computational complexity and execution time. Our proposed method can detect credit card fraud and credit card default data sets with higher sensitivity than other methods. It presents a promising solution for tackling credit data issues in applied SVM data mining and has the potential to improve data analysis accuracy and reduce computational complexity in various fields.
The IDCOST method is presented in pre-processing, training, validation, and testing stages. We use detector threshold clustering in the pre-processing stage, sensitivity and feature validation on the models in the training stage, and score each sample in the test dataset in the testing stage. The proposed method's accuracy is optimized by selecting an appropriate cluster head in data classification and employing a scoring technique. In conclusion, our proposed method is an effective solution for tackling credit data issues in applied SVM data mining. By integrating index feature selection, IDCOST method, feature selection, and kernel modification, we can accurately detect credit card fraud and credit card default data sets while reducing data redundancy and computational complexity.

Keywords

Main Subjects


[1] T. Aliheidari Bioki, and Hassan Khademizare. 2015. "Improvement of DEA Approach for Clustering Credit Rating of Customer in Banks." Journal of Modeling in Engineering 13, no. 41 (2015): 59–74. (in Persian)
[2] X. Xie, X. Shi, J. Gu, and X. Xu. "Examining the Contagion Effect of Credit Risk in a Supply Chain under Trade Credit and Bank Loan Offering." Omega 115 (2023): 102751.
[3] R. Zhang, X. Liguo, and W. Qin. "An Ensemble Credit Scoring Model Based on Logistic Regression with Heterogeneous Balancing and Weighting Effects." Expert Systems with Applications 212 (2023): 118732.
[4] C. Jimenez-Castaño, A. Álvarez-Meza, D. Cárdenas-Peña, A. Orozco-Gutíerrez, and J. Guerrero-Erazo. "Kreĭn Twin Support Vector Machines for Imbalanced Data Classification." Pattern Recognition Letters 182 (2024): 39–45.
[5] W. Zhai, Xiya Xiong, Guozhao Mo, Yuzhen Xiao, Caicong Wu, Zhi Xu, and Jiawen Pan. "A Bagging-SVM Field-Road Trajectory Classification Model Based on Feature Enhancement." Computers and Electronics in Agriculture 217 (2024): 108635–35.
[6] C. Dou, Yan Lv, Zhen Wang, and Lan Bai. "Handling Imbalanced Classification Problems by Weighted Generalization Memorization Machine." Applied Artificial Intelligence 38, no. 1 (2024): 2355424.
[7] Z. Hou, J. Tang, Y. Li, S. Fu, and Y. Tian. "MVQS: Robust multi-view instance-level cost-sensitive learning method for imbalanced data classification." Information Sciences 675 (2024): 120467.
[8] M. Zakariah, M. Al-Razgan, and T. Alfakih. "Pathological voice classification using MEEL features and SVM-TabNet model." Speech Communication 162 (2024): 103100.
[9] X. Gao, Z. Meng, X. Jia, J. Liu, X. Diao, B. Xue, Z. Huang, and K. Li. "An imbalanced binary classification method based on contrastive learning using multi-label confidence comparisons within sample-neighbors pair." Neurocomputing 517 (2023): 148-164.
[10] R. Asencios, C. Asencios, and E. Ramos. "Profit Scoring for Credit Unions Using the Multilayer Perceptron, XGBoost and TabNet Algorithms: Evidence from Peru." Expert Systems with Applications 213 (2023): 119201.
[11] Liu, Wanan, Hong Fan, Min Xia, and Meng Xia. "A Focal-Aware Cost-Sensitive Boosted Tree for Imbalanced Credit Scoring." Expert Systems with Applications 208 (2022): 118158.
[12] Wong, Man Leung, Kruy Seng, and Pak Kan Wong. "Cost-Sensitive Ensemble of Stacked Denoising Autoencoders for Class Imbalance Problems in Business Domain." Expert Systems with Applications 141 (2020): 112918.
[13] V. Moscato, A. Picariello, and G. Sperlí. "A Benchmark of Machine Learning Approaches for Credit Score Prediction." Expert Systems with Applications 165 (2021): 113986.
[14] F. Yang, Y. Qiao, C. Huang, S. Wang, and X. Wang. "An Automatic Credit Scoring Strategy (ACSS) Using Memetic Evolutionary Algorithm and Neural Architecture Search." Applied Soft Computing 113 (2021): 107871.
[15] K. Yuan, G. Chi, Y. Zhou, and H. Yin. 2022. "A Novel Two-Stage Hybrid Default Prediction Model with K-Means Clustering and Support Vector Domain Description." Research in International Business and Finance 59 (2022): 101536.
[16] J. Zhai, J. Qi, and C. Shen. "Binary Imbalanced Data Classification Based on Diversity Oversampling by Generative Models." Information Sciences 585 (2022): 313–43.
[17] W. Liu, H. Fan, and M. Xia. "Credit Scoring Based on Tree-Enhanced Gradient Boosting Decision Trees." Expert Systems with Applications 189 (2022): 116034.
[18] L. Sun, J. Zhang, W. Ding, and J. Xu. "Feature Reduction for Imbalanced Data Classification Using Similarity-Based Feature Clustering with Adaptive Weighted K-Nearest Neighbors." Information Sciences 593 (2022): 591–613.
[19] W. Sun, X. Zhang, M. Li, and Y. Wang. "Interpretable High-Stakes Decision Support System for Credit Default Forecasting." Technological Forecasting & Social Change 196 (2023): 122825.
[20] L. Wang. "Imbalanced Credit Risk Prediction Based on SMOTE and Multi-Kernel FCM Improved by Particle Swarm Optimization." Applied Soft Computing 114 (2022): 108153.
[21] L. Yu, and C. He. "A Shapelet-Based Behavioral Pattern Extraction Method for Credit Risk Classification with Behavior Sparsity." Advanced Engineering Informatics 58 (2023): 102227.
[22] H. Ding, L. Chen, L. Dong, Z. Fu, and X. Cui. "Imbalanced Data Classification: A KNN and Generative Adversarial Networks-Based Hybrid Approach for Intrusion Detection." Future Generation Computer Systems 131 (2022): 240–54.
[23] S. Sen, K. Pratap Singh, and P. Chakraborty. "Dealing with Imbalanced Regression Problem for Large Dataset Using Scalable Artificial Neural Network." New Astronomy 99 (2023): 101959.
[24] T. Wang, R. Liu, and G. Qi. "Multi-Classification Assessment of Bank Personal Credit Risk Based on Multi-Source Information Fusion." Expert Systems with Applications 191 (2022): 116236.
[25] C. Jiang, W. Lu, Z. Wang, and Y. Ding. "Benchmarking State-of-The-Art Imbalanced Data Learning Approaches for Credit Scoring." Expert Systems with Applications 213 (2023): 118878.
[26] M. Abdoli, M. Akbari, and J. Shahrabi. "Bagging Supervised Autoencoder Classifier for Credit Scoring." Expert Systems with Applications 213 (2023): 118991.
[27] H.W. Teng, M.H. Kang, I.H. Lee, and L.C. Bai. "Bridging Accuracy and Interpretability: A Rescaled Cluster-Then-Predict Approach for Enhanced Credit Scoring." International Review of Financial Analysis 91 (2024): 103005–5.
[28] Y. Liu, G. Yang, S. Qiao, M. Liu, L. Qu, N. Han, T. Wu, G. Yuan, T. Wu, and Y. Peng. "Imbalanced Data Classification: Using Transfer Learning and Active Sampling." Engineering Applications of Artificial Intelligence 117 (2023): 105621.
[29] Y. Wang, Y. Jia, Y. Tian, and J. Xiao. "Deep Reinforcement Learning with the Confusion-Matrix-Based Dynamic Reward Function for Customer Credit Scoring." Expert Systems with Applications 200 (2022): 117013.
Volume 23, Special Issue 81
Celebrating the 50th Anniversary of Semnan University- In Progress
July 2025
Pages 1-18
  • Receive Date: 13 July 2023
  • Revise Date: 01 August 2024
  • Accept Date: 01 January 2025