Contact Us Search Paper

Mitigating Risk in P2P Lending Network: Enhancing Predictions with GenAI and SMOTE

Lina Devakumar Louis1, Andrew Dunton1, Sourab Rajendra Saklecha1, Swetha Neha Kutty Sivakumar1, Abdul Sohail Ahmed1, Smeet Sheth1, and Shih Yu Chang2

Corresponding Author:

Lina Devakumar Louis

Affiliation(s):

1 Department of Applied Data Science, San Jose State University, Washington Sq, San Jose, CA 95192, United States

2 Assistant Professor, Department of Applied Data Science, San Jose State University, Washington Sq, San Jose, CA 95192, United States

Abstract:

Peer-to-peer (P2P) lending is a major revolution in the field of finance where it transformed the market by eliminating the need for middlemen or conventional intermediaries such as banks, connecting borrowers directly with investors. This transformation offers several advantages, including potentially lower interest rates for borrowers and higher returns for investors. However, it also introduces risks, particularly the possibility of borrowers defaulting on their loans which could lead to significant losses. Research indicates that classification models can be leveraged to address this risk. However, the real-world datasets available are heavily skewed which could lead to bias in the prediction and model over-fitting. Existing research utilize conventional approaches such as Synthetic Minority Over-sampling Technique (SMOTE) for balancing data and ensemble models. This study addresses these challenges by implementing a comparative study between SMOTE and generative AI for data synthesis to rationalize the effects of modern approaches. Further it also explores the inclusion of additional features as compared to existing research. Ensemble modeling approaches were adopted for the purpose of this study. Logistic Regression, Support Vector Machine (SVM), KNN, and Random Forest were selected to determine the best base model to be used for stacking. XGBoost, LightGBM, and AdaBoost were the three selected models for stacking. XGBoost outperformed all other models, achieving an average accuracy of 99.4% and average F1-score of 97.4% using SMOTE synthesis. GenAI synthesis obtained similar performance.

Keywords:

Peer-to-Peer lending, loan default, imbalanced dataset, SMOTE, GenAI, Logistic Regression, KNN, Random Forest, Support Vector Machine, XGBoost, LightGBM, AdaBoost, ensemble

Downloads: 15 Views: 89
Cite This Paper:

Lina Devakumar Louis, Andrew Dunton, Sourab Rajendra Saklecha, Swetha Neha Kutty Sivakumar, Abdul Sohail Ahmed, Smeet Sheth, and Shih Yu Chang (2024). Mitigating Risk in P2P Lending Network: Enhancing Predictions with GenAI and SMOTE. Journal of Networking and Network Applications, Volume 4, Issue 2, pp. 48–59. https://doi.org/10.33969/J-NaNA.2024.040201.

References:

[1] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.

[2] Levi. ”Synthetic Data Generation with the Highest Accuracy for Free.” MOSTLY AI, Nov. 21, 2023. Available at: https://mostly.ai/.

[3] Y. Pristyanto, A. F. Nugraha, I. Pratama, and A. Dahlan, ”Ensem-ble Model Approach For Imbalanced Class Handling on Dataset,” in 2020 3rd International Conference on Information and Communica-tions Technology (ICOIACT), Yogyakarta, Indonesia, 2020, pp. 17-21, doi:10.1109/ICOIACT50329.2020.9331984.

[4] Mukherjee, Mimi, and Matloob Khushi. ”SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for nominal and con-tinuous features.” Applied System Innovation 4.1 (2021): 18. Publisher: MDPI.

[5] Muslim, Much Aziz, Tiara Lailatul Nikmah, Dwika Ananda Agustina Pertiwi, Yosza Dasril, and others. ”New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning.” Intelligent Systems with Applications 18 (2023): 200204. Publisher: Elsevier.

[6] Shen, Feng, Xingchao Zhao, Zhiyong Li, Ke Li, and Zhiyi Meng. ”A novel ensemble classification model based on neural networks and a classifier optimisation technique for imbalanced credit risk evaluation.” Physica A: Statistical Mechanics and its Applications 526 (2019): 121073. Publisher: Elsevier.

[7] Caruso, Giulia, SA Gattone, Francesca Fortuna, and Tonio Di Battista. ”Cluster Analysis for mixed data: An application to credit risk evalua-tion.” Socio-Economic Planning Sciences 73 (2021): 100850. Publisher: Elsevier.

[8] Y. Chen and R. Zhang, “Research on credit card default prediction based on K-Means SMOTE and BP Neural Network,” Complexity, vol. 2021, pp. 1–13, Mar. 2021, doi: 10.1155/2021/6618841.

[9] H. Wang and L. Cheng, “CatBoost model with synthetic features in application to loan risk assessment of small businesses,” arXiv (Cornell University), Jun. 2021, doi: 10.48550/arxiv.2106.07954.

[10] N. Park, Y. H. Gu, and S. J. Yoo, “Synthesizing individual consumers’ credit historical data using generative adversarial networks,” Applied Sciences, vol. 11, no. 3, p. 1126, Jan. 2021, doi: 10.3390/app11031126.

[11] Wordsforthewise (n.d). Lending Club. Retrieved from https://www. kaggle.com/datasets/wordsforthewise/lending-club (2018).

[12] M. Hossin and M. R. Sulaiman, ”A Review on Evaluation Metrics for Data Classification Evaluations,” *Int. J. Data Min. Knowl. Manag. Process*, vol. 5, no. 2, pp. 01–11, 2015, doi: 10.5121/ijdkp.2015.5201. 

[13] Chen, T., Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting Sys-tem. In Proceedings of the 2016 Conference on Knowledge Discovery and Data Mining, 785-794.

[14] Wang, W., Chakraborty, G., Chakraborty, B. (2020). Predicting the Risk of Chronic Kidney Disease (CKD) Using Machine Learning Algorithm. Applied Sciences, 11(2).

[15] Wang, S., You, S. D., Zhou, S. (2023). Loan prediction using machine learning methods. Advances in Economics, Management and Political Sciences, 5(1), 210–215.