Performance Comparison of Logistic Regression and XGBoost for Credit Card Fraud Detection using Random Undersampling and Hyperparameter Tuning

Authors

  • Hasri Akbar Awal Rozaq Graduate School of Informatics, Department of Computer Science, Gazi University, Türkiye
  • Deni Sutaji Graduate School of Informatics, Department of Computer Science, Gazi University, Türkiye

DOI:

https://doi.org/10.54082/jiki.306

Keywords:

Fraud Detection, Hyperparameter Tuning, Imbalanced data, Logistic Regression, Random Undersampling, XGBoost

Abstract

Credit card fraud is a growing problem due to the rise of card transactions. This study investigates the effectiveness of Logistic Regression (LogReg) and Extreme Gradient Boosting (XGBoost) in identifying fraudulent transactions in a highly imbalanced dataset, where only 8% of the data represents fraudulent activity. To address the class imbalance, random undersampling was applied, reducing the number of legitimate transactions. This technique significantly improved LogReg's ability to detect fraud, with the AUC-ROC increasing from 0.7994 to 0.9089. XGBoost performed well even without hyperparameter tuning or random undersampling, indicating its robustness as a baseline model. The study highlights the critical importance of addressing class imbalance in fraud detection. Both LogReg and XGBoost demonstrated potential, particularly when combined with techniques like undersampling or hyperparameter tuning. These findings underscore the need for effective data preprocessing methods to enhance the performance of machine learning models in detecting credit card fraud.

References

A. RB and S. K. KR, “Credit card fraud detection using artificial neural network,” Global Transitions Proceedings, vol. 2, no. 1, pp. 35–41, Jun. 2021, doi: 10.1016/j.gltp.2021.01.006.

J. Pesantez-Narvaez, M. Guillen, and M. Alcañiz, “Predicting motor insurance claims using telematics data—XGboost versus logistic regression,” Risks, vol. 7, no. 2, Jun. 2019, doi: 10.3390/risks7020070.

P. Liu, X. J. Li, T. Zhang, and Y. H. Huang, “Comparison between XGboost model and logistic regression model for predicting sepsis after extremely severe burns,” Journal of International Medical Research, vol. 52, no. 5, May 2024, doi: 10.1177/03000605241247696.

V. Berisha et al., “Digital medicine and the curse of dimensionality,” npj Digital Medicine, vol. 4, no. 1. Nature Research, Dec. 01, 2021. doi: 10.1038/s41746-021-00521-5.

Y. Xu et al., “Predicting ICU Mortality in Rheumatic Heart Disease: Comparison of XGBoost and Logistic Regression,” Front Cardiovasc Med, vol. 9, Feb. 2022, doi: 10.3389/fcvm.2022.847206.

E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, Dec. 2021, doi: 10.3390/informatics8040079.

N. H. N. B. M. Shahri, S. B. S. Lai, M. B. Mohamad, H. A. B. A. Rahman, and A. Bin Rambli, “Comparing the performance of adaboost, xgboost, and logistic regression for imbalanced data,” Mathematics and Statistics, vol. 9, no. 3, pp. 379–385, 2021, doi: 10.13189/ms.2021.090320.

S. Wang, Y. Dai, J. Shen, and J. Xuan, “Research on expansion and classification of imbalanced data based on SMOTE algorithm,” Sci Rep, vol. 11, no. 1, Dec. 2021, doi: 10.1038/s41598-021-03430-5.

E. F. Swana, W. Doorsamy, and P. Bokoro, “Tomek Link and SMOTE Approaches for Machine Fault Classification with an Imbalanced Dataset,” Sensors, vol. 22, no. 9, May 2022, doi: 10.3390/s22093246.

North Eastern Hill University. Department of Biomedical Engineering, Institute of Electrical and Electronics Engineers. Kolkata Section, IEEE Industry Applications Society, and Institute of Electrical and Electronics Engineers, International Conference on Computational Performance Evaluation : ComPE 2020 online conference : 2nd-4th July 2020. doi: https://doi.org/10.1109/ComPE49325.2020.9200087.

R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” in 2020 11th International Conference on Information and Communication Systems, ICICS 2020, Institute of Electrical and Electronics Engineers Inc., Apr. 2020, pp. 243–248. doi: 10.1109/ICICS49469.2020.239556.

DHANUSH NARAYANAN R, “Credit Card Fraud Dataset,” kaggle.com. Accessed: Jun. 01, 2024. [Online]. Available: https://www.kaggle.com/datasets/dhanushnarayananr/credit-card-fraud

H. Peng and J. Wang, “Unbalanced Data Processing and Machine Learning in Credit Card Fraud Detection,” 2022, doi: 10.21203/rs.3.rs-2004320/v1.

N. Nnamoko and I. Korkontzelos, “Efficient treatment of outliers and class imbalance for diabetes prediction,” Artif Intell Med, vol. 104, Apr. 2020, doi: 10.1016/j.artmed.2020.101815.

B. Liu and G. Tsoumakas, “Dealing with class imbalance in classifier chains via random undersampling,” Knowl Based Syst, vol. 192, Mar. 2020, doi: 10.1016/j.knosys.2019.105292.

Sri Sairam Engineering College. Department of Information Technology and Institute of Electrical and Electronics Engineers, 2019 proceedings of the 3rd International Conference on Computing and Communications Technologies (ICCCT’19) : February 21-22, 2019, Chennai, India. doi: https://doi.org/10.1109/ICCCT2.2019.8824930.

A. A. T. Fernandes, D. B. F. Filho, E. C. da Rocha, and W. da Silva Nascimento, “Read this paper if you want to learn logistic regression,” Revista de Sociologia e Politica, vol. 28, no. 74, pp. 1/1-19/19, 2020, doi: 10.1590/1678-987320287406EN.

E. O. Bayman and F. Dexter, “Multicollinearity in Logistic Regression Models,” Anesth Analg, vol. 133, no. 2, pp. 362–365, 2021, doi: 10.1213/ANE.0000000000005593.

“Mathematical justification on the origin of the sigmoid in logistic regression,” Central European Management Journal, 2022, doi: 10.57030/23364890.cemj.30.4.135.

V. H. Nhu et al., “Shallow landslide susceptibility mapping: A comparison between logistic model tree, logistic regression, naïve bayes tree, artificial neural network, and support vector machine algorithms,” Int J Environ Res Public Health, vol. 17, no. 8, Apr. 2020, doi: 10.3390/ijerph17082749.

Y. Zhang, J. Tong, Z. Wang, and F. Gao, “Customer Transaction Fraud Detection Using Xgboost Model,” in Proceedings - 2020 International Conference on Computer Engineering and Application, ICCEA 2020, Institute of Electrical and Electronics Engineers Inc., Mar. 2020, pp. 554–558. doi: 10.1109/ICCEA50009.2020.00122.

H. Jain, A. Khunteta, and S. Srivastava, “Churn Prediction in Telecommunication using Logistic Regression and Logit Boost,” in Procedia Computer Science, Elsevier B.V., 2020, pp. 101–112. doi: 10.1016/j.procs.2020.03.187.

R. Turner et al., “Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020.” doi: https://doi.org/10.48550/arXiv.2104.10201.

H. J. P. Weerts, A. C. Mueller, and J. Vanschoren, “Importance of Tuning Hyperparameters of Machine Learning Algorithms,” Jul. 2020, doi: https://doi.org/10.48550/arXiv.2007.07588.

H. Benhar, A. Idri, and J. L Fernández-Alemán, “Data preprocessing for heart disease classification: A systematic literature review.,” Computer Methods and Programs in Biomedicine, vol. 195. Elsevier Ireland Ltd, Oct. 01, 2020. doi: 10.1016/j.cmpb.2020.105635

Downloads

Published

2025-12-31

How to Cite

Rozaq, H. A. A., & Sutaji, D. (2025). Performance Comparison of Logistic Regression and XGBoost for Credit Card Fraud Detection using Random Undersampling and Hyperparameter Tuning. Jurnal Ilmu Komputer Dan Informatika, 5(2), 115–126. https://doi.org/10.54082/jiki.306