Comparative Analysis of XGBoost and LightGBM Algorithms in Credit Card Fraud Detection: Cost Sensitivity and Computational Complexity

Nefzah Atirah Qalby; Achmad Fitro; Achmad Kautsar; Riska Dhenabayu

doi:10.55227/ijhess.v5i6.2394

Authors

Nefzah Atirah Qalby Universitas Negeri Surabaya
Achmad Fitro Digital Business/Faculty Of Economics And Business, State University of Surabaya, Indonesia
Achmad Kautsar Digital Business/Faculty Of Economics And Business, State University of Surabaya, Indonesia
Riska Dhenabayu Digital Business/Faculty Of Economics And Business, State University of Surabaya, Indonesia

DOI:

https://doi.org/10.55227/ijhess.v5i6.2394

Abstract

Cybercrime in credit card transactions inflicts severe financial damage, with extreme class imbalances often biasing conventional models toward high false positive rates. This study compares and optimizes tree-ensemble algorithms XGBoost and LightGBM to develop a Fraud Detection model that is statistically accurate, computationally efficient, and minimizes real banking financial losses. Utilizing a dataset of 23,769 transactions with a 270:1 imbalance ratio, both models were optimized via the Tree-structured Parzen Estimator and validated using 5-Fold Stratified Cross-Validation. Performance was evaluated through classification metrics, computational efficiency, and the Expected Cost of Misclassification, while Explainable AI via SHAP values ensured model transparency. Results demonstrate LightGBM’s superiority, achieving perfect precision (1.000) and an F1-Score of 0.9714, effectively minimizing financial losses to Rp5,000,000. Although XGBoost trained faster, LightGBM’s 60-millisecond latency meets real-time standards, providing a robust, transparent risk mitigation system for banking operations. The implementation of this architecture significantly enhances the competitiveness of IT efficiency and banking risk governance in the digital era

References

Abdelghafour, E. B., Mohamed, C., Noura, A., & Abdelhamid, B. (2024). Enhancing Credit Card Fraud Detection Using a Stacking Model Approach and Hyperparameter Optimization. In IJACSA) International Journal of Advanced Computer Science and Applications (Vol. 15, Number 10). www.ijacsa.thesai.org

Ahsan, M. M., Ali, M. S., & Siddique, Z. (2024). Enhancing and improving the performance of imbalanced class data using novel GBO and SSG: A comparative analysis. Neural Networks, 173. https://doi.org/10.1016/j.neunet.2024.106157

Ariza-Garzón, M. J., Arroyo, J., Segovia-Vargas, M. J., & Caparrini, A. (2024). Profit-sensitive machine learning classification with explanations in credit risk: The case of small businesses in peer-to-peer lending. Electronic Commerce Research and Applications, 67. https://doi.org/10.1016/j.elerap.2024.101428

Bergstra, J., Yamins, D., & Cox, D. D. (2013). Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures (Vol. 28).

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-August-2016, 785–794. https://doi.org/10.1145/2939672.2939785

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1). https://doi.org/10.1186/s12864-019-6413-7

CNBC Indonesia. (2023, May 12). Diserang Ransomware, Mobile Banking BSI Error Berhari-hari. https://www.cnbcindonesia.com/news/20230512134929-8-436887/diserang-ransomware-mobile-banking-bsi-error-berhari-hari

CNN Indonesia. (2025, September 26). Fakta-fakta Pembobolan Rekening Dormant Rp204 Miliar. https://www.cnnindonesia.com/nasional/20250926070515-12-1277851/fakta-fakta-pembobolan-rekening-dormant-rp204-miliar

Darmawan, R. A., Musyafa, A., & Handayani, M. (2026). Optimization of RNN and Tree-Based Models with Imbalance Handling for Fraud Detection in Digital Banking Transactions. Jurnal Ilmiah Multidisiplin Indonesia (JIM-ID), 5(02), 347–366.

Elkan, & Charles. (2001). The Foundations of Cost-Sensitive Learning.

Fernández, A., Garcia, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from Imbalanced Data Sets.

Forough, J., & Momtazi, S. (2021). Ensemble of deep sequential models for credit card fraud detection. Applied Soft Computing, 99. https://doi.org/10.1016/j.asoc.2020.106883

Friedman, J. H. (2001). GREEDY FUNCTION APPROXIMATION: A GRADIENT BOOSTING MACHINE 1. In The Annals of Statistics (Vol. 29, Number 5).

Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why do tree based models still outperform deep learning on tabular data.

Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. In Expert Systems with Applications (Vol. 73, pp. 220–239). Elsevier Ltd. https://doi.org/10.1016/j.eswa.2016.12.035

Hajek, P., Novotny, J., & Munk, M. (2026). Financial statement fraud detection using topic-driven financial sentiment analysis. Decision Support Systems, 203, 114615. https://doi.org/10.1016/j.dss.2026.114615

Höppner, S., Baesens, B., Verbeke, W., & Verdonck, T. (2022). Instance-dependent cost-sensitive learning for detecting transfer fraud. European Journal of Operational Research, 297(1), 291–300. https://doi.org/10.1016/j.ejor.2021.05.028

Husin, L. S. S., Darmayanti, E. F., & Nusantoro, J. (2025). Implementasi Model Pendekatan Machine Learning untuk Deteksi Fraud pada Transaksi Pembayaran Digital Paper.Id.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. https://github.com/Microsoft/LightGBM.

Kurniawan, M., Putra, H., Mintaraga, C. A., & Hidayaturrahman. (2025). Sequential Oversampling for Fraud Detection: Leveraging Generative Adversarial Networks and Continual Learning Approach in Imbalanced Data Streams. Procedia Computer Science, 269, 485–501. https://doi.org/10.1016/j.procs.2025.08.301

Lawson, R., & Nancy, J. (2024). Detecting First-Party Fraud in Online Lending Using Machine Learning Models.

Lemaˆıtre, G., Nogueira, F., & Aridas, C. K. (2017). Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. In Journal of Machine Learning Research (Vol. 18). http://jmlr.org/papers/v18/16-365.html.

Liu, W., Han, Y., Lan, X., Yu, Z., Xia, M., Lin, S., Pang, C., & Chen, N. (2026). Progressive gradient boosted trees for imbalanced financial distress prediction. Expert Systems with Applications, 321. https://doi.org/10.1016/j.eswa.2026.132187

Lundberg, S., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. http://arxiv.org/abs/1705.07874

Nugroho, L. P., Saputro, R. E., & Utomo, F. S. (2026). Performance Comparison Of Xgboost Lightgbm And Lstm For E-Commerce Repeat Buyer Prediction. Jurnal Teknik Informatika (JUTIF), 7(1). https://doi.org/10.52436/1.jutif.2026.7.1.5746

Pozzolo, A. D., Caelen, O., Johnson, R. A., & Bontempi, G. (2015). Calibrating probability with undersampling for unbalanced classification. Proceedings - 2015 IEEE Symposium Series on Computational Intelligence, SSCI 2015, 159–166. https://doi.org/10.1109/SSCI.2015.33

PPATK. (2025, September 25). Bareskrim Polri Ungkap Kasus Pembobolan Rekening Dorman Bank BUMN Rp204 Miliar Terkait Kejahatan Siber dan Pencucian Uang. https://www.ppatk.go.id/news/read/1529/bareskrim-polri-ungkap-kasus-pembobolan-rekening-dorman-bank-bumn-rp204-miliar-terkait-kejahatan-siber-dan-pencucian-uang.html

Raschka, S. (2018). Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning.

Shapley, L. S. (1952). A Value for N-Person Games.

Shwartz-Ziv, R., & Armon, A. (2022). Tabular data: Deep learning is not all you need. Information Fusion, 81, 84–90. https://doi.org/10.1016/j.inffus.2021.11.011

Sugiyono. (2023). METODE PENELITIAN KUANTITATIF, KUALITATIF, DAN R&D. Alvabeta. www.cvalfabeta.com

Tao, S., Peng, P., Li, Y., Sun, H., Li, Q., & Wang, H. (2024). Supervised contrastive representation learning with tree-structured parzen estimator Bayesian optimization for imbalanced tabular data. Expert Systems with Applications, 237. https://doi.org/10.1016/j.eswa.2023.121294

Tayebi, M., & El Kafhali, S. (2025). A novel approach based on XGBoost classifier and Bayesian optimization for credit card fraud detection. Cyber Security and Applications, 3. https://doi.org/10.1016/j.csa.2025.100093

Vanderschueren, T., Verdonck, T., Baesens, B., & Verbeke, W. (2022). Predict-then-optimize or predict-and-optimize? An empirical evaluation of cost-sensitive learning strategies. Information Sciences, 594, 400–415. https://doi.org/10.1016/j.ins.2022.02.021

Velarde, G., Sudhir, A., Deshmane, S., Deshmunkh, A., Sharma, K., & Joshi, V. (2023). Evaluating XGBoost for Balanced and Imbalanced Data: Application to Fraud Detection. http://arxiv.org/abs/2303.15218

Wald, A. (1950). Statistical Decision Functions.

Widjaja, G. (2026). Digitalisasi Sistem Pembayaran Dan Risiko Hukum Ai Dalam Deteksi Fraud: Kepastian Hukum Bagi Pelaku Usaha Dan Konsumen Dalam Ekonomi Finansial Teknologi. Jurnal Ekonomi Dan Bisnis (Jebi), 3(11).

Xiao, Y., Tan, L., & Liu, J. (2025). Application of Machine Learning Model in Fraud Identification: A Comparative Study of CatBoost, XGBoost and LightGBM. https://doi.org/10.20944/preprints202503.1199.v1

Xu, B., Wang, Y., Liao, X., & Wang, K. (2023). Efficient fraud detection using deep boosting decision trees. Decision Support Systems, 175. https://doi.org/10.1016/j.dss.2023.114037

Zanah, A. S., Calista, B., Dewi, W. N., & Sokibi, P. (2025). Deteksi Dini Fraud pada Layanan Keuangan Digital Menggunakan Metode Random Forest. In Indonesian Journal on Data Science (Vol. 3, Number 2). https://ejournal.unjaya.ac.id/index.php/ijds