Comparative Analysis of XGBoost and LightGBM Algorithms in Credit Card Fraud Detection: Cost Sensitivity and Computational Complexity
DOI:
https://doi.org/10.55227/ijhess.v5i6.2394Abstract
Cybercrime in credit card transactions inflicts severe financial damage, with extreme class imbalances often biasing conventional models toward high false positive rates. This study compares and optimizes tree-ensemble algorithms XGBoost and LightGBM to develop a Fraud Detection model that is statistically accurate, computationally efficient, and minimizes real banking financial losses. Utilizing a dataset of 23,769 transactions with a 270:1 imbalance ratio, both models were optimized via the Tree-structured Parzen Estimator and validated using 5-Fold Stratified Cross-Validation. Performance was evaluated through classification metrics, computational efficiency, and the Expected Cost of Misclassification, while Explainable AI via SHAP values ensured model transparency. Results demonstrate LightGBM’s superiority, achieving perfect precision (1.000) and an F1-Score of 0.9714, effectively minimizing financial losses to Rp5,000,000. Although XGBoost trained faster, LightGBM’s 60-millisecond latency meets real-time standards, providing a robust, transparent risk mitigation system for banking operations. The implementation of this architecture significantly enhances the competitiveness of IT efficiency and banking risk governance in the digital era
References
Abdelghafour, E. B., Mohamed, C., Noura, A., & Abdelhamid, B. (2024). Enhancing Credit Card Fraud Detection Using a Stacking Model Approach and Hyperparameter Optimization. In IJACSA) International Journal of Advanced Computer Science and Applications (Vol. 15, Number 10). www.ijacsa.thesai.org
Ahsan, M. M., Ali, M. S., & Siddique, Z. (2024). Enhancing and improving the performance of imbalanced class data using novel GBO and SSG: A comparative analysis. Neural Networks, 173. https://doi.org/10.1016/j.neunet.2024.106157
Ariza-Garzón, M. J., Arroyo, J., Segovia-Vargas, M. J., & Caparrini, A. (2024). Profit-sensitive machine learning classification with explanations in credit risk: The case of small businesses in peer-to-peer lending. Electronic Commerce Research and Applications, 67. https://doi.org/10.1016/j.elerap.2024.101428
Bergstra, J., Yamins, D., & Cox, D. D. (2013). Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures (Vol. 28).
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-August-2016, 785–794. https://doi.org/10.1145/2939672.2939785
Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1). https://doi.org/10.1186/s12864-019-6413-7
CNBC Indonesia. (2023, May 12). Diserang Ransomware, Mobile Banking BSI Error Berhari-hari. https://www.cnbcindonesia.com/news/20230512134929-8-436887/diserang-ransomware-mobile-banking-bsi-error-berhari-hari
CNN Indonesia. (2025, September 26). Fakta-fakta Pembobolan Rekening Dormant Rp204 Miliar. https://www.cnnindonesia.com/nasional/20250926070515-12-1277851/fakta-fakta-pembobolan-rekening-dormant-rp204-miliar
Darmawan, R. A., Musyafa, A., & Handayani, M. (2026). Optimization of RNN and Tree-Based Models with Imbalance Handling for Fraud Detection in Digital Banking Transactions. Jurnal Ilmiah Multidisiplin Indonesia (JIM-ID), 5(02), 347–366.
Elkan, & Charles. (2001). The Foundations of Cost-Sensitive Learning.
Fernández, A., Garcia, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from Imbalanced Data Sets.
Forough, J., & Momtazi, S. (2021). Ensemble of deep sequential models for credit card fraud detection. Applied Soft Computing, 99. https://doi.org/10.1016/j.asoc.2020.106883
Friedman, J. H. (2001). GREEDY FUNCTION APPROXIMATION: A GRADIENT BOOSTING MACHINE 1. In The Annals of Statistics (Vol. 29, Number 5).
Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why do tree based models still outperform deep learning on tabular data.
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. In Expert Systems with Applications (Vol. 73, pp. 220–239). Elsevier Ltd. https://doi.org/10.1016/j.eswa.2016.12.035
Hajek, P., Novotny, J., & Munk, M. (2026). Financial statement fraud detection using topic-driven financial sentiment analysis. Decision Support Systems, 203, 114615. https://doi.org/10.1016/j.dss.2026.114615
Höppner, S., Baesens, B., Verbeke, W., & Verdonck, T. (2022). Instance-dependent cost-sensitive learning for detecting transfer fraud. European Journal of Operational Research, 297(1), 291–300. https://doi.org/10.1016/j.ejor.2021.05.028
Husin, L. S. S., Darmayanti, E. F., & Nusantoro, J. (2025). Implementasi Model Pendekatan Machine Learning untuk Deteksi Fraud pada Transaksi Pembayaran Digital Paper.Id.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. https://github.com/Microsoft/LightGBM.
Kurniawan, M., Putra, H., Mintaraga, C. A., & Hidayaturrahman. (2025). Sequential Oversampling for Fraud Detection: Leveraging Generative Adversarial Networks and Continual Learning Approach in Imbalanced Data Streams. Procedia Computer Science, 269, 485–501. https://doi.org/10.1016/j.procs.2025.08.301
Lawson, R., & Nancy, J. (2024). Detecting First-Party Fraud in Online Lending Using Machine Learning Models.
Lemaˆıtre, G., Nogueira, F., & Aridas, C. K. (2017). Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. In Journal of Machine Learning Research (Vol. 18). http://jmlr.org/papers/v18/16-365.html.
Liu, W., Han, Y., Lan, X., Yu, Z., Xia, M., Lin, S., Pang, C., & Chen, N. (2026). Progressive gradient boosted trees for imbalanced financial distress prediction. Expert Systems with Applications, 321. https://doi.org/10.1016/j.eswa.2026.132187
Lundberg, S., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. http://arxiv.org/abs/1705.07874
Nugroho, L. P., Saputro, R. E., & Utomo, F. S. (2026). Performance Comparison Of Xgboost Lightgbm And Lstm For E-Commerce Repeat Buyer Prediction. Jurnal Teknik Informatika (JUTIF), 7(1). https://doi.org/10.52436/1.jutif.2026.7.1.5746
Pozzolo, A. D., Caelen, O., Johnson, R. A., & Bontempi, G. (2015). Calibrating probability with undersampling for unbalanced classification. Proceedings - 2015 IEEE Symposium Series on Computational Intelligence, SSCI 2015, 159–166. https://doi.org/10.1109/SSCI.2015.33
PPATK. (2025, September 25). Bareskrim Polri Ungkap Kasus Pembobolan Rekening Dorman Bank BUMN Rp204 Miliar Terkait Kejahatan Siber dan Pencucian Uang. https://www.ppatk.go.id/news/read/1529/bareskrim-polri-ungkap-kasus-pembobolan-rekening-dorman-bank-bumn-rp204-miliar-terkait-kejahatan-siber-dan-pencucian-uang.html
Raschka, S. (2018). Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning.
Shapley, L. S. (1952). A Value for N-Person Games.
Shwartz-Ziv, R., & Armon, A. (2022). Tabular data: Deep learning is not all you need. Information Fusion, 81, 84–90. https://doi.org/10.1016/j.inffus.2021.11.011
Sugiyono. (2023). METODE PENELITIAN KUANTITATIF, KUALITATIF, DAN R&D. Alvabeta. www.cvalfabeta.com
Tao, S., Peng, P., Li, Y., Sun, H., Li, Q., & Wang, H. (2024). Supervised contrastive representation learning with tree-structured parzen estimator Bayesian optimization for imbalanced tabular data. Expert Systems with Applications, 237. https://doi.org/10.1016/j.eswa.2023.121294
Tayebi, M., & El Kafhali, S. (2025). A novel approach based on XGBoost classifier and Bayesian optimization for credit card fraud detection. Cyber Security and Applications, 3. https://doi.org/10.1016/j.csa.2025.100093
Vanderschueren, T., Verdonck, T., Baesens, B., & Verbeke, W. (2022). Predict-then-optimize or predict-and-optimize? An empirical evaluation of cost-sensitive learning strategies. Information Sciences, 594, 400–415. https://doi.org/10.1016/j.ins.2022.02.021
Velarde, G., Sudhir, A., Deshmane, S., Deshmunkh, A., Sharma, K., & Joshi, V. (2023). Evaluating XGBoost for Balanced and Imbalanced Data: Application to Fraud Detection. http://arxiv.org/abs/2303.15218
Wald, A. (1950). Statistical Decision Functions.
Widjaja, G. (2026). Digitalisasi Sistem Pembayaran Dan Risiko Hukum Ai Dalam Deteksi Fraud: Kepastian Hukum Bagi Pelaku Usaha Dan Konsumen Dalam Ekonomi Finansial Teknologi. Jurnal Ekonomi Dan Bisnis (Jebi), 3(11).
Xiao, Y., Tan, L., & Liu, J. (2025). Application of Machine Learning Model in Fraud Identification: A Comparative Study of CatBoost, XGBoost and LightGBM. https://doi.org/10.20944/preprints202503.1199.v1
Xu, B., Wang, Y., Liao, X., & Wang, K. (2023). Efficient fraud detection using deep boosting decision trees. Decision Support Systems, 175. https://doi.org/10.1016/j.dss.2023.114037
Zanah, A. S., Calista, B., Dewi, W. N., & Sokibi, P. (2025). Deteksi Dini Fraud pada Layanan Keuangan Digital Menggunakan Metode Random Forest. In Indonesian Journal on Data Science (Vol. 3, Number 2). https://ejournal.unjaya.ac.id/index.php/ijds
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Nefzah Atirah Qalby, Achmad Fitro, Achmad Kautsar, Riska Dhenabayu

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








































