A Review on Novel Approach to Handle Imbalanced Credit Card Transactions

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2018 by IJETT Journal
Volume-62 Number-2
Year of Publication : 2018
Authors : Sudhansu R. Lenka, Bikram K. Ratha, Biswaranjan Nayak
DOI :  10.14445/22315381/IJETT-V62P214

Citation 

MLA Style: Sudhansu R. Lenka, Bikram K. Ratha, Biswaranjan Nayak "A Review on Novel Approach to Handle Imbalanced Credit Card Transactions" International Journal of Engineering Trends and Technology 62.2 (2018): 80-95.

APA Style:Sudhansu R. Lenka, Bikram K. Ratha, Biswaranjan Nayak (2018). A Review on Novel Approach to Handle Imbalanced Credit Card Transactions International Journal of Engineering Trends and Technology, 62(2), 80-95.

Abstract
Recently, most of the people are using credit and debit cards for every purchase and payment. So the excessive usage of these cards attracts the illicit to implement different techniques to create fraudulent activities against these transactions. As a result, each year billions of dollars are lost due to ineffectiveness of the fraud detection system. Credit card transactions are highly imbalanced, since most of them are genuine and very few are fraudulent. These imbalance transactions lead to a huge challenge for the machine learning and data mining algorithms. A single algorithm cannot accelerate the performance of the model, so ensemble of classifiers is the approach to handle such an issue. In this paper, we first provide different approaches to detect frauds, the methods to evaluate the performance and the challenges faced by the fraud detection model. Second, we present a comprehensive review on the imbalanced problem, state of the art on ensembles techniques, assessment measures to evaluate the algorithm performance and finally, we perform different comparison tests among the ensemble-based methods.The ensemble-based methods are classified into different categories to handle imbalanced fraudulent transactions where each method is grouped based on their working principle. The comparison tests of different methods have shown that the performance of the detection model can be improved by integrating random undersampling approaches with bagging or boosting methods. Additionally, the results justifies that ensemble-based methods are worthwhile in integrating the pre-processing techniques before learning the classifier.

Reference
[1] Raymond Anderson. The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation. Oxford University Press, 2007.
[2] Jon TS Quah and M Sriganesh. Real-time credit card fraud detection using computational intelligence. Expert Systems with Applications, 35(4):1721–1732, 2008.
[3] Christopher M Bishop et al. "Pattern recognition and machine learning", volume 4, springer New York, 2006.
[4] Sam Maes, Karl Tuyls, Bram Vanschoenwinkel, and Bernard Manderick,”Credit card fraud detection using bayesian and neural networks”, In Proceedings of the 1st international naiso congress on neuro fuzzy technologies, 2002.
[5] Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41(10):4915–4928, 2014.
[6] D. Hand. “Measuring classifier performance: a coherent alternative to the area under the roc curve”, Machine learning, 77(1):103–123, 2009.
[7] C. Bahnsen, D. Aouada, and B. Ottersten,” Example-dependent cost-sensitive decision trees”, Expert Systems with Applications, 2015.
[8] N. Mahmoudi and E. Duman. “Detecting credit card fraud by modified fisher discriminant analysis”, Expert Systems with Applications, 42(5):2510–2516, 2015.
[9] Bharatheesh, T.L. &Iyengar, S.S. (2004). Predictive Data Mining for Delinquency Modeling. Retrieved 18 July, 2012.
[10] J. R. Quinlan, “Improved estimates for the accuracy of small disjuncts,” Mach. Learn., vol. 6, pp. 93–98, 1991.
[11] Zadrozny and C. Elkan, “Learning and making decisions when costs and probabilities are both unknown,” in Proc. 7th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, New York, 2001, pp. 204–213.
[12] G. Wu and E. Chang, “KBA: kernel boundary alignment considering imbalanced data distribution,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 786–795, Jun. 2005.
[13] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of severalmethods for balancingmachine learning training data,”SIGKDDExpl. Newslett., vol. 6, pp. 20–29, 2004.
[14] N. V. Chawla, K.W. Bowyer, L. O. Hall, andW. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.
[15] N. V. Chawla, N. Japkowicz, and A. Kolcz, Eds., Special Issue Learning Imbalanced Datasets, SIGKDD Explor. Newsl., vol. 6, no. 1, 2004.
[16] N. Chawla, D. Cieslak, L. Hall, and A. Joshi, “Automatically countering imbalance and its empirical relationship to cost,” Data Min. Knowl. Discov., vol. 17, pp. 225–252, 2008.
[17] Freitas, A. Costa-Pereira, and P. Brazdil, “Cost-sensitive decision trees applied to medical data,” in Data Warehousing Knowl. Discov. (Lecture Notes Series in Computer Science), I. Song, J. Eder, and T. Nguyen, Eds., Berlin/Heidelberg, Germany: Springer, 2007, vol. 4654, pp. 303–312.
[18] C. Elkan. The foundations of cost-sensitive learning. In International Joint Conference on Artificial Intelligence, volume 17, pages 973–978. Citeseer, 2001.
[19] C. Whitrow, D. J. Hand, P. Juszczak, D. Weston, and N. M. Adams, Transaction aggregation as a strategy for credit card fraud detection, Data Mining and Knowledge Discovery, 18(1):30–55, 2009.
[20] M. Krivko, “A hybrid model for plastic card fraud detection systems”, Expert Systems with Applications, 37(8):6070–6076, 2010.
[21] H. He and E. A. Garcia,” Learning from imbalanced data”, Transactions on Knowledge and Data Engineering, 21(9):1263–1284, 2009.
[22] Dal Pozzolo, O. Caelen, and G. Bontempi,” When is undersampling effective in unbalanced classification tasks?”, In Machine Learning and Knowledge Discovery in Databases. Springer, 2015.
[23] G. Ditzler and R. Polikar. Incremental learning of concept drift from streaming imbalanced data. Transactions on Knowledge and Data Engineering, 25(10):2283–2301, 2013.
[24] J. Gao, B. Ding, W. Fan, J. Han, and P. S. Yu. Classifying data streams with skewed class distributions and concept drifts. Internet Computing, 12(6):37–49, 2008.
[25] R.C. Prati, G.E.A.P.A. Batista, Class imbalances versus class overlapping: an analysis of a learning system behavior, in: Proceedings of the Mexican International Conference on Artificial Intelligence, April 2004, pp. 312–321.
[26] Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 2nd ed. London, U.K.: Chapman & Hall/CRC, 2006.
[27] K. Woods, C. Doss, K. Bowyer, J. Solka, C. Priebe, and W.Kegelmeyer, “Comparative Evaluation of Pattern Recognition Techniques for Detection of Microcalcifications in Mammography,” Int?l J. Pattern Recognition and Artificial Intelligence, vol. 7, no. 6, pp. 1417-1436, 1993.
[28] R. Polikar, “Ensemble based systems in decision making,” IEEE Circuits Syst. Mag., vol. 6, no. 3, pp. 21–45, 2006.
[29] L. Rokach, “Ensemble-based classifiers,” Artif. Intell. Rev., vol. 33, pp. 1–39, 2010.
[30] R. E. Schapire, “The strength of weak learnability,” Mach. Learn., vol. 5, pp. 197–227, 1990.
[31] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” J. Comput. Syst. Sci., vol. 55, no. 1, pp. 119–139, 1997.
[32] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, pp. 123–140, 1996.
[33] N. Japkowicz and S. Stephen, “The class imbalance problem: A systematic study,” Intell. Data Anal., vol. 6, pp. 429–449, 2002.
[34] Japkowicz, N. and S. Stephen, "The class imbalance problem: A systematic study”, Intelligent data analysis, 2002. 6(5): p. 429-449.
[35] Nguyen, G.H., A. Bouzerdoum, and S.L. Phung, "Learning Pattern Classification Tasks with Imbalanced Data Sets". 2009.
[36] Chen, Rong-Chang, Luol, Shu-Ting, Lee, Vincent C.S., ,”Personalized approach based on SVM and ANN for detecting credit card fraud”, Proceedings of the IEEE International Conference on Neural Networks and Brain, pp. 810–815,2005.
[37] S. Holm, “A simple sequentially rejective multiple test procedure,”Scand. J. Stat., vol. 6, pp. 65–70, 1979.
[38] J. P. Shaffer, “Modified sequentially rejective multiple test procedures,” J. Am. Stat. Assoc., vol. 81, no. 395, pp. 826–831, 1986.
[39] https://www.kaggle.com/mlg-ulb/credit card fraud.
[40] Gama, Joao, Bifet, Albert, Pechenizkiy, Mykola, Bouchachia, Abdelhamid, “A survey on concept drift adaptation”, ACM Comput. Surv, 2013.
[41] Liu, Y. Ma, and C. Wong, “Improving an association rule based classifier,” in Principles of Data Mining and Knowledge Discovery (Lecture Notes in Computer Science Series 1910), D. Zighed, J. Komorowski, and J. Zytkow, Eds., 2000, pp. 293–317.
[42] Y. Lin, Y. Lee, and G.Wahba, “Support vector machines for classification in nonstandard situations,” Mach. Learn., vol. 46, pp. 191–202, 2002.
[43] R. Barandela, J. S. S´anchez, V. Garc´?a, and E. Rangel, “Strategies for learning in class imbalance problems,” Pattern Recog., vol. 36, no. 3, pp. 849–851, 2003.
[44] Zhou, Z.-H., Liu, X.-Y., "On multi-class cost-sensitive learning",Comput. Intell. 26(3), 232–257, 2010.
[45] Chawla,N.V., Bowyer,K.W., Hall, L.O.,Kegelmeyer,W.P." Smote: synthetic minority over-sampling technique", J.Artif. Intell. Res. 16, 321–357, 2002.
[46] Fern´andez, S. Garc´?a, M. J. del Jesus, and F. Herrera, “A study of the behaviour of linguistic fuzzy-rule-based classification systems in the framework of imbalanced data-sets,” Fuzzy Sets Syst., vol. 159, no. 18, pp. 2378–2398, 2008.
[47] Drummond and R.C. Holte. C4.5, "class imbalance, and cost sensitivity: why under-sampling beats over-sampling", In Workshop on Learning from Imbalanced Datasets II, 2003.
[48] S. Hu, Y. Liang, L. Ma, and Y. He, “MSMOTE: Improving classification performance when training data is imbalanced,” in Proc. 2nd Int. Workshop Comput. Sci. Eng., 2009, vol. 2, pp. 13–17.
[49] M. Galar, A. Fern´andez, E. Barrenechea, H. Bustince, and F. Herrera, Member,"A Review on Ensembles for the Class Imbalance Problem: Bagging, Boosting, and Hybrid-Based Approches", IEEE Transactions on systems, man and cybernetics, 2011.
[50] J. Stefanowski and S. Wilk, “Selective pre-processing of imbalanced data for improving classification performance,” in Data Warehousing and Knowledge Discovery, I.-Y. Song, J. Eder, and T. Nguyen, Eds., 2008, pp. 283–292.
[51] T. K. Ho, J. J. Hull, and S. N. Srihari, “Decision combination in multiple classifier systems,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 16, no. 1,pp. 66–75, Jan. 1994.
[52] T. K. Ho, “Multiple classifier combination: Lessons and next steps,”in Hybrid Methods in Pattern Recognition, Kandel and Bunke, Eds.Singapore: World Scientific, 2002, pp. 171–198.
[53] N. Ueda and R. Nakano, “Generalization error of ensemble estimators,”in Proc. IEEE Int. Conf. Neural Netw., 1996, vol. 1, pp. 90–95.
[54] S. Geman, E. Bienenstock, and R. Doursat, “Neural networks and the bias/variance dilemma,” Neural Comput., vol. 4, pp. 1–58, 1992.
[55] Krogh and J. Vedelsby, “Neural network ensembles, cross validation, and active learning,” in Proc. Adv. Neural Inf. Process. Syst., 1995, vol. 7, pp. 231–238.
[56] G. Brown, J. Wyatt, R. Harris, and X. Yao, “Diversity creation methods: A survey and categorization,” Inf. Fusion, vol. 6, no. 1, pp. 5–20, 2005.
[57] L. I. Kuncheva, “Diversity in multiple classifier systems,” Inf. Fusion, vol. 6, no. 1, pp. 3–4, 2005.
[58] B. Kong and T. G. Dietterich, “Error-correcting output coding corrects bias and variance,” in Proc. 12th Int.Conf.Mach. Learning, 1995, pp. 313–321.
[59] R. Kohavi and D. H. Wolpert, “Bias plus variance decomposition for zero-one loss functions,” in Proc. 13th Int. Conf. Mach. Learning, 1996.
[60] G. M. James, “Variance and bias for general loss functions,” Mach.Learning, vol. 51, pp. 115–135, 2003.
[61] K. Tumer and J. Ghosh, “Error correlation and error reduction in ensemble classifiers,” Connect. Sci., vol. 8, no. 3–4, pp. 385–404, 1996.
[62] X. Hu, “Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications,” in Proc. IEEE Int. Conf. Data Mining, 2001, pp. 233–240.
[63] N. C. Oza and K. Tumer, “Classifier ensembles: Select real-world applications,” Inf. Fusion, vol. 9, no. 1, pp. 4–20, 2008.
[64] G. Wu and E. Y. Chang, "Class-boundary alignment for imbalanced dataset learning",Proc. ICML?03 Workshop on Learning from Imbalanced Data Sets, Washington, DC (August 2003).
[65] Seiffert, T. Khoshgoftaar, J. Van Hulse, and A. Napolitano, “Rusboost:A hybrid approach to alleviating class imbalance,” IEEE Trans. Syst.,Man, Cybern. A, Syst., Humans, vol. 40, no. 1, pp. 185–197, Jan. 2010.
[66] X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for classimbalance learning,” IEEE Trans. Syst., Man, Cybern. B, Appl. Rev,vol. 39, no. 2, pp. 539–550, 2009.
[67] L. Breiman, “Pasting small votes for classification in large databases and on-line,” Mach. Learn., vol. 36, pp. 85–103, 1999.
[68] X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H. Zhou, M. Steinbach, D.J. Hand, and D. Steinberg, “Top 10 algorithms in data mining,” Knowl.Inf. Syst., vol. 14, pp. 1–37, 2007.
[69] J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” Ann. Statist., vol. 28, pp. 337–407, 1998.
[70] Rudin, I. Daubechies, and R. E. Schapire, “The dynamics of AdaBoost: Cyclic behavior and convergence of margins,” J. Mach. Learn. Res., vol. 5,pp. 1557–1595, 2004.
[71] Y. Sun, M. S. Kamel, A. K.Wong, and Y.Wang, “Cost-sensitive boosting for classification of imbalanced data,” Pattern Recog., vol. 40, no. 12, pp. 3358–3378, 2007.
[72] S.Wang and X. Yao, “Diversity analysis on imbalanced data sets by using ensemble models,” in IEEE Symp. Comput. Intell. Data Mining, 2009, pp. 324–331.
[73] Tao, X. Tang, X. Li, and X. Wu, “Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 1088–1099, Jul. 2006.
[74] Chang, B. Li, G. Wu, and K. Goh, “Statistical learning for effective visual information retrieval,” in Proc. Int. Conf. Image Process., 2003, vol. 3, no. 2, pp. 609–612.
[75] S. Hido, H. Kashima, and Y. Takahashi, “Roughly balanced bagging for imbalanced data,” Stat. Anal. Data Min., vol. 2, pp. 412–426, 2009.
[76] P. K. Chan and S. J. Stolfo, “Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection,” in Proc. 4th Int. Conf. Knowl. Discov. Data Mining (KDD-98), 1998, pp. 164–168.
[77] R. Yan, Y. Liu, R. Jin, and A. Hauptmann, “On predicting rare classes with SVM ensembles in scene classification,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2003, vol. 3, pp. 21–24.
[78] Li, “Classifying imbalanced data using a bagging ensemble variation (BEV),” in Proc. 45th Annual Southeast Regional Conference (Association of Computing Machinery South East Series 45). New York: ACM, 2007, pp. 203–208.
[79] W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “Adacost: Mis-classification cost-sensitive boosting,” presented at the 6th Int. Conf. Mach. Learning, pp. 97–105, San Francisco, CA, 1999.
[80] Kim,Min-jung,Kim,Taek-soo,”A Neural Classifier with Fraud Density Map for Effective Credit Card Fraud Detection”.pp.378–383, 2002.
[81] Chen, Rong-Chang,”A new binary support vector system for increasing detection rate of credit card fraud”, Int .J. Pattern Recognit .Artif.Intell.20 (2), 227–239, 2006.
[82] N.V. Chawla, A. Lazarevic, L. O. Hall, and K.W.Bowyer, “SMOTEBoost: Improving prediction of the minority class in boosting,” in Proc. Knowl. Discov. Databases, 2003, pp. 107–119.
[83] H. Guo and H. L. Viktor, “Learning from imbalanced data sets with boosting and data generation: The DataBoost-IM approach,” SIGKDD Expl. Newsl., vol. 6, pp. 30–39, 2004.
[84] R. Barandela, R. M. Valdovinos, and J. S. S´anchez, “New applications of ensembles of classifiers,” Pattern Anal. App., vol. 6, pp. 245–256, 2003.
[85] J. B?aszczy´nski,M. Deckert, J. Stefanowski, and S.Wilk, “Integrating selective pre-processing of imbalanced data with ivotes ensemble,” in Rough Sets and Current Trends in Computing (Lecture Notes in Computer Science Series 6086),M. Szczuka,M. Kryszkiewicz, S. Ramanna, R. Jensen, and Q. Hu, Eds. Berlin/Heidelberg, Germany: Springer-Verlag, 2010, pp. 148–157.
[86] T. Fawcett, “ROC Graphs: Notes and Practical Considerations for Data Mining Researchers,” Technical Report HPL-2003-4, HP Labs, 2003.
[87] T. Fawcett, “An Introduction to ROC Analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861-874, 2006.
[88] J. Davis and M. Goadrich, “The Relationship between Precision-Recall and ROC Curves,” Proc. Int?l Conf. Machine Learning,pp. 233-240, 2006.
[89] R. Bunescu, R. Ge, R. Kate, E. Marcotte, R. Mooney, A. Ramani, and Y. Wong, “Comparative Experiments on Learning Information Extractors for Proteins and Their Interactions,” Artificial Intelligence in Medicine, vol. 33, pp. 139-155, 2005.
[90] P. Singla and P. Domingos, “Discriminative Training of Markov Logic Networks,” Proc. Nat?l Conf. Artificial Intelligence, pp. 868-873, 2005.
[91] Hofmann, Markus, “A comprehensive survey of methods for overcoming the class imbalance problem in fraud detection”, 2012.
[92] J. R. Quinlan, C4.5: Programs for Machine Learning, 1st ed. San Mateo, CA: Morgan Kaufmann Publishers, 1993.
[93] C.-T. Su and Y.-H. Hsiao, “An evaluation of the robustness of MTS for imbalanced data,” IEEE Trans. Knowl. Data Eng., vol. 19, no. 10, pp. 1321–1332, Oct. 2007.
[94] A. Cieslak and N. V. Chawla, “Learning decision trees for unbalanced data,” in Machine Learning and Knowledge Discovery in Databases (Lecture Notes in Computer Science Series 5211),W. Daelemans, B. Goethals, and K. Morik, Eds., 2008, pp. 241–256.
[95] S. Garc´?a, A. Fern´andez, J. Luengo, and F. Herrera, “A study of statistical techniques and performance measures for genetics-based machine learning: Accuracy and interpretability,” Soft Comp., vol. 13, no. 10, pp. 959–977, 2009.
[96] F.Wilcoxon, “Individual comparisons by ranking methods,” Biometrics Bull., vol. 1, no. 6, pp. 80–83, 1945.
[97] J. Kittler, M. Hatef, R. Duin, and J. Matas, “On combining classifiers,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, pp. 226–239, Mar.1998.

Keywords
Bagging, Boosting, Cost-sensitive learning, Ensembles, Imbalance data set, Credit card fraud.