Instance Selection and Optimization of Neural Networks

Zongyuan Zhao, Shuxiang Xu, Byeong Ho Kang, Mir Md Jahangir Kabir, Yunling Liu

Abstract


Credit scoring is an important tool in financial institutions, which can be used in credit granting decision. Credit applications are marked by credit scoring models and those with high marks will be treated as "good", while those with low marks will be regarded as "bad". As data mining technique develops, automatic credit scoring systems are warmly welcomed for their high efficiency and objective judgments. Many machine learning algorithms have been applied in training credit scoring models, and ANN is one of them with good performance. This paper presents a higher accuracy credit scoring model based on MLP neural networks trained with back propagation algorithm. Our work focuses on enhancing credit scoring models in three aspects: optimize data distribution in datasets using a new method called Average Random Choosing; compare effects of training-validation-test instances numbers; and find the most suitable number of hidden units. Another contribution of this paper is summarizing the tendency of scoring accuracy of models when the number of hidden units increases. The experiment results show that our methods can achieve high credit scoring accuracy with imbalanced datasets. Thus, credit granting decision can be made by data mining methods using MLP neural networks.

Keywords


Back Propagation; Credit Scoring; Multilayer Perceptron; Neural Network

References


L.C. Thomas, D.B. Edelman, and J.N. Crook, "Credit Scoring and Its Applications," 2002: SIAM: Philadelphia, PA.

H.L. Jensen, "Using Neural Networks for Credit Scoring," Managerial Finance, vol. 18, no. 6, pp. 15, 1992.

B. Baesens, et al., "Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring," The Journal of the Operational Research Society, vol. 54, no. 6, pp. 627-635, 2003.

A.I. Marques, V. Garcia, and J.S. Sanchez, "Exploring the behaviour of base classifiers in credit scoring ensembles," Expert Systems with Applications, vol. 39, no. 11, pp. 10244-10250, 2012.

K. Bache and M. Lichman. {UCI} Machine Learning Repository. 2013; Available from: http://archive.ics.uci.edu/ml.

A. Khashman, "Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes," Expert Systems with Applications, vol. 37, no. 9, pp. 6233-6239, 2010.

A. Khashman, "A Modified Backpropagation Learning Algorithm With Added Emotional Coefficients," IEEE Transactions on Neural Networks, vol. 19, no. 11, pp. 1896-1909, 2008.

A. Khashman, "Credit risk evaluation using neural networks: Emotional versus conventional models," Applied Soft Computing, vol. 11, no. 8, pp. 5477-5484, 2011.

A. Marcano-Cedeio, et al., "Artificial metaplasticity neural network applied to credit scoring," International Journal of Neural Systems, vol. 21, no. 4, pp. 311-317, 2011.

T.-S. Lee and I.F. Chen, "A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines," Expert Systems with Applications, vol. 28, no. 4, pp. 743-752, 2005.

C.-F. Tsai and J.-W. Wu, "Using neural network ensembles for bankruptcy prediction and credit scoring," Expert Systems with Applications, vol. 34, no. 4, pp. 2639-2649, 2008.

A.B. Hens and M.K. Tiwari, "Computational time reduction for credit scoring: An integrated approach based on support vector machine and stratified sampling method," Expert Systems with Applications, vol. 39, no. 8, pp. 6774-6781, 2012.

B.-W. Chi and C.-C. Hsu, "A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model," Expert Systems with Applications, vol. 39, no. 3, pp. 2650-2661, 2012.

S.-T. Luo, B.-W. Cheng, and C.-H. Hsieh, "Prediction model building with clustering-launched classification and support vector machines in credit scoring," Expert Systems with Applications, vol. 36, no. 4, pp. 7562-7566, 2009.

Y. Peng, et al., "A Multi-criteria Convex Quadratic Programming model for credit data analysis," Decision Support Systems, vol. 44, no. 4, pp. 1016-1030, 2008.

J. Xiao, et al., "Dynamic classifier ensemble model for customer classification with imbalanced class distribution," Expert Systems with Applications, vol. 39, no. 3, pp. 3668-3675, 2012.

I. Brown and C. Mues, "An experimental comparison of classification algorithms for imbalanced credit scoring data sets," Expert Systems with Applications, vol. 39, no. 3, pp. 3446-3453, 2012.

M. Khashei, et al., "A bi-level neural-based fuzzy classification approach for credit scoring problems," Complexity, vol. 18, no. 6, pp. 46-57, 2013.

G. Wang, et al., "Two credit scoring models based on dual strategy ensemble trees," Knowledge-Based Systems, vol. 26, pp. 61-68, 2012.

R. Setiono, B. Baesens, and C. Mues, "Rule extraction from minimal neural networks for credit card screening," International Journal of Neural Systems, vol. 21, no. 4, pp. 265-276, 2011.

L. Yu, et al., "Credit risk evaluation using a weighted least squares SVM classifier with design of experiment for parameter selection," Expert Systems with Applications, vol. 38, no. 12, pp. 15392-15399, 2011.

S. Vukovic, et al., "A case-based reasoning model that uses preference theory functions for credit scoring," Expert Systems with Applications, vol. 39, no. 9, pp. 8389-8395, 2012.

Y. Ping and L. Yongheng, "Neighborhood rough set and SVM based hybrid credit scoring classifier," Expert Systems with Applications, vol. 38, no. 9, pp. 11300-11304, 2011.

G.B. Gonen, M. Gonen, and F. Gurgen, "Probabilistic and discriminative group-wise feature selection methods for credit risk analysis," Expert Systems with Applications, vol. 39, no. 14, pp. 11709-11717, 2012.

A.I. Marques, V. Garcia, and J.S. Sanchez, "On the suitability of resampling techniques for the class imbalance problem in credit scoring," Journal of the Operational Research Society, vol. 64, no. 7, pp. 1060-1070, 2013.

S.S. Haykin, Neural networks : a comprehensive foundation / Simon Haykin, 1999: Upper Saddle River, N.J. : Prentice Hall, c1999. 2nd ed.

D. West, "Neural network credit scoring models," Computers and Operations Research, vol. 27, no. 11-12, pp. 1131-1152, 2000.

T. Karkkainen and E. Heikkola, "Robust formulations for training multilayer perceptrons," Neural Computation, vol. 16, no. 4, pp. 837-862, 2004.

L. Yuchun, "Handwritten digit recognition using k nearest-neighbor, radial-basis function, and backpropagation neural networks," Neural Computation, vol. 3, no. 3, pp. 440-449, 1991.

T. Fawcett, "An introduction to ROC analysis," Pattern recognition letters, vol. 27, no. 8, pp. 861-874, 2006.

X.-W. Chen and M. Wasikowski, "FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems," in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, ACM: Las Vegas, Nevada, USA. pp. 124-132.

R. Wang and K. Tang, "Feature Selection for Maximizing the Area Under the ROC Curve," in Proceedings of the 2009 IEEE International Conference on Data Mining Workshops, 2009, IEEE Computer Society, pp. 400-405.

R. Wang and K. Tang, "Feature selection for MAUC-oriented classification systems," Neurocomputing, vol. 89, pp. 39-54, 2012.

J. Fogarty, R.S. Baker, and S.E. Hudson, "Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction," in Proceedings of Graphics Interface 2005, Canadian Human-Computer Communications Society: Victoria, British Columbia. pp. 129-136.


Full Text: PDF

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

IT in Innovation IT in Business IT in Engineering IT in Health IT in Science IT in Design IT in Fashion

IT in Industry (2012 - ) http://www.it-in-industry.com ISSN (Online): 2203-1731; ISSN (Print): 2204-0595