Abstract
The importance of credit access to improve economic opportunities in developing markets is well established in the literature. However, there exists a strong need to mitigate adverse selection problems in microlending. A risk scoring model that more accurately predicts the likelihood of repayment of potential borrowers can help address this market imperfection and to benefit both lenders and borrowers. This paper compares the performance of nonparametric versus semiparametric and traditional parametric risk scoring models based on default probabilities. We show the advantages of relying on less structured, data-driven methods for risk scoring using both simulated data and data from credit loans granted to small and microenterprises in rural Peru. The estimation results indicate that nonparametric methods lead to a better evaluation of credit worthiness and can help prevent including potential “bad” borrowers and excluding “good” borrowers from sensitive microcredit markets.
Similar content being viewed by others
Notes
As of December 2010, microfinance institutions reported reaching more than 205 million borrowers worldwide (Maes and Reed 2012). A separate issue pertains to whether microcredit has been an effective tool to lift poor people out of poverty by funding their microenterprises and increasing their wealth, considering that a large number of small businesses have been created through microcredits but only few have matured into larger businesses. Recent work evaluating the impact of microfinance using randomized field experiments provide mixed evidence regarding the effects of microcredit on household income and consumption (e.g., Banerjee et al. 2010; Dupas and Robinson 2009; Karlan and Zinman 2011).
There are also concerns that lending institutions have managed to sustain low interest rates and relatively high default rates due to subsidies and soft loans. Grameen Bank, for example, which charges an average real interest rate of 10 %, experienced losses close to 18 % of their outstanding loans from 1985 to 1996 after properly adjusting for their portfolio size (Armendariz and Morduch 2005).
See also Schreiner (2000) for additional discussion on credit scoring in microfinance.
Microfinance data in developing countries have been rather unexploited in general terms, in part due to the lack of information sharing across lending institutions.
We could also consider a continuous variable measuring the percentage of loan (installments) repaid by each individual.
The assumption that the threshold is zero is without loss of generality provided that X includes a constant.
An alternative estimator can be found in Ichimura (1993), but it is less efficient than the estimator proposed by Klein and Spady for binary choice models.
Klein and Spady add a trimming function to the log likelihood function, although trimming does not seem to matter in their simulations. Single index models further require two identification conditions under which the parameter vector \(\beta \) and function \(g(\cdot )\) can be sensibly estimated. First, the set of explanatory variables \(X\) must contain at least one continuous variable. Second, \(\beta \) cannot be identified without some location and scale restrictions (normalizations). One popular location-normalization is to not include a constant in \(X\); one popular scale-normalization is to assume that the first component of \(X\) has a unit coefficient and that this first component is a continuous variable. For further details on single index model estimations refer to Li and Racine (2006).
An alternative selection method is the standard rule-of-thumb procedure in which the bandwidth for covariate \(X_s \) is defined as \(h_s =X_{s,sd} n^{{-1}/{(4+q)}}\), where \(X_{s,sd} \) is the sample standard deviation of \(X_s , n\) is the number of observations in the working sample, and \(q\) is the total number of covariates in \(X\).
In this sense, the local linear estimator is similar to the standard linear probability model. We thank an anonymous referee for noting this.
See Racine (2008) for further details on nonparametric conditional mode models.
While the Probit model is implemented in Stata, the single index and nonparametric models are implemented in R using the np package.
McFadden et al. (1977) performance measure is equal to \(p_{11} +p_{22} -p_{12}^2 -p_{21}^2 \), where \(p_{ij} \) is the ijth entry (expressed as a fraction of the sum of all entries) in the 2 \(\times \) 2 confusion matrix of actual versus predicted (0,1) outcomes.
The Logit and linear probability model also perform very similar to the Probit model. Details are available upon request.
Note also that the differences in the MSPEs across models are more pronounced for “high” asset values, largely explained by the much lower correct default classification rate of the Probit and single index models.
Of course, it is possible that the odds of defaulting are linear to all covariates; but still in this (implausible) scenario, data-driven methods will perform at least similar to linear models.
The name of the bank is omitted due to confidentiality reasons.
Unfortunately, we only have information on asset (real estate) ownership but not on asset value. We also do not have information on debt ratio.
We estimate a random-effects Probit model since a client may be observed more than once in the database.
We also considered alternative data partitions (70–30 and 50–50 %) and obtained qualitatively similar results. The results are also not sensitive to repeated 60–40 % data partitions.
As indicated above, the local linear model may yield fitted values greater than one or less than zero. In this case, the fitted values range between \(-\)0.01 and 1.06, where 14 observations (out of 1,739) are greater than one and one observation is less than zero.
The predictive performance (both in-sample and out-of-sample) of the Logit and linear probability model are very similar to the performance of the Probit model. Further details are available upon request.
We also do not account for the probability of crop failure or climate conditions, but these variables are unlikely to explain default behavior in this case since the loans analyzed were granted to smallholder famers operating in a particular rural area in Peru.
The nonparametric method also points toward a nonlinear relationship between the odds of defaulting and other covariates.
References
Armendariz B, Morduch J (2005) The economics of microfinance. MIT Press, Cambridge
Banerjee A, Duflo E, Glennerster R, Kinnan C (2010) The miracle of microfinance? Evidence from a randomized evaluation. Working paper, MIT Poverty Action Lab
Capon N (1982) Credit scoring systems: a critical analysis. J Market 46(2):82–91
Coleman B (2006) Microfinance in Northeast Thailand: who benefits and how much? World Dev 34(9):1612–1638
de Janvry A, McIntosh C, Sadoulet E (2010) The supply- and demand-side impacts of credit market information. J Dev Econ 93(2):173–188
Dupas P, Robinson J (2009) Savings constraints and microenterprise development: evidence from a field experiment in Kenya. NBER Working Paper No. 14693
Fan J, Gijbels I (1996) Local polynomial modeling and its applications. Chapman and Hall, London
Ghosh P, Mookherjee D, Ray D (2000) Credit rationing in developing countries: an overview of the theory. In: Mookherjee D, Ray D (eds) Readings in the theory of development economics. Blackwell, London
Hand D, Henley W (1997) Statistical classification methods in consumer credit scoring: a review. J R Stat Soc Ser A 160(3):523–541
Ichimura H (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J Econom 58(1–2):71–120
Karlan D, Zinman J (2011) Microcredit in theory and practice: using randomized credit scoring for impact evaluation. Science 332:1278–1284
Khandker S (2005) Microfinance and poverty: evidence using panel data from Bangladesh. World Bank Econ Rev 19(2):263–286
Klein R, Spady R (1993) An efficient semiparametric estimator for binary response models. Econometrica 61(2):387–421
Li Q, Racine J (2004) Nonparametric estimation of regression functions with both categorical and continuous data. J Econom 119(1):99–130
Li Q, Racine J (2006) Nonparametric econometrics: theory and practice. Princeton University Press, Princeton
Luoto J, McIntosh C, Wydick B (2007) Credit information systems in less developed countries: a test with microfinance in Guatemala. Econ Dev Cult Change 55(2):313–334
Maes J, Reed L (2012) State of the Microcredit Summit Campaign Report 2012. Microcredit Summit Campaign
McFadden D, Puig C, Kirschner D (1977) Determinants of the long-run demand for electricity. Proc Am Stat Assoc 1:109–117
Pregibon D (1979) Data analytic methods for generalized linear models. PhD dissertation, University of Toronto
Racine J (1997) Consistent significance testing for nonparametric regression. J Bus Econ Stat 15(3):369–378
Racine J (2008) Nonparametric econometrics: a primer. Found Trends Econom 3(1):1–88
Racine J, Hart J, Li Q (2006) Testing the significance of categorical predictor variables in nonparametric regression models. Econom Rev 25(4):523–544
Schreiner M (2000) Credit scoring for microfinance: can it work? J Microfinance 2(2):105–118
Tukey J (1949) One degree of freedom for non-additivity. Biometrics 5(3):232–242
Acknowledgments
We would like to thank Qi Li, Carlos Martins-Filho, Robert Kunst, and two anonymous referees for their valuable comments. We also thank Christopher Marciniak for his valuable research assistance.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Hernandez, M.A., Torero, M. Parametric versus nonparametric methods in risk scoring: an application to microcredit. Empir Econ 46, 1057–1079 (2014). https://doi.org/10.1007/s00181-013-0703-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00181-013-0703-8