Consumer Debt: Predicting default with machine learning methods

2021

The aim of this thesis is to explore if a machine learning model can create value

by predicting default at the time of credit application. In extension of this, the

thesis will evaluate whether a predictive model can be used to reduce future

monetary losses associated with accepting applicants who later default on their

consumer debt. Furthermore, we explore whether or not information from the

Norwegian Registry of Consumer Debt improves the predictive performance.

The scope of the thesis is limited to customers in the Norwegian market

who was granted consumer debt by the examined company in the period

of November 2019 - February 2020. Several resampling techniques as well as

cost-sensitive learning were explored as the data was highly imbalanced. The

issue was ultimately addressed with cost-sensitive learning, by assigning weights

to the classes. The following machine learning (ML) models were explored: ML

version of Logistic Regression, Random Forest and eXtremeGradientBoosting.

These models were optimized and compared with traditional statistical models.

The models were trained on a stratified random selection consisting of 85% of

the data. The results were obtained by deploying the model on the remaining

15% of the data, called the holdout data. The ML models were individually

optimized across three dimensions: variable selection, hyperparameter tuning,

and resampling technique. Ultimately, the best performing model was

eXtremeGradientBoosting trained on data with no resampling, 66 variables

and a minority class weight of 36:1.

The study concludes that a machine learning model can create value by

predicting default at the time of credit application, as 44% of the applicants

who defaulted were predicted correctly. This comes at the expense of a 4%

misclassification of applicants who did not default. However, monetary losses are

reduced as the avoided loss exceeds the potential loss of income. Additionally,

the information from the Norwegian Debt Registry contributed to an increase

in performance by correctly predicting more defaults.

Keywords – Machine Learning, Consumer Debt, Debt Registry, BI

Masteroppgave(MSc) in Master of Science in Business Analtyics - Handelshøyskolen BI, 2021

Handelshøyskolen BI