Consumer Debt: Predicting default with machine learning methods
Master thesis
View/ Open
Date
2021Metadata
Show full item recordCollections
- Master of Science [1791]
Abstract
The aim of this thesis is to explore if a machine learning model can create value
by predicting default at the time of credit application. In extension of this, the
thesis will evaluate whether a predictive model can be used to reduce future
monetary losses associated with accepting applicants who later default on their
consumer debt. Furthermore, we explore whether or not information from the
Norwegian Registry of Consumer Debt improves the predictive performance.
The scope of the thesis is limited to customers in the Norwegian market
who was granted consumer debt by the examined company in the period
of November 2019 - February 2020. Several resampling techniques as well as
cost-sensitive learning were explored as the data was highly imbalanced. The
issue was ultimately addressed with cost-sensitive learning, by assigning weights
to the classes. The following machine learning (ML) models were explored: ML
version of Logistic Regression, Random Forest and eXtremeGradientBoosting.
These models were optimized and compared with traditional statistical models.
The models were trained on a stratified random selection consisting of 85% of
the data. The results were obtained by deploying the model on the remaining
15% of the data, called the holdout data. The ML models were individually
optimized across three dimensions: variable selection, hyperparameter tuning,
and resampling technique. Ultimately, the best performing model was
eXtremeGradientBoosting trained on data with no resampling, 66 variables
and a minority class weight of 36:1.
The study concludes that a machine learning model can create value by
predicting default at the time of credit application, as 44% of the applicants
who defaulted were predicted correctly. This comes at the expense of a 4%
misclassification of applicants who did not default. However, monetary losses are
reduced as the avoided loss exceeds the potential loss of income. Additionally,
the information from the Norwegian Debt Registry contributed to an increase
in performance by correctly predicting more defaults.
Keywords – Machine Learning, Consumer Debt, Debt Registry, BI
Description
Masteroppgave(MSc) in Master of Science in Business Analtyics - Handelshøyskolen BI, 2021