Classifying Profitable Customers Vased on Meta Data with Machine Learning
Abstract
The objective of this master's thesis is to explore if a machine learning model can predict sale outcomes from the metadata generated by potential leads as they navigate the company’s website. Additionally, the research aims to identify if this model can be used to create value by improving their commercial process.
The data used to answer the research question in this thesis was obtained by Universidad Insurgentes during the period between January 2020 to March 2022. The initial dataset contained more than 0.5 million samples and 41 attributes. The data preparation consisted of several techniques to address challenges such as high cardinality, missing values, and feature engineering. The final dataset used for training and testing consisted of ~250,000 samples with 56 features.
We train and evaluate the performance of three machine learning models: eXtreme Gradient Boosting (XGBoost), CatBoost, and Light Gradient Boosting Machine (LightGBM), which were all compared and evaluated against a simple logistic regression and the default model profit.
Our study concludes that there is a theoretical potential for profit gain when using machine learning to predict sales on CRM metadata. LightGBM is identified as the best-performing algorithm in the context of this thesis. We recommend a heuristic approach for profit and enrollment maximization and include a nuanced discussion about the implied costs of implementing machine learning to predict sales.
Keywords – Machine Learning, Sales prediction, Metadata, Commercial process, BI
Description
Masteroppgave(MSc) in Master of Science in Business Analytics - Handelshøyskolen BI, 2022