False positive reduction endeavors with automated feature engineering
Master thesis
View/ Open
Date
2021Metadata
Show full item recordCollections
- Master of Science [1822]
Abstract
Credit card fraud has been a problem for decades, and with the booming trend of
online shopping fraud losses expected to rise for every year to come. Fraud detection
systems often generate more false positives than true positives in order to attain a
higher detection level of fraudulent transactions. These false positives have plagued
the fraud detection industry for years as they are expensive to investigate and require
extensive manual labor.
An automated feature engineering approach was implemented to address the problem
of high false positives while at the same time conserving most of the true positives.
We generate a high feature space (1750 features) of rich features without manual
intervention other than specifying the primitives. In addition, a feature reduction
method is implemented to retain the features with the highest predictive power to
counteract the dimensionality problem of the method.
To compare our results, there were two additional datasets created for benchmarking
purposes. The first dataset only included the cleaned original features, referred to as
the baseline. In the second dataset, we generated manual features from the original
data to reproduce the situation of a domain expert. The proposed solution was tested
with the XGBoost to quantify the effect of the automated feature engineering on the
reduction of false positives and was compared to the benchmarking datasets.
Our analysis of the results shows that automated feature engineering can improve
false positives by 84% while managing to retain 89% of the true positives compared
to the baseline dataset. In addition, we find no significant difference between
automated and manual feature engineering on the discarding of false positives, and
both methods are equally good. However, the results suggest that an automated
approach can cut down feature engineering time a lot while providing richer features
than manual feature engineering, suggesting a potential for bottom-line savings by
reducing the number of domain experts and improved efficiency in the analytical life
cycle.
Description
Masteroppgave(MSc) in Master of Science in Business Analytics - Handelshøyskolen BI, 2021