False positive reduction endeavors with automated feature engineering

2021

Credit card fraud has been a problem for decades, and with the booming trend of

online shopping fraud losses expected to rise for every year to come. Fraud detection

systems often generate more false positives than true positives in order to attain a

higher detection level of fraudulent transactions. These false positives have plagued

the fraud detection industry for years as they are expensive to investigate and require

extensive manual labor.

An automated feature engineering approach was implemented to address the problem

of high false positives while at the same time conserving most of the true positives.

We generate a high feature space (1750 features) of rich features without manual

intervention other than specifying the primitives. In addition, a feature reduction

method is implemented to retain the features with the highest predictive power to

counteract the dimensionality problem of the method.

To compare our results, there were two additional datasets created for benchmarking

purposes. The first dataset only included the cleaned original features, referred to as

the baseline. In the second dataset, we generated manual features from the original

data to reproduce the situation of a domain expert. The proposed solution was tested

with the XGBoost to quantify the effect of the automated feature engineering on the

reduction of false positives and was compared to the benchmarking datasets.

Our analysis of the results shows that automated feature engineering can improve

false positives by 84% while managing to retain 89% of the true positives compared

to the baseline dataset. In addition, we find no significant difference between

automated and manual feature engineering on the discarding of false positives, and

both methods are equally good. However, the results suggest that an automated

approach can cut down feature engineering time a lot while providing richer features

than manual feature engineering, suggesting a potential for bottom-line savings by

reducing the number of domain experts and improved efficiency in the analytical life

cycle.

Masteroppgave(MSc) in Master of Science in Business Analytics - Handelshøyskolen BI, 2021

Handelshøyskolen BI