Fraud Detection: Give me reasonsJuan Diego BermeoBlockedUnblockFollowFollowingJan 27We will explore the two requirements specified by the client in the introduction of the series, and focus on the nuances that arise of the models built, their performance, and particularly their functionality in terms of the business objectives.
The article will center on how business understanding can lead not only to better feature engineering, but to a better understanding of the pattern modeled and its limitations.
Specifically, we will look at a single feature that on its own practically solves the entire classification problem as it currently is, but since we are dealing with adversaries that can adapt, we will also see how this feature and the high performance it brings is not the best option in the medium or long term, as our model would continually become obsolete and eventually un-updatable.
With this in mind, let’s proceed with the first requirement, by building a model that automatically identifies fraud.
1st Requirement: A model to predict fraud in mobile transactionsGiven that we are dealing with somewhat of a toy dataset, in terms of the number of features, we can easily build a Random Forest classifier with some under-sampling to balance the dataset.
The following performance in terms of AUC-ROC and logloss is attained on the test set with random forest (The code for everything shown in this article can be found on a repository in BitBucket).
Random Forestauc logloss 0.
06435374Confusion Matrix and StatisticsReferencePrediction 0 1 0 162256 9 1 3468 492 Accuracy : 0.
9791 Kappa : 0.
2164 Sensitivity : 0.
9791 Specificity : 0.
9820 Pos Pred Value : 0.
9999 Neg Pred Value : 0.
1242 Prevalence : 0.
9970 Detection Rate : 0.
9761 Detection Prevalence : 0.
9762This performance metrics are really good with a rather small number of false positives, meaning that as far as the first objective or requirement is concerned, this model seems to satisfy it.
In a normal Kaggle setting, at this point our work would be 100% complete, we could proceed to deploy this model for our client, and be done with our business engagement.
However, this is often not the case as most businesses are adamant about having explicability from the model to obtain insights they can use, either in the general strategy around the process modeled or in other related processes.
2nd Requirement: Understanding the factors that characterize a fraudulent transactionWe resorted to explicable models like logistic regression and decision trees to understand the factors.
After looking at the rules, importance of the variables, coefficients, exploring visually through scatterplots and barcharts the relationship between features and the target variable, whilst also incorporating basic business context, as well as more rounds of model building, the following results were obtained:1The strategy of fraudsters is to always extract the highest amount possible from the accounts, whether it is by extracting the full balance, or extracting/transferring the maximum amount allowed of 10 million.
This means that strategies and policies devised should be mindful of this behavior, and that reviewers of flagged fraudulent transactions should look for overly ambitious transfers, especially compared to past transactions of the customer.
2 Transferring the maximum amount almost completely describes the behavior of fraudsters, as a decision tree or logistic regression built with only this feature has the following performanceauc logloss bac 0.
99747902Confusion Matrix ReferencePrediction 0 1 0 165550 2 1 174 499A decision tree built with this feature and the original ones has the following performanceauc logloss bac 0.
99798287Confusion Matrix ReferencePrediction 0 1 0 165717 2 1 7 4993 Fraud can occur either by direct extraction in the form of a CASH_OUT as soon as the attacker takes control of the victims account, or through a TRANSFER to one or more mule accounts from which the money is later cashed out.
Both types of transaction have the same number of occurrences of fraud, however, fraud happens 4 times as much within TRANSFER than within CASH_OUT.
This means that when a reviewer sees a TRANSFER, he or she should be more suspicious of a CASH_OUT.
4 For fraudulent transactions with type TRANSFER, destination accounts have a balance of 0 before and after the transaction.
Meaning that either the transaction is blocked, or there is a problem in the transactional system when handling fraudulent transactions.
The reviewer then should also look for inconsistencies in the transaction itself.
For model building purposes, these zero values could be interpreted as missing values.
5A closer look at figure 1 reveals an inconsistent pattern occurring in the transactions, as amounts transferred are much larger than the original balance of the account.
When looking further into it, we noticed that around 90% of transactions have this inconsistency, 91% of them corresponding to unsubstantiated amounts between $9.
540 and $939.
000 of the currency used.
This is of course, also implies an inconsistency with the client’s transactional system, which should be looked into (In reality this implies the controls for the simulated transfer are not taking this into account).
6 Since the dataset does not show a lengthy history of transactions per client, one can hardly use their past behavior to compare and contrast the possible fraudulent transactions.
In this case, more data should be requested to the customer (in reality, it might be that the simulated data was parameterized this way).
These set of facts and results could be presented to the customer as the deliverable of the second requirement, with the caveat that result number 1 has some very clear implications on what the models might be learning, and how robust they might be to actions taken in response by fraudsters.
In order to understand better the point on robustness as well as how understanding the business problem helps guide the analysis and feature engineering, let’s take a closer look at how result number 1 was reached.
The first result can be derived from understanding how fraud takes place.
According to the business knowledge we have (from the paper), the fraudster agents can take control of the account either by faking customer support calls, stealing the phone that has the account linked, or SIM phishing swap.
In all of these scenarios, the fraudsters only have access to the account for a limited time while the user reports or notices the theft.
Additionally, they have access to a very limited number of accounts, since this is not a highly scalable approach unless one also increases significantly the number of people involved in the operation.
These factors imply that the fraudster is not preoccupied about the subtleness in his theft at all, so he or she will always go for a blunt strategy of highest possible yield from each individual account.
With this in mind, it makes complete sense to explore the pattern of the amount of the transaction compared to the original balance.
It is not a stretch to presume that fraudulent transactions would leave a very ambitious trail with respect to these two variables as indeed occurs when we plot them in figure 1.
The plot then leads us to the maximum_transfer feature that separates almost completely fraudulent transactions from legitimate ones.
This is precisely how understanding the business context can drive a better and guided search for high quality features.
Of course, it is not the only way to get to them, in fact, we went through typically long EDA sessions before arriving at them.
Hopefully with this example we have you a little bit more convinced that always bearing in mind business context and objectives of the problem is better, as it can get you to better features sooner.
However, the point that we want to drive is that the route of understanding is not only faster, it takes you further.
Notice that our classification problem has a very clear and precise decision frontier, so much so that our models would become obsolete very quickly given that our adversaries, the fraudsters, could very quickly adapt to it.
It would not take the fraudster long to notice they cannot withdraw the entire balance of the account or make transactions of 10 million, which is the separation that our models are drawing (directly or indirectly).
In response, they could simply transfer a smaller amount and avoid detection.
The model could be retrained so it could detect the new smaller fraudulent transactions, but what would a fraudster do in response?.well they would consistently keep adapting to the lower threshold.
This cat and mouse game would continue until eventually the threshold would start conflicting with normal transactions and the service’s usability, and the bank would have to strike a balance between the service’s proper functioning and the fraud that is allowed to occur, at which point our model cannot be retrained under the same pardigm.
The previous is a rather impractical situation for our client, as our framework for detecting fraud would inevitably age poorly.
Even worse, our product would fail to meet the customer’s expectations based on promise made with the performance on the test set, since the model most definitely would not identify 99.
9% of fraudulent transactions that occur.
In conclusion, requirement 1 would not be satisfied, and we would find ourselves in a situation of complete under-delivery in light of the expectations set.
As you can see, by understanding appropriately our model and the problem it aims to represent, we can foretell these pitfalls, warn ourselves and our customers about them, and act proactively against them.
Our claim is that most likely this can only happen with an appropriate business or domain understanding of the problem and a proper alignment with its functional objectives.
Having that mindset is how we figured out this problem, and also possibly why you do not see a single kernel in Kaggle, where the dataset is also posted, that refers to this very practical and real world issue (granted, it is not an actual Kaggle competition).
We believe this also plays a huge role as to why so many Data Science projects feel so disheartening or as outright failures when finally evaluated.
What about the solution for the current problem?.How to correct the model so that it can better stand the passing of time?.We will explore this in the next post where we will induce robustness into the model.