Use Tableau to Do the Advanced AnalyticsFangyuan LiBlockedUnblockFollowFollowingJan 2Complex statistical models and arcane codes are what people think when the analytics is mentioned.
However, they are just some parts of the analytics.
In the real world, most of the audience doesn’t know algorithms or any programming stuff.
Delivering an analytical result that is easy to understand becomes the key for every company, and visualization is the tool to reach this goal.
Telling a good story is always the first thing to think before doing the visualization.
No one will be interested in the visualization with unclear and boring graphs.
CEOs, shareholders or customers can be the audience.
It’s important to determine the target audience before making the visualizations because the deliverable can be different for the different groups.
Moreover, the visualization is much more difficult than most people think.
It needs lots of time to conceive a good story using multiple and complicated techniques.
The project that I’ve done recently proves how efficient and useful the visualization is.
The scope is to deliver four analytic assets that each uses a specific ACL test to assist in the detection of incorrect or potentially fraudulent transactions:Invoices Greater than Approved Purchase OrdersBenford’s Law VisualizationVendor Payment TermsVendor Invoice FrequencyInvoices Greater Than A ThresholdInvoices Greater than Approved Purchase OrdersA purchase order is a document initiated by the buyer and sent to the seller.
In terms of the order, the purchase order precedes the invoice and defines the term of sales.
The invoice is provided by the entity performing service and serves as notification that the agreed scope of work in the purchase order has been completed and is ready to be staged for payment.
The primary goal of controllership organizations is to ensure that proper controls remain in place to identify when these two document amounts do not match.
Instances in which an invoice amount exceeds a purchase order amount are indicative of suspect behavior either on the part of the vendor or on the part of the transaction capture system.
When analyzing these differences using our dashboard, I did find some positive results.
In particular, the differences between purchase orders and invoices tend to moderate over time considering that I also had many instances in which the invoice amount was actually lower than the purchase order amount.
The time series below illustrates this moderating impact.
In general, days with high transaction volumes (the larger and darker blue dots) tend to moderate in terms of the overall difference, hovering around an average of between ten and thirty dollars each day, very close given the scale of the data.
Our analytic provides two other sources of information in a visual format: a list of all transactions on the selected date and a histogram identifying the distribution of differences on the selected date.
Fortunately, our histogram indicates that most transactions tend to hover around the zero difference mark.
Benford’s Law VisualizationDue to the number order of our numerical system of digits (i.
, a predictable sequence from zero through nine), “ real ” datasets that meet certain criteria will follow predictable behavior in the frequency these digits occur in a particular order (Berger & Hill, 2017).
In short, because the number one is before other digits in our numeric system, in practice, a greater number of observations in a “ real ” dataset are likely to begin with the number one than to begin with other digits later in the sequence.
Benford’s Law is most often applied when analyzing the frequency of first digits of all observations in a series — i.
, in a column of a table.
The law indicates expectations of how frequently a particular digit should appear relative to the total number of observations.
Below is a table indicating Benford’s Law expectations for first digits (Singleton, 2011):If a dataset adheres to the above distribution, it is said to follow Benford’s Law and thus likely be a naturally occurring dataset.
In contrast, if a dataset does not adhere to Benford’s distribution, then there is an increased likelihood that the dataset is, at least in part, synthetic or manipulated.
In performing analysis on the entirety of the Invoice File dataset from Stanley Black & Decker, I discovered that Benford’s Law is indeed applicable to invoice series.
Invoice data near-perfectly match Benford’s Law expectations based on the above bar graph.
A numeric breakdown of expected versus actual values is below.
To flag vendors whose body of invoices do not adhere to Benford’s Law, I stratified and clustered vendors based on two metrics.
First, I grouped vendors into four quartiles based on the total number of invoices present in the dataset.
Vendors with high volumes are of more interest than vendors with low interest as low volumes may introduce noise into Benford’s Law calculations.
For instance, I would not expect a vendor with only five transactions to adhere to any predictable distribution; there is simply not enough volume.
Following this grouping, I then calculated four tiers of differences in Benford’s Law expectations, calculating each digit’s actual versus expected value and taking the total sum across all differences as the indicator for the vendor.
This created four risk tiers.
Again, only those vendors with a high Benford Risk Level are of interest.
In contrast to the Benford’s Law distribution for all invoices, distributions for the 35 vendors in the bottom right of the above matrix skew wildly from expectations despite a very high number of invoices.
Below is one example of a vendor’s actual versus predicted first digit distributions (Vendor ID 30044959).
In this case, this vendor had over 300 invoices in the provided dataset.
This behavior is irregular.
While not definitive evidence of fraud, the resulting distribution is indicative of some kind of systemic pattern.
This is the limit of applying Benford’s Law to transaction data for individual vendors — not all irregular patterns can be attributed to fraud.
Nonetheless, I would recommend that Stanley Black & Decker apply scrutiny to the vendors I have flagged to confirm that these variances are expected and explainable.
Vendor Payment TermsVendor terms are used as guides indicating the amount of time in which a purchase order should be successfully cleared.
The theory behind this particular analytic is to identify vendors that potentially do not provide sufficient payment terms to complete the purchase-to-pay process.
Rushed processes can result in errors; this analytic is designed to flag vendors that place the controllership organization in a position to process information in a rushed manner.
In theory, vendor payment terms should generally be sufficient and realistic.
In practice, many of the vendors that the company engages with do not supply payment terms, resulting in payment terms of zero days (i.
, pay as soon as possible).
In terms of raw count, most vendors specify payment terms that equate to payment as soon as possible.
These records are assigned a payment term of zero days in the company’s SAP source.
Additionally, I can note from above that most vendor payment terms adhere to logical timeboxes.
There are spikes at 30, 60, 90, and 120 days out.
To a lesser degree, there are also peaks at other logical time periods — notably two weeks and 100 days.
I also applied the vendor payment term information to invoice transaction data per the customer’s recommendation.
Below is an all-in look at invoices by time to clear (blue bars) and the average vendor payment term (black lines) broken down by invoice risk classes that we defined in “Invoices Greater than Approved Purchase Orders”.
In general, this is a positive view of payment behavior.
For each relative risk class, the company is clearing invoices well within the timeframe needed to meet vendor payment terms on average.
However, using this view, we can also identify instances of where payment terms are not met for particular vendors or through particular periods of time.
Below is an example of a potential issue with one particular vendor (Vendor 0020001178):Here, the vendor’s payment terms fall around the 10-day mark, and the vendor has a number of invoices spanning from low to moderately high.
In this case, we can see that for all invoice risk class levels.
I leave to the company the determination of whether this is a problem (i.
, if this vendor will not be satisfied with the transaction and whether that outweighs following the process that is taking more time than the payment term).
However, this analytic can now capture these insights.
Additionally, I examined whether there was any temporal pattern into whether payment terms were met or now.
Below is a view showing the percentage of invoices cleared within payment terms by week.
Yellow indicates payment term targets were not met.
Blue indicates payment terms were met.
In this case, we have filtered payment term targets to greater than 30 days to filter out unrealistic targets.
Particularly earlier in the year, the company meets these targets very well.
Unsurprisingly, during the summer months these targets are missed more often.
This could be the result of individuals out of the office or other capacity constraints.
Over 78% of invoices were cleared under the vendor payment term target.
However, I recommend this analytic be used to view results on a vendor-by-vendor basis to maintain positive vendor relationships and inform any future discussions with vendors.
Invoice FrequencyThe theory behind this one is that it will be possible to determine when a vendor is engaging in high volumes of invoicing and to use this frequency information to identify “spikes” that could be indicative of data quality issues or fraud.
If a vendor generally has low invoice volumes, a dramatic rise would constitute irregular behavior that should be analyzed further.
This information can also be enriched with other invoice attributes, including items such as total invoice volume, day of the week, releaser and approver, etc.
This view enables the company to view how many invoices are coming in overtime and the potential volume that will need to be dealt with by the controllership organization.
Certain trends are readily apparent.
Major holidays coincide with a drop in invoice volume.
Summer months tend to trend downward in terms of the number of invoices.
More trends can be examined by using the multitude of filters available, including document date range, vendor, and day of the week.
For instance, below is the total transaction history for one particular vendor (Vendor 001009399).
this interface can be used to ask some interesting questions.
Why did this vendor have a steady uptick in invoice volume of the last week of June?.What happened on the spike in May?.Without additional context into the contents of these transactions, we cannot add much additional insight into the analytic.
However, a productivity monitor such as this one does enable business discovery functions for the controllership organization.
One particular recommendation for the company would be to build in approver and releaser information for these transactions and to enable those items as a filter to see how individual employees are performing over time.
Invoices Greater Than A ThresholdThis analytic is designed to allow the company to insert their own thresholds and yield the results in their authorization framework.
Although the end goal was simple, the company wanted to extract invoices greater than the client-configured threshold to create a listing for the level of authority testing by the audit.
The approach required some thought, especially regarding the fields to be used in flagging invoices and how to present the analytic in a digestible format.
Additionally, it required designing synthetic authorization groups for demonstration.
I created invoice risk levels by generating eight groups based on invoice reporting amount in which each risk class contains an equal number of invoices.
Within each of these invoice risk class definitions, the distribution of invoices is as expected.
In general, the distribution of invoices tends to fall as the invoice amount in reporting currency increases.
The Distribution for Risk Class Seven in Particular.
ConclusionUltimately, I have attempted to maintain a level of context throughout the development work to ensure the solutions will add value.
The code I delivered should be modular enough to integrate into specific workflows; the visuals I provided should be easy to emulate and connect to live data feeds.
These tests in an actionable framework that enables the business to make informed decisions.
This is how I apply visualizations to the analytics.
It requires the integration of quantitative and qualitative skills at the same time.
The most critical factor among all graphs and charts is business insight.
In another way, each graph should at least deliver one valuable opinion/insight that supports the recommendation or conclusion of the project.