Why a Business Analytics Problem Demands all of your Data Science Skills

The rise (or re-emergence) of powerful machine learning (ML) and artificial intelligence (AI) techniques is helping this transformation to a large extent.However, setting aside all the hype around these amazing new developments, we should recognize the basic fact — a data-driven analytics process is much like a complex and intricate dance performance, where a high degree of harmony between the participants — data extraction, wrangling, statistical modeling, business logic, etc.— is absolutely necessary for any measure of success.Just like an intricate dance drama, there is a conductor — you — the data scientist, who needs to understand all aspects of the story, the tempo, the overall message, and the strength and weakness of individual artists and their whims and fancies.That’s why doing high-quality analytics is so much more than just being good at the individual skills such as,data extraction or database manipulation (e.g. SQL)visualization and dash-boarding (e.g. Tableau, ggplot2 etc.)data wrangling techniques (e.g. TidyR)statistical analysis (e.g. R, SAS, numpy/scipy)machine learning (e.g. scikit-learn, H2O, Caret)time-series analysismathematical optimization (e.g. PuLP)discrete, stochastic simulation (ARENA or SimPy)cloud deployment (e.g. Shiny, Flask, JS)Even a simple enough business problem can call for bringing together all of these analytics skills and a harmonious approach to solve the problem analyzing it from multiple vantage points.A case study to illustrate the complex interplayWe show the example of an analytics case study to illustrate how all the actors must come together to make this dance performance a grand success.This study is based on a real-life business problem faced by the electric power company of a large-sized US metropolitan area (although no details will be shared).Note: We will not do any actual data analysis or modeling but give a rundown on how an analytics professional must think in a variety of dimensions to solve such a typical business problem using data science tools and techniques.Imagine that the power company is debilitating about shutting down power supply to willful defaulters i.e..those customers who have the means to pay, but are not paying the bill for some time deliberately..Note, this means that a significant portion of the consumers should not be considered for this shut-down operation because they may fall below the threshold income limit which qualifies them for delayed or deferred payments (and which is taken care of by some welfare fund).What is the main drama and who are the actors?Fundamentally, in the context of for-profit businesses, almost any analytics problem can be cast as an optimization problem where some aspect of the business needs to be maximized or minimized..In most cases, it is some kind of revenue maximization or cost minimization.Not surprisingly, our power company wants to minimize the total cost of this shut-down campaign..And obviously, shutting down the power has many costs — some are apparent, some other not so much…operational cost (the team for physical shut down, travel, equipment),lost revenue (some people may eventually pay if the power is kept on along with any penalty but shutting down power is straight lost revenue)seasonality factor — the lost revenue can vary depending on the season, as the power consumption varies greatly depending on the weather patterncost of reputation — shutting down essential utility like power has a societal cost and may result in lost reputation for the companypotential litigation cost — if the impacted customer somehow got ground to bring a lawsuit against the companyTherefore, we understand that we must formulate an optimization problem subject to the constraints of the company’s engineering and operational capacity factors..But let us start from the beginning and revisit this later.Capacity and priority considerationsThe bottom-line question is which shutoffs should be done each month, given the capacity constraints..One consideration is that some of the capacity — the workers’ time — is taken up by travel, so maybe the shutoffs can be scheduled in a way that increases the number of them that can be done.Not every shutoff is equal..Some shutoffs shouldn’t be done at all, because if the power is left on, those people are likely to pay the bill eventually..How can we identify which shutoffs should or shouldn’t be done?.And, among the ones to shut off, how should they be prioritized?Already, we are starting to see some key computational techniques that we may have to employ to solve this problem — optimization, scheduling, classification, ranking, and prioritization.The over-arching frameworkThe fundamental problem, the power company is facing, can be broken up in two key components.How to identify the customers that are unlikely to pay for the power in spite of having the means to payHow to optimally allocate the resources and build a plan to shut-off power to those customers over a period of time considering multiple variables and constraints of — (a) potential ‘lost’ revenue due to intentional non-payment, (b) employee time (for shut-off duty), (c) travel time, (d) city layout and the relative positions (and accessibility) of those shut-off locations, (e) traffic conditions, (f) employee efficiency to carry out shut-off (all employees may not be equally swift to carry out the shut-off operation).The over-arching analytics framework can be illustrated as in the following diagram..It is still a high-level plan and does not indicate the intricate details.The Classification and Clustering modelsThis is the front-end model we need to classify the customers in multiple categories..Potentially, we can come up with three broad categories,Safe customers — who continue to pay (barring occasional miss or late payment)Customers who are unable to pay because of genuine reasonCustomers who are risky i.e.. More details

Leave a Reply