Build a Predictive Model on Snowflake in 1 day with Xpanse AI

????12:30pm — Lunch BreakClassic chicken curry with rice & chips mix and a salad.

2:00pm — Reviewing Project OutputsWe are back at our desk and the processing is already finished.

We are interested in 3 main outputs:Modelling DatasetPredictive ModelScoring CodeModelling DatasetFirstly — let’s take a look at the Modelling Dataset.

Xpanse AI aggregated all 5 tables into one flat Modelling Dataset with 780 columns.

Those columns (aka Features or Variables) contain aggregated information on customers’ behaviours and it’s something that traditionally would require months of manual work.

It’s quite astonishing to see months of Data Engineering designed and executed by a machine within 1 hour.

This is exactly what Machine Learning algos need.

Predictive ModelLet’s look what Machine Learning delivered.

From the business perspective Models are evaluated based on their “Gain” or “Uplift”.

Both are represented as charts and allow to assess how much better the Models are comparing to the random approach to targeting.

This is the Gain chart for our Model:Reading of that chart is quite simple: as an example — if we target top 20% of risky customers — we will capture 77% of all churners.

We can quickly check several cut-off points for the Retention campaign and decide how many customers we want to target with a retention offer.

Another way of looking at Model’s accuracy is the Cumulative Lift chart:If we were choosing the same 20% of our customer base to target with the Retention Campaign — we would “capture” 3.

9 times more future churners comparing to a random targeting.

It’s a quite powerful model.

Time to deploy it.

“Deploying” the model means installing it in a place where it can “score” our customers for their churn risk on a regular basis.

Since our data resides in the Snowflake database — this is the best place to deploy the model.

Scoring CodeHere is something that we truly love about Xpanse AI: You can export a deployable model as SQL code.

There are 2 SQL parts to itScoring Table ETL — this code preps the data in our data warehouse to the scoring formatScoring Model ETL — this is an SQL representation of the Machine Learning model ready to generate propensity scores based on the data in the Scoring Table.

This makes the Model deployment a breeze comparing to other tools.

Whenever new data arrives, e.


as part of our nightly batch load or even in real-time we can score the data to determine the likelihood of our customers churning by simply running those two SQL statements.

3:00pm — Deploying Scoring Model on SnowflakeLet’s take the two SQL statements and create two scripts with CREATE OR REPLACE TABLE statements.

The two scripts can be included in our nightly data flow and are executed after the main data warehouse model has been populated.

A great tool for data orchestration is Apache Airflow, which we have written extensively about in other posts.

Looking at the scripts you can get a better understanding on what exactly is going on.

The model is fully transparent.

That’s another nice side effect.

You can exactly follow the logic of the churn score for your clients.

4:00pm — Test RunOk.

Ready to rock.

Let’s execute the two scripts and look at the results.

Script number 1 puts the data into the required format for scoring against our model.

Let’s have a look at the output.

As you can see, this script has generated the features required by our model, which can be identified by column name VAR and a sequence number.

Please also note the default Score of 0.

The score will get updated in our next step when we determine the probability of our clients to churn against the modelSo let’s feed the features into our model and run the script number 2.

You can see that the score has now been populated.

The higher the score the higher the likelihood that a particular customer will churn.

Great stuff, all done!We can now feed the information of likely churners to our sales team who can take action by making the churn candidates some special offer.

As we all know, any analytics project without the appropriate action is pointless.

5:00pm — Guinness in a PubLet’s briefly recap what we did before we enjoy the black stuff.

In a first step we exported the training data from our Snowflake data warehouse to the Xpanse platformNext we gave Xpanse some information on where in the data set it can find the target variable and timestamp information.

In a third step the platform generated the Modelling Dataset and trained the Predictive Model.

Xpanse also generated the entire scoring ETL flow as a SQL script.

In the next step we deployed the scoring SQL scripts into our data warehouse and scheduled them for execution as part of our nightly ETL.

In the last step we scored the likelihood of our customers to churn.

All in one day.

SláinteOriginally published at sonra.

io on January 3, 2019.

.. More details

Leave a Reply