Are they lost forever?Research and Proactive OutreachIt turns out that for these anonymous visitors, which we call visitor non-converts because they never converted to a lead, we have one critical piece of information.
Each visitor to any web site has an IP address, which is a unique internet address from where the visitor accessed our site.
Many of these IP addresses point back to Internet Service Providers (ISPs) such as Verizon or Comcast, which are a dead-end for us.
However, if someone works at, say, Acme Consulting and accessed our site from work, with a technique called reverse DNS lookup, we can often translate that IP address to the company Acme Consulting.
From there, we can point our research team at identifying people at Acme who fit the profile of our typical buyer (say, their COO or VP of Professional Services) and contacting them individually.
We don’t necessarily need to reach out to the exact person who actually visited our site, but just knowing that Acme has some interest in a product like ours is a good trigger for us to reach out.
This research and proactive outreach process, if done right, actually should be fairly labor intensive.
Our team needs to make sure we’re reaching out to the right people and figuring out the right targeted, personalized touchpoints for each person.
Touchpoints that are not just templatized email spam, and may just as likely be a phone call, a comment on a blog post, a retweet of an interesting perspective, or an introduction to a relevant colleague.
Touchpoints that require an understanding of the market, the environment in which the recipient is working, and the pain points that the prospect is likely facing.
Most of all, touchpoints that actually provide value to the recipient.
The problem is we may easily have tens of thousands of visitors to our web site each month.
Far too many for just the research phase, much less for the labor-intensive personalized outreach.
Behavioral AnalysisThis is where, in my opinion, things get interesting…where the data science comes into play.
Because we don’t have any hope of doing even the most basic research on each of the tens of thousands of IP addresses each month, we need to figure out how to prioritize.
We can use automated means to filter out known ISPs, already converted SQLs, current clients, and other visitors to narrow down the set of visitor non-converts, but that still leaves us with huge numbers that need to be further prioritized.
Enter behavioral analysis.
We can tell an awful lot from things like web logs about how a visitor behaved on our site.
As mentioned, we break the content on our site out into different categories, such as our blog, our product pages, pages about our company, press releases, job descriptions, etc.
We find the certain pages (such as our pricing page) are good indicators of buying intent on their own.
However, it’s often combinations of behaviors that are more interesting.
A person who looks at a smaller number of pages, but at a wide variety of content may indicate higher interest.
If a visitor looks at a lot of pages but only at one type, say, our job descriptions, that person is more likely a hiring candidate than a potential customer.
Someone who reads only our blog, but doesn’t look at our product pages may be more interested in educating themselves about best practices in running a professional services firm rather than finding a product to help them do so.
A visitor who looks at nothing but our product pages and screenshots may be trying to reverse engineer our application and not be at all interested in actually using it.
Our web logs also are able to tell us how a visitor first reached our site — whether they searched for something in Google or clicked through from a partner’s site.
They can show how long the visitor spent on each page, whether they signed up for premium content (such as our eBook), and whether they came back multiple times for more information.
By tracing each visitor through the data that we collect on our web site, we can do a fair amount of analysis about that person’s behavior and infer some things about their interests.
In the past, traditional marketing automation tools provided some capabilities to turn all this behavior into actionable information through a mechanism called lead scoring.
Users would configure the tool to understand that reading a blog post might be worth one point, viewing a product page might be worth two, and spending more than 30 seconds on the pricing page was worth five.
This was a bit of a brute-force chain saw approach to a problem that required the precision of a scalpel.
It required the lead scoring model to be set up manually based on some fairly arbitrary assumptions on the relative worth of each action, could rarely be applied retroactively to historical data, and often correlated poorly with real-world results.
Machine LearningFinally, we get to the artificial intelligence and machine learning.
We wanted to avoid the arbitrary nature of the manually compiled lead scoring mechanisms.
To do this, we decided to teach a machine learning classification model to determine the right weightings of all the behavioral inputs to try to predict propensity to convert.
For each visitor, both successful SQL conversions as well as visitor non-converts, we took all the behavioral data and turned it into a set of flags, ratios, numbers, and other metrics.
This will be different for each situation, but for us, this consisted of factors such as total number of pages viewed, fractions of pages that were of each type (blogs, product pages, job descriptions, etc.
), average duration on different types of pages, etc.
Most importantly, we captured our key target attribute, a flag that indicated whether the visitor converted to an SQL or not:With all of this data crunched, we turned it into a vector for each visitor and fed all of our visitor data into several machine learning categorization algorithms.
After comparing their outputs, we finally settled on a particular neural network model in the end.
(I won’t go into the details of the process we used to compare and select from among the different algorithms here.
)This dataset, about 40,000 visitors, served as both training and test data.
As we gathered more data going forward, we could make decisions whether to test the new visitors using this initial training model or to use the additional data to continue to train the algorithm.
Either way, what we were after in the end was the model’s confusion matrix.
The confusion matrix shows that of all of all the visitors who actually did not convert, the algorithm predicted 99.
6% of them correctly.
Of the visitors who did convert, it predicted 80.
However, other than acknowledging that the algorithm was getting a decent hit rate, these correct predictions, the true negative and true positive cases, actually aren’t all that interesting to us.
What we’re really interested in are the wrong predictions, specifically the 0.
4% that the algorithm thought should have converted but didn’t.
These false positives are effectively visitor non-converts (visitors who didn’t convert into a lead) who “acted” more like SQLs (visitors who did convert into leads).
These are the visitors that are worth taking a closer look at and, if we just take the binary 0/1 prediction at its word, this 0.
4% alone turns into over a hundred potential prospects worth chasing.
What’s even more interesting is that we don’t have to just take the binary 0/1 prediction at its word.
We can look at all the visitor non-converts (the 0.
4% as well as the 99.
6%) and look at the p(1) probability provided by the classification model.
That is, the probability between 0 and 1 that the visitor should have converted as predicted by the algorithm.
This gives us finer grained control over which visitor non-converts we chase.
If our research and outreach teams are totally slammed, we may only chase prospects with a 0.
8 probability or greater (a small fraction of the 0.
On the other hand, if they’ve got lots of capacity, we may chase prospects with a 0.
4 probability or greater (thus dipping a bit into the 99.
The point is that this approach doesn’t give us just a binary sense of should/should not have converted, it prioritizes our entire list for us based on the probability of predicted conversion.
ConclusionsI’d love to say that this process quadrupled our lead generation rate overnight.
I’d love to say that our entire marketing team went on vacation to Bora Bora for a month because their work for the next year was done.
But, I can’t.
Not because we didn’t get positive results (we did), but because we’re just at the very start of this process…it’s early days yet.
We’re starting to get a handle on the prospect identification piece and are working on getting all the processes, infrastructure, and training in place for the research and proactive outreach efforts.
What I can say is that in our initial very small proof of concept, we reached out to four companies that we identified and converted one into an SQL.
That’s too small a sample size to reliably predict actual conversion rates long term.
But, it was enough for us to expand our initial proof of concept into a larger initiative.
This sort of proof of concept, even if it hadn’t resulted in any positive conversions, is exactly the type of thinking we love to explore as a company to try to gain that little competitive edge.
To look at things a little differently.
To try to do things more efficiently.
To get better at identifying the red umbrellas.