Intelligent computing in Snowflake

Intelligent computing in SnowflakeJames WeakleyBlockedUnblockFollowFollowingMay 25In a little over a week, I’m heading over to Snowflake’s inaugural user summit in San Francisco, where I’ll be speaking on data sharing in the Australian Private Health Insurance industry.

Not only am I pumped for the conference, but while I’m over there I’m also really excited to visit Numenta’s office for a meetup event.

Snowflake have built the next generation data warehouse, and Numenta are building what I consider to be the next generation machine learning algorithm, so it’s going to be a big week!I’ve had a long-standing interest in machine learning since I first studied it at university about 15 years ago.

It was in the middle of the last AI winter, so I wasn’t expecting a whole lot from the course as nobody was really talking about it.

But it was fascinating, philosophically challenging, and it left a lasting impression on me.

In my current day job, I get to work on data analytics processes that involve current generation machine learning.

But being aware of some of its limitations and forever curious about how human intelligence works, I have followed Numenta’s research closely and been part of their community for a number of years.

So to channel my excitement for my upcoming trip, this post is an experiment where I combine the two technologies; running Numenta’s current intelligent computing algorithm in-database on Snowflake, and using it to detect anomalies in sequences of data.

Isn’t AI already Intelligent?In a nutshell, machine learning algorithms (including deep learning) allow us to make predictions by using historical data to train a model.

As learning algorithms, the idea is that if some information has predictive power for a result and you have a lot of examples, you don’t have to specify an exact formula to make a prediction.

Instead, you can approximate it with brute force and some clever shortcuts.

credit: https://visualstudiomagazine.

comIn some cases (like tree-based learning, in my previous Medium story) you get an explainable prediction, whereas with Neural Networks it can be more of a black box.

But in both cases, you’re usually looking to minimise the error for a specific prediction task so that you can automatically apply it to new data.

This post is not in any way aimed to diminish how powerful these machine learning approaches are.

They are used all over the world by the largest companies and brightest minds, to solve real problems.

Given enough sample data, they can classify images, convert speech to text, or predict energy consumption.

And using rules instead of data, they can learn to win games like Go, or succeed at an Atari game (without the enjoyment part).

Not only that, even after we crack the difficult nut of mammalian intelligence, these traditional approaches will still remain critical for certain types of analysis where we don’t actually want human-like reasoning.

But while they fall under the banner of Artificial Intelligence and are even called “neural” networks, it’s important to recognise that they do work very differently to the way intelligence works in humans and other animals.

In contrast, the human brain predicts and learns continually using temporal streams of unlabeled data, and it does it very efficiently, using a single, general purpose algorithm.

credit: http://dmangus.

blogspot.

comOur brains are also great at maintaining “invariant” representations and hierarchies of relationships, so we can easily recognise a dog from an unfamiliar angle and intuitively understand that it is more similar to a cat than it is to a fish.

This is why even our infants easily outperform the best deep learning algorithms at image recognition, and it’s at the heart of the realisation that fully autonomous road vehicles are a lot further away than first anticipated.

How is the Numenta AI approach different?Numenta take a “Biologically constrained” approach and are developing a theory known as Hierarchical Temporal Memory (HTM), which means they don’t incorporate anything into their algorithms that the human brain doesn’t do (according to what neuroscience research tells us).

The mission of Numenta is to understand how intelligence works in the brain, so that they can implement it in software and build intelligent systems.

They are still on the fringe from an industry perspective, but their research is progressing and is always fascinating, especially to anyone with a dual interest in computer science and neuroscience.

For example, recently there has been a lot of focus on grid cell structures, how an ancient navigational mechanism has been adapted for higher learning.

I expect the research update at June’s meetup event to revolve a lot around the role of the Thalamus, and how they are incorporating the role of attention into their theory.

Our brains contain the best general purpose learning algorithm that exist.

So while there’s still a lot of discovery ahead, what’s appealing to me about the theory is that as it does take shape, its learning will not be limited the way current ML is — the possibilities of what we could build with it are truly vast.

You can get started learning about HTM by watching HTM School on Youtube, which takes you through the fundamentals.

Or if you’re more of a book reader, you can’t go past On Intelligence, the book by Numenta co-founder Jeff Hawkins that originally outlined the foundations of HTM.

What can it do?In addition to keeping the research itself open, Numenta have an open source project named nupic, which implements the learning algorithms described in HTM theory.

It is primarily a research vehicle, and it doesn’t behave like a fully working brain just yet.

If it did, I’m sure the technology would already be used in just about every system already!It has been ported out into other languages by various community members, and some have even been modified with a few of their own ideas.

Aside from the open source projects, there are commercial applications emerging.

For example, Cortical.

IO are applying HTM specifically to natural language understanding.

Anyway, to answer the question, what it can do well currently is learn sequences without too much rigidity, and predict the next value.

This can in turn be used to detect anomalies in a sequence of data with some noise tolerance.

Snowflake implementationI’m going to go through a very basic learning exercise in Snowflake, using simple sequences of numbers.

We need to build a couple of components to do this, but before getting into the detail, I want to start by showing a snapshot of:what we build in Snowflake (right hand side)where each part fits into HTM theory (middle), andwhat that corresponds to in biology (on the left).

Let’s work our way from the top to the bottom.

EncodersEncoders are the first place our data passes through.

They are a shortcut abstraction of the initial brain inputs, like the visual cortex or the auditory cortex.

The raw input is mapped into a sparse representation with a few important properties, described in more detail in this paper.

Fun fact: your neocortex has no concept of sight, sound or smell and doesn’t care which of your senses a signal came from.

It’s just learns and predicts using a single common mechanism, and encoders are the gateway to it.

I’m going to build a scalar encoder for Snowflake using a javascript UDF.

The input will be a single number, and the output will be an array of bits, where some are active and most aren’t.

I’ve saved myself some time by borrowing some of Numenta’s javascript code and making a few modifications:One of the properties of encoders (from the above paper) is:The same input should always produce the same SDR as output.

This means we can set the “immutable” attribute on the Snowflake UDF, and benefit greatly from caching.

To demonstrate with the values 1 to 5:select 1 as value, SCALAR_ENCODER(1,1024,0,250,14,false) as ENCODED_VALUEunion allselect 2 as value, SCALAR_ENCODER(2,1024,0,250,14,false) as ENCODED_VALUEunion allselect 3 as value, SCALAR_ENCODER(3,1024,0,250,14,false) as ENCODED_VALUEunion allselect 4 as value, SCALAR_ENCODER(4,1024,0,250,14,false) as ENCODED_VALUEunion allselect 5 as value, SCALAR_ENCODER(5,1024,0,250,14,false) as ENCODED_VALUEThe encoder outputs a sparse array 1024 bits wide, with the input value in the range of 0–250 and a width of 14.

As you can see, numbers that are closer together have more overlap in bits.

We do this because we consider them to be “semantically similar”.

We’re effectively building something akin to the hair cells in the cochlear, where each output bit could be active in a number of similar inputs.

credit: NumentaSpatial PoolerA Sparse Distributed Representation (credit: Numenta)In computers, we encode information densely, but in our brains it’s very sparse, which is not storage-efficient but yields other important properties like noise tolerance.

These are known as Sparse Distributed Representations (SDRs).

So the role of the Spatial Pooler is to project the encoded input, as one of these SDRs, into mini-columns.

Mini-columns are bunches of vertically-arranged groups of neurons that receive common input and are interconnected.

Sequence MemoryThe brain is fundamentally a memory system, continually predicting future state based on sequences from the past, and strengthening/weakening synapses accordingly.

This is Hebbian learning, which I remember as “fire together, wire together”.

The brain can run on only ~20 watts because it’s continuous and efficient, it doesn’t process information the way we typically do with computers.

credit: brainworkshow.

sparsey.

comAs signals move up the brain’s hierarchy, chaotic raw sensory input eventually becomes stable concepts (and you see this in the neuron firing patterns), the amount of overlap between two SDRs corresponds to semantic similarity.

The role of Sequence Memory, aka Temporal Memory, is to use the distal connections (that go laterally out to other mini-columns) to put the mini-columns into a predictive state based on their recognition of previous sequences.

For both the Spatial Pooler and Sequence Memory, I’m again going to borrow someone else’s code.

This time it’s HTM.

js, a javascript HTM implementation built by Numenta community member Paul Lamb.

Again, all I’m doing is modifying it slightly to work in the Snowflake context.

In HTM.

js, all of the different biological constructs (Cells, Columns, Layers, Synapses etc) are modeled as their own javascript prototypes in different files.

I’m going to lump them all into another Snowflake javascript User Defined Table Function (UDTF), along with all of the learning and predicting controller logic.

The source code for the final UDTF can be found here (it’s way too large to display here).

With this function in place, we can run queries that traverse down the column of a table and do one-shot learning of the sequence of values it finds.

The state of the HTM network resides in the memory of the UDTF executor, initially random but changing with each new value as it learns.

This of course means we deliberately won’t leverage any of the parallelism of the Snowflake engine that you normally get with UDTFs, because the processing order matters.

So let’s try it out!.And I’ll use Tableau where possible to keep this visual.

Starting off with a table that has numbers from 10 to 20, that loop back around to 10 and continue like this indefinitely.

I’ll call this table LOOPING_NUMBERS, and the column is called THE_NUMBER.

Visually, the sequence is:Let’s see if our HTM UDTF can learn this sequence.

But first, HTM.

js actually takes densely represented sparse arrays.

Instead of a long array of 0s and 1s, sensibly it expects the input array to just hold the index positions of the 1s.

This is why I added the DENSE_OUTPUT parameter, which if set to true, makes my 1,2,3,4,5 sequence look like this:Look, they fit on the screen now!OK, so using our input table LOOPING_NUMBERS, we run THE_NUMBER column first through the scalar encoder, then through the HTM network.

select THE_NUMBER,SCALAR_ENCODER(THE_NUMBER,1024,0,250,20,true) as ENCODED_INPUT,ACTIVE,PREDICTIVEfrom LOOPING_NUMBERS,table(HTM(SCALAR_ENCODER(THE_NUMBER,1024,0,250,20,true)))Here’s a sample of what we get back:I’ve scrolled down through the result set a bit to where the predictions begin.

As you might be thinking, it’s a bit hard to interpret like this.

The ACTIVE column contains the indexes of the active mini-columns, and PREDICTIVE are the active ones that were also in a predictive state.

To better understand what’s happening, we can simply use the success rate of predictions to calculate an anomaly score:select THE_NUMBER, SCALAR_ENCODER(THE_NUMBER), ACTIVE, PREDICTIVE, ARRAY_SIZE(ACTIVE)-ARRAY_SIZE(PREDICTIVE) as NOT_PREDICTED_COUNT, NOT_PREDICTED_COUNT/ARRAY_SIZE(ACTIVE) as ANOMALY_SCOREfrom LOOPING_NUMBERS,table(HTM(SPARSE_TO_DENSE(SCALAR_ENCODER(THE_NUMBER))));Now we have an anomaly score to work with, we can take a visual look at what the HTM network experiences:Kind of like being a new baby, nothing makes sense at first, then patterns begin to become familiar.

What’s with the little trail of occasional anomalies left along the top? This is an effect caused by the way repeating inputs are learned, and there are ways to mitigate it in the main python nupic code base, they just doesn’t exist in HTM.

js.

So just trust me and ignore them :)Now let’s throw it some curve-balls and see what happens.

Instead of 10,11,12,13 we’ll give it 10,11,13,13.

The network detects an almost-certain anomaly.

The reason it’s not quite 100% is because there’s a fairly big overlap between 12 and 13 in the encoder output, so a tiny number of active columns were still predicted.

Now let’s really surprise it with a big sequence of 10s, cause that’s pretty different.

The network again spits out its coffee, and is briefly confused before adjusting to the new sequence.

Let’s just try one more, the repeating sequence 10, 12, 14, 16, 18, 20, 19, 17, 15, 13, 11:Similar to before, but the adjustment period is longer because the new sequence takes longer to repeat.

Now let’s do something interesting to finish up, and return to the original sequence of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20.

You might be thinking that it’s been long forgotten with all the changes.

In reality, the network has enormous storage capacity and barely skips a beat before recognising it:SummaryI hope you had as much fun reading this as I did wiring it all up.

If you’re anything like me, HTM is daunting at first but incredibly rewarding to study if you’re interested in how intelligence works.

I have come to see biologically-inspired learning algorithms as a crucial part of our path forward to building more intelligent systems.

Snowflake never fails to impress me as a cloud data warehouse, and it seems inevitable that we’ll continue to see it extended to become more and more versatile.

This is another example of a very non-traditional data warehouse task that was straightforward to implement using its engine.

And again, please drop by the Numenta community if you’re interested in learning more, and don’t forget to introduce yourself!.

.

. More details

Leave a Reply