AddressNet: How to build a robust street address parser using a Recurrent Neural NetworkJason RigbyBlockedUnblockFollowFollowingDec 5Street addresses are complex beasts; they’re designed to be systematic and unambiguous to us human-folk, but end up a disaster for machines..If you ask on StackOverflow the innocent question of…Looking for a quick and dirty way to parse Australian street addresses into its parts:3A/45 Jindabyne Rd, Oakleigh, VIC 3166… you will be quickly told that “you’re going to get data that’s completely useless,” and then you’ll be shown a regular expression like this:(?P<Unit_Number>d*)s*[/,,]s*(?P<Street_Number>d*)s*(?P<Street_Name>[a-zA-Zs]*),?s*(?P<State>NSW|ACT|NT|QLD|SA|TAS|VIC|WA)s*(?P<Post_Code>d{4})| (?P<street_number>d*)s*(?P<street_name>[a-zA-Zs]*),?s*(?P<state>NSW|ACT|NT|QLD|SA|TAS|VIC|WA)s*(?P<post_code>d{4})Dejected and saddened by the complex and unforgiving regular expression, you’ll slink away and lookup #ferrets on instagram because that’s really what the internet is for.I’m here to tell you to not be sad..I’ll show you how you can build your own address parsing machine.AddressNet, following the conventional neural network nomenclature of [Thing]+Net, is a nifty model that sorts out the bits of an address by labelling them any one of 22 possible components and is based on the GNAF database..So I’ll explain the process used in AddressNet at a high-level and link to my favourite authors whose elegant explanations deserve no reproduction here.At the heart of the AddressNet model is a Recurrent Neural Network (RNN) and they’re great at modelling sequential data (in this case, a sequence of letters)..This kind of neural network is often shown diagramatically as:A typical representation of RNNs (left and right are equivalent)In the above diagram, x is an item from the input data sequence, y is some target estimation or output..The big ol’ circular blobs in the middle contain all the matrix operations to produce the ys and hs, but be sure to note that each blob is identical; the exact same internal parameters are applied to each incoming h and x, thus making them recurrent.[Recommended reading: The Unreasonable Effectiveness of Recurrent Neural Networks]The way RNNs operate can be superficially explained like this: for each item of data in the sequence, transform it in some way using the hidden state of the previous step to both estimate a target output while also updating the hidden state to be handed off to the next step..This mysterious hidden state can be though of as “knowledge gleaned as the sequence is scanned,” and it functions as a memory of sorts..Many variations of RNN architecture centre around how the hidden state is managed, and the most common types are LSTM- and GRU-type RNNs (AddressNet uses the GRU-type of RNN).[Recommended reading: Understanding LSTM Networks; Understanding GRU networks]Being able to carry forward information from earlier in the sequence is the key strength of RNNs..The ability to retain some memory of prior elements allows this hypothetical network to generate outcomes framed as “given what I have seen so far, what is the probability that the current element is a mistake?”RNNs can be extended to be multi-layer (outputs at each step are fed in as the inputs of another RNN), and they can also be paired with an equivalent RNN where the horizontal arrows are in the reverse direction, passing the hidden state backwards, effectively looking into the “future” to learn about the “past.” The latter is known as a bidirectional RNN.[Recommended reading: Bidirectional recurrent neural networks / PDF ]A multi-layer bidirectional RNN is used in AddressNet on a per-character basis..Thus it was key for me to ensure that there was so much variability in the input data that the model complexity could not start to “learn the noise.” Experience with the trained model suggests that I hit the mark, but I would love you, the reader, to give it a go and let me know how it goes.. More details