In the cult film Memento the main character, Leonard Shelby, is unable to make new memories after suffering head injuries in an attack in which his wife died. As he tries to track down his wife’s killer and avenge her, he has to resort to tattooing clues onto his body and taking Polaroids in order to keep track of what he knows.
In the world of Machine Learning, conventional “feed-forward” neural networks are in the same predicament as Leonard. Neural networks consist of layers of nodes which are highly connected and propagate signals in a way analogous to neurons and synapses. In a traditional “feed-forward” network the data simply flows through the layers in order from the first input layer to the final output one.
This works well for problems where each input can be considered independently of the others. For example, to tell whether or not a picture contains a car or a horse it is not necessary or helpful to know what any previous images were.
In contrast, for problems such as handwriting, speech recognition or classification of time series it is of great importance to be able to retain information from previous data points when determining the output for each subsequent one. A feed-forward network is, like Leonard, unable to create the memory of what it has just seen so that the information could be used to help interpret what it sees next.
A class of neural networks known as recurrent networks solve this by feeding the output of each layer back into itself as well as forwarding it to the next layer. This is their equivalent of taking a Polaroid or tattooing the information on themselves.
Recurrent neural networks are therefore much more powerful than simple feed-forward ones. One common issue they encounter, however, is the “vanishing gradient problem”. Like all supervised machine learning algorithms, neural networks require training. The “vanishing gradient problem” occurs during training when updates get smaller and smaller the more they are propagated through the network’s recurrent structure. This can cause the training process to become unfeasibly slow or even stop altogether.
Long Short-Term Memory (LSTM) networks have been designed to solve this problem. Each node stores the inputs it receives and has mechanisms in place to learn how to use these to affect future outputs and to learn how long to remember each input for. This allows LSTM networks to be effectively trained and for their outputs to be influenced by inputs that have occurred even long periods earlier.
LSTM networks have proven remarkably successful at tackling a broad range of difficult “sequence” problems, from speech recognition and automatic translation to generating text and music in specific styles. They are widely used commercially by all the major technology companies to enhance their products and services.
Here at The Filter we have expertise in the theory and application of LSTM networks. Supported by funding awarded by the UK Government’s “Innovate UK” program we have developed capabilities to predict the intentions of shoppers on ecommerce websites based on their previous interactions with the website. As these interactions form a time series of events an LSTM network is the perfect algorithm to apply as it is able to understand and interpret the whole series rather than having to consider each event individually.
Our software enables us to predict whether a customer is just browsing, if they are researching and comparing alternatives with a specific purpose in mind or if they know what they want and are ready to purchase. This enables us to personalise the customer’s experience by displaying product and content recommendations that will maximise revenue and engagement.
This is a powerful technology, which we are now ready to exploit to optimise user experiences across digital media. The possibilities are endless!