You’ve probably used Google translator or any other kind of machine translator in your life, and you know how the translation process looks: you type a word in one language, and the translator converts that word to another language in real-time following the sequence of the words you type. This is exactly how the algorithm of seq2seq learning looks like. Let’s see what applications this model can be used for and the examples of sequence-to-sequence learning in Keras code.
What is Sequence-to-Sequence Learning
Sequence to Sequence (also called seq2seq) models is a special class of Recurrent Neural Network architectures that is usually used for machine translation, question answering, development of chatbots, text summarization, and so on.
Here is how it works. The sequence-to-sequence autoencoder keras consists of two components — an encoder and a decoder. Encoder processes the input sequence and concludes the information in the internal state vectors or context vectors. The outputs of the encoder are discarded, and only the internal states are preserved. The context vector encapsulates the information for all input elements and helps the decoder make more precise predictions.
The Keras encoder decoder, in its turn, generates the output sequence and takes them into account for the future outputs using the initial states of the context vector of the encoder’s final cell to input to the first cell of the decoder network.
Most of the data in the world is in the form of numbers, images, video frames, and audio sequences. With the help of neural network architectures and sequence to sequence learning, the data is not preserved in the unstructured format but is efficiently fetched and structured, which helps businesses make the most out of their data and identify business problems with the help of it.
“A lot of the data science talent today focuses its effort on solving problems that already exist. An equally important task for any successful data scientist or analyst is to identify and create new tasks that can be solved analytically. The latter is a very different exercise and does not need a lot of coding experience or mathematical background. All you need to know is what is possible and what is not, using a given tool.”
— Tavish Srivastava, Co-Founder and Chief Strategy Officer of Analytics.
Here’s an interesting example of the LSTM model in Keras — the development of a self-servicing (human-less) store assistant where a customer will only interact with kiosks with the help of audio-enabled Google-like search tabs. With the help of sequence modeling, a customer can communicate with the kiosk, ask different questions, and give commands when the kiosk will do a quick search and find an answer.
How to Perform Sequence-to-Sequence Learning
All the actions described above can be done with the help of the following steps.
- Speech Recognition to understand what the customer is saying
- Machine Language Translation from a source language to a known language (say English)
- Name entity/Subject extraction to find the main subject of the customer’s query translated in step 2
- Relation Classification to tag relationships between various entities tagged in step 3
- Path Query Answering (Similar to Google search) on entity-relationship found in steps 3 & 4 using a core knowledge graph
- Speech Generation to generate answers for the customer with all the relevant information found in step 5
- Chatbot skill to have the conversational ability and engage with customers just like a human
- Text Summarization of customer feedback to work on key challenges/pain points
- Product Sales Forecasting to replenish stock
“The power of this model lies in the fact that it can map sequences of different lengths to each other. As you can see, the inputs and outputs are not correlated, and their lengths can differ. This opens a whole new range of problems that can now be solved using such architecture.”
— Simeon Kostadinov, Lead Operations Manager at Speechify.
The sequence modeling structure is almost the same; you just have to change the input and target data. Inputs and outputs can be the following: scalar, trend, text, image, audio, or video. You’ll also need a type, which is the category of input/target; the elements are the number of elements in the input/target series and use cases that describe the applications in the category.
It’s possible to train the model on any specific type of data; for example, if you train the model on a certain type of music, you can use this model to create new songs, and if you train the model on images of animals, you may be able to see what cross breeds might look like. In general, the seq2seq model performs the following steps:
- The RNN layer receives a sequence as input ,and the outputs of the encoder layers (hidden state) are discarded until the final layer. The output of the final state is called the context or conditioning of the decoder and is used as the initial input of the decoder.
- The RNN layer is trained to return the target characters of the data but with an offset time and can predict the next character, given the previous character.
What are Seq2seq Models
As you already know, a seq2seq sequence model is a model that takes a sequence of items (words, letters, time series, etc.) and outputs another sequence of items. The RNN function takes the current RNN state and a word vector and produces a subsequent RNN state that “encodes” the sentence so far.
The training process in seq2seq models starts with converting each pair of sentences into Tensors from their Lang index. The training process begins with feeding the pair of a sentence to the model to predict the correct output. At each step, the output from the model will be calculated with the true words to find the losses and update the parameters.
Once the defined model is fit, it can be used to make predictions. Specifically, output a French translation for an English source text. The model defined for training has learned weights for this operation, but the structure of the model is not designed to be called recursively to generate one character at a time.
The character decoder is defined as taking the input layer from the encoder in the trained model (encoder_inputs) and outputting the hidden and cell state tensors (encoder_states). The decoder requires the hidden and cell states from the encoder as the initial state of the newly defined encoder model. Because the decoder is a separate standalone model, these states will be provided as input to the model, and therefore must first be defined as inputs.
Seq2seq and TensorFlow
In this case study, the translation model called seq2seq model or encoder decoder neural network was built in TensorFlow. The objective of the model is to translate English sentences into French sentences. You can see the process of developing the model to answer the questions to define the encoder model.
To build such a model, you can divide it into two small sub-models. The first sub-model is called [E] Encoder, and the second sub-model is called [D] Decoder. In order to build such a model, you need to perform the following steps:
- Define input parameters to the encoder model
- Build encoder model
- Define input parameters to the decoder model
- Build decoder model for training
- Connect encoder and decoder models
- Define loss function, optimizer, and apply gradient clipping
New models are required for the prediction step, specifically, a model for encoding English input sequences of characters and a model that takes the sequence of French characters generated so far and the encoding as input and predicts the next character in the sequence. Defining the inference models requires reference to elements of the model used for training in the example. Alternatively, one could define a new model with the same shapes and load the weights from the file.
Simple Code Example of Sequence-to-sequence Learning in Keras
Let’s take an example from the Keras blog. Their training model consists of three key features of sequence to sequence tensorflow RNNs:
- The return_state constructor argument configures an RNN layer to return a list where the first entry is the outputs, and the next entries are the internal RNN states. This is used to recover the states of the encoder.
- The inital_state call argument, specifying the initial state(s) of an RNN. This is used to pass the encoder states to the decoder as initial states.
- The return_sequences constructor argument configures an RNN to return its full sequence of outputs (instead of just the last output, which is the default behavior). This is used in the decoder.
You can find the whole code here in the Keras LSTM tutorial.
Recent technological advances have significantly improved the capabilities of Machine Learning and Artificial Intelligence (ML/AI) systems. As a Machine Learning company, Proxet specializes in NLP, sequence models, sentiment analysis, facial and image recognition as well as voice technology. We also use a broad selection of Artificial Intelligence and Machine Learning web development techniques and solutions — from chatbots in healthcare to self-driving cars in smart transportation. Don’t hesitate to contact us to learn how machine learning can help your business grow!
Accurate parsing enables Q&A quality — but is it possible? No matter the industry or sector, businesses regularly deal with the question of how to efficiently process large amounts of info-heavy documents. Organization leaders, including CTOs, CDOs, and CPOs, are often looking for solutions to this question.
Build a modern data stack by following best practices from data engineering experts. Learn about data maturity, data stack components, and how to build.