Recurrent Neural Network: A.I. for Time Series

2 minute read

In a previous blog, we talked about deep learning and introduced feed-forward neural network. This blog covers the concept of Recurrent Neural Network: a class of deep learning techniques that performs better on time series data such as sensor information and natural language.

Recurrent Neural Network

One major limitation of the feed-forward neural network is its inability to learn temporal features in sequential data. A network of a neuron with one or more feedback loops (see Figure 1) called recurrent neural network have proven successful in learning from sequential data (Graves et al., 2013 ; Graves and Schmidhuber, 2009). A recurrent neuron has the ability to consider information from the previous time step.

Figure 1. Recurrent network with RNN cell.

Given an input sequence \(X=\left(x^{<1>},x^{<2>},\ldots,\ x^{<T>}\ \ \right)\), a recurrent neural network computes the hidden vector sequence \(a=\left(a^{<1>},a^{<2>},\ldots,\ a^{<T>}\ \ \right)\) and output vector sequence by iterating the following equations from \(t=1\) to \(T\):

\[\text{Equation 1: }a^{<t>}=\varphi\left(W_a^T\left[a^{<t-1>},x^{<t>}\right]+b_a\right)\] \[\text{Equation 2: }y^{<t>}=\ \varphi(W_y^{T<t>}+b_y)\]

Where \(W\) terms denote the weight matrix. In Equation 1, it can be observed that the activation at the current step is a function of the previous step \(t-1\). \(\varphi\) can be any non-linear activation function, although the activation function is preferred in the literature (Pham et al., 2013). The notation \(\left[a^{<t-1>},x^{<t>}\right]\) represents the horizontal concatenation of the matrices \(a^{<t-1>}\) and \(x^{<t>}\) respectively. The simple RNN however, has short term memory as it only remembers information from the previous time step, which is undesirable as sequential data often have longer-term dependencies.

Long Short-term Memory Network

A variant of the RNN called the Long short-term memory (LSTM) architecture was found to be better at exploiting time-dependent features as it has purpose-built memory cell. Several variants of the original LSTM model (Hochreiter and Schmidhuber, 1997) has been proposed, however, in a large-scale study (Greff et al., 2015) of the LSTM variants, it was found that no significant improvement was obtained from the standard LSTM architecture. Hence in this study, the standard LSTM architecture was used. Figure 1 illustrates a single LSTM cell. LSTM memorizes longer-term feature dependencies in the data through the use of gated functions.

Figure 2. A single LSTM cell.

An LSTM cell has two forms of memory: the short memory cell  \(h^{<t>}\) and long memory cell \(C^{<t>}\). As the name suggests, only remembers information from the previous time step, \(\ t-1 S\), while \(C^{<t>}\) is a selection of meaningful information remembered as a result of seeing all previous time steps.

The long-term memory cell at the current time step \(C^{<t>}\) drops ‘some’ memory from the previous time step \(C^{<t-1>}\) and then adds some new memories. All the information from the current time step is computed in the temporary cell \({\widetilde{C}}^{<t>}\). The flow of information in \(C^{<t>}\) is controlled by the forget \(Гf\) and input gates \(Г_u.\) The following information shows the computations to update \(C^{<t>}\):

\[\text{Equation 3: }C^{<t>}=Г_u⊗C^{<t>}+Г_f⊗C^{<t-1>}\] \[\text{Equation 4: }{\widetilde{C}}^{<t>}=tanh\left(W_C^T\left[h^{<t-1>},x^t\right]+b_c\right)\] \[\text{Equation 5: }Г_u=σ(W_u^T[h^{<t-1>},xt]+b_u)\] \[\text{Equation 6: }Г_f=σ(W_f^T[h^{<t-1>},x_t]+b_f)\]

Concluding Remarks

This blog explains Recurrent Neural Network (RNN) and Long short-term memory (LSTM). RNNs offers a powerful Architecture for learning from time series data. However, in more recent years, Transformers have grown in popularity (GPT-3 uses Transformers). Check out the paper: Attention is all you need.