Deep Learning: An Introduction

3 minute read

What is Deep Learning?

Deep Learning is a class of Machine Learning techniques that works better on large sets of data. Deep Learning techniques, also referred to as Artificial Intelligence are inspired by the biological brain.

A baby understands the difference between objects by learning the difference between colours, textures, smells, etc. For example, the human brain can differentiate between apples and oranges because we have learned that apples are usually red/green while oranges are usually orange. We also learn the difference in texture between fruits. All these information are stored in brain cells (Neurons). When the brain sees an object, it filters what it sees (Input) and classifies the object based on the information it receives.

Deep Learning techniques use interconnected layers of Neuron Models to learn and store information. This interconnected network ingests a vast amount of input data and processes them through multiple layers that learn increasingly complex features of the data at each layer.

What is a Neuron?

A Neuron in an Artificial Neural networks (ANN) is a software-based calculator inspired by the biological brain. Neuronal models are the basic building block of any ANN. A general neuronal model contains three basic elements (Haykin et al., 2008): the synaptic weights, the summing junction, and the activation junction. The synaptic weights can also be viewed as the parameters of the neuron. The activation function of K gives the neuron non-linear properties.

Figure 1. A nonlinear model of a neuron \((k)\) with Input Vector \((X)\), Weight\((W)\), bias \((b)\), an Activation function and output y.

In mathematical term, a neuron k represented by Figure 1 may be described by the following equations:

\[\text{Equation 1: } a_K=w_K^T\ \ (x^{<t>}\ )+b_k\]

Equation 1 represents the linear mapping of a signal with the transposed weights vector \(w_K^T\) and a scalar bias term added by the term. The superscript \(t\) represents the temporal position of the signal in a sequence. \(x^{<t>}\) is a vector of n features. Equation 2 represents a non-linear mapping of, to the neuronal output via the activation function \(\varphi_K(.)\). x

A Network of Neurons: Neural Network

Figure 2. Fully connected feedforward network of shape with 4, 3 and 1 neurons in the Input, Hidden and Output layer respectively. The red line shows in Inputs and Output of Neuron 1 in layer 1.

The simplest network of neurons is called a feed-forward neural network (FNN). In an FNN, the neurons are arranged parallel to each other in a layer (hidden units), see Figure 2. An FNN is characterized by the number of hidden units and the number of neurons in each hidden units. Neuron \(K\) in layer \(L\) denoted as NLk, will have as input the vector of output \(Y_{L-1\ }\) from the previous layer. The output \(Y_L\) of a layer \(L\) can be written as follows:

\[\text{Equation 2: }A_L=W_L^T\left(Y_{L-1\ }\right)+B_L\] \[\text{Equation 3: }Y_L=\varphi_L\ (A_L)\]

The vector of weights and the bias of each neuron in a layer is represented by the weight matrix WL and the bias vector BL respectively. \(A_L\) and \(L\) is the vector of outputs for each neuron from the summing junction and the activation function respectively. The FNN architecture is symmetric, hence if the parameters of the network are initialised with zero, the neurons will effectively learn the same features (Rumelhart et al., 1986) . To break symmetry, small random initialisation was initially proposed (Lecun, 1985) . More recently, the Glorot Initialisation (Glorot and Bengio, 2010) has shown to improve the learning of a feedforward neural network.

Summary

This blog introduces the basic concepts of Artifical Neural Network: neurons, and neural networks. These two concepts are the foundation of Deep Learning models. There are several types of Deep Learning models, including:

  • Convolution Neural Network (CNN),
  • Recurrent Neural Network (RNN),
  • Generative Adversarial network (GAN) and,
  • Transformers (e.g: GPT-3)

which uses different types of Neuron Models, Architecture, or training techniques. However, the principles behind these approaches remain the same.