Introduction

April 7, 2024

This beginner-friendly deep learning guide is based loosely on the definitions offered in the Little Book of Deep Learning by François Fleuret.

Deep Learning (DL) is historically under the field of statistical machine learning. They are both based on the ability to learn representation from data. Therefore, the foundations of deep learning will be similar to those of machine learning.

Learn more about machine learning here: [[Machine Learning Basics]].

How does machine learning work?

It works through three foundational elements:

  • Training Dataset: In a supervised training setting, the training dataset would be tuples $(x_n, y_n)$ where $x_n$ is the input data and $y_n$ is the output label.
  • Parametric Model: This is a piece of code that takes $x_n$ and makes a prediction $y_n$.
  • Trainable Parameters: These are values in the parametric models that can be adjusted, or trained. These consists of weights and hyperparameters.

The parametric model depends on the trainable parameters to make a prediction. The trainable parameters are set by training on the training dataset.

How is the model trained?

A model is trained by adjusting the trainable parameter to minimize the loss. The loss of a function is a representation of how good the model is performing. The process of minimizing the loss is known as optimization.

Note it is important to know that loss is used in training as a simplified proxy of the true performance of the model. It is implemented to make the process of optimization easier. True performance should be measured with other metrics.

Learn more about training here: [[Training]]

How is data represented and manipulated in DL?

In DL, Tensors are used to represent and manipulate data. They are a collection of scalars organized along several axis or dimensions. In other words, they are basically generalized structures of vectors and matrices.

Tensors make computations faster because their shape representations are stored separately from the storage layout. This allows libraries to reshape and reorganize without needing to copy data.

Learn more about Tensors here: [[Tensors]]

What are the hardware components used in DL?

The heavy computations that DL requires are almost always performed on the Graphical Processing Unit (GPU). A GPU has a highly parallel architecture, meaning that it could handle many tasks simultaneously.

Learn more about GPUs here: [[Graphical Processing Units]]

There is a lot more to Deep Learning than what is mentioned above. To explore the full scope of Deep Learning in a beginner friendly way, feel free to navigate through the links below: