View file src/colab/intro_to_ai.py - Download
# -*- coding: utf-8 -*-
"""intro_to_ai.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1Cmqgjp8P_JIUQcpQX_S5G6tWJibOvcC3
# An introduction to Artificial Intelligence, Neural Networks and Machine Learning
### What is intelligence ?
"Intelligence can be defined as the ability to solve complex problems or make decisions with outcomes benefiting the actor" [https://www.hopkinsmedicine.org/news/articles/2020/10/qa--what-is-intelligence]
"Human intelligence, mental quality that consists of the abilities to learn from experience, adapt to new situations, understand and handle abstract concepts, and use knowledge to manipulate one’s environment." [https://www.britannica.com/science/human-intelligence-psychology]
For acting in such a way that we get some benefit, we need to predict the consequences of our actions. This requires to build a mental model or representation of the world in which we can make prediction. To do this, we must perceive the regularities in the world and turn them into the rules of our model. So, the perception of regularities is the essence of intelligence.
Mathematically or computationally, solving a problem can be represented by what is called a *function*. A function is something (that can be viewed as a machine, a piece of computer code, a process, a mathematical abstract entity...) that takes an input (also called variable or argument) and produces an output (also called result) depending on the given input. A function can be represented either by a descriptive or extensive representation, i.e. the set of all pairs (input, output), or by an operational representation, i.e. the sequence of elementary operations that must be done to produce the output from the given input.
### What is artificial intelligence ?
"Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems, as opposed to the natural intelligence of living beings." [https://en.wikipedia.org/wiki/Artificial_intelligence]
Technically, there are different ways to implement articicial intelligence.
For example, "an expert system is a computer program that is designed to solve complex problems and to provide decision-making ability like a human expert. It performs this by extracting knowledge from its knowledge base using the reasoning and inference rules according to the user queries." [https://www.javatpoint.com/expert-systems-in-artificial-intelligence]
But the most successful approach is based on imitation of nature. The intelligence of animals, including humans, is produced by their brains, which are natural neural networks. Artificial intelligence can also be produced by artificial neural networks.
### What is a neural network ?
According to https://en.wikipedia.org/wiki/Neural_network, "A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or mathematical models. While individual neurons are simple, many of them together in a network can perform complex tasks. There are two main types of neural network.
In neuroscience, a biological neural network is a physical structure found in brains and complex nervous systems – a population of nerve cells connected by synapses.
In machine learning, an artificial neural network is a mathematical model used to approximate nonlinear functions. Artificial neural networks are used to solve artificial intelligence problems."
A neural network is parametrized by the weight of the connections between the neurons, and the biases of the neurons. Each neuron computes a weighted sum of the outputs produced by the other neurons connected to it, adds what is called a "bias" associated to this neuron, then applies an "activation function" to this sum, and finally sends the result to other neurons. It is generally organized in layers, each layer containing neurons that send their outputs to the inputs of neurons of the next layer.

### Why are neural networks so efficient ?
Neural networks are universal function approximators.
"The Universal Approximation Theorem states that a neural network with at least one hidden layer of a sufficient number of neurons, and a non-linear activation function can approximate any continuous function to an arbitrary level of accuracy." [https://rukshanpramoditha.medium.com/the-intuition-behind-the-universal-approximation-theorem-for-neural-networks-ac4b000bfbfc#:~:text=In%20other%20words%2C%20a%20neural,networks%20are%20called%20universal%20approximators.]
### How can we use a neural network to implement a given function ?
We can configure a neural network to approximate a function defined by a set of examples, i.e. a set of pairs of inputs and corresponding expected outputs, called the training set :
- if the input is x1, we want to get the output y1
- if the input is x2, we want to get the output y2
- if the input ix x3, we want to get the output y3
- ... etc ...
We don't need anymore, like in traditional computing, to elaborate the detailed succession of elementary operations that must be done on the input to produce the output. We just need to find the adequate values of the parameters of the neural network (its weights and its biases) to get a good enough approximation of the desired function. The approximation is considered as good if for each given input, the produced output is near the expected output.
To find the best values of the parameters of the neural network, we start from random values, we compute the total gap between the outputs given by the neural network and the expected outputs. This total gap is called "loss".
There are different methods to compute the loss, for example Mean Squared Error (MSE) finds the average of the squared differences between the target and the predicted outputs. If input x1 produces output o1, x2 produces o2, ... , xn produces on, then the loss is 1/n ((o1 - y1)^2 + (o2 - y2)^2 + ... + (on - yn)^2).
Then we see what happens if we modify one of the parameters : does the loss become smaller if we increase this parameter ? In this case, let's increase it. Or does the loss become smaller if we decrease it ? Then let's decrease it. We do this for all parameters, and we repeat it until we get a good enough approximation. This operation is called "training the neural network".
After this training, the neural network will give correct outputs for the inputs contained in the set of examples used for the training, but furthermore, it is also expected to give correct enough outputs for other inputs not contained in the training set. The neural network learns from the examples, that's why it is called "machine learning". This is because of the generalization capacity of the neural network, which captures the logic, the regularities of the correspondence between the inputs and the outputs of the training set. That's the reason why I think we can say that artificial intelligence is really intelligence.
Mathematically, a configuration of the parameters is a point of a N-dimensional space, where N is the number of parameters, and this operation of adjusting parameters is called "gradient descent". It consists in modifying the parameters in the opposite direction of the gradient of the loss in this N-dimensional space. The gradient is a multidimensional derivative. It is like if you want to find the lowest point of a hilly landscape : you start from any place, and you always go down. In this case you are on a 2-dimensional space, you have only two parameters (latitude and longitude). It is the same with neural networks but with thousands of parameters.

The gradient is computed by a technique called "automatic differentiation", which uses a mathematical trick called dual numbers.
Dual numbers are expressions of the form a + b e where a and b are real numbers, and e is a symbol satisfying e^2 = 0. We can do usual mathematical operations on them, for example ⁉
a+be + c+de = (a+c)+(b+d)e
(a+be) (c+de) = ac + (ad+bc)e
The advantage of dual numbers is that if we apply a function f to the dual number x+e, we get the value f(x), and we also get automatically the derivative f'(x), because we have f(x+e) = f(x) + f'(x) e.
### How is it implemented ?
Neural networks and machine learning are often implemented using the Python programming languages. There are different libraries containing predefined features useful for machine learning, like PyTorch, TensorFlow, Keras...
Here we will use PyTorch.
### How automatic differentiation works in PyTorch ?
We will see how to use automatic differentiation to find the minimum of a function, for example ⁉
f(x) = x^4 + 0.5 * x^3 - 3 * x^2 - 5 * x + 1
First we define this function in Python and plot it.
On this graph, we can see it has a minimum for x between 1 and 2, with f(x) between -10 and 0.
"""
import numpy as np
from matplotlib import pyplot as plt
def f(x):
return x**4 + 0.5 * x**3 - 3 * x**2 - 5 * x + 1
x = np.arange(-3.0, 3.0, 0.01)
plt.plot(x, [f(x1) for x1 in x])
plt.show(block=False)
plt.pause(1)
"""We can compute "manually" the minimum by gradient descent :
"""
x = np.random.rand(1)[0]
eps = 1e-10
g = 1
while abs(g) > 0.001:
y = f(x)
g = (f(x+eps) - f(x)) / eps # the gradient
x = x - 0.01 * g # optimize x
print(f"Minimum at x={x} : f(x)={y}")
"""Below is the equivalent code with automatic differentiation.
We first create a tensor which is, in this case, just a scalar with a random value, and we indicate we need to differenciate with respect to it.
Then, until the gradient is small enough, we compute f(x) and its gradient, and we move x in direction opposite to this gradient.
"""
import torch
# initialization
x = torch.tensor(np.random.rand(1)).requires_grad_(True) # differentiation d/dx required
while (x.grad is None or torch.abs(x.grad)>0.001): # while the gradient is not small enough
if (x.grad is not None):
x.grad.data.zero_() # reset gradient
y = f(x) # compute function
y.backward() # compute gradient dy/dx
x.data = x.data - 0.01 * x.grad.data # optimize : move in direction opposite to gradient
print(f"Minimum at x={x.item()} : f(x)={y.item()}")
"""The instruction "x.data = x.data - 0.01 * x.grad.data" moves x in direction opposite to the gradient. It is called the optimization. Here we dit the optimization "manually" by an adequate instruction, but there are predefined optimizers in PyTorch. They allow to find the optimal values of some parameters which must be instances of the class "torch.nn.Parameter"."""
# initialization
x = torch.tensor(np.random.rand(1)).requires_grad_(True) # differentiation d/dx required
p = torch.nn.Parameter(x) # the parameter to optimize
optimizer = torch.optim.SGD([p], lr=1e-2) # the predefined PyTorch optimizer
step = 0
while step < 50: # repeat 50 steps of optimization
step = step + 1
x = torch.nn.utils.parameters_to_vector(p)[0]
y = f(x) # compute function
optimizer.zero_grad()
y.backward() # compute gradient dy/dx
optimizer.step() # optimize
print(f"Minimum at x={x.item()} : f(x)={y.item()}")
"""If we have a GPU, we can use it to accelerate the execution :"""
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
# initialization
x = torch.tensor(np.random.rand(1)).requires_grad_(True).to(device) # differentiation d/dx required
p = torch.nn.Parameter(x) # the parameter to optimize
optimizer = torch.optim.SGD([p], lr=1e-2) # the predefined PyTorch optimizer
step = 0
while step < 50: # repeat 50 steps of optimization
step = step + 1
x = torch.nn.utils.parameters_to_vector(p)[0]
y = f(x) # compute function
optimizer.zero_grad()
y.backward() # compute gradient dy/dx
optimizer.step() # optimize
print(f"Minimum at x={x.item()} : f(x)={y.item()}")
"""The minimum of the function can also be found using a neural network with one neuron, which gives the value of x the the value 0 is given as input :"""
# Define model
class NeuralNetwork(torch.nn.Module):
def __init__(self):
super().__init__()
self.flatten = torch.nn.Flatten()
self.linear_relu_stack = torch.nn.Sequential(
torch.nn.Linear(1, 1),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
print(model)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
def train():
model.train()
x = model(torch.tensor([[0.0]]).to(device))
y = f(x)
optimizer.zero_grad()
y.backward()
optimizer.step()
epochs = 50
for e in range(epochs):
train()
x = model(torch.tensor([[0.0]]).to(device))
y = f(x)
print(f"Minimum at x={x.item()} : f(x)={y.item()}")
"""### A real example : handwritten digits recognition
Import required libraries
"""
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torchvision.transforms import ToTensor
from torch.utils.data import TensorDataset
"""Dataloaders used to load the datas (handwritted digits images and corresponding digits) from the MNIST database"""
transform = transforms.ToTensor()
training_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_data = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
batch_size = 50
# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
for X, y in test_dataloader:
print(f"Shape of X [N, C, H, W]: {X.shape}")
print(f"Shape of y: {y.shape} {y.dtype}")
break
"""Define the structure of the neural network : an input layer for the input images (28 * 28 pixels), two intermediate layers with 512 neurons, and an output layer with 10 neurons to represent the 10 possible outputs, with ReLU activation function between the layers, which are fully connected.
We will first create the network manually.
"""
# Define model
def sigmoid(x): return 1/(1+torch.exp(-x))
lr = 10
class NeuralNetwork1(nn.Module):
def __init__(self):
super().__init__()
self.W1 = torch.randn((784, 256), requires_grad=True, device=device)
self.B1 = torch.randn(256, requires_grad=True, device=device)
self.W2 = torch.randn((256, 256), requires_grad=True, device=device)
self.B2 = torch.randn(256, requires_grad=True, device=device)
self.W3 = torch.randn((256, 10), requires_grad=True, device=device)
self.B3 = torch.randn(10, requires_grad=True, device=device)
def forward(self, x):
y1 = sigmoid((x @ self.W1) + self.B1)
y2 = sigmoid((y1 @ self.W2) + self.B2)
y3 = sigmoid((y2 @ self.W3) + self.B3)
return y3
def zero_grad(self):
if self.W1.grad is not None: self.W1.grad.zero_()
if self.B1.grad is not None: self.B1.grad.zero_()
if self.W2.grad is not None: self.W2.grad.zero_()
if self.B2.grad is not None: self.B2.grad.zero_()
if self.W3.grad is not None: self.W3.grad.zero_()
if self.B3.grad is not None: self.B3.grad.zero_()
def optimize(self):
self.W1.data -= lr * self.W1.grad.data
self.B1.data -= lr * self.B1.grad.data
self.W2.data -= lr * self.W2.grad.data
self.B2.data -= lr * self.B2.grad.data
self.W3.data -= lr * self.W3.grad.data
self.B3.data -= lr * self.B3.grad.data
model1 = NeuralNetwork1().to(device)
"""Function training the model"""
loss_fn = torch.nn.CrossEntropyLoss()
def train(dataloader, model, loss_fn):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
X = torch.flatten(X, 1, 3) # transform X of shape 50, 1, 28, 28 into 50, 28*28
# Compute prediction error
pred = model(X)
# Compute loss
loss = loss_fn(pred, y)
# Backpropagation
model.zero_grad()
loss.backward()
# Optimize parameters
model.optimize()
if batch % 200 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
"""Function testing the model"""
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
X = torch.flatten(X, 1, 3) # transform X of shape 50, 1, 28, 28 into 50, 28*28
pred = model(X)
test_loss += loss_fn(pred, y)
correct += (pred.argmax(1) == y).type(torch.float32).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
"""Do training"""
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model1, loss_fn)
test(test_dataloader, model1, loss_fn)
print("Done!")
"""We can also use predefined functions included in PyTorch to build the neural network :"""
# Define model
class NeuralNetwork2(nn.Module):
def __init__(self):
super().__init__()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.Sigmoid(),
nn.Linear(512, 512),
nn.Sigmoid(),
nn.Linear(512, 10)
)
self.optimizer = torch.optim.SGD(self.parameters(), lr=0.1)
def forward(self, x):
logits = self.linear_relu_stack(x)
return logits
def zero_grad(self):
self.optimizer.zero_grad()
def optimize(self):
self.optimizer.step()
model2 = NeuralNetwork2().to(device)
print(model2)
"""Use PyTorch SGD optimizer"""
# optimizer = torch.optim.SGD(model2.parameters(), lr=0.1)
"""Train the neural network"""
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model2, loss_fn)
test(test_dataloader, model2, loss_fn)
print("Done!")
"""Use the trained neural network"""
classes = [
"Zero ",
"One ",
"Two ",
"Three",
"Four ",
"Five ",
"Six ",
"Seven",
"Eight",
"Nine ",
]
model2.eval()
ngood = 0
nbad = 0
for i in range(100):
x, y = test_data[i][0], test_data[i][1]
x = x.to(device)
x = torch.flatten(x, 1, 2)
with torch.no_grad():
pred = model2(x)
predicted, actual = classes[pred[0].argmax(0)], classes[y]
if predicted == actual:
result = "Good"
ngood = ngood + 1
else:
result = "Bad"
nbad = nbad + 1
print(f'Predicted: "{predicted}", Actual: "{actual}", Result: {result}')
print(f'{ngood} good, {nbad} bad')