How to Build a Neural Network from Scratch with PyTorch

Neural networks are the backbone of modern artificial intelligence and deep learning. But how exactly do these "magic" algorithms work under the hood? The best way to truly understand neural networks is to build one from scratch.

In this in-depth tutorial, we‘ll demystify neural networks by walking through how to create one using PyTorch, the popular deep learning framework. By the end, you‘ll understand the core concepts and be able to architect your own custom models. Let‘s dive in!

Neural Networks 101: Key Concepts to Know

At a high level, a neural network is a machine learning model that learns patterns from data in order to make predictions. It takes in input data, performs a series of mathematical operations, and outputs a prediction.

For example, say we want to build a model that can classify images of handwritten digits as 0-9. We‘d train a neural network by showing it thousands of labeled example images. Over time, it learns the distinguishing visual patterns of each digit class. Then when shown a new image, it can predict the correct digit.

Biological neurons in the brain were the original inspiration for neural networks. An artificial neuron takes in a weighted sum of input values, applies an activation function, and passes the signal to connected neurons, forming a network.

Here are some key building blocks of neural nets to know:

Neurons (or units): The nodes in each layer that hold a numeric value and have weighted connections to neurons in adjacent layers. The input layer neurons correspond to the input data (like pixel intensities for an image). Hidden layers transform the data to learn useful patterns. The output layer neurons represent the final prediction (like digit class probabilities).

Weights and biases: The learnable parameters of the model. Each neuron-to-neuron connection has an associated weight that determines the strength of the signal passed. Each neuron also has a bias term that shifts its output value. The weights and biases start as random values but are tuned during training to minimize prediction error.

Activation functions: The nonlinear functions applied to a neuron‘s output that allow the network to learn complex patterns. Popular choices include ReLU, sigmoid, and tanh. Without activation functions, neural nets could only learn linear relationships.

Loss functions: How the model‘s prediction error is quantified, which the training process tries to minimize. Mean squared error and cross-entropy are common loss functions. The loss is calculated on the training data and used to adjust the weights and biases.

Backpropagation: The workhorse training algorithm that enables a neural net to learn. For each training example, it:

  1. Does a forward pass to calculate the predicted outputs and loss
  2. Calculates the gradient of the loss with respect to each weight
  3. Updates the weights in the direction that reduces the loss (gradient descent)

By iteratively applying backpropagation across many training examples, the weights and biases are optimized to map inputs to correct outputs.

Those are the core concepts – now let‘s get our hands dirty and build a basic neural net!

Loading and Preparing the Dataset

We‘ll build a neural network to classify images of handwritten digits from the famous MNIST dataset. This dataset has 60,000 grayscale 28×28 pixel training images and 10,000 test images, labeled as 0-9.

First, let‘s import PyTorch and download the MNIST data:

import torch
from torchvision import datasets, transforms

# Download MNIST training and test datasets 
train_data = datasets.MNIST(‘data‘, train=True, download=True, transform=transforms.ToTensor())
test_data = datasets.MNIST(‘data‘, train=False, download=True, transform=transforms.ToTensor())

The transforms.ToTensor() converts the images from PIL format to PyTorch tensors and scales the pixel values from 0-255 to 0-1.

Let‘s create data loaders to batch and shuffle the data:

from torch.utils.data import DataLoader

batch_size = 64
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)  
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

We can visualize some example digits:

import matplotlib.pyplot as plt

images, labels = next(iter(train_loader))
plt.figure(figsize=(10,5))
for i in range(10):
  plt.subplot(2,5,i+1)
  plt.imshow(images[i][0], cmap=‘gray‘) 
  plt.title(labels[i].item())
  plt.xticks([])
  plt.yticks([])
plt.show()  

MNIST Examples

Great, we‘ve loaded our data – now let‘s define the neural network!

Defining the Neural Network

We‘ll build a feedforward neural network, meaning the data flows straight through from input to output. Our network will have 3 layers:

  • Input layer with 784 neurons (28*28 unrolled image pixels)
  • Hidden layer with 128 neurons
  • Output layer with 10 neurons (one per digit class)

Here‘s how we define the architecture in PyTorch:

from torch import nn

class NeuralNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.input_layer = nn.Linear(28*28, 128)  
    self.hidden_layer = nn.Linear(128, 128)
    self.output_layer = nn.Linear(128, 10)
    self.relu = nn.ReLU()

  def forward(self, x):
    x = self.input_layer(x)
    x = self.relu(x)
    x = self.hidden_layer(x)
    x = self.relu(x)
    x = self.output_layer(x)
    return x

We create a class inheriting from nn.Module with two key methods:

  • __init__ defines the layers of the network. nn.Linear(in, out) is a fully-connected layer with in inputs and out outputs. We‘ll also use the rectified linear unit (ReLU) activation function.
  • forward defines the forward pass of data through the network. The input data is passed through the layers sequentially, with a ReLU after each linear layer. The final raw outputs are returned.

Let‘s initialize an instance of our model and inspect it:

model = NeuralNet()
print(model)
NeuralNet(
  (input_layer): Linear(in_features=784, out_features=128, bias=True)
  (hidden_layer): Linear(in_features=128, out_features=128, bias=True)
  (output_layer): Linear(in_features=128, out_features=10, bias=True)
  (relu): ReLU()
)

The model has 4 layers (3 linear, 1 activation) with learnable weights and biases. The final layer outputs a vector of 10 raw values (called logits). To get a probability distribution over the classes, we apply the softmax function:

import torch.nn.functional as F

x = torch.randn(64, 784) # random input batch
logits = model(x)
probs = F.softmax(logits, dim=1)  
print(probs.shape)
torch.Size([64, 10])

The dim=1 argument specifies to calculate softmax across the class dimension.

We‘re almost ready to train – but first we need to define the loss function and optimizer.

Defining the Loss and Optimizer

The cross-entropy loss is a common choice for multi-class classification. Conveniently, PyTorch provides an implementation that combines a softmax activation and the negative log likelihood loss:

criterion = nn.CrossEntropyLoss()

For the optimizer, we‘ll use Adam which is known to work well for a wide range of models. It adjusts the learning rate for each weight based on its historical gradients:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)  

We pass the model parameters to optimize and set the learning rate hyperparameter. A higher lr will make training converge faster but may overshoot the optimum.

Now we‘re ready to train the model!

Training the Model

The training loop has 4 steps:

  1. Get a batch of data and move it to the GPU if available
  2. Zero the gradients, do a forward pass, and calculate the loss
  3. Backpropagate the gradients
  4. Take a step with the optimizer to update the weights

Here‘s the code:

epochs = 10

for epoch in range(epochs):
  total_loss = 0

  for batch in train_loader:
    images, labels = batch
    images = images.view(images.shape[0], -1) # flatten images

    optimizer.zero_grad()
    preds = model(images)
    loss = criterion(preds, labels)
    loss.backward()
    optimizer.step()

    total_loss += loss.item()

  print(f"Epoch: {epoch+1}, Loss: {total_loss/len(train_loader):.3f}")

This prints the training loss after each epoch:

Epoch: 1, Loss: 0.519  
Epoch: 2, Loss: 0.298
Epoch: 3, Loss: 0.241  
Epoch: 4, Loss: 0.203
Epoch: 5, Loss: 0.176
Epoch: 6, Loss: 0.156
Epoch: 7, Loss: 0.141
Epoch: 8, Loss: 0.129
Epoch: 9, Loss: 0.121
Epoch: 10, Loss: 0.113

The loss steadily decreases as the model learns – a good sign! After 10 epochs, the loss is 0.113 which is reasonably low. We could continue training to lower it further.

Evaluating the Trained Model

To evaluate how well the model actually performs, we run it on the held-out test set and measure the accuracy:

model.eval()
correct, total = 0, 0

with torch.no_grad():
  for batch in test_loader:
    images, labels = batch
    images = images.view(images.shape[0], -1)
    preds = model(images).argmax(dim=1)  
    correct += (preds == labels).sum().item()
    total += labels.shape[0]

print(f"Test accuracy: {correct/total:.3f}")  

The key parts:

  • model.eval() puts the model in evaluation mode (turns off things like dropout)
  • torch.no_grad() disables gradient calculations (not needed for evaluation)
  • .argmax(dim=1) takes the most probable class for each example
  • We count the number of matching predictions and labels to calculate accuracy

Running this gives:

Test accuracy: 0.961

Wow, over 96% accuracy on unseen data – not too shabby for our simple multilayer perceptron!

Improving the Model

There are many ways we could improve this basic model:

  • Add more hidden layers to learn hierarchical features
  • Increase the units in each layer for greater capacity
  • Train for more epochs until convergence
  • Tune hyperparameters like learning rate, batch size, optimizer
  • Apply regularization techniques like L2 or dropout to prevent overfitting
  • Augment the training data with random transformations for robustness

I encourage you to experiment and see how high you can get the test accuracy! You can find the full code for this tutorial in this Github repo.

Next Steps

I hope this tutorial gave you a solid understanding of the core concepts of neural networks. We covered a lot:

  • The key building blocks like neurons, weights, activation functions, loss, and backpropagation
  • Loading and preparing a dataset in PyTorch
  • Defining a multilayer perceptron model
  • Setting up the training loop and evaluating performance
  • Ideas for improving the basic model

For next steps, I recommend:

  • Coding the forward and backward passes from scratch (without PyTorch autograd)
  • Building other types of models like convolutional or recurrent neural networks
  • Applying neural networks to your own datasets and problems
  • Studying more advanced architectures and techniques used in state-of-the-art models

Neural networks are immensely powerful tools that are fun to experiment with. The field of deep learning is progressing rapidly with new innovations all the time. I‘m excited to see what you‘ll create!

Similar Posts