Deep Learning Course — Mitra AI Projects

new content

Neural Network Basics

The foundational building block of all deep learning. Understand layers, activation functions, backpropagation, and gradient descent before moving to specialized architectures.

Core Concepts

Concept	Plain-English Meaning
Neuron / Node	Takes weighted inputs, applies activation, outputs a value
Layer	A group of neurons — input, hidden, or output layers
Activation Function	Adds non-linearity — ReLU, Sigmoid, Tanh, Softmax
Forward Pass	Data flows from input → layers → output prediction
Loss Function	Measures how wrong the prediction is (MSE, Cross-Entropy)
Backpropagation	Computes gradients of loss w.r.t. each weight
Gradient Descent	Updates weights in the direction that reduces loss
Batch Normalization	Normalizes layer inputs — stabilizes and speeds training

Code Example

neural_net.py

import torch
import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, in_dim, hidden, out_dim):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(in_dim, hidden),
            nn.BatchNorm1d(hidden),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden, out_dim)
        )
    def forward(self, x):
        return self.net(x)

model = MLP(784, 256, 10)
optim = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

⚠ Watch-outs

Vanishing gradients in deep networks — use ReLU not Sigmoid for hidden layers
Always normalize input features before feeding into a neural network
Dropout is only active during training — use model.eval() for inference

Interactive Notebook

⚡

Notebook: Neural Networks

Neural network concepts, activation functions, backpropagation, MLP classification

First load ~30-60s · Saves automatically

Open Notebook

Quiz

Test your understanding -- 10 questions, 70% to pass.

Take Quiz

Layer	What it does
Conv2D	Slides a filter over the image to detect local patterns (edges, textures)
MaxPool2D	Reduces spatial dimensions, provides translation invariance
BatchNorm	Normalizes feature maps for stable training
Flatten	Converts 2D feature map to 1D vector for FC layers
Dense / FC	Final classification or regression head

Layer

What it does

Conv2D

Slides a filter over the image to detect local patterns (edges, textures)

MaxPool2D

Reduces spatial dimensions, provides translation invariance

BatchNorm

Normalizes feature maps for stable training

Flatten

Converts 2D feature map to 1D vector for FC layers

Dense / FC

Final classification or regression head

Gate	Purpose
Forget Gate	Decides what information to discard from the cell state
Input Gate	Decides which new information to add to the cell state
Output Gate	Decides what part of the cell state to output as hidden state
Cell State	Long-term memory that flows through the sequence

Gate

Purpose

Forget Gate

Decides what information to discard from the cell state

Input Gate

Decides which new information to add to the cell state

Output Gate

Decides what part of the cell state to output as hidden state

Cell State

Long-term memory that flows through the sequence

class LSTMForecaster(nn.Module): def __init__(self, input_size, hidden, layers, out): super().__init__() self.lstm = nn.LSTM(input_size, hidden, layers, batch_first=True) self.fc = nn.Linear(hidden, out) def forward(self, x): out, _ = self.lstm(x) return self.fc(out[:, -1, :])

Component	Role
Multi-Head Attention	Attend to different positions with different representation subspaces
Feed-Forward Network	Per-position fully connected layer applied identically to each token
Layer Normalization	Stabilizes training across the sequence dimension
Positional Encoding	Injects position information (transformers have no recurrence)
Residual Connections	Skip connections help gradients flow in deep transformers

Component

Role

Multi-Head Attention

Attend to different positions with different representation subspaces

Feed-Forward Network

Per-position fully connected layer applied identically to each token

Layer Normalization

Stabilizes training across the sequence dimension

Positional Encoding

Injects position information (transformers have no recurrence)

Residual Connections

Skip connections help gradients flow in deep transformers

Model	Capability	Use case
CLIP (OpenAI)	Joint image-text embeddings	Image search, zero-shot classification
LLaVA	Visual instruction following	Image Q&A, captioning
GPT-4V	Vision + language reasoning	Document parsing, chart analysis
Whisper	Speech-to-text	Audio transcription, translation

Model

Capability

Use case

CLIP (OpenAI)

Joint image-text embeddings

Image search, zero-shot classification

LLaVA

Visual instruction following

Image Q&A, captioning

GPT-4V

Vision + language reasoning

Document parsing, chart analysis

Whisper

Speech-to-text

Audio transcription, translation

from torchvision.models import resnet50 import torch.nn as nn model = resnet50(pretrained=True) # Feature extraction: freeze backbone for p in model.parameters(): p.requires_grad = False # Replace classifier head model.fc = nn.Linear(2048, num_classes)

Technique	How it works	Typical size reduction
Quantization	Reduce weight precision from float32 to int8	4× smaller, 2–4× faster
Pruning	Remove weights or neurons below a threshold	Variable (10–90%)
Knowledge Distillation	Train a small "student" model to mimic a large "teacher"	10–100× smaller
ONNX Export	Convert to runtime-optimized format	Inference speedup

Technique

How it works

Typical size reduction

Quantization

Reduce weight precision from float32 to int8

4× smaller, 2–4× faster

Pruning

Remove weights or neurons below a threshold

Variable (10–90%)

Knowledge Distillation

Train a small "student" model to mimic a large "teacher"

10–100× smaller

ONNX Export

Convert to runtime-optimized format

Inference speedup

Neural Network Basics

Core Concepts

Code Example

⚠ Watch-outs

Interactive Notebook

Quiz

Convolutional Neural Networks (CNNs)

Core Components

Architectures Timeline

💡 Vision Quality Inspection

Interactive Notebook

Quiz

RNNs & LSTMs

LSTM Gates

Code Example

Interactive Notebook

Quiz

Autoencoders & GANs

Autoencoder Structure

GAN Training

Interactive Notebook

Quiz

Transformers

Self-Attention Intuition

Architecture Components

Interactive Notebook

Quiz

Multimodal Models (Vision + Language)

Key Models

Interactive Notebook

Quiz

Transfer Learning

Two Approaches

Code Example

Interactive Notebook

Quiz

Model Compression

Compression Techniques

Interactive Notebook

Quiz