DLDeep Learning
All Courses
new content

Neural Network Basics

The foundational building block of all deep learning. Understand layers, activation functions, backpropagation, and gradient descent before moving to specialized architectures.

Core Concepts

ConceptPlain-English Meaning
Neuron / NodeTakes weighted inputs, applies activation, outputs a value
LayerA group of neurons โ€” input, hidden, or output layers
Activation FunctionAdds non-linearity โ€” ReLU, Sigmoid, Tanh, Softmax
Forward PassData flows from input โ†’ layers โ†’ output prediction
Loss FunctionMeasures how wrong the prediction is (MSE, Cross-Entropy)
BackpropagationComputes gradients of loss w.r.t. each weight
Gradient DescentUpdates weights in the direction that reduces loss
Batch NormalizationNormalizes layer inputs โ€” stabilizes and speeds training

Code Example

neural_net.py
import torch
import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, in_dim, hidden, out_dim):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(in_dim, hidden),
            nn.BatchNorm1d(hidden),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden, out_dim)
        )
    def forward(self, x):
        return self.net(x)

model = MLP(784, 256, 10)
optim = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

โš  Watch-outs

  • Vanishing gradients in deep networks โ€” use ReLU not Sigmoid for hidden layers
  • Always normalize input features before feeding into a neural network
  • Dropout is only active during training โ€” use model.eval() for inference

Interactive Notebook

โšก
Notebook: Neural Networks
Neural network concepts, activation functions, backpropagation, MLP classification
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding -- 10 questions, 70% to pass.

Take Quiz
054 ยท wafer-defect-yolo

Convolutional Neural Networks (CNNs)

The dominant architecture for image classification, object detection, and visual inspection. CNNs learn spatial hierarchies of features through shared convolutional filters.

Core Components

LayerWhat it does
Conv2DSlides a filter over the image to detect local patterns (edges, textures)
MaxPool2DReduces spatial dimensions, provides translation invariance
BatchNormNormalizes feature maps for stable training
FlattenConverts 2D feature map to 1D vector for FC layers
Dense / FCFinal classification or regression head

Architectures Timeline

LeNet (1998)AlexNet (2012)VGG (2014)ResNet (2015)EfficientNet (2019)YOLO (detection)

๐Ÿ’ก Vision Quality Inspection

Build a CNN-based defect detection system for manufacturing images using YOLO. See project-06 on this platform for a full guided kit.

Interactive Notebook

โšก
Notebook: CNNs
CIFAR-10 image classification, famous models, datasets, HuggingFace
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding -- 10 questions, 70% to pass.

Take Quiz
015 ยท time-series-deep-learning

RNNs & LSTMs

Recurrent architectures for sequential data โ€” text, time series, speech, and music. LSTMs solve the vanishing gradient problem of vanilla RNNs.

LSTM Gates

GatePurpose
Forget GateDecides what information to discard from the cell state
Input GateDecides which new information to add to the cell state
Output GateDecides what part of the cell state to output as hidden state
Cell StateLong-term memory that flows through the sequence

Code Example

lstm.py
class LSTMForecaster(nn.Module):
    def __init__(self, input_size, hidden, layers, out):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden, layers, batch_first=True)
        self.fc   = nn.Linear(hidden, out)

    def forward(self, x):
        out, _ = self.lstm(x)
        return self.fc(out[:, -1, :])

Interactive Notebook

โšก
Notebook: RNNs / LSTMs
Stock price trend prediction, sliding window, LSTM vs GRU
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding -- 10 questions, 70% to pass.

Take Quiz
014 ยท autoencoders-and-gans

Autoencoders & GANs

Generative and representation learning architectures. Autoencoders compress data into latent codes. GANs generate realistic new data through adversarial training.

Autoencoder Structure

Input X
โ†’
Encoder
โ†’
Latent z
โ†’
Decoder
โ†’
Xฬ‚ (reconstruction)

Loss = ||X โˆ’ Xฬ‚||ยฒ ยท Train to minimize reconstruction error

GAN Training

Generator creates fake data. Discriminator tries to distinguish real from fake. They compete: Generator improves until Discriminator can't tell the difference. The Generator is trained to maximize the Discriminator's error.

Interactive Notebook

โšก
Notebook: Autoencoders & GANs
MNIST compression, denoising, face generation with DCGAN
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding -- 10 questions, 70% to pass.

Take Quiz
073 ยท dtfs-transformer-system

Transformers

The architecture behind GPT, BERT, ViT, and Whisper. Self-attention allows every position in a sequence to attend to every other position โ€” enabling parallelism and long-range dependencies.

Self-Attention Intuition

For each token, attention computes how much to "attend" to every other token. Mathematically: Attention(Q, K, V) = softmax(QKแต€ / โˆšd_k) ยท V where Q = queries, K = keys, V = values. Multi-head attention runs this in parallel with different learned projections.

Architecture Components

ComponentRole
Multi-Head AttentionAttend to different positions with different representation subspaces
Feed-Forward NetworkPer-position fully connected layer applied identically to each token
Layer NormalizationStabilizes training across the sequence dimension
Positional EncodingInjects position information (transformers have no recurrence)
Residual ConnectionsSkip connections help gradients flow in deep transformers

Interactive Notebook

โšก
Notebook: Transformers
Sentiment analysis, extractive summarisation, BERT vs GPT
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding -- 10 questions, 70% to pass.

Take Quiz
034 ยท multimodal-vision-language

Multimodal Models (Vision + Language)

Models that understand and generate across multiple data modalities โ€” images, text, audio. CLIP, LLaVA, GPT-4V, Gemini are all multimodal.

Key Models

ModelCapabilityUse case
CLIP (OpenAI)Joint image-text embeddingsImage search, zero-shot classification
LLaVAVisual instruction followingImage Q&A, captioning
GPT-4VVision + language reasoningDocument parsing, chart analysis
WhisperSpeech-to-textAudio transcription, translation

Interactive Notebook

โšก
Notebook: Multimodal Models
BLIP image captioning, Visual Q&A, CLIP zero-shot
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding -- 10 questions, 70% to pass.

Take Quiz
new content

Transfer Learning

Use a model pre-trained on a large dataset as a starting point for your task. Dramatically reduces training time and data requirements.

Two Approaches

Feature Extraction

Freeze the pre-trained backbone. Only train a new head on top. Fast, works with small datasets.

Fine-Tuning

Unfreeze some or all pre-trained layers. Continue training on your data with a small learning rate. Better accuracy, needs more data.

Code Example

transfer.py
from torchvision.models import resnet50
import torch.nn as nn

model = resnet50(pretrained=True)

# Feature extraction: freeze backbone
for p in model.parameters():
    p.requires_grad = False

# Replace classifier head
model.fc = nn.Linear(2048, num_classes)

Interactive Notebook

โšก
Notebook: Transfer Learning
ResNet50 dog breed classifier, fine-tuning strategy, 120 breeds
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding -- 10 questions, 70% to pass.

Take Quiz
new content

Model Compression

Techniques to make large models smaller, faster, and deployable on edge devices โ€” without sacrificing too much accuracy.

Compression Techniques

TechniqueHow it worksTypical size reduction
QuantizationReduce weight precision from float32 to int84ร— smaller, 2โ€“4ร— faster
PruningRemove weights or neurons below a thresholdVariable (10โ€“90%)
Knowledge DistillationTrain a small "student" model to mimic a large "teacher"10โ€“100ร— smaller
ONNX ExportConvert to runtime-optimized formatInference speedup

Interactive Notebook

โšก
Notebook: Model Compression
Pruning, quantisation, knowledge distillation demo
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding -- 10 questions, 70% to pass.

Take Quiz