KEYKey Concepts & Tools
All Courses
free compute

GPU Training on Kaggle

30 hrs/week T4 GPU โ€” free for all users

Free GPU Resources

ResourceSpecsLimit
T4 GPU (x1)16GB VRAM30 hrs/week
T4 GPU (x2)32GB VRAM30 hrs/week
P100 GPU16GB HBM230 hrs/week
CPU4 cores, 29GB RAMUnlimited

Best Practices

  • Debug on CPU first, switch to GPU only for training runs
  • Enable internet: Settings โ†’ Internet โ†’ On
  • Save to /kaggle/working/ (persists across sessions)
  • Mixed precision (fp16): 2x speedup, 50% less VRAM

Interactive Notebook

โšก
Notebook: GPU Training on Kaggle
30 hrs/week T4 GPU โ€” free for all users
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding of GPU Training on Kaggle -- 10 questions, 70% to pass.

Take Quiz
free cloud notebooks

Google Colab

Free GPU/TPU with Google Drive integration

Colab vs Kaggle

Colab FreeKaggle FreeColab Pro
GPUT4 (limited)T4 30hr/wkA100/V100
RAM12GB29GB52GB
Session12 hours9 hours24 hours

Key Commands

colab_basics.py
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Check GPU
!nvidia-smi

# Install packages
!pip install -q transformers datasets

# Save to Drive
import shutil
shutil.copy('model.pkl', '/content/drive/MyDrive/')

Interactive Notebook

โšก
Notebook: Google Colab
Free GPU/TPU with Google Drive integration
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding of Google Colab -- 10 questions, 70% to pass.

Take Quiz
leading framework

PyTorch Foundations

Tensors, autograd, training loop, GPU

PyTorch Core Concepts

ConceptDescription
TensorMulti-dim array + GPU support + autograd
AutogradAutomatic gradient computation via computational graph
nn.ModuleBase class for all neural network components
OptimizerUpdates weights (Adam, SGD, AdamW)
DataLoaderBatched, shuffled, parallel data iteration

Training Loop

training_loop.py
for epoch in range(epochs):
    optimizer.zero_grad()       # 1. clear gradients
    output = model(X_train)     # 2. forward pass
    loss = criterion(output, y) # 3. compute loss
    loss.backward()             # 4. backpropagation
    optimizer.step()            # 5. update weights

Interactive Notebook

โšก
Notebook: PyTorch Foundations
Tensors, autograd, training loop, GPU
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding of PyTorch Foundations -- 10 questions, 70% to pass.

Take Quiz
google framework

TensorFlow / Keras

High-level Keras API for quick model building

Keras Sequential Model

keras_model.py
import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(30,)),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy',
              metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, validation_split=0.2,
          callbacks=[tf.keras.callbacks.EarlyStopping(patience=5)])

PyTorch vs Keras

PyTorchKeras
StyleExplicit, PythonicHigh-level, concise
Research useDominantLess common
Learning curveSteeperEasier
ProductionTorchServeTFServing/TFLite

Interactive Notebook

โšก
Notebook: TensorFlow / Keras
High-level Keras API for quick model building
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding of TensorFlow / Keras -- 10 questions, 70% to pass.

Take Quiz
version control

Git & GitHub for ML

Git workflows, .gitignore, DVC for data versioning

ML Project .gitignore

.gitignore
# Python
.venv/
__pycache__/
*.py[cod]

# Data (use DVC/S3)
data/raw/
*.csv
*.parquet

# Models (use HuggingFace Hub/S3)
*.pkl
*.pt
*.h5

# Secrets
.env
*.pem

# ML outputs
mlruns/
logs/

DVC for Data Versioning

dvc_setup.sh
pip install dvc[s3]
dvc init
dvc add data/train.csv    # creates train.csv.dvc
git add data/train.csv.dvc .gitignore
git commit -m "track data with DVC"
dvc push                  # push to S3

Interactive Notebook

โšก
Notebook: Git & GitHub for ML
Git workflows, .gitignore, DVC for data versioning
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding of Git & GitHub for ML -- 10 questions, 70% to pass.

Take Quiz
cloud gpu

AWS EC2 with GPU

p2/p3 instances for production ML training

GPU Instance Types

InstanceGPUVRAMUse case
p3.2xlargeV10016GBResearch training
p4d.24xlarge8x A100320GBLarge model training
g4dn.xlargeT416GBInference, fine-tuning
inf2.xlargeInferentia232GBInference (cheap)

Cost Management

  • Use Spot instances for training (70-90% cheaper)
  • Save checkpoints to S3 every N epochs
  • Stop instances immediately after training
  • Use SageMaker for auto-termination on completion

Interactive Notebook

โšก
Notebook: AWS EC2 with GPU
p2/p3 instances for production ML training
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding of AWS EC2 with GPU -- 10 questions, 70% to pass.

Take Quiz
framework comparison

PyTorch vs TensorFlow

When to use which, ecosystem, career relevance

Detailed Comparison

AspectPyTorchTensorFlow/Keras
Research papers~80% of ML papers~20% of ML papers
Industry startupsDominantCommon in Google products
Mobile deploymentExecuTorch (newer)TFLite (mature)
Distributed trainingPyTorch DDPtf.distribute
DebuggingEasy (Python errors)Harder (graph mode)
Job marketStrong (research+industry)Strong (enterprise)

Recommendation

For learning DL: Start with Keras (simpler).
For research/fine-tuning LLMs: Use PyTorch.
For production at scale: Both are fine. Teams choose based on existing codebase.

Interactive Notebook

โšก
Notebook: PyTorch vs TensorFlow
When to use which, ecosystem, career relevance
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding of PyTorch vs TensorFlow -- 10 questions, 70% to pass.

Take Quiz
development tools

Jupyter & VS Code for ML

Notebook best practices, VS Code setup, debugging

Jupyter Best Practices

  • Keep cells small and focused on one thing
  • Run cells top-to-bottom โ€” avoid hidden state bugs
  • Restart and Run All before sharing
  • Use nbconvert to export to HTML/PDF for reports
  • Use %timeit and %memit for profiling

VS Code ML Extensions

ExtensionPurpose
Python (Microsoft)IntelliSense, debugging, testing
JupyterRun notebooks directly in VS Code
GitLensEnhanced git history and blame
Error LensInline error highlighting
Remote - SSHDevelop on remote GPU servers

Interactive Notebook

โšก
Notebook: Jupyter & VS Code for ML
Notebook best practices, VS Code setup, debugging
First load ~30-60s ยท Saves automatically
Open Notebook

Quiz

Test your understanding of Jupyter & VS Code for ML -- 10 questions, 70% to pass.

Take Quiz
๐ŸŽ“

Key Concepts & Tools Complete!

Pass all topic quizzes to earn your certificate.

Browse Next Course