free compute
GPU Training on Kaggle
30 hrs/week T4 GPU โ free for all users
Free GPU Resources
| Resource | Specs | Limit |
|---|---|---|
| T4 GPU (x1) | 16GB VRAM | 30 hrs/week |
| T4 GPU (x2) | 32GB VRAM | 30 hrs/week |
| P100 GPU | 16GB HBM2 | 30 hrs/week |
| CPU | 4 cores, 29GB RAM | Unlimited |
Best Practices
- Debug on CPU first, switch to GPU only for training runs
- Enable internet: Settings โ Internet โ On
- Save to /kaggle/working/ (persists across sessions)
- Mixed precision (fp16): 2x speedup, 50% less VRAM
Interactive Notebook
Notebook: GPU Training on Kaggle
30 hrs/week T4 GPU โ free for all users
First load ~30-60s ยท Saves automatically
free cloud notebooks
Google Colab
Free GPU/TPU with Google Drive integration
Colab vs Kaggle
| Colab Free | Kaggle Free | Colab Pro | |
|---|---|---|---|
| GPU | T4 (limited) | T4 30hr/wk | A100/V100 |
| RAM | 12GB | 29GB | 52GB |
| Session | 12 hours | 9 hours | 24 hours |
Key Commands
colab_basics.py
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
# Check GPU
!nvidia-smi
# Install packages
!pip install -q transformers datasets
# Save to Drive
import shutil
shutil.copy('model.pkl', '/content/drive/MyDrive/')Interactive Notebook
Notebook: Google Colab
Free GPU/TPU with Google Drive integration
First load ~30-60s ยท Saves automatically
leading framework
PyTorch Foundations
Tensors, autograd, training loop, GPU
PyTorch Core Concepts
| Concept | Description |
|---|---|
| Tensor | Multi-dim array + GPU support + autograd |
| Autograd | Automatic gradient computation via computational graph |
| nn.Module | Base class for all neural network components |
| Optimizer | Updates weights (Adam, SGD, AdamW) |
| DataLoader | Batched, shuffled, parallel data iteration |
Training Loop
training_loop.py
for epoch in range(epochs):
optimizer.zero_grad() # 1. clear gradients
output = model(X_train) # 2. forward pass
loss = criterion(output, y) # 3. compute loss
loss.backward() # 4. backpropagation
optimizer.step() # 5. update weightsInteractive Notebook
Notebook: PyTorch Foundations
Tensors, autograd, training loop, GPU
First load ~30-60s ยท Saves automatically
google framework
TensorFlow / Keras
High-level Keras API for quick model building
Keras Sequential Model
keras_model.py
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(30,)),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, validation_split=0.2,
callbacks=[tf.keras.callbacks.EarlyStopping(patience=5)])PyTorch vs Keras
| PyTorch | Keras | |
|---|---|---|
| Style | Explicit, Pythonic | High-level, concise |
| Research use | Dominant | Less common |
| Learning curve | Steeper | Easier |
| Production | TorchServe | TFServing/TFLite |
Interactive Notebook
Notebook: TensorFlow / Keras
High-level Keras API for quick model building
First load ~30-60s ยท Saves automatically
version control
Git & GitHub for ML
Git workflows, .gitignore, DVC for data versioning
ML Project .gitignore
.gitignore
# Python
.venv/
__pycache__/
*.py[cod]
# Data (use DVC/S3)
data/raw/
*.csv
*.parquet
# Models (use HuggingFace Hub/S3)
*.pkl
*.pt
*.h5
# Secrets
.env
*.pem
# ML outputs
mlruns/
logs/DVC for Data Versioning
dvc_setup.sh
pip install dvc[s3]
dvc init
dvc add data/train.csv # creates train.csv.dvc
git add data/train.csv.dvc .gitignore
git commit -m "track data with DVC"
dvc push # push to S3Interactive Notebook
Notebook: Git & GitHub for ML
Git workflows, .gitignore, DVC for data versioning
First load ~30-60s ยท Saves automatically
cloud gpu
AWS EC2 with GPU
p2/p3 instances for production ML training
GPU Instance Types
| Instance | GPU | VRAM | Use case |
|---|---|---|---|
| p3.2xlarge | V100 | 16GB | Research training |
| p4d.24xlarge | 8x A100 | 320GB | Large model training |
| g4dn.xlarge | T4 | 16GB | Inference, fine-tuning |
| inf2.xlarge | Inferentia2 | 32GB | Inference (cheap) |
Cost Management
- Use Spot instances for training (70-90% cheaper)
- Save checkpoints to S3 every N epochs
- Stop instances immediately after training
- Use SageMaker for auto-termination on completion
Interactive Notebook
Notebook: AWS EC2 with GPU
p2/p3 instances for production ML training
First load ~30-60s ยท Saves automatically
framework comparison
PyTorch vs TensorFlow
When to use which, ecosystem, career relevance
Detailed Comparison
| Aspect | PyTorch | TensorFlow/Keras |
|---|---|---|
| Research papers | ~80% of ML papers | ~20% of ML papers |
| Industry startups | Dominant | Common in Google products |
| Mobile deployment | ExecuTorch (newer) | TFLite (mature) |
| Distributed training | PyTorch DDP | tf.distribute |
| Debugging | Easy (Python errors) | Harder (graph mode) |
| Job market | Strong (research+industry) | Strong (enterprise) |
Recommendation
For learning DL: Start with Keras (simpler).
For research/fine-tuning LLMs: Use PyTorch.
For production at scale: Both are fine. Teams choose based on existing codebase.
For research/fine-tuning LLMs: Use PyTorch.
For production at scale: Both are fine. Teams choose based on existing codebase.
Interactive Notebook
Notebook: PyTorch vs TensorFlow
When to use which, ecosystem, career relevance
First load ~30-60s ยท Saves automatically
development tools
Jupyter & VS Code for ML
Notebook best practices, VS Code setup, debugging
Jupyter Best Practices
- Keep cells small and focused on one thing
- Run cells top-to-bottom โ avoid hidden state bugs
- Restart and Run All before sharing
- Use nbconvert to export to HTML/PDF for reports
- Use %timeit and %memit for profiling
VS Code ML Extensions
| Extension | Purpose |
|---|---|
| Python (Microsoft) | IntelliSense, debugging, testing |
| Jupyter | Run notebooks directly in VS Code |
| GitLens | Enhanced git history and blame |
| Error Lens | Inline error highlighting |
| Remote - SSH | Develop on remote GPU servers |
Interactive Notebook
Notebook: Jupyter & VS Code for ML
Notebook best practices, VS Code setup, debugging
First load ~30-60s ยท Saves automatically