Machine Learning Course — Mitra AI Projects

001 · linear-regression-engine

Linear Regression

The foundation of supervised learning. Linear regression models the relationship between input features and a continuous output by fitting a straight line (or hyperplane) through the data.

Plain-English Explanation

Linear regression assumes that the output (label) is a weighted sum of the input features plus a bias term: ŷ = w₁x₁ + w₂x₂ + … + b. Training finds the weights that minimize the average prediction error (mean squared error) across all training examples. Think of it as fitting the "best straight line" through scattered data points.

When to Use / When Not to Use

            ✅ Use when:
            Output is a continuous number (price, temperature, score)
You need interpretability (explainable coefficients)
Features have a roughly linear relationship with the target
Baseline model before trying complex approaches

          

            ❌ Avoid when:
            Output is categorical (use classification instead)
Features have strong non-linear interactions
Data has many outliers (OLS is sensitive to them)
Features are highly correlated (multicollinearity)

          

Algorithm Variants

Variant	Key difference	When to use
Ordinary Least Squares (OLS)	Minimizes MSE directly	Baseline, small datasets
Ridge (L2 regularization)	Penalizes large weights	Multicollinearity, many features
Lasso (L1 regularization)	Can zero out weights	Feature selection needed
ElasticNet	L1 + L2 combined	Many features, some correlated
Polynomial Regression	Adds x², x³ terms	Curved relationships

Key Metrics

Metric	Formula	What it tells you
MAE	mean(\|y − ŷ\|)	Average absolute error, robust to outliers
MSE	mean((y − ŷ)²)	Penalizes large errors more heavily
RMSE	√MSE	Same units as target, most interpretable
R² Score	1 − SS_res/SS_tot	% variance explained (1.0 = perfect)
Adjusted R²	Penalizes # features	Fairer comparison across models

Code Example

linear_regression.py

from sklearn.linear_model import LinearRegression, Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
import numpy as np

# Load your data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train OLS
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Evaluate
r2  = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"R²: {r2:.3f}  RMSE: {rmse:.3f}")

# Coefficients
print("Intercept:", model.intercept_)
print("Weights:", model.coef_)

# With Ridge regularization
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

Notebook & Demo

⚡

Interactive Notebook: Linear Regression

Cheese quality prediction — OLS, Ridge, Lasso, feature importance

First load: ~30–60 seconds (Pyodide Python runtime downloads) · Work saves automatically

Open Notebook →

Common Mistakes

⚠ Watch-outs

Not scaling features when using regularized variants (Ridge, Lasso need StandardScaler)
Ignoring the residuals plot — always check for patterns in errors
Using R² alone without checking absolute error magnitude
Assuming linearity without plotting feature vs. target scatter
Data leakage: fitting the scaler on all data before splitting

Project Idea

💡 House Price Predictor

Build a Streamlit app that predicts property prices using area, location, and bedroom count. Train on a public dataset (Bengaluru Housing, Boston Housing). Add Ridge regularization and compare against OLS. Great for understanding feature importance and model transparency.

Browse Full Project Kits →

Quiz

Test your understanding of Linear Regression with 10 questions. Pass 70% to mark this topic complete.

Take Quiz →

002 · classification-engine

Classification

Predict which category a data point belongs to. Classification is the most common ML task in placement interviews and practical projects.

Plain-English Explanation

Instead of predicting a number (regression), classification predicts a label: spam or not spam, disease or healthy, fraud or legitimate. The model learns a decision boundary that separates classes. Logistic regression is the simplest classifier — despite the name, it outputs a probability of belonging to a class.

When to Use

            ✅ Use when:
            Output is a category (yes/no, A/B/C)
You need probability estimates
Multi-class output is required

          

            ❌ Avoid when:
            Target is continuous (use regression)
Extreme class imbalance without handling
Very few labeled samples per class

          

Key Metrics

Metric	Formula	When to prioritize
Accuracy	(TP+TN)/total	Balanced classes
Precision	TP/(TP+FP)	Cost of false positives is high (spam filter)
Recall	TP/(TP+FN)	Cost of false negatives is high (cancer detection)
F1 Score	2·(P·R)/(P+R)	Imbalanced classes, need balance of P and R
ROC-AUC	Area under ROC curve	Threshold-independent performance

Code Example

classification.py

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc  = scaler.transform(X_test)

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train_sc, y_train)

y_pred  = clf.predict(X_test_sc)
y_proba = clf.predict_proba(X_test_sc)[:, 1]

print(classification_report(y_test, y_pred))
print(f"ROC-AUC: {roc_auc_score(y_test, y_proba):.3f}")

Common Mistakes

⚠ Watch-outs

Using accuracy alone on imbalanced datasets (use F1 or AUC instead)
Not applying SMOTE or class_weight when classes are skewed
Confusing precision and recall — always ask which error is costlier
Choosing the threshold at 0.5 without evaluating the full ROC curve

Project Idea

💡 Placement Predictor

Build a logistic regression model to predict whether a student will get placed based on CGPA, internships, projects, and branch. Add a confusion matrix visualization and deploy on Streamlit.

Interactive Notebook

⚡

Interactive Notebook: Classification

Manuscript authenticity — Logistic Regression, ROC-AUC, confusion matrix

First load: ~30–60 seconds (Pyodide Python runtime downloads) · Work saves automatically

Open Notebook →

Quiz

Test your understanding of Classification with 10 questions. Pass 70% to mark this topic complete.

Take Quiz →

003 · tree-based-learning

Tree-Based Learning

Decision trees, random forests, and their variants. One of the most interview-tested ML family of algorithms.

Plain-English Explanation

A decision tree splits data by asking yes/no questions about features. At each node, it picks the split that best separates the classes (using Gini or entropy). Random Forest builds many trees on random subsets of data and features, then averages their outputs — reducing variance through ensemble averaging.

Key Metrics & Hyperparameters

Parameter	Effect	Typical range
max_depth	Controls tree depth, limits overfitting	3–15
n_estimators (RF)	More trees = lower variance, more compute	100–500
min_samples_split	Min samples to allow a split	2–20
max_features	Features considered per split	sqrt(n), log2(n)
criterion	Split quality measure	gini, entropy

Code Example

random_forest.py

from sklearn.ensemble import RandomForestClassifier
import pandas as pd

rf = RandomForestClassifier(n_estimators=200, max_depth=8, random_state=42)
rf.fit(X_train, y_train)

# Feature importance
fi = pd.Series(rf.feature_importances_, index=feature_names)
fi.sort_values(ascending=False).plot(kind='bar')

Common Mistakes

⚠ Watch-outs

Single decision trees overfit badly — always use ensembles (RF or Boosting)
Assuming feature importance from RF handles correlated features well (it does not)
Forgetting that RF can still overfit if max_depth is unconstrained

Interactive Notebook

⚡

Interactive Notebook: Tree-Based Learning

EV battery health — Decision Tree, Random Forest, feature importance

First load: ~30–60 seconds (Pyodide Python runtime downloads) · Work saves automatically

Open Notebook →

Quiz

Test your understanding of Tree Based Learning with 10 questions. Pass 70% to mark this topic complete.

Take Quiz →

004 · boosting-revolution

Boosting

XGBoost, LightGBM, and gradient boosting. The algorithm that wins Kaggle competitions. Sequentially trains weak learners, each correcting the previous one's errors.

Plain-English Explanation

Boosting trains trees sequentially. Each new tree focuses on the data points the previous ensemble got wrong. Gradient Boosting does this by fitting each tree to the residual errors. XGBoost and LightGBM add regularization, histogram-based splits, and speed optimizations on top of this idea.

Algorithm Comparison

Algorithm	Speed	Memory	Best for
Gradient Boosting (sklearn)	Slow	Medium	Small-medium datasets, baseline
XGBoost	Fast	Higher	Structured data, Kaggle
LightGBM	Fastest	Low	Large datasets, categorical features
CatBoost	Fast	Medium	Many categorical features

Code Example

xgboost_example.py

import xgboost as xgb
from sklearn.model_selection import cross_val_score

model = xgb.XGBClassifier(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=6,
    subsample=0.8,
    colsample_bytree=0.8,
    use_label_encoder=False,
    eval_metric='logloss'
)

scores = cross_val_score(model, X, y, cv=5, scoring='roc_auc')
print(f"CV AUC: {scores.mean():.3f} ± {scores.std():.3f}")

⚠ Watch-outs

Low learning_rate requires more n_estimators — balance them together
Early stopping on a validation set prevents overfitting and saves compute
XGBoost handles missing values natively — don't impute blindly before using it

Interactive Notebook

⚡

Interactive Notebook: Boosting

HDD failure prediction — XGBoost, LightGBM, early stopping

First load: ~30–60 seconds (Pyodide Python runtime downloads) · Work saves automatically

Open Notebook →

Quiz

Test your understanding of Boosting with 10 questions. Pass 70% to mark this topic complete.

Take Quiz →

005 · support-vector-machines

Support Vector Machines

SVMs find the maximum-margin hyperplane that separates classes. They are powerful for high-dimensional data and work well when classes are clearly separable.

Plain-English Explanation

SVM tries to find the widest possible "street" (margin) between two classes. The data points closest to the margin are called support vectors. The kernel trick lets SVMs work in higher-dimensional spaces without explicitly computing those dimensions — enabling non-linear decision boundaries.

Kernel Options

Kernel	When to use
Linear	Linearly separable data, text classification
RBF (Radial Basis Function)	Non-linear data, most common default
Polynomial	Polynomial feature interactions
Sigmoid	Neural network-like, less common

Code Example

svm.py

from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', SVC(kernel='rbf', C=1.0, gamma='scale', probability=True))
])
pipe.fit(X_train, y_train)

⚠ Watch-outs

SVMs are slow on large datasets — prefer XGBoost or Random Forest when n > 50k
Always scale features before SVM — it is margin-based and very sensitive to scale
Hyperparameter C controls the trade-off between margin width and misclassifications

Interactive Notebook

⚡

Interactive Notebook: SVM

Drone motor fault detection — Linear & RBF kernels, C parameter tuning

First load: ~30–60 seconds (Pyodide Python runtime downloads) · Work saves automatically

Open Notebook →

Quiz

Test your understanding of Svm with 10 questions. Pass 70% to mark this topic complete.

Take Quiz →

006 · unsupervised-discovery

Clustering & Unsupervised Learning

Find natural groups in data without labels. Used in customer segmentation, document grouping, image compression, and anomaly detection preprocessing.

Algorithm Comparison

Algorithm	Must specify K?	Handles noise?	Best for
K-Means	Yes	No	Compact spherical clusters
DBSCAN	No	Yes	Arbitrary shapes, outlier detection
Hierarchical	No (choose post)	Partially	Dendrogram visualization
GMM	Yes	Soft	Overlapping clusters, soft assignment

Code Example

clustering.py

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# Elbow method to choose K
inertias = []
for k in range(2, 11):
    km = KMeans(n_clusters=k, random_state=42)
    km.fit(X)
    inertias.append(km.inertia_)

# Silhouette score for validation
labels = KMeans(n_clusters=4).fit_predict(X)
score  = silhouette_score(X, labels)
print(f"Silhouette: {score:.3f}")  # higher is better

Interactive Notebook

⚡

Interactive Notebook: Clustering

Smartwatch health segments — K-Means, DBSCAN, silhouette score

First load: ~30–60 seconds (Pyodide Python runtime downloads) · Work saves automatically

Open Notebook →

Quiz

Test your understanding of Clustering with 10 questions. Pass 70% to mark this topic complete.

Take Quiz →

007 · anomaly-detection-dimensionality

Anomaly Detection

Identify unusual data points — fraud, system failures, manufacturing defects. Most anomaly problems are highly imbalanced.

Key Approaches

Method	How it works	Best for
Isolation Forest	Anomalies are easier to isolate in random trees	Tabular data, general use
Local Outlier Factor	Compares density to k nearest neighbors	Cluster-based outliers
Z-Score	Flags points beyond n standard deviations	Gaussian, simple baselines
Autoencoder	High reconstruction error = anomaly	Complex patterns, images

Code Example

anomaly.py

from sklearn.ensemble import IsolationForest

iso = IsolationForest(contamination=0.05, random_state=42)
labels = iso.fit_predict(X)
# -1 = anomaly, 1 = normal
anomalies = X[labels == -1]

Interactive Notebook

⚡

Interactive Notebook: Anomaly Detection

Jet engine sensor anomalies — Isolation Forest, LOF, Z-score

First load: ~30–60 seconds (Pyodide Python runtime downloads) · Work saves automatically

Open Notebook →

Quiz

Test your understanding of Anomaly Detection with 10 questions. Pass 70% to mark this topic complete.

Take Quiz →

008 · naive-bayes-lda

Naive Bayes & LDA

Probabilistic classifiers grounded in Bayes' theorem. Fast, interpretable, and surprisingly effective for text classification and small datasets.

Plain-English Explanation

Naive Bayes applies Bayes' theorem with the "naive" assumption that all features are independent given the class. Despite this oversimplification, it works very well for text (bag-of-words). LDA (Linear Discriminant Analysis) finds the linear combination of features that best separates classes — it also doubles as a dimensionality reduction tool.

Variants

Variant	Input type	Best for
GaussianNB	Continuous features	Real-valued features
MultinomialNB	Count features	Text, word frequency
BernoulliNB	Binary features	Binary feature vectors
LDA	Continuous	Multi-class + dimensionality reduction

Interactive Notebook

⚡

Interactive Notebook: Naive Bayes / LDA

Jira ticket classifier — MultinomialNB, LDA dimensionality reduction

First load: ~30–60 seconds (Pyodide Python runtime downloads) · Work saves automatically

Open Notebook →

Quiz

Test your understanding of Naive Bayes Lda with 10 questions. Pass 70% to mark this topic complete.

Take Quiz →

009 · time-series-forecasting

Time Series Forecasting

Predict future values based on historical sequences. Used in stock prices, sales forecasting, energy demand, and inventory planning.

Key Concepts

Concept	Meaning
Stationarity	Mean and variance don't change over time. Required for ARIMA.
Seasonality	Repeating patterns at fixed intervals (weekly, monthly)
Trend	Long-term upward or downward movement in the data
Autocorrelation (ACF)	Correlation of the series with its own past values
Partial Autocorrelation (PACF)	Direct correlation at each lag, removing intermediate lags

Algorithm Family

Model	Best for
ARIMA	Stationary or differenced series, no seasonal component
SARIMA	Seasonal patterns with ARIMA
Prophet (Meta)	Business time series with holidays and multiple seasonalities
LSTM/GRU	Long sequences, complex non-linear patterns
XGBoost with lags	Tabular approach to time series

Project Idea

💡 Sales Forecast Dashboard

Build an Inventory Forecasting Dashboard using ARIMA and Prophet on retail sales data. Show actual vs. predicted with confidence intervals. Add Streamlit UI and a CSV upload feature. This maps directly to project-03 on this platform.

See Inventory Forecasting Kit →

Interactive Notebook

⚡

Interactive Notebook: Time Series

Data centre power demand — ARIMA, Prophet, lag features + GBM

First load: ~30–60 seconds (Pyodide Python runtime downloads) · Work saves automatically

Open Notebook →

Quiz

Test your understanding of Time Series with 10 questions. Pass 70% to mark this topic complete.

Take Quiz →

010 · hyperparameter-optimization

Hyperparameter Optimization

Systematically search for the best model configuration. Moving from manual tuning to automated, principled search.

Search Strategies

Strategy	How it works	Speed	Quality
Grid Search	Try every combination in a defined grid	Slow	Exhaustive
Random Search	Sample random combinations from distributions	Faster	Often better than grid
Bayesian Optimization	Builds a surrogate model of the objective	Efficient	Best for expensive models
Optuna / Hyperopt	Tree-structured Parzen Estimator (TPE)	Very efficient	State of the art

Code Example — Optuna

optuna_search.py

import optuna

def objective(trial):
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 100, 500),
        'max_depth':    trial.suggest_int('max_depth', 3, 12),
        'learning_rate': trial.suggest_float('lr', 0.01, 0.3, log=True),
    }
    model = xgb.XGBClassifier(**params)
    score = cross_val_score(model, X, y, cv=3, scoring='roc_auc').mean()
    return score

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)
print(study.best_params)

Interactive Notebook

⚡

Interactive Notebook: Hyperparameter Opt.

3D printer quality — Grid Search, Optuna Bayesian optimisation

First load: ~30–60 seconds (Pyodide Python runtime downloads) · Work saves automatically

Open Notebook →

Quiz

Test your understanding of Hyperparameter Optimization with 10 questions. Pass 70% to mark this topic complete.

Take Quiz →

Linear Regression

Plain-English Explanation

When to Use / When Not to Use

Algorithm Variants

Key Metrics

Code Example

Notebook & Demo

Common Mistakes

⚠ Watch-outs

Project Idea

💡 House Price Predictor

Quiz

Classification

Plain-English Explanation

When to Use

Key Metrics

Code Example

Common Mistakes

⚠ Watch-outs

Project Idea

💡 Placement Predictor

Interactive Notebook

Quiz

Tree-Based Learning

Plain-English Explanation

Key Metrics & Hyperparameters

Code Example

Common Mistakes

⚠ Watch-outs

Interactive Notebook

Quiz

Boosting

Plain-English Explanation

Algorithm Comparison

Code Example

⚠ Watch-outs

Interactive Notebook

Quiz

Support Vector Machines

Plain-English Explanation

Kernel Options

Code Example

⚠ Watch-outs

Interactive Notebook

Quiz

Clustering & Unsupervised Learning

Algorithm Comparison

Code Example

Interactive Notebook

Quiz

Anomaly Detection

Key Approaches

Code Example

Interactive Notebook

Quiz

Naive Bayes & LDA

Plain-English Explanation

Variants

Interactive Notebook

Quiz

Time Series Forecasting

Key Concepts

Algorithm Family

Project Idea

💡 Sales Forecast Dashboard

Interactive Notebook

Quiz

Hyperparameter Optimization

Search Strategies

Code Example — Optuna

Interactive Notebook

Quiz

Machine Learning Complete!