ML Machine Learning
Cheatsheet โ†“
001 ยท linear-regression-engine

Linear Regression

The foundation of supervised learning. Linear regression models the relationship between input features and a continuous output by fitting a straight line (or hyperplane) through the data.

Plain-English Explanation

Linear regression assumes that the output (label) is a weighted sum of the input features plus a bias term: ลท = wโ‚xโ‚ + wโ‚‚xโ‚‚ + โ€ฆ + b. Training finds the weights that minimize the average prediction error (mean squared error) across all training examples. Think of it as fitting the "best straight line" through scattered data points.

When to Use / When Not to Use

โœ… Use when:
  • Output is a continuous number (price, temperature, score)
  • You need interpretability (explainable coefficients)
  • Features have a roughly linear relationship with the target
  • Baseline model before trying complex approaches
โŒ Avoid when:
  • Output is categorical (use classification instead)
  • Features have strong non-linear interactions
  • Data has many outliers (OLS is sensitive to them)
  • Features are highly correlated (multicollinearity)

Algorithm Variants

VariantKey differenceWhen to use
Ordinary Least Squares (OLS)Minimizes MSE directlyBaseline, small datasets
Ridge (L2 regularization)Penalizes large weightsMulticollinearity, many features
Lasso (L1 regularization)Can zero out weightsFeature selection needed
ElasticNetL1 + L2 combinedMany features, some correlated
Polynomial RegressionAdds xยฒ, xยณ termsCurved relationships

Key Metrics

MetricFormulaWhat it tells you
MAEmean(|y โˆ’ ลท|)Average absolute error, robust to outliers
MSEmean((y โˆ’ ลท)ยฒ)Penalizes large errors more heavily
RMSEโˆšMSESame units as target, most interpretable
Rยฒ Score1 โˆ’ SS_res/SS_tot% variance explained (1.0 = perfect)
Adjusted RยฒPenalizes # featuresFairer comparison across models

Code Example

linear_regression.py
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
import numpy as np

# Load your data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train OLS
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Evaluate
r2  = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Rยฒ: {r2:.3f}  RMSE: {rmse:.3f}")

# Coefficients
print("Intercept:", model.intercept_)
print("Weights:", model.coef_)

# With Ridge regularization
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

Notebook & Demo

โšก
Interactive Notebook: Linear Regression
Cheese quality prediction โ€” OLS, Ridge, Lasso, feature importance
First load: ~30โ€“60 seconds (Pyodide Python runtime downloads) ยท Work saves automatically
Open Notebook โ†’

Common Mistakes

โš  Watch-outs

  • Not scaling features when using regularized variants (Ridge, Lasso need StandardScaler)
  • Ignoring the residuals plot โ€” always check for patterns in errors
  • Using Rยฒ alone without checking absolute error magnitude
  • Assuming linearity without plotting feature vs. target scatter
  • Data leakage: fitting the scaler on all data before splitting

Project Idea

๐Ÿ’ก House Price Predictor

Build a Streamlit app that predicts property prices using area, location, and bedroom count. Train on a public dataset (Bengaluru Housing, Boston Housing). Add Ridge regularization and compare against OLS. Great for understanding feature importance and model transparency.

Browse Full Project Kits โ†’

Quiz

Test your understanding of Linear Regression with 10 questions. Pass 70% to mark this topic complete.

Take Quiz โ†’
002 ยท classification-engine

Classification

Predict which category a data point belongs to. Classification is the most common ML task in placement interviews and practical projects.

Plain-English Explanation

Instead of predicting a number (regression), classification predicts a label: spam or not spam, disease or healthy, fraud or legitimate. The model learns a decision boundary that separates classes. Logistic regression is the simplest classifier โ€” despite the name, it outputs a probability of belonging to a class.

When to Use

โœ… Use when:
  • Output is a category (yes/no, A/B/C)
  • You need probability estimates
  • Multi-class output is required
โŒ Avoid when:
  • Target is continuous (use regression)
  • Extreme class imbalance without handling
  • Very few labeled samples per class

Key Metrics

MetricFormulaWhen to prioritize
Accuracy(TP+TN)/totalBalanced classes
PrecisionTP/(TP+FP)Cost of false positives is high (spam filter)
RecallTP/(TP+FN)Cost of false negatives is high (cancer detection)
F1 Score2ยท(PยทR)/(P+R)Imbalanced classes, need balance of P and R
ROC-AUCArea under ROC curveThreshold-independent performance

Code Example

classification.py
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc  = scaler.transform(X_test)

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train_sc, y_train)

y_pred  = clf.predict(X_test_sc)
y_proba = clf.predict_proba(X_test_sc)[:, 1]

print(classification_report(y_test, y_pred))
print(f"ROC-AUC: {roc_auc_score(y_test, y_proba):.3f}")

Common Mistakes

โš  Watch-outs

  • Using accuracy alone on imbalanced datasets (use F1 or AUC instead)
  • Not applying SMOTE or class_weight when classes are skewed
  • Confusing precision and recall โ€” always ask which error is costlier
  • Choosing the threshold at 0.5 without evaluating the full ROC curve

Project Idea

๐Ÿ’ก Placement Predictor

Build a logistic regression model to predict whether a student will get placed based on CGPA, internships, projects, and branch. Add a confusion matrix visualization and deploy on Streamlit.

Interactive Notebook

โšก
Interactive Notebook: Classification
Manuscript authenticity โ€” Logistic Regression, ROC-AUC, confusion matrix
First load: ~30โ€“60 seconds (Pyodide Python runtime downloads) ยท Work saves automatically
Open Notebook โ†’

Quiz

Test your understanding of Classification with 10 questions. Pass 70% to mark this topic complete.

Take Quiz โ†’
003 ยท tree-based-learning

Tree-Based Learning

Decision trees, random forests, and their variants. One of the most interview-tested ML family of algorithms.

Plain-English Explanation

A decision tree splits data by asking yes/no questions about features. At each node, it picks the split that best separates the classes (using Gini or entropy). Random Forest builds many trees on random subsets of data and features, then averages their outputs โ€” reducing variance through ensemble averaging.

Key Metrics & Hyperparameters

ParameterEffectTypical range
max_depthControls tree depth, limits overfitting3โ€“15
n_estimators (RF)More trees = lower variance, more compute100โ€“500
min_samples_splitMin samples to allow a split2โ€“20
max_featuresFeatures considered per splitsqrt(n), log2(n)
criterionSplit quality measuregini, entropy

Code Example

random_forest.py
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

rf = RandomForestClassifier(n_estimators=200, max_depth=8, random_state=42)
rf.fit(X_train, y_train)

# Feature importance
fi = pd.Series(rf.feature_importances_, index=feature_names)
fi.sort_values(ascending=False).plot(kind='bar')

Common Mistakes

โš  Watch-outs

  • Single decision trees overfit badly โ€” always use ensembles (RF or Boosting)
  • Assuming feature importance from RF handles correlated features well (it does not)
  • Forgetting that RF can still overfit if max_depth is unconstrained

Interactive Notebook

โšก
Interactive Notebook: Tree-Based Learning
EV battery health โ€” Decision Tree, Random Forest, feature importance
First load: ~30โ€“60 seconds (Pyodide Python runtime downloads) ยท Work saves automatically
Open Notebook โ†’

Quiz

Test your understanding of Tree Based Learning with 10 questions. Pass 70% to mark this topic complete.

Take Quiz โ†’
004 ยท boosting-revolution

Boosting

XGBoost, LightGBM, and gradient boosting. The algorithm that wins Kaggle competitions. Sequentially trains weak learners, each correcting the previous one's errors.

Plain-English Explanation

Boosting trains trees sequentially. Each new tree focuses on the data points the previous ensemble got wrong. Gradient Boosting does this by fitting each tree to the residual errors. XGBoost and LightGBM add regularization, histogram-based splits, and speed optimizations on top of this idea.

Algorithm Comparison

AlgorithmSpeedMemoryBest for
Gradient Boosting (sklearn)SlowMediumSmall-medium datasets, baseline
XGBoostFastHigherStructured data, Kaggle
LightGBMFastestLowLarge datasets, categorical features
CatBoostFastMediumMany categorical features

Code Example

xgboost_example.py
import xgboost as xgb
from sklearn.model_selection import cross_val_score

model = xgb.XGBClassifier(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=6,
    subsample=0.8,
    colsample_bytree=0.8,
    use_label_encoder=False,
    eval_metric='logloss'
)

scores = cross_val_score(model, X, y, cv=5, scoring='roc_auc')
print(f"CV AUC: {scores.mean():.3f} ยฑ {scores.std():.3f}")

โš  Watch-outs

  • Low learning_rate requires more n_estimators โ€” balance them together
  • Early stopping on a validation set prevents overfitting and saves compute
  • XGBoost handles missing values natively โ€” don't impute blindly before using it

Interactive Notebook

โšก
Interactive Notebook: Boosting
HDD failure prediction โ€” XGBoost, LightGBM, early stopping
First load: ~30โ€“60 seconds (Pyodide Python runtime downloads) ยท Work saves automatically
Open Notebook โ†’

Quiz

Test your understanding of Boosting with 10 questions. Pass 70% to mark this topic complete.

Take Quiz โ†’
005 ยท support-vector-machines

Support Vector Machines

SVMs find the maximum-margin hyperplane that separates classes. They are powerful for high-dimensional data and work well when classes are clearly separable.

Plain-English Explanation

SVM tries to find the widest possible "street" (margin) between two classes. The data points closest to the margin are called support vectors. The kernel trick lets SVMs work in higher-dimensional spaces without explicitly computing those dimensions โ€” enabling non-linear decision boundaries.

Kernel Options

KernelWhen to use
LinearLinearly separable data, text classification
RBF (Radial Basis Function)Non-linear data, most common default
PolynomialPolynomial feature interactions
SigmoidNeural network-like, less common

Code Example

svm.py
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', SVC(kernel='rbf', C=1.0, gamma='scale', probability=True))
])
pipe.fit(X_train, y_train)

โš  Watch-outs

  • SVMs are slow on large datasets โ€” prefer XGBoost or Random Forest when n > 50k
  • Always scale features before SVM โ€” it is margin-based and very sensitive to scale
  • Hyperparameter C controls the trade-off between margin width and misclassifications

Interactive Notebook

โšก
Interactive Notebook: SVM
Drone motor fault detection โ€” Linear & RBF kernels, C parameter tuning
First load: ~30โ€“60 seconds (Pyodide Python runtime downloads) ยท Work saves automatically
Open Notebook โ†’

Quiz

Test your understanding of Svm with 10 questions. Pass 70% to mark this topic complete.

Take Quiz โ†’
006 ยท unsupervised-discovery

Clustering & Unsupervised Learning

Find natural groups in data without labels. Used in customer segmentation, document grouping, image compression, and anomaly detection preprocessing.

Algorithm Comparison

AlgorithmMust specify K?Handles noise?Best for
K-MeansYesNoCompact spherical clusters
DBSCANNoYesArbitrary shapes, outlier detection
HierarchicalNo (choose post)PartiallyDendrogram visualization
GMMYesSoftOverlapping clusters, soft assignment

Code Example

clustering.py
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# Elbow method to choose K
inertias = []
for k in range(2, 11):
    km = KMeans(n_clusters=k, random_state=42)
    km.fit(X)
    inertias.append(km.inertia_)

# Silhouette score for validation
labels = KMeans(n_clusters=4).fit_predict(X)
score  = silhouette_score(X, labels)
print(f"Silhouette: {score:.3f}")  # higher is better

Interactive Notebook

โšก
Interactive Notebook: Clustering
Smartwatch health segments โ€” K-Means, DBSCAN, silhouette score
First load: ~30โ€“60 seconds (Pyodide Python runtime downloads) ยท Work saves automatically
Open Notebook โ†’

Quiz

Test your understanding of Clustering with 10 questions. Pass 70% to mark this topic complete.

Take Quiz โ†’
007 ยท anomaly-detection-dimensionality

Anomaly Detection

Identify unusual data points โ€” fraud, system failures, manufacturing defects. Most anomaly problems are highly imbalanced.

Key Approaches

MethodHow it worksBest for
Isolation ForestAnomalies are easier to isolate in random treesTabular data, general use
Local Outlier FactorCompares density to k nearest neighborsCluster-based outliers
Z-ScoreFlags points beyond n standard deviationsGaussian, simple baselines
AutoencoderHigh reconstruction error = anomalyComplex patterns, images

Code Example

anomaly.py
from sklearn.ensemble import IsolationForest

iso = IsolationForest(contamination=0.05, random_state=42)
labels = iso.fit_predict(X)
# -1 = anomaly, 1 = normal
anomalies = X[labels == -1]

Interactive Notebook

โšก
Interactive Notebook: Anomaly Detection
Jet engine sensor anomalies โ€” Isolation Forest, LOF, Z-score
First load: ~30โ€“60 seconds (Pyodide Python runtime downloads) ยท Work saves automatically
Open Notebook โ†’

Quiz

Test your understanding of Anomaly Detection with 10 questions. Pass 70% to mark this topic complete.

Take Quiz โ†’
008 ยท naive-bayes-lda

Naive Bayes & LDA

Probabilistic classifiers grounded in Bayes' theorem. Fast, interpretable, and surprisingly effective for text classification and small datasets.

Plain-English Explanation

Naive Bayes applies Bayes' theorem with the "naive" assumption that all features are independent given the class. Despite this oversimplification, it works very well for text (bag-of-words). LDA (Linear Discriminant Analysis) finds the linear combination of features that best separates classes โ€” it also doubles as a dimensionality reduction tool.

Variants

VariantInput typeBest for
GaussianNBContinuous featuresReal-valued features
MultinomialNBCount featuresText, word frequency
BernoulliNBBinary featuresBinary feature vectors
LDAContinuousMulti-class + dimensionality reduction

Interactive Notebook

โšก
Interactive Notebook: Naive Bayes / LDA
Jira ticket classifier โ€” MultinomialNB, LDA dimensionality reduction
First load: ~30โ€“60 seconds (Pyodide Python runtime downloads) ยท Work saves automatically
Open Notebook โ†’

Quiz

Test your understanding of Naive Bayes Lda with 10 questions. Pass 70% to mark this topic complete.

Take Quiz โ†’
009 ยท time-series-forecasting

Time Series Forecasting

Predict future values based on historical sequences. Used in stock prices, sales forecasting, energy demand, and inventory planning.

Key Concepts

ConceptMeaning
StationarityMean and variance don't change over time. Required for ARIMA.
SeasonalityRepeating patterns at fixed intervals (weekly, monthly)
TrendLong-term upward or downward movement in the data
Autocorrelation (ACF)Correlation of the series with its own past values
Partial Autocorrelation (PACF)Direct correlation at each lag, removing intermediate lags

Algorithm Family

ModelBest for
ARIMAStationary or differenced series, no seasonal component
SARIMASeasonal patterns with ARIMA
Prophet (Meta)Business time series with holidays and multiple seasonalities
LSTM/GRULong sequences, complex non-linear patterns
XGBoost with lagsTabular approach to time series

Project Idea

๐Ÿ’ก Sales Forecast Dashboard

Build an Inventory Forecasting Dashboard using ARIMA and Prophet on retail sales data. Show actual vs. predicted with confidence intervals. Add Streamlit UI and a CSV upload feature. This maps directly to project-03 on this platform.

See Inventory Forecasting Kit โ†’

Interactive Notebook

โšก
Interactive Notebook: Time Series
Data centre power demand โ€” ARIMA, Prophet, lag features + GBM
First load: ~30โ€“60 seconds (Pyodide Python runtime downloads) ยท Work saves automatically
Open Notebook โ†’

Quiz

Test your understanding of Time Series with 10 questions. Pass 70% to mark this topic complete.

Take Quiz โ†’
010 ยท hyperparameter-optimization

Hyperparameter Optimization

Systematically search for the best model configuration. Moving from manual tuning to automated, principled search.

Search Strategies

StrategyHow it worksSpeedQuality
Grid SearchTry every combination in a defined gridSlowExhaustive
Random SearchSample random combinations from distributionsFasterOften better than grid
Bayesian OptimizationBuilds a surrogate model of the objectiveEfficientBest for expensive models
Optuna / HyperoptTree-structured Parzen Estimator (TPE)Very efficientState of the art

Code Example โ€” Optuna

optuna_search.py
import optuna

def objective(trial):
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 100, 500),
        'max_depth':    trial.suggest_int('max_depth', 3, 12),
        'learning_rate': trial.suggest_float('lr', 0.01, 0.3, log=True),
    }
    model = xgb.XGBClassifier(**params)
    score = cross_val_score(model, X, y, cv=3, scoring='roc_auc').mean()
    return score

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)
print(study.best_params)

Interactive Notebook

โšก
Interactive Notebook: Hyperparameter Opt.
3D printer quality โ€” Grid Search, Optuna Bayesian optimisation
First load: ~30โ€“60 seconds (Pyodide Python runtime downloads) ยท Work saves automatically
Open Notebook โ†’

Quiz

Test your understanding of Hyperparameter Optimization with 10 questions. Pass 70% to mark this topic complete.

Take Quiz โ†’
๐ŸŽ“

Machine Learning Complete!

Pass all topic quizzes, then claim your course certificate.

Browse Courses โ†’