← All Cheatsheets

ML Models at a Glance

When to use · Key params · Metrics · sklearn code
mitraaiprojects.com

Regression Models

ModelWhen to UseKey Param
Linear RegLinear relationships, interpretability
Ridge (L2)Multicollinearity, many featuresalpha
Lasso (L1)Feature selection, sparse dataalpha
ElasticNetCorrelated + irrelevant featuresalpha, l1_ratio
PolynomialCurved relationships (add features)degree
SVRSmall-medium, non-linearC, kernel, eps
Random ForestNon-linear, robust to outliersn_estimators
XGBoostTabular, best accuracylr, n_est, depth

Regression Metrics

MetricFormulaBest for
MAEmean(|y-ŷ|)Outlier-robust
MSEmean((y-ŷ)²)Penalises large errors
RMSE√MSESame units as target
1−SS_res/SS_totExplained variance
MAPEmean(|y-ŷ|/|y|)Scale-free %

Classification Models

ModelStrengthsWeaknessesKey Params
Logistic RegFast, interpretable, calibratedLinear boundary onlyC, max_iter
Decision TreeInterpretable, no scalingOverfits easilymax_depth, min_samples
Random ForestRobust, handles missing, feature impSlow, black boxn_estimators, max_depth
XGBoostBest tabular accuracy, regularisedMany paramslearning_rate, n_estimators, max_depth
LightGBMFastest boosting, big dataOverfits small datanum_leaves, min_data
SVMGreat for small data, non-linear via kernelSlow on large dataC, kernel, gamma
KNNSimple, no training, non-linearSlow prediction, scaling neededn_neighbors, metric
Naive BayesFast, text, small dataFeature independence assumptionvar_smoothing
MLPComplex patterns, flexibleBlack box, slowhidden_layers, lr

Classification Metrics

MetricFormulaUse when
Accuracycorrect/totalBalanced classes
PrecisionTP/(TP+FP)FP costly (spam)
RecallTP/(TP+FN)FN costly (cancer)
F12*P*R/(P+R)Imbalanced
AUC-ROCArea under ROCRanking quality
PR-AUCArea under P-RHighly imbalanced
MCCBalanced metricVery imbalanced

Unsupervised Models

ModelUseKey Params
K-MeansSpherical clustersn_clusters
DBSCANArbitrary shape, outlierseps, min_samples
HierarchicalDendrogram, variable Kn_clusters, linkage
PCADimensionality reductionn_components
t-SNEVisualisation onlyperplexity
Iso. ForestAnomaly detectioncontamination
LOFLocal density anomalyn_neighbors

sklearn Cheat Sheet

from sklearn.XX import Model
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report

# Pipeline (prevents leakage!)
from sklearn.pipeline import Pipeline
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('model', Model(**params))
])
pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)

# Cross-validation
cv_scores = cross_val_score(pipe, X, y, cv=5, scoring='roc_auc')
print(f"AUC: {cv_scores.mean():.3f} ± {cv_scores.std():.3f}")

Hyperparameter Tuning

from sklearn.model_selection import GridSearchCV
import optuna

# Grid Search
grid = GridSearchCV(model, {'C':[0.1,1,10]}, cv=5)
grid.fit(X_train, y_train)
print(grid.best_params_, grid.best_score_)

# Optuna (Bayesian)
def objective(trial):
    C = trial.suggest_float('C',0.01,10,log=True)
    model = SVC(C=C)
    return cross_val_score(model,X,y,cv=3).mean()

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)
print(study.best_params)

Feature Engineering Quick Ref

TransformCode
Lognp.log1p(df["col"])
Sqrtnp.sqrt(df["col"])
Box-Coxstats.boxcox(df["col"])
Binpd.cut(df["col"],5)
Interactiondf["a"]*df["b"]
PolynomialPolynomialFeatures(2)
TF-IDFTfidfVectorizer()
Label encLabelEncoder()
One-hotpd.get_dummies()