← All Cheatsheets

AI/ML Interview Prep — Quick Reference

Concepts · Formulas · Trade-offs · One-liners to memorise
mitraaiprojects.com

Must-Know Definitions (1 line each)

TermOne-liner
OverfittingModel memorises training data, fails on new data
UnderfittingModel too simple, misses patterns in training data
BiasError from wrong assumptions — systematic misfit
VarianceError from sensitivity to training noise
Cross-entropy−Σ y·log(p) — measures distribution divergence
Gradient descentIteratively move weights in negative gradient direction
BackpropagationChain rule applied to compute gradients in neural nets
Attentionsoftmax(QK^T/√d)·V — weighted sum over values
EmbeddingDense vector representing token/item semantically
Transfer learningReuse pretrained features for new related task
RAGRetrieve relevant docs → inject as context → generate answer
Fine-tuningContinue training pretrained model on task-specific data
LoRAAdd trainable low-rank matrices to frozen pretrained weights
RLHFSFT → reward model on human prefs → PPO to optimise
HallucinationLLM generates confident but factually wrong information

Key Formulas

FormulaName
TP/(TP+FP)Precision
TP/(TP+FN)Recall
2PR/(P+R)F1 Score
1−SS_res/SS_totR² Score
mean((y−ŷ)²)MSE
softmax(QK^T/√d)·VAttention
−Σ y·log(p)Cross-entropy
w += α·∇wGradient descent
W+B·ALoRA update
W·|x|+(1−W)·x²ElasticNet

Algorithm Complexity

AlgorithmTrainPredict
Linear RegressionO(nd²)O(d)
KNNO(1)O(nd)
Decision TreeO(n·d·log n)O(depth)
Random ForestO(T·n·d·log n)O(T·depth)
SVMO(n²–n³)O(sv·d)
Self-attentionO(n²·d)O(n²·d)

Common Trade-offs to Know

Choice AvsChoice BWhen to choose B
AccuracyvsInterpretabilityRegulated domains (medical, legal)
PrecisionvsRecallFN costly (cancer screening)
Bias (simple)vsVariance (complex)More data → can increase complexity
Fine-tuningvsRAGKnowledge changes frequently
LSTMvsTransformerAlways Transformer (if compute allows)
RidgevsLassoNeed feature selection
GPU trainingvsCPU inferenceSmall model → CPU saves cost
Deep modelvsEnsembleStructured/tabular → tree ensemble

Evaluation Metrics Quick Ref

MetricTaskKey insight
AccuracyClassificationMisleading for imbalanced classes
AUC-ROCBinary clfThreshold-independent ranking quality
PR-AUCImbalancedBetter than ROC when positives are rare
BLEUTranslationn-gram precision, brevity penalised
ROUGE-LSummarisationLongest common subsequence recall
MAPEForecastingScale-free %, bad when actuals ≈ 0
SilhouetteClustering1=perfect, 0=overlapping, −1=wrong
PerplexityLLMLower = model is more confident

Viva-Killer Questions with Answers

QAnswer keyword
Why scale before SVM?SVM uses Euclidean distance — large features dominate the margin
Random Forest vs XGBoost?RF: parallel bagging, variance reduction. XGB: sequential boosting, bias reduction
Why use log loss not MSE for classification?MSE doesn't penalise confident wrong predictions enough; log loss heavily penalises them
Why ADAM not SGD always?SGD often better final accuracy for CNNs; Adam converges faster, better for NLP
Vanishing gradient fix?ReLU (no saturation), ResNet skip connections, LSTM gates, BatchNorm
What is k in k-fold?Number of equal-sized folds. k=5 or 10 is standard. Each fold serves as test set once.
Why use attention instead of RNN?Parallel (not sequential), captures long-range dependencies equally, no vanishing gradient
What is data leakage?Future/test information leaks into training — e.g. scaling on full data before split

Coding Interview Patterns (ML)

Linear Regression:
w -= lr * (2/m) * X.T @ (X@w - y)

Softmax:
e = np.exp(x - x.max())
return e / e.sum()

K-Means:
labels = np.argmin(dist(X,centroids),axis=1)
centroids = [X[labels==k].mean(0) for k in range(K)]

Cosine similarity:
np.dot(a,b) / (np.linalg.norm(a)*np.linalg.norm(b))

Precision/Recall:
precision = TP / (TP + FP)
recall    = TP / (TP + FN)
f1 = 2*precision*recall / (precision+recall)

RMSE:
np.sqrt(np.mean((y_true - y_pred)**2))