track 01
EDA with Pandas
Systematic exploration before modelling. Understand shape, types, missing values, distributions, correlations, and outliers.
The EDA Checklist
| Step | Code | What you learn |
|---|---|---|
| Shape | df.shape | Rows, columns |
| Dtypes | df.dtypes | Numeric, categorical, datetime |
| Missing | df.isnull().sum() | Count + pattern of nulls |
| Stats | df.describe() | Min, max, mean, std, quartiles |
| Unique vals | df['col'].value_counts() | Category frequencies |
| Correlations | df.corr() | Linear relationships |
| Outliers | IQR / Z-score | Extreme values |
Outlier Detection
outliers.py
# IQR method
Q1, Q3 = df['col'].quantile([0.25, 0.75])
IQR = Q3 - Q1
lower, upper = Q1 - 1.5*IQR, Q3 + 1.5*IQR
outliers = df[(df['col'] < lower) | (df['col'] > upper)]
# Z-score method
z = (df['col'] - df['col'].mean()) / df['col'].std()
outliers = df[z.abs() > 3]Interactive Notebook
Notebook: EDA with Pandas
shape, dtypes, missing values, distributions, correlations, outlier detection
First load ~30-60s ยท Saves automatically
track 02
Matplotlib Deep Dive
Full control over every visual element. Publication-quality charts with the object-oriented API.
Object-Oriented API
matplotlib_oo.py
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(x, y, color='#6b21a8', lw=2, label='data')
ax.set_title('Title', fontweight='bold')
ax.set_xlabel('X Label')
ax.legend(); ax.grid(alpha=0.3)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout(); plt.show()Subplots with Gridspec
gridspec.py
import matplotlib.gridspec as gridspec
fig = plt.figure(figsize=(12, 6))
gs = gridspec.GridSpec(2, 3)
ax1 = fig.add_subplot(gs[:, 0:2]) # spans both rows
ax2 = fig.add_subplot(gs[0, 2]) # top-right
ax3 = fig.add_subplot(gs[1, 2]) # bottom-rightInteractive Notebook
Notebook: Matplotlib Deep Dive
line charts, subplots, gridspec, annotations, custom formatters
First load ~30-60s ยท Saves automatically
track 03
Seaborn Statistical Charts
Built-in statistical summaries, beautiful defaults, and FacetGrid for multi-panel plots.
Key Seaborn Plots
| Function | Use | Key params |
|---|---|---|
histplot() | Distribution + optional KDE | kde=True, hue= |
boxplot() | Quartiles + outliers by group | x=, y=, hue= |
violinplot() | Full distribution shape | inner='quartile' |
regplot() | Scatter + regression line + CI | line_kws=, ci= |
heatmap() | Matrix of values | annot=True, fmt='.2f' |
FacetGrid() | Multi-panel by category | col=, row=, hue= |
Interactive Notebook
Notebook: Seaborn Statistical
histplot, boxplot, violin, regplot, heatmap, FacetGrid
First load ~30-60s ยท Saves automatically
track 04
Plotly Interactive Charts
Hover, zoom, filter, animate. Charts that respond to the user in the browser.
Plotly Express Quick Reference
| Function | Chart type |
|---|---|
px.scatter() | Scatter with optional colour, size, animation |
px.bar() | Bar chart, supports animation_frame |
px.line() | Line chart |
px.histogram() | Histogram |
px.box() | Box plot |
px.choropleth() | World map coloured by value |
px.treemap() | Hierarchical treemap |
Save Interactive Chart
save.py
fig.write_html('chart.html') # share as file
fig.write_image('chart.png') # static PNG
fig.write_image('chart.pdf') # PDF for reportsInteractive Notebook
Notebook: Plotly Interactive
scatter, animated bar, multi-panel subplots, hover, choropleth
First load ~30-60s ยท Saves automatically
track 05
Real-World EDA Project
End-to-end analysis from raw data to actionable business recommendations.
EDA Project Structure
| Phase | Output |
|---|---|
| 1. Business question | What decision will this analysis inform? |
| 2. Data understanding | Shape, dtypes, missing, quality issues |
| 3. Univariate analysis | Distribution of each variable |
| 4. Bivariate analysis | Relationships between pairs of variables |
| 5. Multivariate | Group comparisons, correlations heatmap |
| 6. Insights | Patterns in plain English with numbers |
| 7. Recommendations | Specific, prioritised, actionable next steps |
Interactive Notebook
Notebook: Real-World EDA Project
full EDA pipeline from raw data to actionable insights and recommendations
First load ~30-60s ยท Saves automatically
track 06
Storytelling with Data
Charts exist to communicate insights, not display data. Every design choice should serve the message.
The 5 Principles
| Principle | In practice |
|---|---|
| Right chart type | Bar for comparison, line for trend, scatter for correlation |
| Remove chart junk | No 3D, no heavy borders, minimal gridlines |
| Pre-attentive attrs | Use colour to highlight ONE thing only |
| Direct labelling | Label lines directly instead of using a legend |
| One story per chart | Split complex charts into focused panels |
Title Examples
| Bad | Good |
|---|---|
| Monthly Revenue | Revenue peaked in June -- H1 target exceeded by 9% |
| Student Scores | Students studying less than 3 hrs/week fail 42% more often |
| Sales by Region | South drives 38% of revenue but only 22% of headcount |
Interactive Notebook
Notebook: Storytelling with Data
bad vs good charts, chart selection guide, IBCS colour rules
First load ~30-60s ยท Saves automatically
track 07
Tableau / Power BI Basics
BI tools for non-technical stakeholders. When to use Python vs when to use a BI tool.
Python vs BI Tools
| Use Python when | Use BI when |
|---|---|
| Custom statistical analysis | Executives need self-service |
| Data cleaning and wrangling | Connecting live databases |
| ML model building | Scheduled refresh dashboards |
| Complex custom charts | Simple interactive filters |
| Reproducible pipelines | Non-technical users |
Tableau vs Power BI
| Feature | Tableau | Power BI |
|---|---|---|
| Formula language | Tableau calc fields | DAX (Excel-like) |
| Best ecosystem | Any data source | Microsoft / Azure |
| Visualisation | Richer, more flexible | Good, improving fast |
| Price | Higher ($70/user/mo) | Lower ($10/user/mo) |
| Free tier | Tableau Public | Power BI Desktop |
Interactive Notebook
Notebook: Tableau / Power BI
Concepts and when to use BI tools (notebook focuses on Python EDA principles)
First load ~30-60s ยท Saves automatically