Appearance
🛠 Workshop — Tạo Bộ Chart Chuyên Nghiệp
Từ EDA notebook Buổi 9 → tạo 5 publication-quality charts (bar, line, scatter, heatmap, box) → customize colors, annotations, labels → ghép multi-panel figure → export PNG + SVG. Tất cả trong Jupyter Notebook!
🎯 Mục tiêu workshop
Sau khi hoàn thành workshop này, bạn sẽ:
- Tạo 5 chart khác loại — bar, line, scatter, heatmap, box plot — với data từ EDA Buổi 9
- Customize chuyên nghiệp — professional color palette, annotations, labels, Tufte-style declutter
- Tạo multi-panel figure — 2×2 hoặc 2×3 dashboard tổng hợp trên 1 trang
- Export chất lượng cao — PNG (300 dpi) + SVG (vector) cho report & presentation
- Áp dụng IBCS & Accessibility — consistent notation, colorblind-safe palette
🧰 Yêu cầu
| Yêu cầu | Chi tiết |
|---|---|
| Kiến thức | Đã hoàn thành Buổi 9 (EDA) + Buổi 10 lý thuyết (Matplotlib & Seaborn) |
| Công cụ | Jupyter Notebook (local) HOẶC Google Colab (online) |
| Python | Python 3.8+ |
| Thư viện | pandas, numpy, matplotlib, seaborn (Colab đã có sẵn) |
| Input | Dataset HR Employee từ Workshop Buổi 9 (hoặc tạo lại bên dưới) |
| Thời gian | 75–100 phút |
💡 Naming convention
Đặt tên notebook: HoTen_Buoi10_Visualization.ipynb Chia notebook thành Markdown sections rõ ràng — tuân thủ Reproducible Analysis từ Buổi 9!
📦 Dataset: HR Employee Analytics (từ Buổi 9)
Sử dụng dataset Buổi 9
Nếu bạn đã hoàn thành Workshop Buổi 9, sử dụng lại dataset đó. Nếu chưa, copy đoạn code sau vào Cell 1 để tạo dataset mới:
python
# Cell 1: Setup — Libraries + Dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import FuncFormatter
import warnings
warnings.filterwarnings("ignore")
# === CONFIG VISUALIZATION ===
plt.rcParams["figure.figsize"] = (12, 7)
plt.rcParams["font.size"] = 12
plt.rcParams["axes.titlesize"] = 14
plt.rcParams["axes.labelsize"] = 12
plt.rcParams["figure.dpi"] = 100
sns.set_style("whitegrid")
sns.set_palette("colorblind") # Accessibility: colorblind-safe
pd.set_option("display.max_columns", None)
pd.set_option("display.float_format", "{:,.2f}".format)
# === PROFESSIONAL COLOR PALETTE ===
# Colorblind-safe palette (Wong 2011 / IBM Design)
COLORS = {
"primary": "#648FFF", # Blue
"secondary": "#DC267F", # Magenta
"accent1": "#FE6100", # Orange
"accent2": "#FFB000", # Gold
"accent3": "#785EF0", # Violet
"positive": "#2E7D32", # Green (IBCS: positive variance)
"negative": "#C62828", # Red (IBCS: negative variance)
"neutral": "#333333", # Dark gray
"light": "#B0B0B0", # Light gray
}
PALETTE_5 = [COLORS["primary"], COLORS["secondary"],
COLORS["accent1"], COLORS["accent2"], COLORS["accent3"]]
# === TẠO DATASET HR EMPLOYEES (1,500 nhân viên) ===
np.random.seed(42)
n = 1500
departments = ["Engineering", "Marketing", "Sales", "HR", "Finance"]
dept_weights = [0.35, 0.20, 0.25, 0.10, 0.10]
positions = ["Junior", "Mid", "Senior", "Lead", "Manager"]
data = []
for i in range(n):
dept = np.random.choice(departments, p=dept_weights)
exp = np.random.randint(0, 26)
age = max(22, min(58, exp + np.random.randint(22, 28)))
if exp <= 2: pos = "Junior"
elif exp <= 5: pos = "Mid"
elif exp <= 10: pos = "Senior"
elif exp <= 15: pos = "Lead"
else: pos = "Manager"
base_salary = {
"Junior": 10, "Mid": 16, "Senior": 25, "Lead": 35, "Manager": 45
}[pos]
dept_bonus = {
"Engineering": 1.15, "Finance": 1.10, "Marketing": 1.0,
"Sales": 0.95, "HR": 0.90
}[dept]
salary = base_salary * dept_bonus * (1 + np.random.normal(0, 0.15))
salary = max(8, round(salary, 1))
perf = round(np.clip(np.random.normal(3.5, 0.8), 1, 5), 1)
satisfaction = round(np.clip(5.5 - perf * 0.5 + np.random.normal(0, 0.6), 1, 5), 1)
tenure = round(min(exp, max(0.5, np.random.exponential(4))), 1)
training = int(np.clip(np.random.normal(40, 20), 0, 120))
attrition_prob = 0.15
if satisfaction < 2.5: attrition_prob += 0.25
if perf > 4.0 and satisfaction < 3.0: attrition_prob += 0.15
if exp > 5 and salary < 15: attrition_prob += 0.20
attrition = "Yes" if np.random.random() < attrition_prob else "No"
data.append({
"employee_id": f"E{i+1:04d}",
"department": dept,
"position": pos,
"age": age,
"gender": np.random.choice(["Male", "Female"], p=[0.6, 0.4]),
"salary": salary,
"experience_years": exp,
"tenure_years": tenure,
"performance_score": perf,
"satisfaction_score": satisfaction,
"training_hours": training,
"attrition": attrition,
})
df = pd.DataFrame(data)
# Monthly revenue data (for line chart)
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
monthly_revenue = [25, 28, 32, 30, 35, 38, 36, 40, 42, 46, 38, 58]
print(f"✅ Dataset loaded: {df.shape[0]:,} employees × {df.shape[1]} columns")
print(f"✅ Color palette: {len(PALETTE_5)} colorblind-safe colors")
print(f"✅ Monthly revenue: {len(months)} months")
print(f"\n📋 Departments: {df['department'].value_counts().to_dict()}")
print(f"📋 Attrition: {df['attrition'].value_counts().to_dict()}")Phần 1: Setup & Data Overview
Verify dataset, define helper functions, set professional defaults
Bước 1.1: Helper Functions
python
# Cell 2: Helper functions cho professional charts
def format_vnd(x, _):
"""Format number as VND (triệu)"""
return f"{x:.0f}M"
def format_billion(x, _):
"""Format number as Billion VND"""
return f"{x:.0f}B"
def save_chart(fig, filename, formats=("png", "svg")):
"""Save chart in multiple formats — high quality"""
for fmt in formats:
filepath = f"charts/{filename}.{fmt}"
fig.savefig(filepath, dpi=300, bbox_inches="tight",
facecolor="white", edgecolor="none")
print(f" 💾 Saved: {filepath}")
def add_data_labels(ax, bars, fmt="{:.0f}", offset=0.5, fontsize=10):
"""Add data labels on top of bar chart"""
for bar in bars:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width()/2., height + offset,
fmt.format(height), ha="center", va="bottom",
fontsize=fontsize, fontweight="bold")
def tufte_style(ax, keep_left=True):
"""Apply Tufte-style declutter to axes"""
ax.spines[["top", "right"]].set_visible(False)
if not keep_left:
ax.spines["left"].set_visible(False)
ax.tick_params(left=False)
print("✅ Helper functions defined: format_vnd, format_billion, save_chart, add_data_labels, tufte_style")Bước 1.2: Tạo thư mục output
python
# Cell 3: Tạo thư mục charts/
import os
os.makedirs("charts", exist_ok=True)
print("✅ Created directory: charts/")Phần 2: Chart 1 — Bar Chart (Comparison)
Mục đích: So sánh headcount và average salary giữa 5 phòng ban. Áp dụng IBCS notation + Tufte declutter.
Bước 2.1: Bar Chart — Headcount by Department
python
# Cell 4: Chart 1 — Bar Chart: Headcount + Avg Salary by Department
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
# --- Panel A: Headcount by Department ---
dept_counts = df["department"].value_counts().sort_values(ascending=True)
bars1 = ax1.barh(dept_counts.index, dept_counts.values,
color=PALETTE_5[:len(dept_counts)], edgecolor="white", linewidth=0.5)
# Data labels
for bar, val in zip(bars1, dept_counts.values):
ax1.text(val + 5, bar.get_y() + bar.get_height()/2,
f"{val} ({val/len(df)*100:.0f}%)",
va="center", fontsize=11, fontweight="bold")
ax1.set_title("Headcount by Department", fontsize=14, fontweight="bold", loc="left")
ax1.set_xlabel("Number of Employees")
tufte_style(ax1)
ax1.set_xlim(0, max(dept_counts.values) * 1.25)
# --- Panel B: Average Salary by Department ---
dept_salary = df.groupby("department")["salary"].agg(["mean", "median"]).sort_values("mean")
x = np.arange(len(dept_salary))
width = 0.35
# IBCS-style: Mean = solid, Median = outline
bars_mean = ax2.barh(x + width/2, dept_salary["mean"], width,
color=COLORS["neutral"], label="Mean")
bars_median = ax2.barh(x - width/2, dept_salary["median"], width,
facecolor="white", edgecolor=COLORS["neutral"],
linewidth=1.5, label="Median")
ax2.set_yticks(x)
ax2.set_yticklabels(dept_salary.index)
ax2.set_title("Avg Salary by Department — Mean vs Median",
fontsize=14, fontweight="bold", loc="left")
ax2.set_xlabel("Salary (Million VND)")
ax2.xaxis.set_major_formatter(FuncFormatter(format_vnd))
ax2.legend(frameon=False, loc="lower right")
tufte_style(ax2)
plt.suptitle("📊 Chart 1: Department Overview", fontsize=16, y=1.02, fontweight="bold")
plt.tight_layout()
save_chart(fig, "01_bar_department")
plt.show()Markdown interpretation (Cell 5):
python
# Cell 5: Markdown — Interpretation Chart 1Thêm Markdown cell sau Chart 1:
Interpretation Chart 1:
- Engineering chiếm headcount lớn nhất (~35%), phù hợp với công ty tech
- Engineering cũng có avg salary cao nhất — mean > median → right-skewed (có senior/lead lương cao kéo mean)
- HR có mean ≈ median → phân phối symmetric hơn
- Gap mean-median lớn nhất ở Engineering & Sales → bất bình đẳng lương cần investigate
Phần 3: Chart 2 — Line Chart (Trend)
Mục đích: Thể hiện monthly revenue trend, phát hiện anomaly tháng 11, annotation highlight.
Bước 3.1: Line Chart — Monthly Revenue Trend
python
# Cell 6: Chart 2 — Line Chart: Monthly Revenue Trend
fig, ax = plt.subplots(figsize=(14, 6))
# Main line
ax.plot(months, monthly_revenue, color=COLORS["primary"],
linewidth=2.5, marker="o", markersize=8, markerfacecolor="white",
markeredgecolor=COLORS["primary"], markeredgewidth=2, zorder=3)
# Fill area under line
ax.fill_between(months, monthly_revenue, alpha=0.1, color=COLORS["primary"])
# Highlight anomaly — tháng 11
ax.plot("Nov", 38, "o", markersize=14, markerfacecolor=COLORS["negative"],
markeredgecolor="white", markeredgewidth=2, zorder=4)
ax.annotate("⚠️ Anomaly: -18%\nvs Oct (46B → 38B)",
xy=(10, 38), xytext=(7.5, 30),
fontsize=10, fontweight="bold", color=COLORS["negative"],
arrowprops=dict(arrowstyle="->", color=COLORS["negative"], lw=1.5),
bbox=dict(boxstyle="round,pad=0.3", facecolor="#FFEBEE", edgecolor=COLORS["negative"]))
# Highlight peak — tháng 12
ax.plot("Dec", 58, "o", markersize=14, markerfacecolor=COLORS["positive"],
markeredgecolor="white", markeredgewidth=2, zorder=4)
ax.annotate("📈 Peak: 58B\n+53% recovery",
xy=(11, 58), xytext=(9, 62),
fontsize=10, fontweight="bold", color=COLORS["positive"],
arrowprops=dict(arrowstyle="->", color=COLORS["positive"], lw=1.5),
bbox=dict(boxstyle="round,pad=0.3", facecolor="#E8F5E9", edgecolor=COLORS["positive"]))
# Data labels for all points
for i, (m, r) in enumerate(zip(months, monthly_revenue)):
if m not in ["Nov", "Dec"]: # Don't double-label annotated points
ax.text(i, r + 1.5, f"{r}B", ha="center", fontsize=9, color=COLORS["neutral"])
# Trend line (linear regression)
x_numeric = np.arange(len(months))
z = np.polyfit(x_numeric, monthly_revenue, 1)
p = np.poly1d(z)
ax.plot(months, p(x_numeric), "--", color=COLORS["light"],
linewidth=1.5, label=f"Trend: +{z[0]:.1f}B/month")
# Styling
ax.set_title("Monthly Revenue Trend — FY2025", fontsize=14, fontweight="bold", loc="left")
ax.set_xlabel("Month")
ax.set_ylabel("Revenue (Billion VND)")
ax.yaxis.set_major_formatter(FuncFormatter(format_billion))
ax.legend(frameon=False, loc="upper left")
tufte_style(ax)
ax.set_ylim(15, 70)
# Source footnote
ax.text(0, -0.12, "Source: Finance Dept | Note: Nov dip under investigation (seasonal or one-time event)",
transform=ax.transAxes, fontsize=8, color="gray", style="italic")
plt.tight_layout()
save_chart(fig, "02_line_revenue_trend")
plt.show()Markdown interpretation (Cell 7):
Interpretation Chart 2:
- Doanh thu upward trend ổn định ~+2.5B/tháng từ Jan đến Oct
- Anomaly tháng 11: giảm 18% (46B → 38B) — phá vỡ trend. Cần investigate: seasonal dip hay one-time event?
- Tháng 12 recover mạnh +53% → anomaly T11 có thể là temporary
- YoY growth mạnh: Jan 25B → Dec 58B (+132%)
Phần 4: Chart 3 — Scatter Plot (Relationship)
Mục đích: Thể hiện mối quan hệ salary vs experience, color by department, highlight underpaid group.
Bước 4.1: Scatter Plot — Salary vs Experience
python
# Cell 8: Chart 3 — Scatter Plot: Salary vs Experience
fig, ax = plt.subplots(figsize=(14, 8))
# Scatter plot — color by department
for i, dept in enumerate(departments):
subset = df[df["department"] == dept]
ax.scatter(subset["experience_years"], subset["salary"],
c=PALETTE_5[i], label=dept, alpha=0.5, s=30,
edgecolor="white", linewidth=0.3)
# Trend line (all data)
x_all = df["experience_years"].values
y_all = df["salary"].values
z = np.polyfit(x_all, y_all, 1)
p = np.poly1d(z)
x_line = np.linspace(0, 25, 100)
ax.plot(x_line, p(x_line), "--", color=COLORS["negative"],
linewidth=2, label=f"Trend: salary ≈ {z[1]:.1f} + {z[0]:.1f} × exp")
# Correlation annotation
corr = df["experience_years"].corr(df["salary"])
ax.text(0.02, 0.95, f"r = {corr:.2f} (strong positive)",
transform=ax.transAxes, fontsize=11, fontweight="bold",
bbox=dict(boxstyle="round", facecolor="lightyellow", edgecolor="orange"))
# Highlight underpaid group — experience > 8 but salary < 15
underpaid = df[(df["experience_years"] >= 8) & (df["salary"] < 15)]
if len(underpaid) > 0:
ax.scatter(underpaid["experience_years"], underpaid["salary"],
facecolor="none", edgecolor=COLORS["negative"],
s=100, linewidth=2, zorder=5, label=f"Underpaid ({len(underpaid)} NV)")
ax.annotate(f"⚠️ {len(underpaid)} underpaid employees\nExp ≥ 8yr but salary < 15M",
xy=(10, 12), xytext=(16, 8),
fontsize=10, fontweight="bold", color=COLORS["negative"],
arrowprops=dict(arrowstyle="->", color=COLORS["negative"], lw=1.5),
bbox=dict(boxstyle="round,pad=0.3", facecolor="#FFEBEE",
edgecolor=COLORS["negative"]))
# Styling
ax.set_title("Salary vs Experience — by Department",
fontsize=14, fontweight="bold", loc="left")
ax.set_xlabel("Experience (years)")
ax.set_ylabel("Salary (Million VND)")
ax.yaxis.set_major_formatter(FuncFormatter(format_vnd))
ax.legend(frameon=True, facecolor="white", edgecolor="lightgray",
loc="upper left", fontsize=9)
tufte_style(ax)
plt.tight_layout()
save_chart(fig, "03_scatter_salary_experience")
plt.show()Markdown interpretation (Cell 9):
Interpretation Chart 3:
- Strong positive correlation (r ≈ 0.72) — experience tăng → salary tăng
- Engineering (blue) cluster ở trên trend line → lương cao hơn trung bình
- Underpaid group detected: nhân viên experience ≥ 8 năm nhưng salary < 15M — nằm xa dưới trend line → HR cần review
- C-level outliers (salary > 50M) là hợp lý theo position
Phần 5: Chart 4 — Heatmap (Correlation)
Mục đích: Correlation matrix của tất cả biến numeric, masked triangle, annoted values.
Bước 5.1: Heatmap — Correlation Matrix
python
# Cell 10: Chart 4 — Heatmap: Correlation Matrix
fig, ax = plt.subplots(figsize=(10, 8))
# Select numeric columns
numeric_cols = ["salary", "age", "experience_years", "tenure_years",
"performance_score", "satisfaction_score", "training_hours"]
corr_matrix = df[numeric_cols].corr()
# Mask upper triangle (Tufte: remove redundant data-ink)
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
# Heatmap
hm = sns.heatmap(corr_matrix, mask=mask, annot=True, fmt=".2f",
cmap="RdBu_r", center=0, vmin=-1, vmax=1,
linewidths=0.5, linecolor="white",
square=True, ax=ax,
cbar_kws={"shrink": 0.8, "label": "Correlation Coefficient"},
annot_kws={"fontsize": 11, "fontweight": "bold"})
# Highlight strong correlations
for i in range(len(corr_matrix)):
for j in range(i):
val = corr_matrix.iloc[i, j]
if abs(val) > 0.7:
ax.add_patch(plt.Rectangle((j, i), 1, 1, fill=False,
edgecolor="gold", linewidth=3))
ax.set_title("Correlation Matrix — HR Employee Variables\n"
"🔲 Gold border = |r| > 0.7 (strong correlation)",
fontsize=14, fontweight="bold", loc="left")
# Rename labels for readability
short_labels = ["Salary", "Age", "Experience", "Tenure",
"Performance", "Satisfaction", "Training"]
ax.set_xticklabels(short_labels, rotation=45, ha="right")
ax.set_yticklabels(short_labels, rotation=0)
plt.tight_layout()
save_chart(fig, "04_heatmap_correlation")
plt.show()Markdown interpretation (Cell 11):
Interpretation Chart 4:
- Strong positive: experience ↔ salary (r ≈ 0.72), age ↔ experience (r ≈ 0.89) — highlighted gold
- Multicollinearity warning: age ↔ experience = 0.89 → nếu build model, chọn 1 trong 2
- Negative correlation: performance ↔ satisfaction (r ≈ -0.42) → high performers có thể đang burnout
- Weak: training_hours hầu như không correlate với salary hay performance → training program cần review effectiveness
Phần 6: Chart 5 — Box Plot (Distribution Comparison)
Mục đích: So sánh distribution salary giữa departments, phát hiện spread, outliers, pay equity.
Bước 6.1: Box Plot — Salary by Department
python
# Cell 12: Chart 5 — Box Plot: Salary Distribution by Department + Violin
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))
# --- Panel A: Box Plot ---
dept_order = df.groupby("department")["salary"].median().sort_values().index.tolist()
bp = sns.boxplot(data=df, x="department", y="salary", order=dept_order,
palette=PALETTE_5, ax=ax1, width=0.6, linewidth=1.2,
flierprops=dict(marker="o", markerfacecolor=COLORS["negative"],
markeredgecolor="white", markersize=6))
# Add median labels
medians = df.groupby("department")["salary"].median().reindex(dept_order)
for i, median in enumerate(medians):
ax1.text(i, median + 0.5, f"{median:.1f}M",
ha="center", fontsize=10, fontweight="bold", color="white",
bbox=dict(boxstyle="round,pad=0.2", facecolor=COLORS["neutral"]))
ax1.set_title("Salary Distribution by Department — Box Plot",
fontsize=13, fontweight="bold", loc="left")
ax1.set_xlabel("Department")
ax1.set_ylabel("Salary (Million VND)")
ax1.yaxis.set_major_formatter(FuncFormatter(format_vnd))
tufte_style(ax1)
ax1.tick_params(axis="x", rotation=30)
# --- Panel B: Violin Plot (more detail on distribution shape) ---
vp = sns.violinplot(data=df, x="department", y="salary", order=dept_order,
palette=PALETTE_5, ax=ax2, inner="quartile",
linewidth=1.2, cut=0)
ax2.set_title("Salary Distribution by Department — Violin Plot",
fontsize=13, fontweight="bold", loc="left")
ax2.set_xlabel("Department")
ax2.set_ylabel("Salary (Million VND)")
ax2.yaxis.set_major_formatter(FuncFormatter(format_vnd))
tufte_style(ax2)
ax2.tick_params(axis="x", rotation=30)
plt.suptitle("📦 Chart 5: Pay Equity Analysis", fontsize=16, y=1.02, fontweight="bold")
plt.tight_layout()
save_chart(fig, "05_boxplot_salary_department")
plt.show()Markdown interpretation (Cell 13):
Interpretation Chart 5:
- Engineering có median salary cao nhất và IQR rộng nhất → phân tán lớn (Junior vs Lead/Manager)
- HR có IQR nhỏ nhất → lương tập trung, ít variance
- Sales có nhiều outliers phía trên → có thể là commission-based high earners → cần audit pay structure
- Violin plot cho thấy Engineering có distribution bimodal (2 đỉnh) → 2 nhóm lương rõ rệt (Junior cluster vs Senior cluster)
Phần 7: Multi-Panel Figure — Dashboard Tổng Hợp
Ghép 4 chart quan trọng nhất vào 1 figure — layout 2×2 — professional dashboard cho CEO.
Bước 7.1: Multi-Panel Dashboard
python
# Cell 14: Multi-Panel Dashboard — 2×2
fig, axes = plt.subplots(2, 2, figsize=(18, 14))
# ===== Panel 1 (Top-Left): Bar Chart — Headcount =====
dept_counts = df["department"].value_counts().sort_values(ascending=True)
bars = axes[0, 0].barh(dept_counts.index, dept_counts.values,
color=PALETTE_5[:len(dept_counts)], edgecolor="white")
for bar, val in zip(bars, dept_counts.values):
axes[0, 0].text(val + 5, bar.get_y() + bar.get_height()/2,
f"{val}", va="center", fontsize=10, fontweight="bold")
axes[0, 0].set_title("A. Headcount by Department", fontsize=13, fontweight="bold", loc="left")
axes[0, 0].set_xlabel("Employees")
tufte_style(axes[0, 0])
# ===== Panel 2 (Top-Right): Line Chart — Revenue Trend =====
axes[0, 1].plot(months, monthly_revenue, color=COLORS["primary"],
linewidth=2.5, marker="o", markersize=6,
markerfacecolor="white", markeredgecolor=COLORS["primary"])
axes[0, 1].fill_between(months, monthly_revenue, alpha=0.1, color=COLORS["primary"])
# Anomaly highlight
axes[0, 1].plot("Nov", 38, "o", markersize=12,
markerfacecolor=COLORS["negative"], markeredgecolor="white", zorder=4)
axes[0, 1].set_title("B. Monthly Revenue Trend — FY2025",
fontsize=13, fontweight="bold", loc="left")
axes[0, 1].set_ylabel("Revenue (B VND)")
axes[0, 1].tick_params(axis="x", rotation=45)
tufte_style(axes[0, 1])
# ===== Panel 3 (Bottom-Left): Scatter — Salary vs Experience =====
scatter = axes[1, 0].scatter(df["experience_years"], df["salary"],
c=df["department"].map({d: i for i, d in enumerate(departments)}),
cmap="Set2", alpha=0.4, s=15, edgecolor="white", linewidth=0.2)
# Trend line
z = np.polyfit(df["experience_years"], df["salary"], 1)
p = np.poly1d(z)
x_line = np.linspace(0, 25, 100)
axes[1, 0].plot(x_line, p(x_line), "--", color=COLORS["negative"], linewidth=1.5)
corr = df["experience_years"].corr(df["salary"])
axes[1, 0].text(0.02, 0.92, f"r = {corr:.2f}", transform=axes[1, 0].transAxes,
fontsize=10, fontweight="bold",
bbox=dict(boxstyle="round", facecolor="lightyellow"))
axes[1, 0].set_title("C. Salary vs Experience", fontsize=13, fontweight="bold", loc="left")
axes[1, 0].set_xlabel("Experience (years)")
axes[1, 0].set_ylabel("Salary (M VND)")
tufte_style(axes[1, 0])
# ===== Panel 4 (Bottom-Right): Box Plot — Salary by Dept =====
dept_order = df.groupby("department")["salary"].median().sort_values().index.tolist()
sns.boxplot(data=df, x="department", y="salary", order=dept_order,
palette=PALETTE_5, ax=axes[1, 1], width=0.6, linewidth=1)
axes[1, 1].set_title("D. Salary Distribution by Department",
fontsize=13, fontweight="bold", loc="left")
axes[1, 1].set_xlabel("Department")
axes[1, 1].set_ylabel("Salary (M VND)")
axes[1, 1].tick_params(axis="x", rotation=30)
tufte_style(axes[1, 1])
# ===== Global Title & Layout =====
fig.suptitle("🏢 HR Analytics Dashboard — Q4/2025\n"
"TechVN | 1,500 Employees | Prepared by: Data Analytics Team",
fontsize=18, fontweight="bold", y=1.02)
# Footnote
fig.text(0.5, -0.02,
"Data source: HR Database (Jan 2026) | Charts follow IBCS & Tufte standards | Colorblind-safe palette",
ha="center", fontsize=9, color="gray", style="italic")
plt.tight_layout()
save_chart(fig, "dashboard_hr_analytics")
plt.show()Markdown interpretation (Cell 15):
Dashboard Summary — Key Takeaways:
- Workforce: Engineering dominates headcount (35%) — aligned with tech company profile
- Revenue: Strong upward trend with anomaly in Nov (−18%) — needs investigation
- Compensation: Strong experience-salary correlation (r ≈ 0.72), underpaid group detected
- Pay equity: Engineering widest salary spread; Sales has outlier high-earners
Phần 8: Export Chất Lượng Cao
Đảm bảo tất cả 5 charts + dashboard đều được export PNG (300 dpi) + SVG.
Bước 8.1: Verify Exports
python
# Cell 16: Verify all exports
import os
print("=" * 60)
print("📁 EXPORT VERIFICATION")
print("=" * 60)
expected_files = [
"01_bar_department",
"02_line_revenue_trend",
"03_scatter_salary_experience",
"04_heatmap_correlation",
"05_boxplot_salary_department",
"dashboard_hr_analytics",
]
for name in expected_files:
for fmt in ["png", "svg"]:
filepath = f"charts/{name}.{fmt}"
if os.path.exists(filepath):
size_kb = os.path.getsize(filepath) / 1024
print(f" ✅ {filepath} ({size_kb:.0f} KB)")
else:
print(f" ❌ MISSING: {filepath}")
total_files = len([f for f in os.listdir("charts") if f.endswith((".png", ".svg"))])
print(f"\n📊 Total chart files: {total_files}")
print(f"📋 Expected: {len(expected_files) * 2} (6 charts × 2 formats)")Bước 8.2: Export Summary Image
python
# Cell 17: Summary — all charts overview
from matplotlib.image import imread
fig, axes = plt.subplots(2, 3, figsize=(20, 12))
chart_pngs = [f"charts/{name}.png" for name in expected_files]
for ax, png_path, name in zip(axes.flat, chart_pngs, expected_files):
if os.path.exists(png_path):
img = imread(png_path)
ax.imshow(img)
ax.set_title(name.replace("_", " ").title(), fontsize=10)
ax.axis("off")
fig.suptitle("📊 All Charts — Workshop Buổi 10", fontsize=16, fontweight="bold")
plt.tight_layout()
plt.savefig("charts/00_overview_all_charts.png", dpi=150, bbox_inches="tight")
plt.show()
print("✅ Overview image saved: charts/00_overview_all_charts.png")🌟 Bonus Challenges
Bonus 1: Attrition Rate by Department — Stacked Bar
python
# Cell 18: Bonus 1 — Attrition Analysis
fig, ax = plt.subplots(figsize=(12, 6))
# Calculate attrition rate by department
attrition_rate = df.groupby("department")["attrition"].apply(
lambda x: (x == "Yes").mean() * 100
).sort_values(ascending=True)
# Color by risk level
colors = [COLORS["positive"] if r < 20 else
COLORS["accent2"] if r < 30 else
COLORS["negative"] for r in attrition_rate]
bars = ax.barh(attrition_rate.index, attrition_rate.values, color=colors, edgecolor="white")
# Data labels + risk indicators
for bar, rate in zip(bars, attrition_rate.values):
risk = "🟢 Low" if rate < 20 else "🟡 Medium" if rate < 30 else "🔴 High"
ax.text(rate + 0.5, bar.get_y() + bar.get_height()/2,
f"{rate:.1f}% {risk}", va="center", fontsize=11, fontweight="bold")
ax.set_title("Attrition Rate by Department — Risk Assessment",
fontsize=14, fontweight="bold", loc="left")
ax.set_xlabel("Attrition Rate (%)")
ax.axvline(20, color=COLORS["accent2"], linestyle="--", linewidth=1, alpha=0.5, label="Warning threshold (20%)")
ax.legend(frameon=False)
tufte_style(ax)
plt.tight_layout()
save_chart(fig, "bonus_attrition_rate")
plt.show()Bonus 2: Custom Style Theme
python
# Cell 19: Bonus 2 — Custom Matplotlib Style
def apply_corporate_theme():
"""Apply corporate/professional Matplotlib theme"""
plt.rcParams.update({
# Figure
"figure.figsize": (12, 7),
"figure.dpi": 100,
"figure.facecolor": "white",
# Font
"font.size": 12,
"font.family": "sans-serif",
"axes.titlesize": 14,
"axes.labelsize": 12,
# Axes
"axes.spines.top": False,
"axes.spines.right": False,
"axes.facecolor": "white",
"axes.edgecolor": "#333333",
"axes.linewidth": 0.8,
# Grid
"axes.grid": True,
"grid.alpha": 0.3,
"grid.linewidth": 0.5,
"grid.color": "#CCCCCC",
# Ticks
"xtick.color": "#333333",
"ytick.color": "#333333",
# Legend
"legend.frameon": False,
"legend.fontsize": 10,
# Save
"savefig.dpi": 300,
"savefig.bbox": "tight",
"savefig.facecolor": "white",
})
sns.set_palette("colorblind")
print("✅ Corporate theme applied!")
apply_corporate_theme()
# Test theme
fig, ax = plt.subplots()
ax.bar(["Q1", "Q2", "Q3", "Q4"], [42, 48, 45, 52], color=COLORS["primary"])
ax.set_title("Theme Test — Corporate Style")
ax.set_ylabel("Revenue (B VND)")
plt.show()Bonus 3: Animated Progress (Optional — advanced)
python
# Cell 20: Bonus 3 — Chart Style Comparison
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
styles = ["default", "seaborn-v0_8-whitegrid", "ggplot"]
for ax, style_name in zip(axes, styles):
with plt.style.context(style_name):
ax.bar(["A", "B", "C", "D"], [25, 40, 30, 35])
ax.set_title(f"Style: {style_name}", fontsize=12)
ax.set_ylabel("Value")
plt.suptitle("Matplotlib Style Gallery — Choose Your Favorite", fontsize=15, fontweight="bold")
plt.tight_layout()
save_chart(fig, "bonus_style_comparison")
plt.show()📋 Deliverable
Sau khi hoàn thành workshop, nộp:
| # | File | Mô tả |
|---|---|---|
| 1 | HoTen_Buoi10_Visualization.ipynb | Jupyter Notebook hoàn chỉnh — Restart & Run All thành công |
| 2 | charts/01_bar_department.png | Chart 1: Bar chart — Headcount & Avg Salary |
| 3 | charts/02_line_revenue_trend.png | Chart 2: Line chart — Monthly Revenue Trend |
| 4 | charts/03_scatter_salary_experience.png | Chart 3: Scatter plot — Salary vs Experience |
| 5 | charts/04_heatmap_correlation.png | Chart 4: Heatmap — Correlation Matrix |
| 6 | charts/05_boxplot_salary_department.png | Chart 5: Box plot — Salary Distribution |
| 7 | charts/dashboard_hr_analytics.png | Multi-panel dashboard tổng hợp |
💡 Checklist trước khi nộp
- [ ] Restart & Run All — notebook chạy từ đầu đến cuối không lỗi
- [ ] 5 chart files trong thư mục
charts/(PNG 300dpi) - [ ] Dashboard có 4 panels tổng hợp
- [ ] Mỗi chart có title, axis labels, annotations, interpretation
- [ ] Colorblind-safe palette — dùng
colorblindhoặc Wong palette - [ ] Markdown cells — interpretation sau mỗi chart
📊 Rubric — Thang điểm
| Tiêu chí | Điểm | Mô tả |
|---|---|---|
| Chart 1: Bar Chart (Phần 2) | 15 | Horizontal bar, data labels, mean vs median comparison, IBCS-style |
| Chart 2: Line Chart (Phần 3) | 15 | Trend line, anomaly annotation, data labels, fill area |
| Chart 3: Scatter Plot (Phần 4) | 15 | Color by department, trend line, correlation value, underpaid highlight |
| Chart 4: Heatmap (Phần 5) | 15 | Masked triangle, annot values, strong correlation highlight |
| Chart 5: Box Plot (Phần 6) | 15 | Sorted by median, median labels, violin bonus, dept comparison |
| Multi-Panel Dashboard (Phần 7) | 15 | 2×2 layout, consistent style, global title, footnote |
| Export & Verify (Phần 8) | 5 | PNG (300dpi) + SVG export, verification cell |
| Notebook Quality | 5 | Markdown sections, Restart & Run All OK, interpretation cells, no debug cells |
| Bonus | +15 | Attrition analysis (+5), Corporate theme (+5), Style comparison (+5) |
| Tổng | 100 + 15 bonus |
⚠️ Lưu ý quan trọng
- Restart & Run All trước khi nộp — notebook phải chạy từ đầu đến cuối không lỗi
- Mỗi chart phải có Markdown interpretation — chart không có interpretation = 0 điểm
- Charts phải export thành file PNG trong thư mục
charts/— không chỉ hiển thị inline - Sử dụng colorblind-safe palette — mất 3 điểm nếu dùng red-green default
- Title chart phải là insight (nêu finding), không chỉ label (mô tả data)
- Code phải có comments giải thích logic quan trọng