🧠 Case Study — Visualization: Biến data thành câu chuyện trong thực tế

Trong buổi học này, chúng ta đã nắm được Matplotlib & Seaborn, chart selection, và visualization best practices theo chuẩn IBCS/Tufte. Bây giờ hãy xem các kỹ năng đó được áp dụng thực tế như thế nào — từ chuẩn visualization của The Economist, đến interactive charts COVID-19 của NYT/FiveThirtyEight, và cuối cùng là một dashboard Việt Nam trước/sau khi áp dụng best practices.

Case Study 1: The Economist — Chuẩn Data Visualization cho báo chí quốc tế

Bối cảnh

The Economist là tạp chí kinh tế chính trị hàng đầu thế giới, nổi tiếng không chỉ về nội dung phân tích mà còn về chất lượng data visualization. Năm 2017, The Economist thành lập team Graphic Detail chuyên trách data viz, và năm 2019, họ công khai style guide cho biểu đồ — trở thành tài liệu tham khảo cho cả ngành báo chí và Data Analytics.

Mỗi tuần, The Economist xuất bản trung bình 25-30 biểu đồ trên bản in và bản online. Mỗi biểu đồ tuân theo bộ quy tắc nghiêm ngặt — từ font chữ, màu sắc, đến cách viết title. Mục tiêu: người đọc hiểu insight trong 5 giây mà không cần đọc bài viết kèm theo.

Vấn đề

Trước khi có style guide chuẩn hóa, biểu đồ của The Economist phụ thuộc vào "gu thẩm mỹ" của từng graphic designer. Kết quả: inconsistent — chart trên bản print khác style chart trên website, chart số ra tuần này khác tuần trước. Và quan trọng nhất: một số chart đẹp nhưng misleading — vi phạm nguyên tắc mà chính The Economist phê phán ở báo khác.

Năm 2019, team Graphic Detail quyết định: xây dựng visualization style guide có thể áp dụng cho mọi chart, mọi đề tài, mọi nền tảng.

Giải pháp — The Economist Style Guide

Nguyên tắc 1 — Title kể chuyện, không mô tả:

The Economist style guide quy định: title biểu đồ phải là insight statement, không phải mô tả dữ liệu. Subtitle cung cấp context bổ sung.

python

import matplotlib.pyplot as plt
import numpy as np

years = np.arange(2015, 2026)
asia = [3.2, 3.8, 4.5, 5.1, 5.8, 4.2, 5.5, 6.8, 7.9, 8.5, 9.2]
europe = [2.1, 2.3, 2.5, 2.6, 2.7, 1.8, 2.4, 2.9, 3.1, 3.3, 3.4]
americas = [2.8, 3.0, 3.2, 3.4, 3.5, 2.5, 3.3, 3.8, 4.1, 4.3, 4.5]

fig, axes = plt.subplots(1, 2, figsize=(18, 6))

# ❌ TRƯỚC: Title mô tả, không có insight
axes[0].plot(years, asia, marker='o', linewidth=2)
axes[0].plot(years, europe, marker='s', linewidth=2)
axes[0].plot(years, americas, marker='^', linewidth=2)
axes[0].set_title('E-commerce Revenue by Region (2015-2025)', fontsize=12)
axes[0].legend(['Asia', 'Europe', 'Americas'])
axes[0].set_ylabel('Revenue (Trillion $)')
axes[0].grid(True)

# ✅ SAU: The Economist style
axes[1].plot(years, asia, color='#E3120B', linewidth=2.5, label='Châu Á')
axes[1].plot(years, europe, color='#006BA2', linewidth=2.5, label='Châu Âu')
axes[1].plot(years, americas, color='#86BCB6', linewidth=2.5, label='Châu Mỹ')

# Title = Insight + Subtitle = Context
axes[1].set_title('Châu Á bỏ xa phần còn lại trong cuộc đua e-commerce',
                   fontsize=13, fontweight='bold', loc='left', pad=15)
axes[1].text(0, 1.06, 'Doanh thu e-commerce theo khu vực, nghìn tỷ USD',
             transform=axes[1].transAxes, fontsize=10, color='gray')

# Direct labeling thay vì legend box
axes[1].text(2025.2, 9.2, 'Châu Á', fontsize=10, color='#E3120B', fontweight='bold')
axes[1].text(2025.2, 4.5, 'Châu Mỹ', fontsize=10, color='#86BCB6', fontweight='bold')
axes[1].text(2025.2, 3.4, 'Châu Âu', fontsize=10, color='#006BA2', fontweight='bold')

# Remove chartjunk
axes[1].spines['top'].set_visible(False)
axes[1].spines['right'].set_visible(False)
axes[1].set_ylabel('')
axes[1].yaxis.grid(True, alpha=0.3, linestyle='-')
axes[1].set_axisbelow(True)

# Source line
axes[1].text(0, -0.12, 'Nguồn: eMarketer, Statista',
             transform=axes[1].transAxes, fontsize=8, color='gray', style='italic')

plt.tight_layout()
plt.show()

The Economist Title Rule

Loại title	Ví dụ	Đánh giá
Mô tả (descriptive)	"GDP Growth by Country, 2020-2025"	❌ Người đọc tự tìm insight
Insight (declarative)	"China's GDP growth outpaces the world since 2021"	✅ Insight rõ ràng trong 3 giây
Question (interrogative)	"Is China's growth sustainable?"	⚠️ Dùng khi muốn tạo curiosity, nhưng cần subtitle rõ

The Economist dùng declarative title cho 85% biểu đồ. Rule: nếu bạn che title mà chart vẫn có thể bị hiểu sai hướng → title chưa đủ rõ.

Nguyên tắc 2 — Color Palette chuẩn hóa:

The Economist dùng palette giới hạn: đỏ (#E3120B) cho highlight chính, xanh dương (#006BA2) cho secondary, xám cho context. Tối đa 5-6 màu cho mọi loại chart. Lý do: consistency giúp người đọc nhận diện brand và giảm cognitive load.

python

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

# The Economist Color Palette
economist_colors = {
    'Red (Primary)': '#E3120B',
    'Dark Blue': '#006BA2',
    'Teal': '#86BCB6',
    'Dark Gray': '#3D3D3D',
    'Light Gray': '#DBDBDB',
    'Orange': '#DB6D00'
}

fig, ax = plt.subplots(figsize=(12, 4))
ax.axis('off')

for i, (name, color) in enumerate(economist_colors.items()):
    rect = mpatches.FancyBboxPatch((i * 1.8 + 0.1, 0.3), 1.5, 1.2,
                                     boxstyle="round,pad=0.1",
                                     facecolor=color, edgecolor='white', linewidth=2)
    ax.add_patch(rect)
    text_color = 'white' if color in ['#E3120B', '#006BA2', '#3D3D3D'] else '#2C3E50'
    ax.text(i * 1.8 + 0.85, 0.9, name, ha='center', va='center',
            fontsize=8, color=text_color, fontweight='bold')
    ax.text(i * 1.8 + 0.85, 0.55, color, ha='center', va='center',
            fontsize=8, color=text_color)

ax.set_xlim(0, 11)
ax.set_ylim(0, 2)
ax.set_title('The Economist — Standard Color Palette (6 màu cho mọi chart)',
             fontsize=13, fontweight='bold', pad=15)
plt.tight_layout()
plt.show()

Nguyên tắc 3 — Annotation thay vì để người đọc tự suy luận:

The Economist không bao giờ để biểu đồ "trống" — mọi điểm dữ liệu quan trọng đều được annotate trực tiếp. Người đọc không cần nhìn qua legend, không cần ước lượng trục Y — insight "nhảy" vào mắt.

python

import matplotlib.pyplot as plt
import numpy as np

quarters = ['Q1\n2024', 'Q2\n2024', 'Q3\n2024', 'Q4\n2024', 'Q1\n2025', 'Q2\n2025', 'Q3\n2025', 'Q4\n2025']
inflation = [3.8, 3.5, 3.2, 3.1, 2.9, 2.7, 2.4, 2.1]

fig, ax = plt.subplots(figsize=(12, 6))

# Line plot
ax.plot(quarters, inflation, color='#E3120B', linewidth=3, marker='o', markersize=8,
        markerfacecolor='white', markeredgecolor='#E3120B', markeredgewidth=2)

# Annotation cho điểm quan trọng — The Economist style
ax.annotate('Đỉnh lạm phát\n3.8%', xy=(0, 3.8), xytext=(0.8, 4.3),
            fontsize=10, fontweight='bold', color='#E3120B',
            arrowprops=dict(arrowstyle='->', color='#E3120B', lw=1.5),
            ha='center')

ax.annotate('Mục tiêu NHNN đạt\n2.1%', xy=(7, 2.1), xytext=(5.5, 1.5),
            fontsize=10, fontweight='bold', color='#006BA2',
            arrowprops=dict(arrowstyle='->', color='#006BA2', lw=1.5),
            ha='center')

# Target line
ax.axhline(y=2.5, color='#006BA2', linestyle='--', alpha=0.6, linewidth=1.5)
ax.text(7.3, 2.55, 'Mục tiêu 2.5%', fontsize=9, color='#006BA2')

# Style
ax.set_title('Lạm phát Việt Nam giảm liên tục 8 quý, đạt mục tiêu NHNN',
             fontsize=13, fontweight='bold', loc='left', pad=15)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_ylabel('CPI YoY (%)', fontsize=11)
ax.yaxis.grid(True, alpha=0.3)
ax.set_axisbelow(True)

ax.text(0, -0.15, 'Nguồn: Tổng cục Thống kê',
        transform=ax.transAxes, fontsize=8, color='gray', style='italic')

plt.tight_layout()
plt.show()

Kết quả

Nguyên tắc The Economist	Trước style guide	Sau style guide
Title	Mô tả generic: "GDP by Country"	Insight: "China outpaces the world"
Colors	8-10 màu tùy designer	6 màu chuẩn, consistent mọi số
Labeling	Legend box góc chart	Direct label trên data point
Gridlines	Full grid ngang + dọc	Chỉ gridlines ngang, nhạt
Source	Không ghi hoặc thiếu	Bắt buộc, góc trái dưới

Bài học từ The Economist cho DA

Style guide không phải luxury — nó là necessity. Team nào cũng cần 1 bộ rule chuẩn cho chart: color palette, font, title format. Consistency tạo professionalism.
Title là element quan trọng nhất — CEO nhìn title trước, số liệu sau. Title tốt = chart thành công 70%.
Ít hơn = nhiều hơn — The Economist bỏ legend box, bỏ gridlines dọc, bỏ 3D effect. Mỗi pixel trên chart phải phục vụ insight.

Bài học cho DA

Xây dựng personal/team style guide — chọn 4-6 màu cố định, format title nhất quán, và áp dụng cho mọi báo cáo.
Declarative title > Descriptive title — luôn viết insight vào title, dùng subtitle cho context.
Direct labeling tiết kiệm thời gian đọc — người xem không phải nhìn qua lại giữa legend và data.
Remove chartjunk theo Tufte — mọi thứ không "nói data" đều là rác: 3D, shadow, gradient, border thừa.

Case Study 2: NYT / FiveThirtyEight — Interactive Charts cho COVID-19 Data

Bối cảnh

Khi đại dịch COVID-19 bùng phát đầu 2020, thế giới đối mặt với data overload chưa từng có: số ca nhiễm, tử vong, tiêm chủng, biến thể — cập nhật từng giờ, từ 200+ quốc gia. Thách thức không phải thiếu data — mà là làm sao truyền tải data phức tạp đến hàng triệu người bình thường không có background Data Analytics.

The New York Times (NYT) và FiveThirtyEight trở thành hai tiêu chuẩn vàng cho COVID-19 data visualization. NYT tracking page đạt peak 3.5 tỷ pageviews/tháng — trở thành nguồn thông tin COVID tin cậy nhất nước Mỹ. FiveThirtyEight nổi bật với uncertainty visualization — thể hiện khoảng tin cậy của dự báo, giúp công chúng hiểu "dự đoán không phải 1 con số."

Vấn đề

COVID-19 data đặt ra 3 thách thức visualization chưa từng có:

Scale thay đổi liên tục — số ca từ hàng chục lên hàng triệu trong vài tháng. Linear scale không thể hiện giai đoạn đầu, log scale khó hiểu cho public.
Temporal patterns phức tạp — weekly seasonality (ít test cuối tuần → ít ca → nhưng không phải giảm thật), reporting lags, backfill corrections.
Regional granularity — cần so sánh 3.000+ counties ở Mỹ cùng lúc, với trajectory khác nhau.

Giải pháp — NYT & FiveThirtyEight Approach

Approach 1 — NYT: Rolling Average để smooth noise:

NYT nhận ra rằng raw daily case counts gây hoảng loạn không cần thiết (Monday drop, Tuesday spike do reporting lag). Giải pháp: 7-day rolling average làm đường chính, raw data làm nền.

python

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Simulate COVID-19 daily data với weekly seasonality
np.random.seed(42)
days = 180
dates = pd.date_range('2024-07-01', periods=days)

# Base trend (wave shape) + weekly noise
trend = 500 + 800 * np.sin(np.linspace(0, 2*np.pi, days)) + np.random.normal(0, 100, days)
# Weekly seasonality: lower on weekends
weekly_factor = np.array([1.0, 1.1, 1.15, 1.1, 1.0, 0.7, 0.6] * 26)[:days]
daily_cases = (trend * weekly_factor).clip(50)

df = pd.DataFrame({'date': dates, 'cases': daily_cases.astype(int)})
df['rolling_7d'] = df['cases'].rolling(7).mean()

fig, axes = plt.subplots(1, 2, figsize=(18, 6))

# ❌ Raw daily — noisy và misleading
axes[0].bar(df['date'], df['cases'], color='#DBDBDB', width=1.0)
axes[0].set_title('❌ Raw Daily Cases — Weekly noise gây hiểu lầm',
                   fontsize=11, fontweight='bold', loc='left')
axes[0].set_ylabel('Số ca / ngày')
axes[0].tick_params(axis='x', rotation=45)

# ✅ NYT style: rolling average + raw as background
axes[1].bar(df['date'], df['cases'], color='#DBDBDB', width=1.0, label='Ca hàng ngày')
axes[1].plot(df['date'], df['rolling_7d'], color='#E3120B', linewidth=2.5,
             label='Trung bình 7 ngày')
axes[1].set_title('✅ NYT Style — Rolling Average cho thấy trend thật',
                   fontsize=11, fontweight='bold', loc='left')
axes[1].set_ylabel('Số ca / ngày')
axes[1].legend(frameon=False, fontsize=10)
axes[1].tick_params(axis='x', rotation=45)

for ax in axes:
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

NYT Rolling Average — Tại sao 7 ngày?

7 ngày vì COVID testing có weekly cycle: ít test cuối tuần → ít ca → nhưng thêm ca vào đầu tuần sau (backlog). Rolling average 7 ngày triệt tiêu cycle này, cho thấy trend thật.

Bài học cho DA: bất kỳ data nào có seasonality (daily, weekly, monthly) đều cần smoothing trước khi trình bày. Raw data tốt cho EDA, nhưng rolling average tốt cho communication.

Approach 2 — FiveThirtyEight: Uncertainty Fan Chart:

FiveThirtyEight nổi tiếng với dự báo bầu cử và COVID — và điểm khác biệt lớn nhất là họ luôn thể hiện uncertainty. Thay vì nói "dự báo 50.000 ca/ngày", họ nói "dự báo 30.000-70.000 ca/ngày với confidence 80%."

python

import numpy as np
import matplotlib.pyplot as plt

# Simulate forecast with uncertainty bands
days_forecast = 60
dates_forecast = np.arange(days_forecast)

# Central forecast
central = 1000 + 50 * dates_forecast + 200 * np.sin(dates_forecast / 10)

# Uncertainty bands (widening over time — realistic)
uncertainty_50 = 80 + 5 * dates_forecast    # 50% CI
uncertainty_80 = 150 + 10 * dates_forecast   # 80% CI
uncertainty_95 = 250 + 18 * dates_forecast   # 95% CI

fig, ax = plt.subplots(figsize=(14, 7))

# Historical "actual" data
historical_days = 30
hist_dates = np.arange(-historical_days, 0)
hist_values = 800 + 50 * (hist_dates + historical_days) + np.random.normal(0, 60, historical_days)
ax.plot(hist_dates, hist_values, color='#2C3E50', linewidth=2, label='Thực tế')

# Forecast bands — FiveThirtyEight style (lightest = widest CI)
ax.fill_between(dates_forecast, central - uncertainty_95, central + uncertainty_95,
                alpha=0.15, color='#E3120B', label='95% CI')
ax.fill_between(dates_forecast, central - uncertainty_80, central + uncertainty_80,
                alpha=0.25, color='#E3120B', label='80% CI')
ax.fill_between(dates_forecast, central - uncertainty_50, central + uncertainty_50,
                alpha=0.4, color='#E3120B', label='50% CI')
ax.plot(dates_forecast, central, color='#E3120B', linewidth=2, linestyle='--', label='Dự báo trung tâm')

# Vertical line at forecast start
ax.axvline(x=0, color='gray', linestyle=':', linewidth=1.5, alpha=0.7)
ax.text(1, ax.get_ylim()[1] * 0.95, '← Thực tế | Dự báo →',
        fontsize=10, color='gray', style='italic')

# Style
ax.set_title('Dự báo số ca nhiễm: Khoảng tin cậy mở rộng theo thời gian',
             fontsize=13, fontweight='bold', loc='left', pad=15)
ax.text(0, 1.06, 'Fan chart thể hiện uncertainty — dự báo xa hơn = bất định lớn hơn',
        transform=ax.transAxes, fontsize=10, color='gray')

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_xlabel('Ngày', fontsize=11)
ax.set_ylabel('Số ca nhiễm / ngày', fontsize=11)
ax.legend(frameon=False, fontsize=9, loc='upper left')
ax.yaxis.grid(True, alpha=0.3)
ax.set_axisbelow(True)

ax.text(0, -0.12, 'Nguồn: Mô phỏng theo phương pháp FiveThirtyEight',
        transform=ax.transAxes, fontsize=8, color='gray', style='italic')

plt.tight_layout()
plt.show()

Approach 3 — Small Multiples cho so sánh regional:

NYT dùng small multiples — grid of identical charts — để so sánh COVID trajectory giữa các bang. Mỗi chart nhỏ cùng scale, cùng format, khác nhau chỉ data. Mắt người scan qua nhanh chóng nhận ra pattern.

python

import matplotlib.pyplot as plt
import numpy as np

# Simulate regional data — 6 khu vực Việt Nam
regions = ['TP.HCM', 'Hà Nội', 'Đà Nẵng', 'Bình Dương', 'Đồng Nai', 'Long An']
np.random.seed(42)

fig, axes = plt.subplots(2, 3, figsize=(16, 8), sharey=True, sharex=True)
fig.suptitle('Mỗi vùng có trajectory riêng — không thể dùng 1 con số đại diện cả nước',
             fontsize=13, fontweight='bold')
fig.text(0.5, 0.93, 'Số ca mới theo ngày, trung bình 7 ngày, mô phỏng dữ liệu',
         ha='center', fontsize=10, color='gray')

days = 120
dates = np.arange(days)

for idx, (ax, region) in enumerate(zip(axes.flat, regions)):
    # Mỗi vùng có wave khác nhau
    phase = idx * 15
    amplitude = 300 + idx * 150
    cases = amplitude * np.sin((dates - phase) / 20) + 500 + np.random.normal(0, 30, days)
    cases = cases.clip(10)
    rolling = pd.Series(cases).rolling(7).mean()

    ax.fill_between(dates, rolling.values, alpha=0.3, color='#E3120B')
    ax.plot(dates, rolling.values, color='#E3120B', linewidth=1.5)
    ax.set_title(region, fontsize=11, fontweight='bold', loc='left')
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.yaxis.grid(True, alpha=0.2)
    ax.set_axisbelow(True)

plt.tight_layout(rect=[0, 0, 1, 0.90])
plt.show()

Kết quả

Kỹ thuật	Vấn đề giải quyết	Impact
7-day rolling average	Weekly noise trong raw data	Loại bỏ hoảng loạn giả từ reporting lag, cho thấy trend thật
Uncertainty fan chart	Single-point forecast gây false confidence	Công chúng hiểu "dự báo có khoảng bất định" — ra quyết định hợp lý hơn
Small multiples	Quá nhiều vùng để so sánh trên 1 chart	Mỗi vùng 1 chart nhỏ cùng scale → pattern scanning nhanh
Log scale toggle	Linear scale che giấu early growth	Cho phép chuyển đổi giữa linear/log tùy context

Bài học từ NYT/FiveThirtyEight cho DA

Smoothing trước khi communicate — raw data cho EDA, rolling average cho report. Đừng để noise che trend.
Thể hiện uncertainty — đừng bao giờ nói "doanh thu Q1 sẽ là 50 tỷ" mà hãy nói "40-60 tỷ với 80% confidence." Fan chart giúp stakeholders hiểu risk.
Small multiples > Spaghetti chart — thay vì 6 đường chồng lên nhau trên 1 chart (spaghetti), dùng 6 chart nhỏ cùng scale. Dễ đọc gấp 10 lần.
Let users explore — NYT cho phép toggle log/linear, hover data point, zoom timeline. Trong report tĩnh, hãy dùng annotation thay interactive.

Bài học cho DA

Rolling average là kỹ thuật bắt buộc cho data có seasonality — sales daily, website traffic, support tickets.
Uncertainty visualization nâng tầm chuyên nghiệp — DA trình bày khoảng tin cậy thay vì 1 con số cụ thể sẽ được sếp tin hơn.
Small multiples giải quyết vấn đề "quá nhiều categories" — thay vì chọn top 5 rồi bỏ phần còn lại.
Narrative hướng dẫn mắt — annotation + highlight + title kể chuyện giúp người đọc follows đúng insight bạn muốn truyền tải.

Case Study 3: Dashboard Việt Nam — Trước / Sau khi áp dụng Visualization Best Practices

Bối cảnh

Một công ty logistics vừa ở TP.HCM — 800 nhân viên, 15.000 đơn giao/ngày — có team Data Analytics 4 người. Team dùng Python + Matplotlib để tạo weekly dashboard gửi cho Ban Giám Đốc mỗi sáng thứ Hai.

Dashboard bao gồm: tổng đơn giao, tỷ lệ giao thành công, thời gian giao trung bình, doanh thu, và breakdown theo vùng.

Vấn đề — Dashboard "Trước"

Dashboard v1 của team có các vấn đề điển hình:

10 chart trên 1 trang — CEO mở ra không biết nhìn đâu trước
Pie chart cho market share — 8 vùng miền, 8 màu cầu vồng
Title mô tả — "Số đơn theo tháng", "Doanh thu theo vùng"
Không có benchmark — số tăng 5% nhưng tăng so với cái gì? Target bao nhiêu?
Font chữ, spacing, color inconsistent — mỗi chart một style

CEO từng feedback: "Dashboard nhiều chart quá nhưng anh không biết tình hình kinh doanh tốt hay xấu. Mỗi thứ Hai mở ra phải mất 15 phút mới hiểu."

Dashboard v1 — Trước khi cải thiện:

python

import matplotlib.pyplot as plt
import numpy as np

# ❌ Dashboard v1: Nhiều chart, inconsistent, không có insight
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
fig.suptitle('WEEKLY DASHBOARD', fontsize=16)

months = ['T7', 'T8', 'T9', 'T10', 'T11', 'T12']
orders = [58000, 61000, 63500, 65000, 68000, 72000]
revenue = [8.2, 8.7, 9.1, 9.3, 9.8, 10.5]
success_rate = [94.2, 93.8, 94.5, 95.1, 94.7, 95.3]

# Chart 1: Bar chart — màu quá sặc sỡ
colors_rainbow = ['#FF0000', '#FF7F00', '#FFFF00', '#00FF00', '#0000FF', '#8B00FF']
axes[0, 0].bar(months, orders, color=colors_rainbow)
axes[0, 0].set_title('Số đơn theo tháng')
axes[0, 0].grid(True)

# Chart 2: Pie chart cho vùng miền — quá nhiều slice
regions = ['TP.HCM', 'Hà Nội', 'Đà Nẵng', 'Cần Thơ', 'Hải Phòng', 'Biên Hòa', 'Bình Dương', 'Khác']
region_pct = [32, 25, 12, 8, 7, 6, 5, 5]
axes[0, 1].pie(region_pct, labels=regions, autopct='%d%%', startangle=90,
               colors=['#e6194b', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', '#42d4f4', '#bfef45'])
axes[0, 1].set_title('Tỷ lệ đơn theo vùng')

# Chart 3: Line chart — no context
axes[0, 2].plot(months, success_rate, marker='o', linewidth=2)
axes[0, 2].set_title('Tỷ lệ giao thành công')
axes[0, 2].set_ylim(90, 100)
axes[0, 2].grid(True)

# Chart 4: Revenue — generic
axes[1, 0].bar(months, revenue, color='steelblue')
axes[1, 0].set_title('Doanh thu (tỷ)')

# Chart 5-6: Empty placeholders
axes[1, 1].text(0.5, 0.5, 'Chart 5\n(KPI khác)', ha='center', va='center', fontsize=14)
axes[1, 1].set_title('Thời gian giao TB')
axes[1, 2].text(0.5, 0.5, 'Chart 6\n(tỷ lệ khiếu nại)', ha='center', va='center', fontsize=14)
axes[1, 2].set_title('Tỷ lệ khiếu nại')

plt.tight_layout()
plt.show()

Giải pháp — Dashboard "Sau" áp dụng Best Practices

Team tham khảo The Economist style guide, IBCS, và nguyên tắc "1 dashboard kể 1 câu chuyện":

Nguyên tắc redesign:

KPI cards ở đầu — 4 con số lớn nhất: tổng đơn, doanh thu, tỷ lệ thành công, AOV. CEO nhìn 3 giây là biết tình hình.
Title = insight — "Đơn tăng 6% MoM nhưng giao thành công giảm 0.4 pp"
Consistency — 1 color palette, 1 font
Max 4 charts — bỏ chart nào không dẫn đến action
Benchmark lines — target, cùng kỳ năm trước

python

import matplotlib.pyplot as plt
import numpy as np

# ✅ Dashboard v2: Clean, focused, insight-driven
fig = plt.figure(figsize=(18, 12))
fig.patch.set_facecolor('#FAFAFA')

# Color palette chuẩn
PRIMARY = '#2C3E50'
ACCENT = '#E3120B'
POSITIVE = '#27AE60'
NEGATIVE = '#E74C3C'
GRAY = '#BDC3C7'

# ═══ TOP: KPI Cards ═══
fig.text(0.05, 0.95, 'Weekly Operations Dashboard — Tuần 50/2025',
         fontsize=16, fontweight='bold', color=PRIMARY)
fig.text(0.05, 0.92, 'Tổng đơn tăng nhưng tỷ lệ giao thành công giảm nhẹ — cần review vùng Hà Nội',
         fontsize=11, color='gray')

kpi_data = [
    ('Tổng đơn tuần', '72,000', '▲ 5.9%', POSITIVE),
    ('Doanh thu', '10.5 tỷ', '▲ 7.1%', POSITIVE),
    ('Giao thành công', '95.3%', '▼ 0.4pp', NEGATIVE),
    ('Thời gian TB', '2.1 ngày', '▼ 0.2 ngày', POSITIVE),
]

for i, (label, value, change, color) in enumerate(kpi_data):
    x_start = 0.05 + i * 0.235
    fig.text(x_start, 0.87, label, fontsize=10, color='gray')
    fig.text(x_start, 0.83, value, fontsize=22, fontweight='bold', color=PRIMARY)
    fig.text(x_start + 0.12, 0.84, change, fontsize=11, fontweight='bold', color=color)

# ═══ Chart 1: Trend — Số đơn + Target ═══
ax1 = fig.add_axes([0.05, 0.45, 0.42, 0.32])

months = ['T7', 'T8', 'T9', 'T10', 'T11', 'T12']
orders = [58000, 61000, 63500, 65000, 68000, 72000]
target = [60000, 62000, 64000, 66000, 68000, 70000]

ax1.plot(months, orders, color=PRIMARY, linewidth=2.5, marker='o', markersize=6, label='Thực tế')
ax1.plot(months, target, color=GRAY, linewidth=1.5, linestyle='--', label='Target')
ax1.fill_between(months, orders, target,
                 where=[o >= t for o, t in zip(orders, target)],
                 alpha=0.1, color=POSITIVE)
ax1.fill_between(months, orders, target,
                 where=[o < t for o, t in zip(orders, target)],
                 alpha=0.1, color=NEGATIVE)

ax1.set_title('Đơn vượt target từ T11 — momentum tốt cuối năm',
              fontsize=11, fontweight='bold', loc='left')
ax1.legend(frameon=False, fontsize=9)
ax1.spines['top'].set_visible(False)
ax1.spines['right'].set_visible(False)
ax1.yaxis.grid(True, alpha=0.3)
ax1.set_axisbelow(True)

# ═══ Chart 2: Horizontal Bar — Đơn theo vùng ═══
ax2 = fig.add_axes([0.55, 0.45, 0.40, 0.32])

regions = ['TP.HCM', 'Hà Nội', 'Đà Nẵng', 'Cần Thơ', 'Hải Phòng']
region_orders = [23040, 18000, 8640, 5760, 5040]
region_success = [96.1, 93.2, 95.8, 95.5, 94.9]

colors_bar = [ACCENT if s < 94.0 else PRIMARY for s in region_success]
y_pos = np.arange(len(regions))

ax2.barh(y_pos, region_orders, color=colors_bar, height=0.6, edgecolor='white')

for i, (orders_val, success) in enumerate(zip(region_orders, region_success)):
    label_color = 'white' if orders_val > 10000 else PRIMARY
    ax2.text(orders_val - 800, i, f'{orders_val:,}', va='center', ha='right',
             fontsize=9, color=label_color, fontweight='bold')
    status = '⚠️' if success < 94.0 else ''
    ax2.text(region_orders[0] + 1200, i, f'{success}% {status}', va='center',
             fontsize=9, color=ACCENT if success < 94.0 else POSITIVE)

ax2.set_yticks(y_pos)
ax2.set_yticklabels(regions, fontsize=10)
ax2.set_title('Hà Nội: tỷ lệ giao thành công thấp nhất — 93.2% ⚠️',
              fontsize=11, fontweight='bold', loc='left')
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)
ax2.spines['bottom'].set_visible(False)
ax2.tick_params(bottom=False, labelbottom=False)

# ═══ Chart 3: Success Rate Trend ═══
ax3 = fig.add_axes([0.05, 0.06, 0.42, 0.30])

success_rate = [94.2, 93.8, 94.5, 95.1, 94.7, 95.3]
target_sr = [95.0] * 6

ax3.plot(months, success_rate, color=PRIMARY, linewidth=2.5, marker='o', markersize=6)
ax3.axhline(y=95.0, color=ACCENT, linestyle='--', linewidth=1.5, alpha=0.7)
ax3.text(5.1, 95.1, 'Target 95%', fontsize=9, color=ACCENT)

# Highlight below-target
for i, (m, sr) in enumerate(zip(months, success_rate)):
    if sr < 95.0:
        ax3.scatter(i, sr, color=NEGATIVE, s=60, zorder=5)

ax3.set_title('Giao thành công dao động quanh target — chưa ổn định',
              fontsize=11, fontweight='bold', loc='left')
ax3.set_ylabel('%', fontsize=10)
ax3.set_ylim(92, 97)
ax3.spines['top'].set_visible(False)
ax3.spines['right'].set_visible(False)
ax3.yaxis.grid(True, alpha=0.3)
ax3.set_axisbelow(True)

# ═══ Chart 4: Delivery Time Distribution ═══
ax4 = fig.add_axes([0.55, 0.06, 0.40, 0.30])

np.random.seed(42)
delivery_times = np.concatenate([
    np.random.normal(1.8, 0.5, 5000),  # HCM — nhanh
    np.random.normal(2.5, 0.8, 3500),  # HN — chậm hơn
    np.random.normal(2.0, 0.6, 3000),  # Others
]).clip(0.5, 6)

ax4.hist(delivery_times, bins=40, color=PRIMARY, alpha=0.7, edgecolor='white')
ax4.axvline(x=2.0, color=ACCENT, linestyle='--', linewidth=2, label='Target: 2.0 ngày')
ax4.axvline(x=np.median(delivery_times), color=POSITIVE, linestyle='-', linewidth=2,
            label=f'Median: {np.median(delivery_times):.1f} ngày')

ax4.set_title('18% đơn giao trên 3 ngày — long tail cần xử lý',
              fontsize=11, fontweight='bold', loc='left')
ax4.set_xlabel('Thời gian giao (ngày)', fontsize=10)
ax4.legend(frameon=False, fontsize=9)
ax4.spines['top'].set_visible(False)
ax4.spines['right'].set_visible(False)

# Source
fig.text(0.05, 0.01, 'Nguồn: Hệ thống TMS, cập nhật 15/12/2025 | Team Data Analytics',
         fontsize=8, color='gray', style='italic')

plt.show()

So sánh Before / After

Tiêu chí	Dashboard v1 (Trước)	Dashboard v2 (Sau)
Số chart	6 charts + 2 placeholder	4 KPI cards + 4 charts có chọn lọc
Title	"Số đơn theo tháng" (mô tả)	"Đơn vượt target từ T11" (insight)
Color	Rainbow (8 màu cầu vồng)	4 màu chuẩn: primary, accent, positive, negative
Benchmark	Không có target / so sánh	Target lines + cùng kỳ + color coding
Thời gian đọc	15 phút, CEO phải tự tìm insight	30 giây, insight hiện ngay từ title + KPI
Pie chart	Có, 8 slices + 8 màu	Không, thay bằng horizontal bar + success rate
Actionable	Không — chỉ show số	Có — highlight vùng có vấn đề (Hà Nội ⚠️)

Before vs After — Mindset thay đổi

Dashboard v1 trả lời: "Số liệu là bao nhiêu?" Dashboard v2 trả lời: "Tình hình kinh doanh tốt hay xấu? Cần làm gì?"

Sự khác biệt không nằm ở kỹ thuật Python — cùng Matplotlib, cùng data. Khác biệt nằm ở tư duy communication: chart phải kể câu chuyện, không phải show data.

Bài học cho DA

KPI cards đầu dashboard — tổng quan 3-4 con số quan trọng nhất, kèm trend (▲/▼) và so sánh target.
Max 4-6 charts — mỗi chart phải dẫn đến 1 action. Chart nào không dẫn đến quyết định thì bỏ.
Consistent color = professional — chọn 4-6 màu cố định, dùng xuyên suốt mọi tuần. CEO quen với màu → đọc nhanh hơn.
Highlight anomaly — dùng màu đỏ/icon ⚠️ cho metric dưới target. CEO nhìn vào biết ngay đâu cần chú ý.

So sánh & Tổng hợp

Tiêu chí	The Economist	NYT / FiveThirtyEight	Dashboard Việt Nam
Context	Biểu đồ cho báo chí hàng tuần	COVID-19 tracking cho public	Weekly report cho Ban Giám Đốc
Audience	Độc giả chuyên nghiệp, ít thời gian	Công chúng — mọi trình độ	CEO, CFO — cần action
Key Technique	Declarative title, direct labeling, minimal color	Rolling average, uncertainty fan, small multiples	KPI cards, benchmark lines, color coding anomaly
Style Rule	Max 5 màu, bỏ chartjunk, source bắt buộc	Smooth noise, show uncertainty, enable exploration	Max 4-6 charts, insight title, consistent palette
Biggest Lesson	Title = insight, mọi pixel phục vụ data	Đừng show raw data cho communication, luôn thể hiện uncertainty	Dashboard kể chuyện, không chỉ show số

Nguyên tắc chung từ 3 Case Studies

Chart phải kể câu chuyện — The Economist dùng title, NYT dùng rolling average, dashboard dùng KPI cards. Mỗi cách khác nhau nhưng đều hướng đến: người đọc hiểu insight trong vài giây.
Less is More — The Economist bỏ chartjunk, NYT bỏ raw noise, dashboard bỏ chart thừa. Mọi thứ không phục vụ insight đều là rác.
Consistency builds trust — color palette chuẩn, format nhất quán, source citation. Consistency = professionalism = trust từ stakeholder.
Know your audience — báo chí cần declarative title; public cần simplicity; CEO cần actionable insight. Chart tốt cho audience này có thể tệ cho audience khác.

Bài tập tư duy

Câu hỏi thảo luận

The Economist Context: Nếu bạn xây style guide cho team DA ở công ty mình, 5 rule đầu tiên bạn sẽ đặt là gì? Tham khảo The Economist nhưng điều chỉnh cho context Việt Nam (ví dụ: font tiếng Việt, format số tiền VNĐ).
NYT/FiveThirtyEight Context: Sếp bạn yêu cầu dự báo doanh thu Q1/2026. Bạn chọn trình bày single-point forecast ($50 tỷ) hay uncertainty range ($40-60 tỷ)? Nếu sếp nói "anh cần 1 con số chính xác, đừng cho anh khoảng", bạn respond thế nào?
Dashboard Context: CEO feedback: "Dashboard nhiều quá, anh không biết nhìn đâu." Bạn có 12 metrics đang track — chọn 4 cái nào giữ lại? Tiêu chí chọn là gì?
Cross-case: Cả 3 case studies đều dùng annotation/highlight thay vì để người đọc tự khám phá. Trong context của bạn, khi nào nên dùng annotation trực tiếp, khi nào nên để dashboard interactive cho người dùng tự explore?

Bài tập thực hành

Bài tập: Dashboard Redesign Challenge

Tình huống: Bạn là DA tại một chuỗi café 50 chi nhánh. CEO gửi bạn file Excel với data tháng 12:

Doanh thu theo chi nhánh (50 chi nhánh)
Số ly bán theo loại đồ uống (15 loại)
Customer satisfaction score (1-5)
Staff efficiency (đơn/giờ)

Yêu cầu:

Chọn 4 KPI cards quan trọng nhất để đặt đầu dashboard
Thiết kế 3 charts (chọn đúng chart type) với insight title
Áp dụng color palette tối đa 4 màu
Viết annotation cho ít nhất 2 data points quan trọng
Code bằng Matplotlib/Seaborn, apply best practices từ cả 3 case studies

Tiêu chí đánh giá:

Title có phải insight statement không?
Color có consistent và có nghĩa (red=bad, green=good) không?
CEO nhìn 30 giây có hiểu tình hình kinh doanh không?
Có chart nào là chartjunk (không dẫn đến action)?

🧠 Case Study — Visualization: Biến data thành câu chuyện trong thực tế ​

Case Study 1: The Economist — Chuẩn Data Visualization cho báo chí quốc tế ​

Bối cảnh ​

Vấn đề ​

Giải pháp — The Economist Style Guide ​

Kết quả ​

Bài học cho DA ​

Case Study 2: NYT / FiveThirtyEight — Interactive Charts cho COVID-19 Data ​

Bối cảnh ​

Vấn đề ​

Giải pháp — NYT & FiveThirtyEight Approach ​

Kết quả ​

Bài học cho DA ​

Case Study 3: Dashboard Việt Nam — Trước / Sau khi áp dụng Visualization Best Practices ​

Bối cảnh ​

Vấn đề — Dashboard "Trước" ​

Giải pháp — Dashboard "Sau" áp dụng Best Practices ​

So sánh Before / After ​

Bài học cho DA ​

So sánh & Tổng hợp ​

Bài tập tư duy ​

Câu hỏi thảo luận ​

Bài tập thực hành ​

🧠 Case Study — Visualization: Biến data thành câu chuyện trong thực tế

Case Study 1: The Economist — Chuẩn Data Visualization cho báo chí quốc tế

Bối cảnh

Vấn đề

Giải pháp — The Economist Style Guide

Kết quả

Bài học cho DA

Case Study 2: NYT / FiveThirtyEight — Interactive Charts cho COVID-19 Data

Bối cảnh

Vấn đề

Giải pháp — NYT & FiveThirtyEight Approach

Kết quả

Bài học cho DA

Case Study 3: Dashboard Việt Nam — Trước / Sau khi áp dụng Visualization Best Practices

Bối cảnh

Vấn đề — Dashboard "Trước"

Giải pháp — Dashboard "Sau" áp dụng Best Practices

So sánh Before / After

Bài học cho DA

So sánh & Tổng hợp

Bài tập tư duy

Câu hỏi thảo luận

Bài tập thực hành