📘 Buổi 16: Basic Forecasting — Dự báo tương lai bằng dữ liệu quá khứ

Kinh doanh cần biết tương lai. Forecasting không cần ML phức tạp — đôi khi moving average đủ tốt.

🎯 Mục tiêu buổi học

Sau buổi này, học viên sẽ:

Hiểu forecasting methods: naïve, moving average, exponential smoothing
Phân tích time series: trend, seasonality, cyclicality, noise
Đánh giá forecast accuracy: MAE, MAPE, RMSE
Sử dụng Python (statsmodels) để forecast

📋 Tổng quan

Ở Buổi 15, bạn đã nắm A/B Testing — thiết kế thí nghiệm, phân tích kết quả, tránh pitfalls. Nhưng khi bạn biết hiện tại đang xảy ra gì, câu hỏi tiếp theo luôn là: "Tương lai sẽ ra sao?"

Buổi 16 chuyển từ "thí nghiệm hiện tại" (Buổi 15) sang "dự báo tương lai" — Basic Forecasting. Thay vì đoán theo cảm tính, bạn dùng dữ liệu quá khứ để dự đoán xu hướng sắp tới — doanh thu tháng sau, nhu cầu hàng hóa quý tới, hay traffic tuần tới.

Theo McKinsey (2025), các công ty sử dụng data-driven forecasting giảm forecast error 30-50% so với dựa vào kinh nghiệm. Walmart cải thiện forecast accuracy từ 78% lên 95% nhờ chuyển sang statistical forecasting. Amazon dự báo demand cho 400M+ products hàng ngày để đảm bảo next-day delivery. Thế Giới Di Động forecast demand từng SKU để quyết định nhập hàng cho 2,000+ cửa hàng.

Forecasting không cần Machine Learning phức tạp — đôi khi Simple Moving Average hay Exponential Smoothing đã đủ chính xác cho business decisions. Buổi này trang bị cho bạn nền tảng forecasting vững chắc trước khi (nếu cần) học các phương pháp nâng cao hơn.

mermaid

flowchart LR
    A["📥 Obtain<br/>Buổi 7: Python"] --> B["🧹 Scrub<br/>Buổi 8: Pandas"]
    B --> C["🔍 Explore<br/>Buổi 9: EDA"]
    C --> D["📊 iNterpret<br/>Buổi 10-11: Chart + BI"]
    D --> E["📖 Storytelling<br/>Buổi 12: Presentation"]
    E --> F["💼 Business Metrics<br/>Buổi 13: KPI, Funnel"]
    F --> G["🏭 Industry Cases<br/>Buổi 14: Domain Analytics"]
    G --> H["🧪 A/B Testing<br/>Buổi 15: Experiment"]
    H --> I["🔮 Forecasting<br/>✅ Buổi 16: Predict"]
    style I fill:#e8f5e9,stroke:#4caf50,stroke-width:3px

💡 Tại sao Forecasting quan trọng cho DA?

Tình huống	Không forecast	Có forecast
CFO cần budget plan Q3	"Dựa theo năm ngoái cộng 10%" → sai lệch lớn	Decompose trend + seasonality → dự báo chính xác ±5%
Supply Chain nhập hàng Tết	"Năm ngoái bán 10K, năm nay nhập 12K" → thừa/thiếu	Holt-Winters forecast demand + prediction interval → nhập đúng
Marketing plan campaign	"Tháng 6 traffic thường thấp" → cảm tính	Seasonal decomposition → biết chính xác tháng nào low/high
CEO muốn biết revenue target	"Tăng trưởng 20% chắc được" → wishful thinking	ETS model + confidence interval → realistic target
Ops team cần staffing plan	"Cuối tuần cần nhiều người hơn" → mơ hồ	Time series forecast demand hourly → schedule chính xác

📌 Phần 1: Time Series Basics — Hiểu dữ liệu thời gian

Time Series là gì?

Time Series — chuỗi dữ liệu được ghi nhận theo thứ tự thời gian, với khoảng cách đều nhau (daily, weekly, monthly, quarterly).

Ví dụ	Frequency	Observation
Doanh thu hàng tháng	Monthly	Revenue VND
Số đơn hàng mỗi ngày	Daily	Order count
GDP từng quý	Quarterly	GDP VND
Nhiệt độ mỗi giờ	Hourly	Temperature °C
Giá cổ phiếu mỗi ngày	Daily	Close price VND

4 Components của Time Series

Mọi time series đều có thể phân tách thành 4 thành phần:

mermaid

flowchart TD
    A["📈 Time Series Data<br/>Y(t)"] --> B["📊 Decomposition"]
    B --> C["📈 Trend (T)<br/>Xu hướng dài hạn"]
    B --> D["🔄 Seasonality (S)<br/>Pattern lặp lại cố định"]
    B --> E["🌊 Cyclicality (C)<br/>Dao động không cố định"]
    B --> F["📡 Noise / Residual (ε)<br/>Random variation"]
    C --> G["Y(t) = T + S + C + ε<br/>(Additive)"]
    D --> G
    E --> G
    F --> G

Component	Định nghĩa	Ví dụ	Đặc điểm
Trend (T)	Xu hướng dài hạn — tăng, giảm, hoặc flat	Revenue tăng 15%/năm	Smooth, long-term direction
Seasonality (S)	Pattern lặp lại cố định theo chu kỳ	Bán hàng tăng Tết, Christmas, Black Friday	Fixed frequency (12 months, 7 days)
Cyclicality (C)	Dao động lên xuống KHÔNG cố định	Chu kỳ kinh tế 5-10 năm	Variable frequency, longer than season
Noise (ε)	Random variation — không giải thích được	Đơn hàng ngẫu nhiên tăng/giảm 1 ngày	Unpredictable, zero mean

Seasonality vs Cyclicality — Khác nhau!

Thuộc tính	Seasonality	Cyclicality
Chu kỳ	Cố định (12 months, 7 days, 4 quarters)	Không cố định (2-10 years)
Nguyên nhân	Calendar, weather, holidays	Economic cycles, business cycles
Dự đoán được	✅ Có — biết trước pattern	⚠️ Khó — không biết khi nào bắt đầu/kết thúc
Ví dụ	Kem bán nhiều mùa hè, quần áo ấm mùa đông	Bất động sản boom/bust, GDP expansion/contraction

Decomposition — Additive vs Multiplicative

Có 2 cách phân tách time series:

Additive: $Y (t) = T (t) + S (t) + ε (t)$

Seasonal variation không đổi theo thời gian
Ví dụ: Mỗi tháng 12, revenue tăng thêm 500 triệu VND (cố định)

Multiplicative: $Y (t) = T (t) \times S (t) \times ε (t)$

Seasonal variation tỷ lệ theo trend
Ví dụ: Mỗi tháng 12, revenue tăng 30% so với tháng trung bình (tỷ lệ)

python

import pandas as pd
import numpy as np
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt

# Ví dụ: Monthly revenue 3 năm
np.random.seed(16)
dates = pd.date_range('2023-01-01', periods=36, freq='MS')

# Tạo time series: trend + seasonality + noise
trend = np.linspace(800, 1500, 36)  # Trend tăng từ 800 → 1500
seasonal = 200 * np.sin(2 * np.pi * np.arange(36) / 12)  # Seasonal 12 tháng
noise = np.random.normal(0, 50, 36)  # Random noise

revenue = trend + seasonal + noise

ts = pd.Series(revenue, index=dates, name='Revenue (triệu VND)')

# Decompose — Additive
result = seasonal_decompose(ts, model='additive', period=12)

fig, axes = plt.subplots(4, 1, figsize=(12, 10))
result.observed.plot(ax=axes[0], title='Observed (Dữ liệu gốc)')
result.trend.plot(ax=axes[1], title='Trend (Xu hướng)')
result.seasonal.plot(ax=axes[2], title='Seasonality (Mùa vụ)')
result.resid.plot(ax=axes[3], title='Residual (Noise)')
plt.tight_layout()
plt.show()

Stationary vs Non-Stationary

Thuộc tính	Stationary	Non-Stationary
Mean	Hằng số theo thời gian	Thay đổi (tăng/giảm)
Variance	Hằng số	Thay đổi
Autocovariance	Chỉ phụ thuộc lag	Phụ thuộc thời điểm
Ví dụ	Nhiệt độ hàng ngày (quanh mean)	GDP (tăng liên tục)
Forecast	Dễ hơn — predictable	Khó hơn — cần transform

Kiểm tra Stationarity: dùng Augmented Dickey-Fuller (ADF) test:

python

from statsmodels.tsa.stattools import adfuller

result_adf = adfuller(ts)
print(f"ADF Statistic: {result_adf[0]:.4f}")
print(f"p-value: {result_adf[1]:.4f}")
print(f"→ {'Stationary' if result_adf[1] < 0.05 else 'Non-Stationary'}")
# p < 0.05 → Reject H₀ → Stationary
# p ≥ 0.05 → Fail to Reject → Non-Stationary

ACF và PACF — Autocorrelation

ACF (Autocorrelation Function): đo correlation giữa $Y (t)$ và $Y (t - k)$ (bao gồm indirect effects) PACF (Partial ACF): đo correlation giữa $Y (t)$ và $Y (t - k)$ sau khi loại bỏ effects của $Y (t - 1), . . ., Y (t - k + 1)$

python

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))
plot_acf(ts, lags=24, ax=ax1, title='ACF — Autocorrelation')
plot_pacf(ts, lags=24, ax=ax2, title='PACF — Partial Autocorrelation')
plt.tight_layout()
plt.show()

# ACF lags significant → data có dependency trên past values
# PACF cut off ở lag k → AR(k) model phù hợp
# ACF có pattern seasonal (spike ở lag 12, 24) → seasonal component

📌 Phần 2: Forecasting Methods — Từ Naïve đến Holt-Winters

Overview các phương pháp

mermaid

flowchart TD
    A["🔮 Forecasting Methods"] --> B["📌 Naïve<br/>Simplest baseline"]
    A --> C["📊 Moving Average<br/>Smoothing noise"]
    A --> D["📈 Exponential Smoothing<br/>Weighted recent data"]
    A --> E["🔥 Holt-Winters<br/>Trend + Seasonality"]
    B --> F["F(t+1) = Y(t)<br/>Dùng value cuối cùng"]
    C --> G["F(t+1) = mean(Y(t-k+1)...Y(t))<br/>Average k periods"]
    D --> H["F(t+1) = α·Y(t) + (1-α)·F(t)<br/>Weighted average"]
    E --> I["Level + Trend + Season<br/>Full decomposition model"]

Method 1: Naïve Forecast

Ý tưởng: Dự báo = giá trị quan sát cuối cùng.

{\hat{Y}}_{t + 1} = Y_{t}

Seasonal Naïve: Dự báo = giá trị cùng kỳ năm trước.

{\hat{Y}}_{t + 1} = Y_{t + 1 - m}

trong đó $m$ = seasonal period (12 cho monthly data).

python

# Naïve forecast
naive_forecast = ts.iloc[-1]  # Last observed value
print(f"Naïve forecast (next month): {naive_forecast:.0f} triệu VND")

# Seasonal Naïve
seasonal_naive = ts.iloc[-12]  # Same month last year
print(f"Seasonal Naïve (next month): {seasonal_naive:.0f} triệu VND")

Khi nào dùng Naïve?

✅ Làm baseline để so sánh với model phức tạp hơn
✅ Data có random walk pattern (stock prices)
❌ Không tốt khi có trend rõ ràng hoặc seasonality mạnh

Method 2: Simple Moving Average (SMA)

Ý tưởng: Dự báo = trung bình $k$ observations gần nhất. Smooth out noise.

{\hat{Y}}_{t + 1} = \frac{1}{k} \sum_{i = 0}^{k - 1} Y_{t - i}

python

def simple_moving_average(series, window):
    """Calculate SMA forecast."""
    return series.rolling(window=window).mean()

# SMA với window 3, 6, 12
for k in [3, 6, 12]:
    sma = simple_moving_average(ts, k)
    print(f"SMA(k={k:2d}) — Next forecast: {sma.iloc[-1]:.0f} triệu VND")

Window $k$	Ưu điểm	Nhược điểm
Nhỏ ( $k$ = 3)	Phản ứng nhanh với thay đổi	Bị ảnh hưởng bởi noise
Lớn ( $k$ = 12)	Smooth hơn, ít noise	Phản ứng chậm, lag behind

Hạn chế SMA:

Tất cả $k$ observations có weight bằng nhau — observation 12 tháng trước ≠ observation tháng trước
Không capture trend (SMA luôn lag behind trend)
Không capture seasonality trực tiếp

Method 3: Exponential Smoothing (Simple/SES)

Ý tưởng: Weight gần đây cao hơn, weight xa giảm exponentially.

{\hat{Y}}_{t + 1} = α \cdot Y_{t} + (1 - α) \cdot {\hat{Y}}_{t}

Trong đó $α$ (smoothing parameter, $0 < α < 1$ ):

$α$ lớn (gần 1) → weight nhiều vào gần đây → responsive nhưng noisy
$α$ nhỏ (gần 0) → weight nhiều vào quá khứ → smooth nhưng slow

python

from statsmodels.tsa.holtwinters import SimpleExpSmoothing

# Simple Exponential Smoothing
ses_model = SimpleExpSmoothing(ts).fit(smoothing_level=0.3, optimized=False)
ses_forecast = ses_model.forecast(steps=6)

print("📈 Simple Exponential Smoothing (α=0.3)")
print(f"   Forecast 6 tháng tới:")
for date, val in ses_forecast.items():
    print(f"   {date.strftime('%Y-%m')}: {val:.0f} triệu VND")

# Optimized α (let statsmodels find best α)
ses_opt = SimpleExpSmoothing(ts).fit(optimized=True)
print(f"\n   Optimal α = {ses_opt.params['smoothing_level']:.4f}")

Hạn chế SES: Chỉ dự báo flat line — không capture trend hay seasonality. Mọi forecast values bằng nhau!

Method 4: Holt-Winters (Triple Exponential Smoothing)

Ý tưởng: Mở rộng SES để capture cả trend và seasonality.

Method	Components	Parameters
SES	Level only	$α$
Holt (Double)	Level + Trend	$α, β$
Holt-Winters (Triple)	Level + Trend + Seasonality	$α, β, γ$

Holt-Winters Additive:

{\hat{Y}}_{t + h} = ℓ_{t} + h \cdot b_{t} + s_{t + h - m}

Trong đó:

$ℓ_{t}$ = Level (smoothed value): $ℓ_{t} = α (Y_{t} - s_{t - m}) + (1 - α) (ℓ_{t - 1} + b_{t - 1})$
$b_{t}$ = Trend (slope): $b_{t} = β (ℓ_{t} - ℓ_{t - 1}) + (1 - β) b_{t - 1}$
$s_{t}$ = Seasonal component: $s_{t} = γ (Y_{t} - ℓ_{t - 1} - b_{t - 1}) + (1 - γ) s_{t - m}$
$m$ = seasonal period (12 cho monthly)
$h$ = forecast horizon (bao nhiêu bước ahead)

python

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Holt-Winters — Additive trend + Additive seasonality
hw_add = ExponentialSmoothing(
    ts,
    trend='add',           # Additive trend
    seasonal='add',        # Additive seasonality
    seasonal_periods=12    # Monthly data, yearly cycle
).fit(optimized=True)

# Forecast 6 tháng
hw_forecast = hw_add.forecast(steps=6)

print("🔥 Holt-Winters Additive")
print(f"   α (level)    = {hw_add.params['smoothing_level']:.4f}")
print(f"   β (trend)    = {hw_add.params['smoothing_trend']:.4f}")
print(f"   γ (seasonal) = {hw_add.params['smoothing_seasonal']:.4f}")
print(f"\n   Forecast 6 tháng tới:")
for date, val in hw_forecast.items():
    print(f"   {date.strftime('%Y-%m')}: {val:.0f} triệu VND")

# So sánh Additive vs Multiplicative
hw_mul = ExponentialSmoothing(
    ts,
    trend='add',
    seasonal='mul',        # Multiplicative seasonality
    seasonal_periods=12
).fit(optimized=True)

hw_mul_forecast = hw_mul.forecast(steps=6)

So sánh tất cả methods

python

# Visualization: tất cả methods
fig, ax = plt.subplots(figsize=(14, 6))

# Actual data
ts.plot(ax=ax, label='Actual', color='black', linewidth=2)

# SMA
sma_6 = simple_moving_average(ts, 6)
sma_6.plot(ax=ax, label='SMA(6)', linestyle='--', alpha=0.7)

# SES fitted
ses_opt.fittedvalues.plot(ax=ax, label='SES', linestyle='--', alpha=0.7)

# Holt-Winters fitted
hw_add.fittedvalues.plot(ax=ax, label='Holt-Winters (Add)', linestyle='--', alpha=0.7)

# Forecasts
hw_forecast.plot(ax=ax, label='HW Forecast', color='red', linewidth=2, marker='o')

ax.set_title('Forecasting Methods Comparison')
ax.set_ylabel('Revenue (triệu VND)')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

📌 Phần 3: Model Evaluation — Đánh giá forecast chính xác đến đâu

Train/Test Split cho Time Series — KHÔNG random!

⚠️ Quan trọng: Time Series split KHÁC hoàn toàn random split!

Trong time series, data có temporal order — bạn KHÔNG THỂ random shuffle rồi split. Tương lai không biết trước — bạn chỉ train trên quá khứ và test trên tương lai.

mermaid

flowchart LR
    A["📊 Full Data<br/>Jan 2023 — Dec 2025<br/>36 months"] --> B["🏋️ Train Set<br/>Jan 2023 — Jun 2025<br/>30 months"]
    A --> C["🧪 Test Set<br/>Jul 2025 — Dec 2025<br/>6 months"]
    B --> D["Fit Model trên Train"]
    D --> E["Forecast 6 tháng"]
    E --> F["So sánh Forecast vs Actual (Test)"]
    C --> F
    F --> G["📏 MAE, MAPE, RMSE"]

python

# Train/Test split — temporal!
train = ts[:'2025-06']  # 30 months
test = ts['2025-07':]    # 6 months

print(f"Train: {train.index[0].strftime('%Y-%m')} → {train.index[-1].strftime('%Y-%m')} ({len(train)} months)")
print(f"Test:  {test.index[0].strftime('%Y-%m')} → {test.index[-1].strftime('%Y-%m')} ({len(test)} months)")

# ❌ KHÔNG BAO GIỜ LÀM:
# from sklearn.model_selection import train_test_split
# train, test = train_test_split(ts, test_size=0.2, random_state=42)  # ← SAI!

Forecast Accuracy Metrics

Metric	Công thức	Ý nghĩa	Đơn vị
MAE	$\frac{1}{n} \sum \| Y_{t} - {\hat{Y}}_{t} \|$	Trung bình sai lệch tuyệt đối	Cùng đơn vị Y
MAPE	$\frac{100}{n} \sum \| \frac{Y_{t} - {\hat{Y}}_{t}}{Y_{t}} \|$	% sai lệch trung bình	%
RMSE	$\sqrt{\frac{1}{n} \sum (Y_{t} - {\hat{Y}}_{t})^{2}}$	Penalize lỗi lớn nhiều hơn	Cùng đơn vị Y

python

from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

def forecast_accuracy(actual, forecast, method_name=""):
    """Calculate MAE, MAPE, RMSE."""
    mae = mean_absolute_error(actual, forecast)
    mape = np.mean(np.abs((actual - forecast) / actual)) * 100
    rmse = np.sqrt(mean_squared_error(actual, forecast))

    print(f"📏 {method_name}")
    print(f"   MAE  = {mae:.2f} (trung bình sai {mae:.0f} triệu VND)")
    print(f"   MAPE = {mape:.2f}% (trung bình sai {mape:.1f}%)")
    print(f"   RMSE = {rmse:.2f}")
    return {'MAE': mae, 'MAPE': mape, 'RMSE': rmse}

MAPE Interpretation

MAPE	Đánh giá	Ví dụ
< 10%	🟢 Excellent — Highly accurate	Demand forecasting mature products
10-20%	🟡 Good — Acceptable cho business	Revenue forecasting
20-30%	🟠 Fair — Cần cải thiện	New product forecasting
> 30%	🔴 Poor — Forecast không reliable	Volatile/new markets

So sánh model trên Test Set

python

# Fit models trên Train, forecast 6 tháng
from statsmodels.tsa.holtwinters import SimpleExpSmoothing, ExponentialSmoothing

# 1. Naïve
naive_pred = pd.Series([train.iloc[-1]] * len(test), index=test.index)

# 2. Seasonal Naïve
snaive_pred = pd.Series(train.iloc[-12:-6].values, index=test.index)

# 3. SES
ses = SimpleExpSmoothing(train).fit(optimized=True)
ses_pred = ses.forecast(len(test))

# 4. Holt-Winters
hw = ExponentialSmoothing(train, trend='add', seasonal='add',
                          seasonal_periods=12).fit(optimized=True)
hw_pred = hw.forecast(len(test))

# Accuracy comparison
results = {}
results['Naïve'] = forecast_accuracy(test, naive_pred, "Naïve")
results['Seasonal Naïve'] = forecast_accuracy(test, snaive_pred, "Seasonal Naïve")
results['SES'] = forecast_accuracy(test, ses_pred, "SES")
results['Holt-Winters'] = forecast_accuracy(test, hw_pred, "Holt-Winters")

# Summary table
accuracy_df = pd.DataFrame(results).T
print("\n📊 FORECAST ACCURACY COMPARISON")
print(accuracy_df.to_string())
print(f"\n🏆 Best model: {accuracy_df['MAPE'].idxmin()} (MAPE = {accuracy_df['MAPE'].min():.2f}%)")

Residual Analysis

python

# Residual analysis cho best model (Holt-Winters)
residuals = test - hw_pred

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# 1. Residual plot
axes[0, 0].plot(residuals, marker='o')
axes[0, 0].axhline(y=0, color='r', linestyle='--')
axes[0, 0].set_title('Residuals over Time')

# 2. Histogram
axes[0, 1].hist(residuals, bins=10, edgecolor='black')
axes[0, 1].set_title('Residual Distribution')

# 3. Forecast vs Actual
axes[1, 0].plot(test.index, test.values, 'b-o', label='Actual')
axes[1, 0].plot(test.index, hw_pred.values, 'r--o', label='Forecast')
axes[1, 0].set_title('Forecast vs Actual')
axes[1, 0].legend()

# 4. Scatter: Actual vs Forecast
axes[1, 1].scatter(test.values, hw_pred.values)
min_val = min(test.min(), hw_pred.min())
max_val = max(test.max(), hw_pred.max())
axes[1, 1].plot([min_val, max_val], [min_val, max_val], 'r--')
axes[1, 1].set_xlabel('Actual')
axes[1, 1].set_ylabel('Forecast')
axes[1, 1].set_title('Actual vs Forecast (perfect = diagonal)')

plt.tight_layout()
plt.show()

# Kiểm tra residuals: mean ≈ 0, no pattern
print(f"Residual Mean: {residuals.mean():.2f} (should be ≈ 0)")
print(f"Residual Std:  {residuals.std():.2f}")

📌 Phần 4: Practical Forecasting — Áp dụng thực tế

Revenue Forecasting

mermaid

flowchart TD
    A["📊 Historical Revenue<br/>3+ năm monthly data"] --> B["🔍 EDA<br/>Trend? Seasonality?"]
    B --> C["🧬 Decompose<br/>Additive hoặc Multiplicative"]
    C --> D["🏋️ Train/Test Split<br/>80% train, 20% test"]
    D --> E["🔮 Fit Models<br/>Naïve, SMA, SES, HW"]
    E --> F["📏 Evaluate<br/>MAE, MAPE, RMSE"]
    F --> G["🏆 Select Best Model"]
    G --> H["🚀 Forecast Future<br/>+ Prediction Interval"]
    H --> I["📋 Report cho Stakeholders"]

Demand Forecasting

Demand forecasting — dự báo số lượng sản phẩm cần bán/nhập trong tương lai.

Yếu tố ảnh hưởng	Ví dụ	Cách xử lý
Promotion	Black Friday sale → demand tăng 3x	Tách promotion days, forecast separately
Holiday	Tết → demand tăng cụ thể categories	Seasonal component capture
Weather	Mùa hè → kem, đồ uống tăng	External variable (advanced)
Competition	Đối thủ giảm giá → demand shift	Monitor + adjust forecast
Stockout	Hết hàng → demand bị ẩn	Impute true demand, not sales

Prediction Intervals — Forecast không chỉ là 1 con số!

💡 Prediction Interval quan trọng hơn Point Forecast!

Forecast 1,200 triệu VND không đủ thông tin. Forecast 1,200 ± 150 triệu VND (95% CI: 900–1,500) = actionable!

python

# Holt-Winters với prediction intervals
hw_full = ExponentialSmoothing(
    ts, trend='add', seasonal='add', seasonal_periods=12
).fit(optimized=True)

# Forecast với confidence interval
forecast_result = hw_full.get_forecast(steps=6)
forecast_mean = forecast_result.predicted_mean
forecast_ci = forecast_result.conf_int(alpha=0.05)  # 95% CI

# Alternative: simulate prediction intervals
simulations = hw_full.simulate(nsimulations=6, repetitions=1000,
                                error='mul', random_errors='bootstrap')
lower = simulations.quantile(0.025, axis=1)
upper = simulations.quantile(0.975, axis=1)

# Plot
fig, ax = plt.subplots(figsize=(14, 6))
ts.plot(ax=ax, label='Historical', color='black', linewidth=2)
forecast_mean.plot(ax=ax, label='Forecast', color='red', linewidth=2, marker='o')
ax.fill_between(forecast_mean.index, lower.values, upper.values,
                 alpha=0.2, color='red', label='95% Prediction Interval')
ax.set_title('Revenue Forecast with 95% Prediction Interval')
ax.set_ylabel('Revenue (triệu VND)')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Khi nào KHÔNG nên tin Forecast?

Tình huống	Tại sao forecast fail	Ví dụ
Black Swan events	Sự kiện chưa từng xảy ra	COVID-19 phá hủy mọi forecast
Structural break	Business model thay đổi hoàn toàn	Đóng cửa 50% stores
Quá ít data	< 2 seasonal cycles	6 tháng data → forecast seasonality = đoán
Regime change	Quy luật cũ không còn đúng	Lạm phát tăng đột biến
External shock	Yếu tố ngoài data	Đối thủ mới gia nhập thị trường

Business Reporting — Forecast cho Stakeholders

📋 FORECAST REPORT — Monthly Revenue Q1 2026
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Method:  Holt-Winters (Additive), optimized parameters
Data:    Jan 2023 — Dec 2025 (36 months)
MAPE on test set: 5.8% (Excellent)

FORECAST:
┌──────────┬──────────────┬─────────────────────┐
│ Tháng    │ Point        │ 95% Interval        │
├──────────┼──────────────┼─────────────────────┤
│ Jan 2026 │ 1,580 triệu  │ 1,380 — 1,780      │
│ Feb 2026 │ 1,420 triệu  │ 1,190 — 1,650      │
│ Mar 2026 │ 1,510 triệu  │ 1,250 — 1,770      │
│ Apr 2026 │ 1,620 triệu  │ 1,330 — 1,910      │
│ May 2026 │ 1,590 triệu  │ 1,270 — 1,910      │
│ Jun 2026 │ 1,550 triệu  │ 1,200 — 1,900      │
└──────────┴──────────────┴─────────────────────┘

⚠️ ASSUMPTIONS:
- Không có sự kiện bất thường (promotion lớn, đóng cửa)
- Trend tiếp tục tương tự 3 năm qua
- Seasonal pattern ổn định
- Prediction interval RỘNG dần theo horizon → chart shows

✅ RECOMMENDATION:
- Sử dụng point forecast cho budgeting
- Sử dụng lower bound (pessimistic) cho cash flow planning
- Sử dụng upper bound (optimistic) cho capacity planning
- Re-forecast hàng tháng khi có actual data mới

🔗 Kết nối toàn bộ

Forecasting trong hành trình DA

Buổi	Kỹ năng	Forecasting liên quan
Buổi 7-8	Python + Pandas	Xử lý time series data với Pandas
Buổi 9	EDA	Explore patterns: trend, seasonality, outliers
Buổi 10-11	Visualization + BI	Time series charts, forecast dashboards
Buổi 12	Data Storytelling	Present forecast cho leadership
Buổi 13	Business Metrics	Revenue, demand = metrics cần forecast
Buổi 14	Industry Cases	Supply chain → demand forecast, Finance → revenue forecast
Buổi 15	A/B Testing	Test hypothesis → Forecast validate prediction
Buổi 16	Basic Forecasting	Dự báo future values từ historical data

Checklist "Forecasting Literacy"

✅ Hiểu 4 components: trend, seasonality, cyclicality, noise
✅ Phân biệt additive vs multiplicative decomposition
✅ Kiểm tra stationarity (ADF test)
✅ Áp dụng Naïve, SMA, SES, Holt-Winters
✅ Train/Test split đúng cách (temporal, KHÔNG random)
✅ Đánh giá accuracy: MAE, MAPE, RMSE
✅ Biết khi nào dùng method nào
✅ Forecast với prediction interval
✅ Residual analysis: mean ≈ 0, no pattern
✅ Report forecast cho stakeholders (point + interval + assumptions)

📚 Tài liệu tham khảo

Tài liệu	Tác giả	Nội dung chính
Forecasting: Principles and Practice (3rd ed.)	Rob Hyndman & George Athanasopoulos	Bible của forecasting — free online (otexts.com/fpp3)
statsmodels Documentation	statsmodels team	Python library cho time series & statistics
Time Series Analysis and Its Applications	Shumway & Stoffer	Textbook chuyên sâu time series
Demand Forecasting for Retail	Fildes et al.	Ứng dụng forecast trong retail
Makridakis Competitions (M4, M5)	Spyros Makridakis	Benchmark forecasting competitions

🎯 Bài tập và thực hành

Workshop: Dự báo doanh thu — 3-year monthly data, decompose, fit models, evaluate, forecast 6 tháng
Case Study: Walmart demand forecasting, Amazon inventory, Thế Giới Di Động khuyến mãi
Mini Game: Forecast Simulator — 7 vòng chọn method, Gold MAPE < 10%
Blog: Câu chuyện Quân — Supply Chain DA và bài học forecast demand Tết
Tiêu chuẩn: Time Series Cross-Validation, Forecast Accuracy Metrics, Prediction Intervals

📘 Buổi 16: Basic Forecasting — Dự báo tương lai bằng dữ liệu quá khứ ​

🎯 Mục tiêu buổi học ​

📋 Tổng quan ​

📌 Phần 1: Time Series Basics — Hiểu dữ liệu thời gian ​

Time Series là gì? ​

4 Components của Time Series ​

Seasonality vs Cyclicality — Khác nhau! ​

Decomposition — Additive vs Multiplicative ​

Stationary vs Non-Stationary ​

ACF và PACF — Autocorrelation ​

📌 Phần 2: Forecasting Methods — Từ Naïve đến Holt-Winters ​

Overview các phương pháp ​

Method 1: Naïve Forecast ​

Method 2: Simple Moving Average (SMA) ​

Method 3: Exponential Smoothing (Simple/SES) ​

Method 4: Holt-Winters (Triple Exponential Smoothing) ​

So sánh tất cả methods ​

📌 Phần 3: Model Evaluation — Đánh giá forecast chính xác đến đâu ​

Train/Test Split cho Time Series — KHÔNG random! ​

Forecast Accuracy Metrics ​

MAPE Interpretation ​

So sánh model trên Test Set ​

Residual Analysis ​

📌 Phần 4: Practical Forecasting — Áp dụng thực tế ​

Revenue Forecasting ​

Demand Forecasting ​

Prediction Intervals — Forecast không chỉ là 1 con số! ​

Khi nào KHÔNG nên tin Forecast? ​

Business Reporting — Forecast cho Stakeholders ​

🔗 Kết nối toàn bộ ​

Forecasting trong hành trình DA ​

Checklist "Forecasting Literacy" ​

📚 Tài liệu tham khảo ​

🎯 Bài tập và thực hành ​

📘 Buổi 16: Basic Forecasting — Dự báo tương lai bằng dữ liệu quá khứ

🎯 Mục tiêu buổi học

📋 Tổng quan

📌 Phần 1: Time Series Basics — Hiểu dữ liệu thời gian

Time Series là gì?

4 Components của Time Series

Seasonality vs Cyclicality — Khác nhau!

Decomposition — Additive vs Multiplicative

Stationary vs Non-Stationary

ACF và PACF — Autocorrelation

📌 Phần 2: Forecasting Methods — Từ Naïve đến Holt-Winters

Overview các phương pháp

Method 1: Naïve Forecast

Method 2: Simple Moving Average (SMA)

Method 3: Exponential Smoothing (Simple/SES)

Method 4: Holt-Winters (Triple Exponential Smoothing)

So sánh tất cả methods

📌 Phần 3: Model Evaluation — Đánh giá forecast chính xác đến đâu

Train/Test Split cho Time Series — KHÔNG random!

Forecast Accuracy Metrics

MAPE Interpretation

So sánh model trên Test Set

Residual Analysis

📌 Phần 4: Practical Forecasting — Áp dụng thực tế

Revenue Forecasting

Demand Forecasting

Prediction Intervals — Forecast không chỉ là 1 con số!

Khi nào KHÔNG nên tin Forecast?

Business Reporting — Forecast cho Stakeholders

🔗 Kết nối toàn bộ

Forecasting trong hành trình DA

Checklist "Forecasting Literacy"

📚 Tài liệu tham khảo

🎯 Bài tập và thực hành