🏆 Tiêu chuẩn — Basic Forecasting

Các tiêu chuẩn giúp bạn đánh giá forecast model đúng cách, tránh overfitting, và communicate uncertainty — Time Series Cross-Validation cho evaluation, Forecast Accuracy Metrics cho so sánh, Prediction Intervals cho uncertainty.

Tổng quan tiêu chuẩn buổi 16

Buổi 16 chuyển từ A/B Testing (Buổi 15) sang Basic Forecasting — dự báo tương lai bằng dữ liệu quá khứ. Forecast tốt không chỉ cần model tốt — cần evaluation nghiêm ngặt, metrics phù hợp, và communication uncertainty:

Time Series Cross-Validation — Expanding/Sliding window evaluation → đánh giá model stability, tránh overfitting
Forecast Accuracy Metrics — MAE, MAPE, RMSE, MASE → chọn metric đúng cho đúng context
Prediction Intervals — Quantify uncertainty → hỗ trợ business decision-making

📋 Danh sách tiêu chuẩn liên quan

#	Tiêu chuẩn	Tổ chức / Tác giả	Áp dụng cho Buổi 16
1	Time Series Cross-Validation	Hyndman & Athanasopoulos (FPP3)	Evaluation strategy cho time series — không random split
2	Forecast Accuracy Metrics	M-Competition (Makridakis)	Standard metrics cho forecast benchmarking
3	Prediction Intervals	NIST, Chatfield (1993)	Uncertainty quantification cho forecast output

1️⃣ Time Series Cross-Validation — Đánh giá đúng cách

Giới thiệu

Time Series Cross-Validation (TSCV) — phương pháp đánh giá forecast model bằng cách tạo nhiều train/test splits theo thứ tự thời gian. Khác biệt cơ bản với k-fold CV thông thường: KHÔNG random shuffle — tương lai luôn phải nằm sau quá khứ.

TSCV phổ biến trong Forecasting: Principles and Practice (Hyndman & Athanasopoulos) và được sử dụng trong Makridakis Competitions (M3, M4, M5) — benchmark lớn nhất thế giới cho forecasting methods.

Tại sao KHÔNG dùng Random Split cho Time Series?

Aspect	Random Split (k-Fold CV)	Time Series CV
Data leakage	❌ Future data leak vào train → overfit	✅ Train luôn trước Test
Temporal order	❌ Phá vỡ thứ tự thời gian	✅ Bảo tồn temporal order
Real-world simulation	❌ Không giống production	✅ Simulate đúng: train past → predict future
Autocorrelation	❌ Ignore dependency	✅ Data contiguous, autocorrelation preserved

2 Strategies: Expanding Window vs Sliding Window

EXPANDING WINDOW (Anchored):
━━━━━━━━━━━━━━━━━━━━━━━━━━
Fold 1: [Train: 1-12 ] → Test: [13-15]
Fold 2: [Train: 1-15 ] → Test: [16-18]
Fold 3: [Train: 1-18 ] → Test: [19-21]
Fold 4: [Train: 1-21 ] → Test: [22-24]
Fold 5: [Train: 1-24 ] → Test: [25-27]

→ Train set GROWS mỗi fold → more data = potentially better model
→ Phù hợp khi data cũ vẫn relevant

SLIDING WINDOW (Rolling):
━━━━━━━━━━━━━━━━━━━━━━━━
Fold 1: [Train: 1-12 ] → Test: [13-15]
Fold 2: [Train: 4-15 ] → Test: [16-18]
Fold 3: [Train: 7-18 ] → Test: [19-21]
Fold 4: [Train: 10-21] → Test: [22-24]
Fold 5: [Train: 13-24] → Test: [25-27]

→ Train set FIXED size → recent data weighted equally
→ Phù hợp khi data cũ không còn relevant (concept drift)

Implementation trong Python

python

from statsmodels.tsa.holtwinters import ExponentialSmoothing
import numpy as np
import pandas as pd

def ts_cross_validate(series, min_train_size, forecast_horizon, step=1,
                       method='expanding'):
    """
    Time Series Cross-Validation.

    Parameters:
        series: pd.Series with DatetimeIndex
        min_train_size: minimum training observations
        forecast_horizon: number of periods to forecast
        step: step size between folds
        method: 'expanding' or 'sliding'

    Returns:
        dict with fold results
    """
    results = []
    n = len(series)

    for i in range(min_train_size, n - forecast_horizon + 1, step):
        if method == 'expanding':
            train = series[:i]
        else:  # sliding
            train = series[max(0, i - min_train_size):i]

        test = series[i:i + forecast_horizon]

        # Fit model
        model = ExponentialSmoothing(
            train, trend='add', seasonal='add', seasonal_periods=12
        ).fit(optimized=True)

        forecast = model.forecast(len(test))

        # Calculate metrics
        mae = np.mean(np.abs(test.values - forecast.values))
        mape = np.mean(np.abs((test.values - forecast.values) / test.values)) * 100

        results.append({
            'fold': len(results) + 1,
            'train_size': len(train),
            'test_start': test.index[0],
            'MAE': mae,
            'MAPE': mape
        })

    return pd.DataFrame(results)

# Usage
# cv_results = ts_cross_validate(ts, min_train_size=24,
#                                  forecast_horizon=6, step=3)
# print(f"Mean CV MAPE: {cv_results['MAPE'].mean():.2f}%")
# print(f"Std CV MAPE:  {cv_results['MAPE'].std():.2f}%")

Checklist Time Series CV

✅ KHÔNG BAO GIỜ random shuffle time series data
✅ Train set luôn TRƯỚC test set theo thời gian
✅ Minimum train size ≥ 2 seasonal cycles (24 months cho monthly)
✅ Forecast horizon = business-relevant horizon (6 months, 1 quarter)
✅ Report MEAN và STD of metrics across folds (không chỉ 1 fold)
✅ So sánh expanding vs sliding → chọn phù hợp
✅ Nếu MAPE vary nhiều giữa folds → model không stable → investigate

Ưu & nhược điểm

Ưu điểm	Nhược điểm
✅ Simulate real forecasting scenario	❌ Computationally expensive (nhiều folds)
✅ Detect overfitting — model tốt trên 1 period nhưng kém trên khác	❌ Cần đủ data (≥ 3 seasonal cycles)
✅ Report không chỉ accuracy mà cả stability (std across folds)	❌ Sliding window bỏ data cũ → potential information loss
✅ Industry standard — Makridakis Competitions dùng TSCV	❌ Không handle structural breaks tốt

2️⃣ Forecast Accuracy Metrics — Chọn metric đúng

Giới thiệu

Makridakis Competitions (M-Competitions) — series forecasting competitions bắt đầu từ 1982 (M1) bởi Spyros Makridakis, giáo sư INSEAD. M4 (2018) và M5 (2020) là benchmark lớn nhất — hàng trăm teams, hàng nghìn time series. M-Competitions standardize cách đánh giá forecast accuracy và establish best practices cho metrics.

Bảng Metrics

Metric	Công thức	Scale-dependent?	Symmetric?	Handle $Y_{t} = 0$ ?
MAE	$\frac{1}{n} \sum \| e_{t} \|$	✅ Có	✅	✅
RMSE	$\sqrt{\frac{1}{n} \sum e_{t}^{2}}$	✅ Có	✅	✅
MAPE	$\frac{100}{n} \sum \| \frac{e_{t}}{Y_{t}} \|$	❌ Không (%)	❌ Không	❌ Undefined
sMAPE	$\frac{200}{n} \sum \frac{\| e_{t} \|}{\| Y_{t} \| + \| {\hat{Y}}_{t} \|}$	❌ Không (%)	⚠️ Gần	✅
MASE	$\frac{M A E}{M A E_{n a ï v e}}$	❌ Không (ratio)	✅	✅

Trong đó $e_{t} = Y_{t} - {\hat{Y}}_{t}$ (forecast error).

Khi nào dùng metric nào?

Metric	Dùng khi	KHÔNG dùng khi
MAE	So sánh models cùng dataset, cùng scale	So sánh CROSS datasets (scale khác)
RMSE	Penalize outlier errors (large errors costly)	Errors đều quan trọng như nhau
MAPE	Communicate với business (% dễ hiểu)	$Y_{t}$ có thể = 0 (intermittent demand)
MASE	So sánh ACROSS datasets, benchmark vs naïve	Dataset quá ngắn cho naïve baseline

MAPE — Phổ biến nhưng có cạm bẫy

MAPE là metric phổ biến nhất trong business forecasting vì dễ hiểu — "forecast sai trung bình 8%". Tuy nhiên:

Cạm bẫy 1: Asymmetric

$Y_{t} = 100$ , ${\hat{Y}}_{t} = 150$ → error = $\frac{| 100 - 150 |}{100}$ = 50%
$Y_{t} = 150$ , ${\hat{Y}}_{t} = 100$ → error = $\frac{| 150 - 100 |}{150}$ = 33%
Cùng absolute error (50), nhưng MAPE khác nhau! MAPE penalize over-forecast nhiều hơn under-forecast.

Cạm bẫy 2: Undefined khi $Y_{t} = 0$

Sản phẩm bán 0 units 1 ngày → MAPE = division by zero
Intermittent demand (spare parts, luxury items) → MAPE không dùng được

Cạm bẫy 3: Infinity khi $Y_{t}$ rất nhỏ

$Y_{t} = 1$ , ${\hat{Y}}_{t} = 3$ → MAPE = 200% — misleading!

MASE — Metric recommended bởi Hyndman

MASE (Mean Absolute Scaled Error) — proposed bởi Rob Hyndman (2006), metric chính được dùng trong M4 competition.

M A S E = \frac{M A E}{\frac{1}{n - m} \sum_{t = m + 1}^{n} | Y_{t} - Y_{t - m} |}

Trong đó $m$ = seasonal period. Mẫu số = MAE của seasonal naïve forecast.

MASE	Ý nghĩa
< 1	Model tốt hơn seasonal naïve (good!)
= 1	Model ngang naïve (bad — model vô nghĩa)
> 1	Model kém hơn naïve (very bad — delete model)

python

def mase(actual, forecast, seasonal_period=12):
    """Calculate MASE — Mean Absolute Scaled Error."""
    n = len(actual)
    mae_model = np.mean(np.abs(actual - forecast))

    # Seasonal naïve MAE (in-sample)
    naive_errors = np.abs(np.diff(actual.values, n=seasonal_period))
    mae_naive = np.mean(naive_errors)

    return mae_model / mae_naive

# MASE < 1 → model tốt hơn seasonal naïve
# MASE > 1 → bỏ model, dùng naïve còn hơn

Business Reporting — Metric nào report cho ai?

Audience	Metric	Lý do
CEO / CFO	MAPE	Dễ hiểu: "Forecast sai 8%"
DA team	MAE + RMSE + MASE	Technical comparison giữa models
Supply Chain	MAE (units)	"Forecast sai ±500 units" → actionable
Academic / Competition	MASE	Scale-free, benchmark vs naïve

3️⃣ Prediction Intervals — Quantify Uncertainty

Giới thiệu

Prediction Intervals (PI) — khoảng tin cậy cho forecast. Thay vì nói "revenue tháng sau = 1,500 triệu VND", nói "revenue tháng sau = 1,500 triệu VND, 95% PI = [1,300 — 1,700]".

PI được formalize bởi Chris Chatfield (1993) và áp dụng rộng rãi trong NIST (National Institute of Standards and Technology) guidelines. Trong business forecasting, PI là tiêu chuẩn bắt buộc — point forecast alone = misleading.

Point Forecast vs Prediction Interval

Aspect	Point Forecast	Prediction Interval
Output	1 con số	Range [lower, upper]
Uncertainty	❌ Ẩn	✅ Hiển thị rõ ràng
Decision-making	⚠️ Over-confident	✅ Informed decisions
Communication	"Revenue = 1,500M"	"Revenue = 1,300M — 1,700M (95%)"
Business usage	Single planning scenario	Best/Base/Worst planning scenarios

Tính Prediction Interval cho Holt-Winters

Prediction interval RỘNG dần theo forecast horizon — càng xa tương lai, càng không chắc chắn:

Forecast Horizon vs Prediction Interval Width:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Month 1: ████████████░░░░░░░░░░░░░░░░░  (narrow — confident)
Month 2: ███████████░░░░░░░░░░░░░░░░░░░
Month 3: ██████████░░░░░░░░░░░░░░░░░░░░
Month 4: █████████░░░░░░░░░░░░░░░░░░░░░
Month 5: ████████░░░░░░░░░░░░░░░░░░░░░░
Month 6: ███████░░░░░░░░░░░░░░░░░░░░░░░  (wide — uncertain)

Legend: █ = forecast range, ░ = outside range

python

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Fit model
model = ExponentialSmoothing(
    ts, trend='add', seasonal='add', seasonal_periods=12
).fit(optimized=True)

# Simulation-based prediction intervals
simulations = model.simulate(
    nsimulations=6,       # 6 months ahead
    repetitions=1000,     # 1000 Monte Carlo paths
    error='mul',          # Multiplicative error
    random_errors='bootstrap'  # Bootstrap from residuals
)

# Calculate intervals
pi_80 = {
    'lower': simulations.quantile(0.10, axis=1),
    'upper': simulations.quantile(0.90, axis=1)
}
pi_95 = {
    'lower': simulations.quantile(0.025, axis=1),
    'upper': simulations.quantile(0.975, axis=1)
}

print("Prediction Intervals:")
print(f"  80% PI: [{pi_80['lower'].iloc[0]:.0f}, {pi_80['upper'].iloc[0]:.0f}]")
print(f"  95% PI: [{pi_95['lower'].iloc[0]:.0f}, {pi_95['upper'].iloc[0]:.0f}]")

Business Applications của Prediction Intervals

Level	Business Use	Ví dụ
Lower bound (P10)	Pessimistic scenario — cash flow planning	"Worst case revenue ≥ 1,300M"
Point forecast (P50)	Base case — budgeting	"Expected revenue = 1,500M"
Upper bound (P90)	Optimistic scenario — capacity planning	"Best case revenue ≤ 1,700M"

Checklist Prediction Intervals

✅ LUÔN report prediction interval cùng point forecast
✅ Dùng 80% PI cho business planning (phổ biến hơn 95%)
✅ Dùng 95% PI cho risk scenarios và stress testing
✅ Nhắc stakeholders: PI rộng dần theo horizon — xa = ít chắc chắn
✅ Nếu PI quá rộng → model không đủ tốt hoặc data quá noisy
✅ Dùng simulation-based PI (bootstrap) cho non-normal errors
✅ Validate PI: Coverage test — 95% PI nên chứa actual ≈ 95% thời gian

PI Coverage Test

python

def pi_coverage_test(actuals, lower_bounds, upper_bounds, target_coverage=0.95):
    """Test if prediction intervals achieve target coverage."""
    covered = sum((a >= l) and (a <= u)
                  for a, l, u in zip(actuals, lower_bounds, upper_bounds))
    actual_coverage = covered / len(actuals)

    print(f"Target coverage: {target_coverage:.0%}")
    print(f"Actual coverage: {actual_coverage:.0%}")
    print(f"Status: {'✅ OK' if abs(actual_coverage - target_coverage) < 0.05 else '❌ Review needed'}")

    return actual_coverage

Tổng kết — Khi nào dùng tiêu chuẩn nào?

Tiêu chuẩn	Áp dụng khi	Output
Time Series CV	Evaluate model trước khi deploy	Mean + Std MAPE across folds
Forecast Metrics	So sánh models, report cho stakeholders	MAE, MAPE, RMSE, MASE
Prediction Intervals	Communicate forecast uncertainty	Point forecast + 80%/95% PI

Workflow tích hợp 3 tiêu chuẩn

📊 FORECASTING BEST PRACTICES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: BUILD — Fit multiple models (Naïve, SMA, ETS, HW)
Step 2: EVALUATE — Time Series CV (expanding window, ≥ 5 folds)
Step 3: COMPARE — MAE/MAPE/RMSE per model, chọn best
Step 4: VALIDATE — MASE < 1 (beat naïve?)
Step 5: FORECAST — Point forecast + 95% PI
Step 6: REPORT — MAPE (business), MASE (technical), PI chart
Step 7: MONITOR — Track actual vs forecast monthly
Step 8: RE-TRAIN — Update model khi có data mới

💡 Best Practices Summary

Evaluation: Dùng Time Series CV, KHÔNG random split
Metrics: Report MAPE cho business, MASE cho technical benchmark
Uncertainty: Forecast = point + interval. Never point alone.
Monitoring: Track forecast accuracy continuously. MAPE drift > 5pp → investigate.

🏆 Tiêu chuẩn — Basic Forecasting ​

Tổng quan tiêu chuẩn buổi 16 ​

📋 Danh sách tiêu chuẩn liên quan ​

1️⃣ Time Series Cross-Validation — Đánh giá đúng cách ​

Giới thiệu ​

Tại sao KHÔNG dùng Random Split cho Time Series? ​

2 Strategies: Expanding Window vs Sliding Window ​

Implementation trong Python ​

Checklist Time Series CV ​

Ưu & nhược điểm ​

2️⃣ Forecast Accuracy Metrics — Chọn metric đúng ​

Giới thiệu ​

Bảng Metrics ​

Khi nào dùng metric nào? ​

MAPE — Phổ biến nhưng có cạm bẫy ​

MASE — Metric recommended bởi Hyndman ​

Business Reporting — Metric nào report cho ai? ​

3️⃣ Prediction Intervals — Quantify Uncertainty ​

Giới thiệu ​

Point Forecast vs Prediction Interval ​

Tính Prediction Interval cho Holt-Winters ​

Business Applications của Prediction Intervals ​

Checklist Prediction Intervals ​

PI Coverage Test ​

Tổng kết — Khi nào dùng tiêu chuẩn nào? ​

Workflow tích hợp 3 tiêu chuẩn ​

🏆 Tiêu chuẩn — Basic Forecasting

Tổng quan tiêu chuẩn buổi 16

📋 Danh sách tiêu chuẩn liên quan

1️⃣ Time Series Cross-Validation — Đánh giá đúng cách

Giới thiệu

Tại sao KHÔNG dùng Random Split cho Time Series?

2 Strategies: Expanding Window vs Sliding Window

Implementation trong Python

Checklist Time Series CV

Ưu & nhược điểm

2️⃣ Forecast Accuracy Metrics — Chọn metric đúng

Giới thiệu

Bảng Metrics

Khi nào dùng metric nào?

MAPE — Phổ biến nhưng có cạm bẫy

MASE — Metric recommended bởi Hyndman

Business Reporting — Metric nào report cho ai?

3️⃣ Prediction Intervals — Quantify Uncertainty

Giới thiệu

Point Forecast vs Prediction Interval

Tính Prediction Interval cho Holt-Winters

Business Applications của Prediction Intervals

Checklist Prediction Intervals

PI Coverage Test

Tổng kết — Khi nào dùng tiêu chuẩn nào?

Workflow tích hợp 3 tiêu chuẩn