🛠 Workshop — Dự Báo Doanh Thu

Từ dữ liệu 3 năm monthly revenue → decompose → fit models → evaluate → forecast 6 tháng tới với prediction interval. Output: Jupyter Notebook + forecast chart + accuracy comparison!

🎯 Mục tiêu workshop

Sau khi hoàn thành workshop này, bạn sẽ:

Tạo và explore time series dataset 3 năm
Decompose thành trend + seasonality + residual
Fit 4 models — SMA, SES, Holt, Holt-Winters
Evaluate — MAE, MAPE trên test set
Forecast 6 tháng tới với prediction interval

🧰 Yêu cầu

Yêu cầu	Chi tiết
Kiến thức	Đã hoàn thành Buổi 16 lý thuyết (Basic Forecasting)
Python	pandas, numpy, statsmodels, matplotlib (đã học Buổi 7-10)
Thời gian	90–120 phút
Output	Jupyter Notebook + Forecast Chart + Accuracy Report

💡 Naming convention

Đặt tên file: HoTen_Buoi16_Forecasting.ipynb

📦 Scenario: VietFresh Monthly Revenue Forecasting

Bạn là DA tại VietFresh — chuỗi cửa hàng thực phẩm sạch, 20 stores tại TP.HCM. CEO cần dự báo revenue 6 tháng tới để lập budget và kế hoạch mở thêm stores.

Business context:

Revenue đang tăng trưởng ~18%/năm
Seasonality rõ: Tết (tháng 1-2) spike, tháng 3-4 dip, tháng 9 (Trung Thu) spike nhẹ
Monthly revenue dao động 2-5 tỷ VND
CEO muốn: point forecast + confidence interval + accuracy report

Phần 1: Tạo Dataset (15 phút)

Bước 1.1: Generate 3-Year Monthly Revenue

python

# ============================================
# PHẦN 1: TẠO DATASET — 3 NĂM MONTHLY REVENUE
# ============================================

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

np.random.seed(16)  # Reproducible results

# ---- Parameters ----
n_months = 36  # 3 năm: Jan 2023 — Dec 2025
dates = pd.date_range('2023-01-01', periods=n_months, freq='MS')

# ---- Trend: tăng trưởng 18%/năm ----
# Revenue bắt đầu ~2.5 tỷ, kết thúc ~4.2 tỷ
base_revenue = 2500  # triệu VND
growth_per_month = (4200 - 2500) / 36  # ~47 triệu/tháng
trend = base_revenue + growth_per_month * np.arange(n_months)

# ---- Seasonality: monthly factors ----
# Tết (Jan-Feb) spike, Mar-Apr dip, Sep Trung Thu bump
seasonal_factors = [
    1.30,  # Jan — trước Tết, mua sắm nhiều
    1.35,  # Feb — Tết Nguyên Đán peak
    0.80,  # Mar — sau Tết, spending giảm
    0.85,  # Apr — recovery
    0.95,  # May — bình thường
    1.00,  # Jun — bình thường
    0.95,  # Jul — nghỉ hè, gia đình mua thêm
    1.05,  # Aug — back to school
    1.15,  # Sep — Trung Thu + bắt đầu Q4
    1.05,  # Oct — bình thường
    1.00,  # Nov — bình thường
    1.10,  # Dec — Giáng sinh + year-end
]
seasonal = np.tile(seasonal_factors, 3)  # 3 năm

# ---- Noise: random variation ----
noise = np.random.normal(0, 80, n_months)  # std = 80 triệu VND

# ---- Combine: Multiplicative ----
revenue = trend * seasonal + noise

# ---- Create DataFrame ----
df = pd.DataFrame({
    'date': dates,
    'revenue': revenue
}).set_index('date')

df['revenue'] = df['revenue'].round(0)

print("📊 VIETFRESH MONTHLY REVENUE (2023-2025)")
print("=" * 55)
print(f"  Period:     {df.index[0].strftime('%b %Y')} — {df.index[-1].strftime('%b %Y')}")
print(f"  Months:     {len(df)}")
print(f"  Min:        {df['revenue'].min():,.0f} triệu VND")
print(f"  Max:        {df['revenue'].max():,.0f} triệu VND")
print(f"  Mean:       {df['revenue'].mean():,.0f} triệu VND")
print(f"  Std:        {df['revenue'].std():,.0f} triệu VND")
print()
print(df.to_string())

Bước 1.2: Visualize Time Series

python

# ============================================
# VISUALIZE — Line Plot + Seasonal Markers
# ============================================

fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# --- Plot 1: Full time series ---
ax1 = axes[0]
ax1.plot(df.index, df['revenue'], 'b-o', markersize=4, linewidth=1.5, label='Monthly Revenue')
ax1.fill_between(df.index, df['revenue'], alpha=0.1, color='blue')
ax1.set_title('VietFresh Monthly Revenue (2023–2025)', fontsize=14, fontweight='bold')
ax1.set_ylabel('Revenue (triệu VND)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Highlight Tết months
for year in [2023, 2024, 2025]:
    for month in [1, 2]:
        tet_date = pd.Timestamp(f'{year}-{month:02d}-01')
        if tet_date in df.index:
            ax1.axvline(x=tet_date, color='red', alpha=0.2, linestyle='--')
ax1.annotate('Tết', xy=(pd.Timestamp('2024-02-01'), df['revenue'].max()),
             fontsize=10, color='red', alpha=0.7)

# --- Plot 2: Box plot by month ---
ax2 = axes[1]
df_monthly = df.copy()
df_monthly['month'] = df_monthly.index.month
df_monthly.boxplot(column='revenue', by='month', ax=ax2)
ax2.set_title('Revenue Distribution by Month (Seasonal Pattern)', fontsize=14, fontweight='bold')
ax2.set_xlabel('Month')
ax2.set_ylabel('Revenue (triệu VND)')
plt.suptitle('')  # Remove auto title

plt.tight_layout()
plt.show()

Phần 2: Decomposition (20 phút)

Bước 2.1: Decompose — Additive vs Multiplicative

python

# ============================================
# PHẦN 2: DECOMPOSITION
# ============================================

from statsmodels.tsa.seasonal import seasonal_decompose

ts = df['revenue']

# --- Additive Decomposition ---
decomp_add = seasonal_decompose(ts, model='additive', period=12)

# --- Multiplicative Decomposition ---
decomp_mul = seasonal_decompose(ts, model='multiplicative', period=12)

# --- Plot both ---
fig, axes = plt.subplots(4, 2, figsize=(16, 14))

# Additive (left)
decomp_add.observed.plot(ax=axes[0, 0], title='Observed (Additive)')
decomp_add.trend.plot(ax=axes[1, 0], title='Trend')
decomp_add.seasonal.plot(ax=axes[2, 0], title='Seasonality')
decomp_add.resid.plot(ax=axes[3, 0], title='Residual')

# Multiplicative (right)
decomp_mul.observed.plot(ax=axes[0, 1], title='Observed (Multiplicative)')
decomp_mul.trend.plot(ax=axes[1, 1], title='Trend')
decomp_mul.seasonal.plot(ax=axes[2, 1], title='Seasonality')
decomp_mul.resid.plot(ax=axes[3, 1], title='Residual')

for ax_row in axes:
    for ax in ax_row:
        ax.grid(True, alpha=0.3)

plt.suptitle('Additive vs Multiplicative Decomposition', fontsize=16, fontweight='bold', y=1.01)
plt.tight_layout()
plt.show()

# --- Seasonal Factors (Multiplicative) ---
print("\n📊 SEASONAL FACTORS (Multiplicative)")
print("=" * 40)
seasonal_vals = decomp_mul.seasonal[:12]
for i, (date, val) in enumerate(seasonal_vals.items()):
    month_name = date.strftime('%b')
    bar = '█' * int(val * 20)
    print(f"  {month_name}: {val:.3f} {bar}")

Bước 2.2: Stationarity Test

python

# ============================================
# STATIONARITY TEST — ADF
# ============================================

from statsmodels.tsa.stattools import adfuller

def adf_test(series, name=""):
    """Perform Augmented Dickey-Fuller test."""
    result = adfuller(series.dropna())
    print(f"📏 ADF Test — {name}")
    print(f"   ADF Statistic:  {result[0]:.4f}")
    print(f"   p-value:        {result[1]:.4f}")
    print(f"   Lags used:      {result[2]}")
    print(f"   Observations:   {result[3]}")
    print(f"   → {'✅ Stationary' if result[1] < 0.05 else '❌ Non-Stationary'} (at 5% significance)")
    print()

# Test original series
adf_test(ts, "Original Revenue")

# Test after differencing
adf_test(ts.diff().dropna(), "First Difference")

# Test after seasonal differencing
adf_test(ts.diff(12).dropna(), "Seasonal Difference (lag=12)")

Bước 2.3: ACF/PACF

python

# ============================================
# ACF & PACF
# ============================================

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

fig, axes = plt.subplots(1, 2, figsize=(14, 4))

plot_acf(ts, lags=24, ax=axes[0], title='ACF — Autocorrelation Function')
plot_pacf(ts, lags=24, ax=axes[1], title='PACF — Partial Autocorrelation Function')

plt.tight_layout()
plt.show()

print("📊 ACF Interpretation:")
print("   → Significant lags ở 1, 12, 24 = seasonal autocorrelation")
print("   → Slow decay = trend (non-stationary)")

Phần 3: Fit Forecasting Models (25 phút)

Bước 3.1: Train/Test Split (Temporal!)

python

# ============================================
# PHẦN 3: FIT MODELS
# ============================================

# Train/Test split — TEMPORAL, không random!
train = ts[:'2025-06']  # 30 months (Jan 2023 — Jun 2025)
test = ts['2025-07':]    # 6 months (Jul 2025 — Dec 2025)

print("📊 TRAIN/TEST SPLIT")
print("=" * 55)
print(f"  Train: {train.index[0].strftime('%b %Y')} — {train.index[-1].strftime('%b %Y')} ({len(train)} months)")
print(f"  Test:  {test.index[0].strftime('%b %Y')} — {test.index[-1].strftime('%b %Y')} ({len(test)} months)")
print(f"  Split: {len(train)}/{len(test)} = {len(train)/len(ts):.0%}/{len(test)/len(ts):.0%}")

# Visualize split
fig, ax = plt.subplots(figsize=(14, 5))
train.plot(ax=ax, label='Train', color='blue', linewidth=2)
test.plot(ax=ax, label='Test', color='orange', linewidth=2, linestyle='--')
ax.axvline(x=test.index[0], color='red', linestyle=':', alpha=0.7, label='Split Point')
ax.set_title('Train/Test Split — Temporal')
ax.set_ylabel('Revenue (triệu VND)')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Bước 3.2: Model 1 — Naïve & Seasonal Naïve

python

# ============================================
# MODEL 1: NAÏVE & SEASONAL NAÏVE
# ============================================

# Naïve: forecast = last observed value
naive_forecast = pd.Series([train.iloc[-1]] * len(test), index=test.index, name='Naïve')

# Seasonal Naïve: forecast = same month last year
snaive_vals = train.iloc[-12:-12+len(test)].values  # Jul-Dec 2024
# Nếu không đủ, dùng available recent months
if len(snaive_vals) < len(test):
    snaive_vals = train.tail(12).iloc[-len(test):].values
snaive_forecast = pd.Series(snaive_vals, index=test.index, name='Seasonal Naïve')

print("📊 NAÏVE FORECASTS")
print("=" * 55)
for date, n_val, sn_val in zip(test.index, naive_forecast, snaive_forecast):
    print(f"  {date.strftime('%b %Y')}: Naïve={n_val:.0f} | S.Naïve={sn_val:.0f}")

Bước 3.3: Model 2 — Simple Moving Average (SMA)

python

# ============================================
# MODEL 2: SIMPLE MOVING AVERAGE
# ============================================

# SMA(3), SMA(6), SMA(12) — forecast = last SMA value (constant)
sma_forecasts = {}

for k in [3, 6, 12]:
    sma_val = train.iloc[-k:].mean()
    sma_forecasts[f'SMA({k})'] = pd.Series([sma_val] * len(test),
                                             index=test.index, name=f'SMA({k})')
    print(f"  SMA({k:2d}): {sma_val:.0f} triệu VND")

# Best SMA cho so sánh sau
sma_best = sma_forecasts['SMA(12)']

Bước 3.4: Model 3 — Simple Exponential Smoothing (SES)

python

# ============================================
# MODEL 3: SIMPLE EXPONENTIAL SMOOTHING (SES)
# ============================================

from statsmodels.tsa.holtwinters import SimpleExpSmoothing

# Fit SES — optimized α
ses_model = SimpleExpSmoothing(train).fit(optimized=True)
ses_forecast = ses_model.forecast(steps=len(test))
ses_forecast.name = 'SES'

print(f"📊 SIMPLE EXPONENTIAL SMOOTHING")
print(f"   Optimal α (smoothing_level) = {ses_model.params['smoothing_level']:.4f}")
print(f"   AIC = {ses_model.aic:.2f}")
print()
for date, val in ses_forecast.items():
    print(f"  {date.strftime('%b %Y')}: {val:.0f} triệu VND")

# Note: SES produces FLAT forecast — all values equal!
print(f"\n  ⚠️ SES forecast is FLAT: {ses_forecast.iloc[0]:.0f} for all months")
print(f"     → SES không capture trend hay seasonality")

Bước 3.5: Model 4 — Holt-Winters

python

# ============================================
# MODEL 4: HOLT-WINTERS (Triple Exponential Smoothing)
# ============================================

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Holt-Winters — Additive trend + Additive seasonality
hw_add = ExponentialSmoothing(
    train,
    trend='add',
    seasonal='add',
    seasonal_periods=12,
    initialization_method='estimated'
).fit(optimized=True)

hw_add_forecast = hw_add.forecast(steps=len(test))
hw_add_forecast.name = 'HW Additive'

# Holt-Winters — Additive trend + Multiplicative seasonality
hw_mul = ExponentialSmoothing(
    train,
    trend='add',
    seasonal='mul',
    seasonal_periods=12,
    initialization_method='estimated'
).fit(optimized=True)

hw_mul_forecast = hw_mul.forecast(steps=len(test))
hw_mul_forecast.name = 'HW Multiplicative'

print("📊 HOLT-WINTERS MODELS")
print("=" * 55)
print(f"\n  HW Additive:")
print(f"    α (level)    = {hw_add.params['smoothing_level']:.4f}")
print(f"    β (trend)    = {hw_add.params['smoothing_trend']:.4f}")
print(f"    γ (seasonal) = {hw_add.params['smoothing_seasonal']:.4f}")
print(f"    AIC          = {hw_add.aic:.2f}")

print(f"\n  HW Multiplicative:")
print(f"    α (level)    = {hw_mul.params['smoothing_level']:.4f}")
print(f"    β (trend)    = {hw_mul.params['smoothing_trend']:.4f}")
print(f"    γ (seasonal) = {hw_mul.params['smoothing_seasonal']:.4f}")
print(f"    AIC          = {hw_mul.aic:.2f}")

print(f"\n  Forecast Comparison:")
print(f"  {'Month':<10} {'HW Add':>10} {'HW Mul':>10} {'Actual':>10}")
print(f"  {'-'*40}")
for date in test.index:
    print(f"  {date.strftime('%b %Y'):<10} {hw_add_forecast[date]:>10,.0f} {hw_mul_forecast[date]:>10,.0f} {test[date]:>10,.0f}")

Phần 4: Evaluate Models (20 phút)

Bước 4.1: Accuracy Metrics

python

# ============================================
# PHẦN 4: EVALUATE MODELS
# ============================================

from sklearn.metrics import mean_absolute_error, mean_squared_error

def evaluate_forecast(actual, forecast, model_name=""):
    """Calculate MAE, MAPE, RMSE for a forecast."""
    mae = mean_absolute_error(actual, forecast)
    mape = np.mean(np.abs((actual - forecast) / actual)) * 100
    rmse = np.sqrt(mean_squared_error(actual, forecast))

    return {
        'Model': model_name,
        'MAE': round(mae, 1),
        'MAPE (%)': round(mape, 2),
        'RMSE': round(rmse, 1)
    }

# Evaluate all models
results = []
models = {
    'Naïve': naive_forecast,
    'Seasonal Naïve': snaive_forecast,
    'SMA(12)': sma_best,
    'SES': ses_forecast,
    'HW Additive': hw_add_forecast,
    'HW Multiplicative': hw_mul_forecast
}

for name, forecast in models.items():
    r = evaluate_forecast(test, forecast, name)
    results.append(r)

# Create comparison table
results_df = pd.DataFrame(results).set_index('Model')
results_df = results_df.sort_values('MAPE (%)')

print("📊 FORECAST ACCURACY COMPARISON")
print("=" * 60)
print(results_df.to_string())
print()
print(f"🏆 Best Model: {results_df.index[0]} (MAPE = {results_df['MAPE (%)'].iloc[0]:.2f}%)")

# MAPE interpretation
best_mape = results_df['MAPE (%)'].iloc[0]
if best_mape < 10:
    quality = "🟢 EXCELLENT — Highly accurate"
elif best_mape < 20:
    quality = "🟡 GOOD — Acceptable for business"
elif best_mape < 30:
    quality = "🟠 FAIR — Needs improvement"
else:
    quality = "🔴 POOR — Forecast unreliable"
print(f"   Quality: {quality}")

Bước 4.2: Visual Comparison — Forecast vs Actual

python

# ============================================
# VISUAL COMPARISON
# ============================================

fig, axes = plt.subplots(2, 2, figsize=(16, 10))

# Plot 1: All models vs Actual
ax = axes[0, 0]
test.plot(ax=ax, label='Actual', color='black', linewidth=3, marker='o')
for name, forecast in models.items():
    forecast.plot(ax=ax, label=name, linestyle='--', alpha=0.7)
ax.set_title('All Models vs Actual')
ax.set_ylabel('Revenue (triệu VND)')
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)

# Plot 2: Best model detail
ax = axes[0, 1]
best_name = results_df.index[0]
best_forecast = models[best_name]
train[-6:].plot(ax=ax, label='Train (last 6m)', color='blue', linewidth=2, marker='o')
test.plot(ax=ax, label='Actual', color='black', linewidth=3, marker='o')
best_forecast.plot(ax=ax, label=f'{best_name} Forecast', color='red', linewidth=2,
                    marker='s', linestyle='--')
ax.set_title(f'Best Model: {best_name}')
ax.set_ylabel('Revenue (triệu VND)')
ax.legend()
ax.grid(True, alpha=0.3)

# Plot 3: Error by model (bar chart)
ax = axes[1, 0]
results_df['MAPE (%)'].plot(kind='barh', ax=ax, color=['green' if x < 10 else 'orange' if x < 20 else 'red'
                                                         for x in results_df['MAPE (%)']])
ax.set_title('MAPE by Model (%)')
ax.set_xlabel('MAPE (%)')
ax.axvline(x=10, color='green', linestyle='--', alpha=0.5, label='Good threshold (10%)')
ax.legend()

# Plot 4: Residuals of best model
ax = axes[1, 1]
residuals = test - best_forecast
ax.bar(range(len(residuals)), residuals.values, color=['green' if r > 0 else 'red' for r in residuals])
ax.axhline(y=0, color='black', linestyle='-')
ax.set_title(f'Residuals — {best_name}')
ax.set_ylabel('Error (triệu VND)')
ax.set_xticks(range(len(test)))
ax.set_xticklabels([d.strftime('%b') for d in test.index])

plt.suptitle('FORECAST EVALUATION DASHBOARD', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

Bước 4.3: Residual Analysis

python

# ============================================
# RESIDUAL ANALYSIS — Best Model
# ============================================

best_model_obj = hw_mul if best_name == 'HW Multiplicative' else hw_add
residuals_full = best_model_obj.resid

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# 1. Residuals over time
axes[0, 0].plot(residuals_full, marker='o', markersize=3)
axes[0, 0].axhline(y=0, color='r', linestyle='--')
axes[0, 0].set_title('Residuals over Time')
axes[0, 0].grid(True, alpha=0.3)

# 2. Residual histogram
axes[0, 1].hist(residuals_full.dropna(), bins=12, edgecolor='black', alpha=0.7)
axes[0, 1].axvline(x=0, color='r', linestyle='--')
axes[0, 1].set_title('Residual Distribution')

# 3. ACF of residuals
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(residuals_full.dropna(), lags=18, ax=axes[1, 0], title='ACF of Residuals')

# 4. QQ plot
from scipy import stats
stats.probplot(residuals_full.dropna(), plot=axes[1, 1])
axes[1, 1].set_title('Q-Q Plot (Normality Check)')

plt.suptitle(f'Residual Analysis — {best_name}', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print(f"📊 RESIDUAL STATISTICS — {best_name}")
print(f"   Mean:     {residuals_full.mean():.2f} (should be ≈ 0)")
print(f"   Std:      {residuals_full.std():.2f}")
print(f"   Min:      {residuals_full.min():.2f}")
print(f"   Max:      {residuals_full.max():.2f}")
print(f"   Skewness: {residuals_full.skew():.2f} (should be ≈ 0)")

Phần 5: Forecast 6 Tháng Tới + Prediction Interval (15 phút)

Bước 5.1: Final Forecast

python

# ============================================
# PHẦN 5: FORECAST 6 THÁNG — JAN-JUN 2026
# ============================================

# Refit best model on ALL data (train + test)
final_model = ExponentialSmoothing(
    ts,  # Full 36 months
    trend='add',
    seasonal='mul',  # hoặc 'add' tùy best model
    seasonal_periods=12,
    initialization_method='estimated'
).fit(optimized=True)

# Forecast 6 months
forecast_horizon = 6
future_forecast = final_model.forecast(steps=forecast_horizon)

print("🔮 REVENUE FORECAST — JAN-JUN 2026")
print("=" * 55)
print(f"  Model: {best_name}")
print(f"  Training data: Jan 2023 — Dec 2025 ({len(ts)} months)")
print(f"  MAPE on holdout: {results_df['MAPE (%)'].iloc[0]:.2f}%")
print()
print(f"  {'Month':<12} {'Forecast':>15}")
print(f"  {'-'*27}")
for date, val in future_forecast.items():
    print(f"  {date.strftime('%b %Y'):<12} {val:>12,.0f} tr. VND")
print(f"  {'-'*27}")
print(f"  {'Total':<12} {future_forecast.sum():>12,.0f} tr. VND")

Bước 5.2: Prediction Intervals

python

# ============================================
# PREDICTION INTERVALS — Bootstrap Simulation
# ============================================

# Simulate 1000 forecast paths
n_simulations = 1000
simulations = final_model.simulate(
    nsimulations=forecast_horizon,
    repetitions=n_simulations,
    error='mul',
    random_errors='bootstrap'
)

# Calculate prediction intervals
pi_80_lower = simulations.quantile(0.10, axis=1)
pi_80_upper = simulations.quantile(0.90, axis=1)
pi_95_lower = simulations.quantile(0.025, axis=1)
pi_95_upper = simulations.quantile(0.975, axis=1)

print("🔮 FORECAST WITH PREDICTION INTERVALS")
print("=" * 75)
print(f"  {'Month':<10} {'Point':>10} {'80% PI':>22} {'95% PI':>22}")
print(f"  {'-'*65}")
for i, date in enumerate(future_forecast.index):
    pt = future_forecast.iloc[i]
    l80, u80 = pi_80_lower.iloc[i], pi_80_upper.iloc[i]
    l95, u95 = pi_95_lower.iloc[i], pi_95_upper.iloc[i]
    print(f"  {date.strftime('%b %Y'):<10} {pt:>10,.0f} [{l80:>8,.0f} — {u80:>8,.0f}] [{l95:>8,.0f} — {u95:>8,.0f}]")

print(f"\n  ⚠️ PI rộng dần theo horizon → uncertainty tăng")
print(f"  📏 PI width Month 1: ±{(pi_95_upper.iloc[0] - pi_95_lower.iloc[0])/2:,.0f} | Month 6: ±{(pi_95_upper.iloc[-1] - pi_95_lower.iloc[-1])/2:,.0f}")

Bước 5.3: Forecast Chart

python

# ============================================
# FINAL FORECAST CHART
# ============================================

fig, ax = plt.subplots(figsize=(16, 7))

# Historical data
ts.plot(ax=ax, label='Historical Revenue', color='blue', linewidth=2, marker='o', markersize=4)

# Point forecast
future_forecast.plot(ax=ax, label='Forecast', color='red', linewidth=2.5,
                      marker='s', markersize=6, linestyle='--')

# Prediction intervals
ax.fill_between(future_forecast.index, pi_95_lower.values, pi_95_upper.values,
                 alpha=0.1, color='red', label='95% Prediction Interval')
ax.fill_between(future_forecast.index, pi_80_lower.values, pi_80_upper.values,
                 alpha=0.2, color='red', label='80% Prediction Interval')

# Formatting
ax.set_title('VietFresh Revenue Forecast — Jan to Jun 2026', fontsize=16, fontweight='bold')
ax.set_ylabel('Revenue (triệu VND)', fontsize=12)
ax.set_xlabel('', fontsize=12)
ax.legend(loc='upper left', fontsize=10)
ax.grid(True, alpha=0.3)

# Annotate
ax.annotate(f'MAPE: {results_df["MAPE (%)"].iloc[0]:.1f}%\nModel: {best_name}',
            xy=(0.02, 0.95), xycoords='axes fraction',
            fontsize=10, verticalalignment='top',
            bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.8))

plt.tight_layout()
plt.savefig('forecast_chart.png', dpi=150, bbox_inches='tight')
plt.show()
print("  💾 Chart saved: forecast_chart.png")

Phần 6: Forecast Report (10 phút)

Bước 6.1: Final Report cho CEO

python

# ============================================
# PHẦN 6: FORECAST REPORT
# ============================================

report = f"""
{'='*65}
📋 VIETFRESH REVENUE FORECAST REPORT
{'='*65}

PREPARED BY: [Your Name], Data Analyst
DATE:        {pd.Timestamp.now().strftime('%Y-%m-%d')}
MODEL:       {best_name}

━━━ DATA SUMMARY ━━━
  Historical period: Jan 2023 — Dec 2025 (36 months)
  Revenue range:     {ts.min():,.0f} — {ts.max():,.0f} triệu VND
  Average monthly:   {ts.mean():,.0f} triệu VND
  Growth rate:       ~18%/year

━━━ MODEL PERFORMANCE ━━━
  Test period:       Jul — Dec 2025 (6 months)
  MAE:               {results_df['MAE'].iloc[0]:,.1f} triệu VND
  MAPE:              {results_df['MAPE (%)'].iloc[0]:.2f}%
  RMSE:              {results_df['RMSE'].iloc[0]:,.1f} triệu VND
  Quality:           {quality}

━━━ MODELS COMPARED ━━━
{results_df.to_string()}

━━━ FORECAST: JAN — JUN 2026 ━━━
"""

for date, val in future_forecast.items():
    i = list(future_forecast.index).index(date)
    l95, u95 = pi_95_lower.iloc[i], pi_95_upper.iloc[i]
    report += f"  {date.strftime('%b %Y')}: {val:>8,.0f} triệu VND  (95% PI: {l95:>8,.0f} — {u95:>8,.0f})\n"

report += f"""
  Total H1/2026:     {future_forecast.sum():,.0f} triệu VND
  95% PI Total:      {pi_95_lower.sum():,.0f} — {pi_95_upper.sum():,.0f}

━━━ ASSUMPTIONS ━━━
  1. Không có sự kiện bất thường (promotion lớn, đóng cửa store)
  2. Trend tiếp tục tương tự 3 năm qua (~18%/year)
  3. Seasonal pattern ổn định (Tết, Trung Thu)
  4. Không có đối thủ mới hoặc market disruption

━━━ RECOMMENDATIONS ━━━
  📌 Sử dụng point forecast cho budgeting cơ bản
  📌 Sử dụng lower bound (pessimistic) cho cash flow planning
  📌 Sử dụng upper bound (optimistic) cho capacity/staffing
  📌 Re-forecast hàng tháng khi có actual data mới
  📌 Monitor: nếu actual nằm ngoài 95% PI → investigate

{'='*65}
"""

print(report)

✅ Deliverables Checklist

📦 WORKSHOP DELIVERABLES
━━━━━━━━━━━━━━━━━━━━━━━━

✅ Jupyter Notebook: HoTen_Buoi16_Forecasting.ipynb
    ├── Phần 1: Dataset generation + visualization
    ├── Phần 2: Decomposition (additive + multiplicative) + ADF test
    ├── Phần 3: 4 models fitted (Naïve, SMA, SES, Holt-Winters)
    ├── Phần 4: Accuracy comparison (MAE, MAPE, RMSE) + residual analysis
    ├── Phần 5: 6-month forecast + prediction intervals + chart
    └── Phần 6: Forecast report for CEO

✅ Forecast Chart: forecast_chart.png
    ├── Historical data (blue line)
    ├── Point forecast (red dashed)
    ├── 80% prediction interval (darker shade)
    └── 95% prediction interval (lighter shade)

✅ Accuracy Comparison Table:
    └── All 6 models ranked by MAPE

✅ Key Findings (3 bullet points):
    1. Best model + MAPE
    2. Seasonal pattern observation (which months peak/dip)
    3. Forecast range for H1/2026 (point + interval)

💡 Extensions (Nếu còn thời gian)

Extension 1: Time Series Cross-Validation

python

# Time Series CV — Expanding Window
from statsmodels.tsa.holtwinters import ExponentialSmoothing

cv_results = []
min_train = 24  # 2 years minimum

for i in range(min_train, len(ts) - 3, 3):  # step = 3 months
    train_cv = ts[:i]
    test_cv = ts[i:i+3]

    model_cv = ExponentialSmoothing(
        train_cv, trend='add', seasonal='mul', seasonal_periods=12
    ).fit(optimized=True)

    forecast_cv = model_cv.forecast(len(test_cv))
    mape_cv = np.mean(np.abs((test_cv - forecast_cv) / test_cv)) * 100

    cv_results.append({
        'fold': len(cv_results) + 1,
        'train_months': len(train_cv),
        'test_start': test_cv.index[0].strftime('%b %Y'),
        'MAPE': round(mape_cv, 2)
    })

cv_df = pd.DataFrame(cv_results)
print("📊 TIME SERIES CROSS-VALIDATION")
print(cv_df.to_string(index=False))
print(f"\nMean MAPE: {cv_df['MAPE'].mean():.2f}%")
print(f"Std MAPE:  {cv_df['MAPE'].std():.2f}%")

Extension 2: MASE Calculation

python

# MASE — Mean Absolute Scaled Error
def calculate_mase(actual, forecast, train_series, seasonal_period=12):
    """MASE: compared against seasonal naïve baseline."""
    mae_model = np.mean(np.abs(actual - forecast))
    naive_errors = np.abs(train_series.values[seasonal_period:] - train_series.values[:-seasonal_period])
    mae_naive = np.mean(naive_errors)
    return mae_model / mae_naive

for name, fcast in models.items():
    m = calculate_mase(test, fcast, train, seasonal_period=12)
    status = "✅ Better than naïve" if m < 1 else "❌ Worse than naïve"
    print(f"  {name:<20} MASE = {m:.3f}  {status}")

🛠 Workshop — Dự Báo Doanh Thu ​

🎯 Mục tiêu workshop ​

🧰 Yêu cầu ​

📦 Scenario: VietFresh Monthly Revenue Forecasting ​

Phần 1: Tạo Dataset (15 phút) ​

Bước 1.1: Generate 3-Year Monthly Revenue ​

Bước 1.2: Visualize Time Series ​

Phần 2: Decomposition (20 phút) ​

Bước 2.1: Decompose — Additive vs Multiplicative ​

Bước 2.2: Stationarity Test ​

Bước 2.3: ACF/PACF ​

Phần 3: Fit Forecasting Models (25 phút) ​

Bước 3.1: Train/Test Split (Temporal!) ​

Bước 3.2: Model 1 — Naïve & Seasonal Naïve ​

Bước 3.3: Model 2 — Simple Moving Average (SMA) ​

Bước 3.4: Model 3 — Simple Exponential Smoothing (SES) ​

Bước 3.5: Model 4 — Holt-Winters ​

Phần 4: Evaluate Models (20 phút) ​

Bước 4.1: Accuracy Metrics ​

Bước 4.2: Visual Comparison — Forecast vs Actual ​

Bước 4.3: Residual Analysis ​

Phần 5: Forecast 6 Tháng Tới + Prediction Interval (15 phút) ​

Bước 5.1: Final Forecast ​

Bước 5.2: Prediction Intervals ​

Bước 5.3: Forecast Chart ​

Phần 6: Forecast Report (10 phút) ​

Bước 6.1: Final Report cho CEO ​

✅ Deliverables Checklist ​

💡 Extensions (Nếu còn thời gian) ​

Extension 1: Time Series Cross-Validation ​

Extension 2: MASE Calculation ​

🛠 Workshop — Dự Báo Doanh Thu

🎯 Mục tiêu workshop

🧰 Yêu cầu

📦 Scenario: VietFresh Monthly Revenue Forecasting

Phần 1: Tạo Dataset (15 phút)

Bước 1.1: Generate 3-Year Monthly Revenue

Bước 1.2: Visualize Time Series

Phần 2: Decomposition (20 phút)

Bước 2.1: Decompose — Additive vs Multiplicative

Bước 2.2: Stationarity Test

Bước 2.3: ACF/PACF

Phần 3: Fit Forecasting Models (25 phút)

Bước 3.1: Train/Test Split (Temporal!)

Bước 3.2: Model 1 — Naïve & Seasonal Naïve

Bước 3.3: Model 2 — Simple Moving Average (SMA)

Bước 3.4: Model 3 — Simple Exponential Smoothing (SES)

Bước 3.5: Model 4 — Holt-Winters

Phần 4: Evaluate Models (20 phút)

Bước 4.1: Accuracy Metrics

Bước 4.2: Visual Comparison — Forecast vs Actual

Bước 4.3: Residual Analysis

Phần 5: Forecast 6 Tháng Tới + Prediction Interval (15 phút)

Bước 5.1: Final Forecast

Bước 5.2: Prediction Intervals

Bước 5.3: Forecast Chart

Phần 6: Forecast Report (10 phút)

Bước 6.1: Final Report cho CEO

✅ Deliverables Checklist

💡 Extensions (Nếu còn thời gian)

Extension 1: Time Series Cross-Validation

Extension 2: MASE Calculation