Appearance
🛠 Workshop — Dự Báo Doanh Thu
Từ dữ liệu 3 năm monthly revenue → decompose → fit models → evaluate → forecast 6 tháng tới với prediction interval. Output: Jupyter Notebook + forecast chart + accuracy comparison!
🎯 Mục tiêu workshop
Sau khi hoàn thành workshop này, bạn sẽ:
- Tạo và explore time series dataset 3 năm
- Decompose thành trend + seasonality + residual
- Fit 4 models — SMA, SES, Holt, Holt-Winters
- Evaluate — MAE, MAPE trên test set
- Forecast 6 tháng tới với prediction interval
🧰 Yêu cầu
| Yêu cầu | Chi tiết |
|---|---|
| Kiến thức | Đã hoàn thành Buổi 16 lý thuyết (Basic Forecasting) |
| Python | pandas, numpy, statsmodels, matplotlib (đã học Buổi 7-10) |
| Thời gian | 90–120 phút |
| Output | Jupyter Notebook + Forecast Chart + Accuracy Report |
💡 Naming convention
Đặt tên file: HoTen_Buoi16_Forecasting.ipynb
📦 Scenario: VietFresh Monthly Revenue Forecasting
Bạn là DA tại VietFresh — chuỗi cửa hàng thực phẩm sạch, 20 stores tại TP.HCM. CEO cần dự báo revenue 6 tháng tới để lập budget và kế hoạch mở thêm stores.
Business context:
- Revenue đang tăng trưởng ~18%/năm
- Seasonality rõ: Tết (tháng 1-2) spike, tháng 3-4 dip, tháng 9 (Trung Thu) spike nhẹ
- Monthly revenue dao động 2-5 tỷ VND
- CEO muốn: point forecast + confidence interval + accuracy report
Phần 1: Tạo Dataset (15 phút)
Bước 1.1: Generate 3-Year Monthly Revenue
python
# ============================================
# PHẦN 1: TẠO DATASET — 3 NĂM MONTHLY REVENUE
# ============================================
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
np.random.seed(16) # Reproducible results
# ---- Parameters ----
n_months = 36 # 3 năm: Jan 2023 — Dec 2025
dates = pd.date_range('2023-01-01', periods=n_months, freq='MS')
# ---- Trend: tăng trưởng 18%/năm ----
# Revenue bắt đầu ~2.5 tỷ, kết thúc ~4.2 tỷ
base_revenue = 2500 # triệu VND
growth_per_month = (4200 - 2500) / 36 # ~47 triệu/tháng
trend = base_revenue + growth_per_month * np.arange(n_months)
# ---- Seasonality: monthly factors ----
# Tết (Jan-Feb) spike, Mar-Apr dip, Sep Trung Thu bump
seasonal_factors = [
1.30, # Jan — trước Tết, mua sắm nhiều
1.35, # Feb — Tết Nguyên Đán peak
0.80, # Mar — sau Tết, spending giảm
0.85, # Apr — recovery
0.95, # May — bình thường
1.00, # Jun — bình thường
0.95, # Jul — nghỉ hè, gia đình mua thêm
1.05, # Aug — back to school
1.15, # Sep — Trung Thu + bắt đầu Q4
1.05, # Oct — bình thường
1.00, # Nov — bình thường
1.10, # Dec — Giáng sinh + year-end
]
seasonal = np.tile(seasonal_factors, 3) # 3 năm
# ---- Noise: random variation ----
noise = np.random.normal(0, 80, n_months) # std = 80 triệu VND
# ---- Combine: Multiplicative ----
revenue = trend * seasonal + noise
# ---- Create DataFrame ----
df = pd.DataFrame({
'date': dates,
'revenue': revenue
}).set_index('date')
df['revenue'] = df['revenue'].round(0)
print("📊 VIETFRESH MONTHLY REVENUE (2023-2025)")
print("=" * 55)
print(f" Period: {df.index[0].strftime('%b %Y')} — {df.index[-1].strftime('%b %Y')}")
print(f" Months: {len(df)}")
print(f" Min: {df['revenue'].min():,.0f} triệu VND")
print(f" Max: {df['revenue'].max():,.0f} triệu VND")
print(f" Mean: {df['revenue'].mean():,.0f} triệu VND")
print(f" Std: {df['revenue'].std():,.0f} triệu VND")
print()
print(df.to_string())Bước 1.2: Visualize Time Series
python
# ============================================
# VISUALIZE — Line Plot + Seasonal Markers
# ============================================
fig, axes = plt.subplots(2, 1, figsize=(14, 10))
# --- Plot 1: Full time series ---
ax1 = axes[0]
ax1.plot(df.index, df['revenue'], 'b-o', markersize=4, linewidth=1.5, label='Monthly Revenue')
ax1.fill_between(df.index, df['revenue'], alpha=0.1, color='blue')
ax1.set_title('VietFresh Monthly Revenue (2023–2025)', fontsize=14, fontweight='bold')
ax1.set_ylabel('Revenue (triệu VND)')
ax1.legend()
ax1.grid(True, alpha=0.3)
# Highlight Tết months
for year in [2023, 2024, 2025]:
for month in [1, 2]:
tet_date = pd.Timestamp(f'{year}-{month:02d}-01')
if tet_date in df.index:
ax1.axvline(x=tet_date, color='red', alpha=0.2, linestyle='--')
ax1.annotate('Tết', xy=(pd.Timestamp('2024-02-01'), df['revenue'].max()),
fontsize=10, color='red', alpha=0.7)
# --- Plot 2: Box plot by month ---
ax2 = axes[1]
df_monthly = df.copy()
df_monthly['month'] = df_monthly.index.month
df_monthly.boxplot(column='revenue', by='month', ax=ax2)
ax2.set_title('Revenue Distribution by Month (Seasonal Pattern)', fontsize=14, fontweight='bold')
ax2.set_xlabel('Month')
ax2.set_ylabel('Revenue (triệu VND)')
plt.suptitle('') # Remove auto title
plt.tight_layout()
plt.show()Phần 2: Decomposition (20 phút)
Bước 2.1: Decompose — Additive vs Multiplicative
python
# ============================================
# PHẦN 2: DECOMPOSITION
# ============================================
from statsmodels.tsa.seasonal import seasonal_decompose
ts = df['revenue']
# --- Additive Decomposition ---
decomp_add = seasonal_decompose(ts, model='additive', period=12)
# --- Multiplicative Decomposition ---
decomp_mul = seasonal_decompose(ts, model='multiplicative', period=12)
# --- Plot both ---
fig, axes = plt.subplots(4, 2, figsize=(16, 14))
# Additive (left)
decomp_add.observed.plot(ax=axes[0, 0], title='Observed (Additive)')
decomp_add.trend.plot(ax=axes[1, 0], title='Trend')
decomp_add.seasonal.plot(ax=axes[2, 0], title='Seasonality')
decomp_add.resid.plot(ax=axes[3, 0], title='Residual')
# Multiplicative (right)
decomp_mul.observed.plot(ax=axes[0, 1], title='Observed (Multiplicative)')
decomp_mul.trend.plot(ax=axes[1, 1], title='Trend')
decomp_mul.seasonal.plot(ax=axes[2, 1], title='Seasonality')
decomp_mul.resid.plot(ax=axes[3, 1], title='Residual')
for ax_row in axes:
for ax in ax_row:
ax.grid(True, alpha=0.3)
plt.suptitle('Additive vs Multiplicative Decomposition', fontsize=16, fontweight='bold', y=1.01)
plt.tight_layout()
plt.show()
# --- Seasonal Factors (Multiplicative) ---
print("\n📊 SEASONAL FACTORS (Multiplicative)")
print("=" * 40)
seasonal_vals = decomp_mul.seasonal[:12]
for i, (date, val) in enumerate(seasonal_vals.items()):
month_name = date.strftime('%b')
bar = '█' * int(val * 20)
print(f" {month_name}: {val:.3f} {bar}")Bước 2.2: Stationarity Test
python
# ============================================
# STATIONARITY TEST — ADF
# ============================================
from statsmodels.tsa.stattools import adfuller
def adf_test(series, name=""):
"""Perform Augmented Dickey-Fuller test."""
result = adfuller(series.dropna())
print(f"📏 ADF Test — {name}")
print(f" ADF Statistic: {result[0]:.4f}")
print(f" p-value: {result[1]:.4f}")
print(f" Lags used: {result[2]}")
print(f" Observations: {result[3]}")
print(f" → {'✅ Stationary' if result[1] < 0.05 else '❌ Non-Stationary'} (at 5% significance)")
print()
# Test original series
adf_test(ts, "Original Revenue")
# Test after differencing
adf_test(ts.diff().dropna(), "First Difference")
# Test after seasonal differencing
adf_test(ts.diff(12).dropna(), "Seasonal Difference (lag=12)")Bước 2.3: ACF/PACF
python
# ============================================
# ACF & PACF
# ============================================
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
fig, axes = plt.subplots(1, 2, figsize=(14, 4))
plot_acf(ts, lags=24, ax=axes[0], title='ACF — Autocorrelation Function')
plot_pacf(ts, lags=24, ax=axes[1], title='PACF — Partial Autocorrelation Function')
plt.tight_layout()
plt.show()
print("📊 ACF Interpretation:")
print(" → Significant lags ở 1, 12, 24 = seasonal autocorrelation")
print(" → Slow decay = trend (non-stationary)")Phần 3: Fit Forecasting Models (25 phút)
Bước 3.1: Train/Test Split (Temporal!)
python
# ============================================
# PHẦN 3: FIT MODELS
# ============================================
# Train/Test split — TEMPORAL, không random!
train = ts[:'2025-06'] # 30 months (Jan 2023 — Jun 2025)
test = ts['2025-07':] # 6 months (Jul 2025 — Dec 2025)
print("📊 TRAIN/TEST SPLIT")
print("=" * 55)
print(f" Train: {train.index[0].strftime('%b %Y')} — {train.index[-1].strftime('%b %Y')} ({len(train)} months)")
print(f" Test: {test.index[0].strftime('%b %Y')} — {test.index[-1].strftime('%b %Y')} ({len(test)} months)")
print(f" Split: {len(train)}/{len(test)} = {len(train)/len(ts):.0%}/{len(test)/len(ts):.0%}")
# Visualize split
fig, ax = plt.subplots(figsize=(14, 5))
train.plot(ax=ax, label='Train', color='blue', linewidth=2)
test.plot(ax=ax, label='Test', color='orange', linewidth=2, linestyle='--')
ax.axvline(x=test.index[0], color='red', linestyle=':', alpha=0.7, label='Split Point')
ax.set_title('Train/Test Split — Temporal')
ax.set_ylabel('Revenue (triệu VND)')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()Bước 3.2: Model 1 — Naïve & Seasonal Naïve
python
# ============================================
# MODEL 1: NAÏVE & SEASONAL NAÏVE
# ============================================
# Naïve: forecast = last observed value
naive_forecast = pd.Series([train.iloc[-1]] * len(test), index=test.index, name='Naïve')
# Seasonal Naïve: forecast = same month last year
snaive_vals = train.iloc[-12:-12+len(test)].values # Jul-Dec 2024
# Nếu không đủ, dùng available recent months
if len(snaive_vals) < len(test):
snaive_vals = train.tail(12).iloc[-len(test):].values
snaive_forecast = pd.Series(snaive_vals, index=test.index, name='Seasonal Naïve')
print("📊 NAÏVE FORECASTS")
print("=" * 55)
for date, n_val, sn_val in zip(test.index, naive_forecast, snaive_forecast):
print(f" {date.strftime('%b %Y')}: Naïve={n_val:.0f} | S.Naïve={sn_val:.0f}")Bước 3.3: Model 2 — Simple Moving Average (SMA)
python
# ============================================
# MODEL 2: SIMPLE MOVING AVERAGE
# ============================================
# SMA(3), SMA(6), SMA(12) — forecast = last SMA value (constant)
sma_forecasts = {}
for k in [3, 6, 12]:
sma_val = train.iloc[-k:].mean()
sma_forecasts[f'SMA({k})'] = pd.Series([sma_val] * len(test),
index=test.index, name=f'SMA({k})')
print(f" SMA({k:2d}): {sma_val:.0f} triệu VND")
# Best SMA cho so sánh sau
sma_best = sma_forecasts['SMA(12)']Bước 3.4: Model 3 — Simple Exponential Smoothing (SES)
python
# ============================================
# MODEL 3: SIMPLE EXPONENTIAL SMOOTHING (SES)
# ============================================
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
# Fit SES — optimized α
ses_model = SimpleExpSmoothing(train).fit(optimized=True)
ses_forecast = ses_model.forecast(steps=len(test))
ses_forecast.name = 'SES'
print(f"📊 SIMPLE EXPONENTIAL SMOOTHING")
print(f" Optimal α (smoothing_level) = {ses_model.params['smoothing_level']:.4f}")
print(f" AIC = {ses_model.aic:.2f}")
print()
for date, val in ses_forecast.items():
print(f" {date.strftime('%b %Y')}: {val:.0f} triệu VND")
# Note: SES produces FLAT forecast — all values equal!
print(f"\n ⚠️ SES forecast is FLAT: {ses_forecast.iloc[0]:.0f} for all months")
print(f" → SES không capture trend hay seasonality")Bước 3.5: Model 4 — Holt-Winters
python
# ============================================
# MODEL 4: HOLT-WINTERS (Triple Exponential Smoothing)
# ============================================
from statsmodels.tsa.holtwinters import ExponentialSmoothing
# Holt-Winters — Additive trend + Additive seasonality
hw_add = ExponentialSmoothing(
train,
trend='add',
seasonal='add',
seasonal_periods=12,
initialization_method='estimated'
).fit(optimized=True)
hw_add_forecast = hw_add.forecast(steps=len(test))
hw_add_forecast.name = 'HW Additive'
# Holt-Winters — Additive trend + Multiplicative seasonality
hw_mul = ExponentialSmoothing(
train,
trend='add',
seasonal='mul',
seasonal_periods=12,
initialization_method='estimated'
).fit(optimized=True)
hw_mul_forecast = hw_mul.forecast(steps=len(test))
hw_mul_forecast.name = 'HW Multiplicative'
print("📊 HOLT-WINTERS MODELS")
print("=" * 55)
print(f"\n HW Additive:")
print(f" α (level) = {hw_add.params['smoothing_level']:.4f}")
print(f" β (trend) = {hw_add.params['smoothing_trend']:.4f}")
print(f" γ (seasonal) = {hw_add.params['smoothing_seasonal']:.4f}")
print(f" AIC = {hw_add.aic:.2f}")
print(f"\n HW Multiplicative:")
print(f" α (level) = {hw_mul.params['smoothing_level']:.4f}")
print(f" β (trend) = {hw_mul.params['smoothing_trend']:.4f}")
print(f" γ (seasonal) = {hw_mul.params['smoothing_seasonal']:.4f}")
print(f" AIC = {hw_mul.aic:.2f}")
print(f"\n Forecast Comparison:")
print(f" {'Month':<10} {'HW Add':>10} {'HW Mul':>10} {'Actual':>10}")
print(f" {'-'*40}")
for date in test.index:
print(f" {date.strftime('%b %Y'):<10} {hw_add_forecast[date]:>10,.0f} {hw_mul_forecast[date]:>10,.0f} {test[date]:>10,.0f}")Phần 4: Evaluate Models (20 phút)
Bước 4.1: Accuracy Metrics
python
# ============================================
# PHẦN 4: EVALUATE MODELS
# ============================================
from sklearn.metrics import mean_absolute_error, mean_squared_error
def evaluate_forecast(actual, forecast, model_name=""):
"""Calculate MAE, MAPE, RMSE for a forecast."""
mae = mean_absolute_error(actual, forecast)
mape = np.mean(np.abs((actual - forecast) / actual)) * 100
rmse = np.sqrt(mean_squared_error(actual, forecast))
return {
'Model': model_name,
'MAE': round(mae, 1),
'MAPE (%)': round(mape, 2),
'RMSE': round(rmse, 1)
}
# Evaluate all models
results = []
models = {
'Naïve': naive_forecast,
'Seasonal Naïve': snaive_forecast,
'SMA(12)': sma_best,
'SES': ses_forecast,
'HW Additive': hw_add_forecast,
'HW Multiplicative': hw_mul_forecast
}
for name, forecast in models.items():
r = evaluate_forecast(test, forecast, name)
results.append(r)
# Create comparison table
results_df = pd.DataFrame(results).set_index('Model')
results_df = results_df.sort_values('MAPE (%)')
print("📊 FORECAST ACCURACY COMPARISON")
print("=" * 60)
print(results_df.to_string())
print()
print(f"🏆 Best Model: {results_df.index[0]} (MAPE = {results_df['MAPE (%)'].iloc[0]:.2f}%)")
# MAPE interpretation
best_mape = results_df['MAPE (%)'].iloc[0]
if best_mape < 10:
quality = "🟢 EXCELLENT — Highly accurate"
elif best_mape < 20:
quality = "🟡 GOOD — Acceptable for business"
elif best_mape < 30:
quality = "🟠 FAIR — Needs improvement"
else:
quality = "🔴 POOR — Forecast unreliable"
print(f" Quality: {quality}")Bước 4.2: Visual Comparison — Forecast vs Actual
python
# ============================================
# VISUAL COMPARISON
# ============================================
fig, axes = plt.subplots(2, 2, figsize=(16, 10))
# Plot 1: All models vs Actual
ax = axes[0, 0]
test.plot(ax=ax, label='Actual', color='black', linewidth=3, marker='o')
for name, forecast in models.items():
forecast.plot(ax=ax, label=name, linestyle='--', alpha=0.7)
ax.set_title('All Models vs Actual')
ax.set_ylabel('Revenue (triệu VND)')
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)
# Plot 2: Best model detail
ax = axes[0, 1]
best_name = results_df.index[0]
best_forecast = models[best_name]
train[-6:].plot(ax=ax, label='Train (last 6m)', color='blue', linewidth=2, marker='o')
test.plot(ax=ax, label='Actual', color='black', linewidth=3, marker='o')
best_forecast.plot(ax=ax, label=f'{best_name} Forecast', color='red', linewidth=2,
marker='s', linestyle='--')
ax.set_title(f'Best Model: {best_name}')
ax.set_ylabel('Revenue (triệu VND)')
ax.legend()
ax.grid(True, alpha=0.3)
# Plot 3: Error by model (bar chart)
ax = axes[1, 0]
results_df['MAPE (%)'].plot(kind='barh', ax=ax, color=['green' if x < 10 else 'orange' if x < 20 else 'red'
for x in results_df['MAPE (%)']])
ax.set_title('MAPE by Model (%)')
ax.set_xlabel('MAPE (%)')
ax.axvline(x=10, color='green', linestyle='--', alpha=0.5, label='Good threshold (10%)')
ax.legend()
# Plot 4: Residuals of best model
ax = axes[1, 1]
residuals = test - best_forecast
ax.bar(range(len(residuals)), residuals.values, color=['green' if r > 0 else 'red' for r in residuals])
ax.axhline(y=0, color='black', linestyle='-')
ax.set_title(f'Residuals — {best_name}')
ax.set_ylabel('Error (triệu VND)')
ax.set_xticks(range(len(test)))
ax.set_xticklabels([d.strftime('%b') for d in test.index])
plt.suptitle('FORECAST EVALUATION DASHBOARD', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()Bước 4.3: Residual Analysis
python
# ============================================
# RESIDUAL ANALYSIS — Best Model
# ============================================
best_model_obj = hw_mul if best_name == 'HW Multiplicative' else hw_add
residuals_full = best_model_obj.resid
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
# 1. Residuals over time
axes[0, 0].plot(residuals_full, marker='o', markersize=3)
axes[0, 0].axhline(y=0, color='r', linestyle='--')
axes[0, 0].set_title('Residuals over Time')
axes[0, 0].grid(True, alpha=0.3)
# 2. Residual histogram
axes[0, 1].hist(residuals_full.dropna(), bins=12, edgecolor='black', alpha=0.7)
axes[0, 1].axvline(x=0, color='r', linestyle='--')
axes[0, 1].set_title('Residual Distribution')
# 3. ACF of residuals
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(residuals_full.dropna(), lags=18, ax=axes[1, 0], title='ACF of Residuals')
# 4. QQ plot
from scipy import stats
stats.probplot(residuals_full.dropna(), plot=axes[1, 1])
axes[1, 1].set_title('Q-Q Plot (Normality Check)')
plt.suptitle(f'Residual Analysis — {best_name}', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
print(f"📊 RESIDUAL STATISTICS — {best_name}")
print(f" Mean: {residuals_full.mean():.2f} (should be ≈ 0)")
print(f" Std: {residuals_full.std():.2f}")
print(f" Min: {residuals_full.min():.2f}")
print(f" Max: {residuals_full.max():.2f}")
print(f" Skewness: {residuals_full.skew():.2f} (should be ≈ 0)")Phần 5: Forecast 6 Tháng Tới + Prediction Interval (15 phút)
Bước 5.1: Final Forecast
python
# ============================================
# PHẦN 5: FORECAST 6 THÁNG — JAN-JUN 2026
# ============================================
# Refit best model on ALL data (train + test)
final_model = ExponentialSmoothing(
ts, # Full 36 months
trend='add',
seasonal='mul', # hoặc 'add' tùy best model
seasonal_periods=12,
initialization_method='estimated'
).fit(optimized=True)
# Forecast 6 months
forecast_horizon = 6
future_forecast = final_model.forecast(steps=forecast_horizon)
print("🔮 REVENUE FORECAST — JAN-JUN 2026")
print("=" * 55)
print(f" Model: {best_name}")
print(f" Training data: Jan 2023 — Dec 2025 ({len(ts)} months)")
print(f" MAPE on holdout: {results_df['MAPE (%)'].iloc[0]:.2f}%")
print()
print(f" {'Month':<12} {'Forecast':>15}")
print(f" {'-'*27}")
for date, val in future_forecast.items():
print(f" {date.strftime('%b %Y'):<12} {val:>12,.0f} tr. VND")
print(f" {'-'*27}")
print(f" {'Total':<12} {future_forecast.sum():>12,.0f} tr. VND")Bước 5.2: Prediction Intervals
python
# ============================================
# PREDICTION INTERVALS — Bootstrap Simulation
# ============================================
# Simulate 1000 forecast paths
n_simulations = 1000
simulations = final_model.simulate(
nsimulations=forecast_horizon,
repetitions=n_simulations,
error='mul',
random_errors='bootstrap'
)
# Calculate prediction intervals
pi_80_lower = simulations.quantile(0.10, axis=1)
pi_80_upper = simulations.quantile(0.90, axis=1)
pi_95_lower = simulations.quantile(0.025, axis=1)
pi_95_upper = simulations.quantile(0.975, axis=1)
print("🔮 FORECAST WITH PREDICTION INTERVALS")
print("=" * 75)
print(f" {'Month':<10} {'Point':>10} {'80% PI':>22} {'95% PI':>22}")
print(f" {'-'*65}")
for i, date in enumerate(future_forecast.index):
pt = future_forecast.iloc[i]
l80, u80 = pi_80_lower.iloc[i], pi_80_upper.iloc[i]
l95, u95 = pi_95_lower.iloc[i], pi_95_upper.iloc[i]
print(f" {date.strftime('%b %Y'):<10} {pt:>10,.0f} [{l80:>8,.0f} — {u80:>8,.0f}] [{l95:>8,.0f} — {u95:>8,.0f}]")
print(f"\n ⚠️ PI rộng dần theo horizon → uncertainty tăng")
print(f" 📏 PI width Month 1: ±{(pi_95_upper.iloc[0] - pi_95_lower.iloc[0])/2:,.0f} | Month 6: ±{(pi_95_upper.iloc[-1] - pi_95_lower.iloc[-1])/2:,.0f}")Bước 5.3: Forecast Chart
python
# ============================================
# FINAL FORECAST CHART
# ============================================
fig, ax = plt.subplots(figsize=(16, 7))
# Historical data
ts.plot(ax=ax, label='Historical Revenue', color='blue', linewidth=2, marker='o', markersize=4)
# Point forecast
future_forecast.plot(ax=ax, label='Forecast', color='red', linewidth=2.5,
marker='s', markersize=6, linestyle='--')
# Prediction intervals
ax.fill_between(future_forecast.index, pi_95_lower.values, pi_95_upper.values,
alpha=0.1, color='red', label='95% Prediction Interval')
ax.fill_between(future_forecast.index, pi_80_lower.values, pi_80_upper.values,
alpha=0.2, color='red', label='80% Prediction Interval')
# Formatting
ax.set_title('VietFresh Revenue Forecast — Jan to Jun 2026', fontsize=16, fontweight='bold')
ax.set_ylabel('Revenue (triệu VND)', fontsize=12)
ax.set_xlabel('', fontsize=12)
ax.legend(loc='upper left', fontsize=10)
ax.grid(True, alpha=0.3)
# Annotate
ax.annotate(f'MAPE: {results_df["MAPE (%)"].iloc[0]:.1f}%\nModel: {best_name}',
xy=(0.02, 0.95), xycoords='axes fraction',
fontsize=10, verticalalignment='top',
bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.8))
plt.tight_layout()
plt.savefig('forecast_chart.png', dpi=150, bbox_inches='tight')
plt.show()
print(" 💾 Chart saved: forecast_chart.png")Phần 6: Forecast Report (10 phút)
Bước 6.1: Final Report cho CEO
python
# ============================================
# PHẦN 6: FORECAST REPORT
# ============================================
report = f"""
{'='*65}
📋 VIETFRESH REVENUE FORECAST REPORT
{'='*65}
PREPARED BY: [Your Name], Data Analyst
DATE: {pd.Timestamp.now().strftime('%Y-%m-%d')}
MODEL: {best_name}
━━━ DATA SUMMARY ━━━
Historical period: Jan 2023 — Dec 2025 (36 months)
Revenue range: {ts.min():,.0f} — {ts.max():,.0f} triệu VND
Average monthly: {ts.mean():,.0f} triệu VND
Growth rate: ~18%/year
━━━ MODEL PERFORMANCE ━━━
Test period: Jul — Dec 2025 (6 months)
MAE: {results_df['MAE'].iloc[0]:,.1f} triệu VND
MAPE: {results_df['MAPE (%)'].iloc[0]:.2f}%
RMSE: {results_df['RMSE'].iloc[0]:,.1f} triệu VND
Quality: {quality}
━━━ MODELS COMPARED ━━━
{results_df.to_string()}
━━━ FORECAST: JAN — JUN 2026 ━━━
"""
for date, val in future_forecast.items():
i = list(future_forecast.index).index(date)
l95, u95 = pi_95_lower.iloc[i], pi_95_upper.iloc[i]
report += f" {date.strftime('%b %Y')}: {val:>8,.0f} triệu VND (95% PI: {l95:>8,.0f} — {u95:>8,.0f})\n"
report += f"""
Total H1/2026: {future_forecast.sum():,.0f} triệu VND
95% PI Total: {pi_95_lower.sum():,.0f} — {pi_95_upper.sum():,.0f}
━━━ ASSUMPTIONS ━━━
1. Không có sự kiện bất thường (promotion lớn, đóng cửa store)
2. Trend tiếp tục tương tự 3 năm qua (~18%/year)
3. Seasonal pattern ổn định (Tết, Trung Thu)
4. Không có đối thủ mới hoặc market disruption
━━━ RECOMMENDATIONS ━━━
📌 Sử dụng point forecast cho budgeting cơ bản
📌 Sử dụng lower bound (pessimistic) cho cash flow planning
📌 Sử dụng upper bound (optimistic) cho capacity/staffing
📌 Re-forecast hàng tháng khi có actual data mới
📌 Monitor: nếu actual nằm ngoài 95% PI → investigate
{'='*65}
"""
print(report)✅ Deliverables Checklist
📦 WORKSHOP DELIVERABLES
━━━━━━━━━━━━━━━━━━━━━━━━
✅ Jupyter Notebook: HoTen_Buoi16_Forecasting.ipynb
├── Phần 1: Dataset generation + visualization
├── Phần 2: Decomposition (additive + multiplicative) + ADF test
├── Phần 3: 4 models fitted (Naïve, SMA, SES, Holt-Winters)
├── Phần 4: Accuracy comparison (MAE, MAPE, RMSE) + residual analysis
├── Phần 5: 6-month forecast + prediction intervals + chart
└── Phần 6: Forecast report for CEO
✅ Forecast Chart: forecast_chart.png
├── Historical data (blue line)
├── Point forecast (red dashed)
├── 80% prediction interval (darker shade)
└── 95% prediction interval (lighter shade)
✅ Accuracy Comparison Table:
└── All 6 models ranked by MAPE
✅ Key Findings (3 bullet points):
1. Best model + MAPE
2. Seasonal pattern observation (which months peak/dip)
3. Forecast range for H1/2026 (point + interval)💡 Extensions (Nếu còn thời gian)
Extension 1: Time Series Cross-Validation
python
# Time Series CV — Expanding Window
from statsmodels.tsa.holtwinters import ExponentialSmoothing
cv_results = []
min_train = 24 # 2 years minimum
for i in range(min_train, len(ts) - 3, 3): # step = 3 months
train_cv = ts[:i]
test_cv = ts[i:i+3]
model_cv = ExponentialSmoothing(
train_cv, trend='add', seasonal='mul', seasonal_periods=12
).fit(optimized=True)
forecast_cv = model_cv.forecast(len(test_cv))
mape_cv = np.mean(np.abs((test_cv - forecast_cv) / test_cv)) * 100
cv_results.append({
'fold': len(cv_results) + 1,
'train_months': len(train_cv),
'test_start': test_cv.index[0].strftime('%b %Y'),
'MAPE': round(mape_cv, 2)
})
cv_df = pd.DataFrame(cv_results)
print("📊 TIME SERIES CROSS-VALIDATION")
print(cv_df.to_string(index=False))
print(f"\nMean MAPE: {cv_df['MAPE'].mean():.2f}%")
print(f"Std MAPE: {cv_df['MAPE'].std():.2f}%")Extension 2: MASE Calculation
python
# MASE — Mean Absolute Scaled Error
def calculate_mase(actual, forecast, train_series, seasonal_period=12):
"""MASE: compared against seasonal naïve baseline."""
mae_model = np.mean(np.abs(actual - forecast))
naive_errors = np.abs(train_series.values[seasonal_period:] - train_series.values[:-seasonal_period])
mae_naive = np.mean(naive_errors)
return mae_model / mae_naive
for name, fcast in models.items():
m = calculate_mase(test, fcast, train, seasonal_period=12)
status = "✅ Better than naïve" if m < 1 else "❌ Worse than naïve"
print(f" {name:<20} MASE = {m:.3f} {status}")