🛠 Workshop — Phân Tích Dữ Liệu Ngành: Marketing / Finance / Supply Chain

Chọn 1 of 3 datasets → Xác định KPI phù hợp → Phân tích end-to-end → Mini dashboard + executive summary. Output: Analysis notebook + mini dashboard + insight report!

🎯 Mục tiêu workshop

Sau khi hoàn thành workshop này, bạn sẽ:

Chọn dataset theo ngành và xác định KPI phù hợp cho domain
Phân tích end-to-end — từ data cleaning đến insight
Xây mini dashboard với 4-6 charts quan trọng nhất
Viết executive summary — tóm tắt insight + recommendation
So sánh với KPI benchmark ngành

🧰 Yêu cầu

Yêu cầu	Chi tiết
Kiến thức	Đã hoàn thành Buổi 14 lý thuyết (Industry Case Studies)
Python	pandas, matplotlib, seaborn (đã học Buổi 7-10)
Thời gian	90–120 phút
Output	Analysis notebook + mini dashboard (PNG) + executive summary

💡 Naming convention

Đặt tên file: HoTen_Buoi14_[Marketing|Finance|SupplyChain].ipynb

📦 Chọn Dataset

Bạn chọn 1 trong 3 datasets dưới đây. Mỗi dataset mô phỏng dữ liệu thực tế của ngành:

#	Dataset	Ngành	Bài toán chính	Difficulty
🅰️	Marketing Campaign	Marketing Analytics	Channel attribution, RFM, ROAS optimization	⭐⭐ Medium
🅱️	Financial Statements	Finance Analytics	Variance analysis, financial ratios, P&L	⭐⭐⭐ Hard
🅲️	Inventory & Demand	Supply Chain Analytics	Turnover, demand forecast, fill rate	⭐⭐ Medium

🅰️ Dataset A: Marketing Campaign Analytics

Phần A0: Tạo Dataset (10 phút)

python

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

np.random.seed(14)

# ============================================
# TABLE 1: CAMPAIGNS (5 channels, 12 months)
# ============================================
channels = ['Facebook', 'Google', 'TikTok', 'Email', 'SEO']
months = pd.date_range('2025-03-01', '2026-02-01', freq='MS')

campaign_records = []
for month in months:
    for ch in channels:
        base_spend = {'Facebook': 180, 'Google': 150, 'TikTok': 120, 'Email': 30, 'SEO': 50}[ch]
        spend = int(base_spend * (1 + 0.1 * np.sin(month.month * np.pi / 6) + np.random.normal(0, 0.1)) * 1_000_000)

        base_ctr = {'Facebook': 0.035, 'Google': 0.055, 'TikTok': 0.025, 'Email': 0.12, 'SEO': 0.04}[ch]
        impressions = int(spend / ({'Facebook': 8, 'Google': 15, 'TikTok': 5, 'Email': 2, 'SEO': 3}[ch]))
        clicks = int(impressions * base_ctr * np.random.uniform(0.8, 1.2))

        base_cvr = {'Facebook': 0.045, 'Google': 0.065, 'TikTok': 0.025, 'Email': 0.08, 'SEO': 0.055}[ch]
        conversions = int(clicks * base_cvr * np.random.uniform(0.8, 1.2))

        aov = int(np.random.normal(320000, 50000))
        revenue = conversions * aov

        campaign_records.append({
            'month': month, 'channel': ch, 'spend': spend,
            'impressions': impressions, 'clicks': clicks,
            'conversions': conversions, 'revenue': revenue, 'aov': aov
        })

campaigns = pd.DataFrame(campaign_records)
print(f"✅ Campaigns: {len(campaigns)} rows")
print(campaigns.head(10))

python

# ============================================
# TABLE 2: CUSTOMERS for RFM (3,000 customers)
# ============================================
n_customers = 3000

customers = pd.DataFrame({
    'customer_id': [f'CUST{i:04d}' for i in range(1, n_customers + 1)],
    'first_purchase': pd.date_range('2024-06-01', '2025-12-31', periods=n_customers),
    'channel': np.random.choice(channels, n_customers, p=[0.30, 0.25, 0.15, 0.10, 0.20]),
    'total_orders': np.random.poisson(lam=4, size=n_customers) + 1,
    'total_revenue': np.random.lognormal(mean=13.2, sigma=0.7, size=n_customers).astype(int),
    'last_purchase': pd.NaT
})

# Generate realistic last purchase dates
for idx in customers.index:
    days_active = np.random.exponential(scale=120)
    customers.loc[idx, 'last_purchase'] = customers.loc[idx, 'first_purchase'] + timedelta(days=int(days_active))
    if customers.loc[idx, 'last_purchase'] > pd.Timestamp('2026-02-15'):
        customers.loc[idx, 'last_purchase'] = pd.Timestamp('2026-02-15') - timedelta(days=np.random.randint(1, 30))

print(f"\n✅ Customers: {len(customers)} rows")
print(customers.head())

Phần A1: Channel Performance Analysis (20 phút)

python

# ============================================
# TASK A1: Calculate KPIs per channel
# ============================================

# Aggregate by channel
channel_perf = campaigns.groupby('channel').agg(
    total_spend=('spend', 'sum'),
    total_impressions=('impressions', 'sum'),
    total_clicks=('clicks', 'sum'),
    total_conversions=('conversions', 'sum'),
    total_revenue=('revenue', 'sum')
).reset_index()

# Calculate KPIs
channel_perf['CTR'] = (channel_perf['total_clicks'] / channel_perf['total_impressions'] * 100).round(2)
channel_perf['CVR'] = (channel_perf['total_conversions'] / channel_perf['total_clicks'] * 100).round(2)
channel_perf['CPC'] = (channel_perf['total_spend'] / channel_perf['total_clicks']).round(0)
channel_perf['CPA'] = (channel_perf['total_spend'] / channel_perf['total_conversions']).round(0)
channel_perf['ROAS'] = (channel_perf['total_revenue'] / channel_perf['total_spend']).round(2)

print("📊 CHANNEL PERFORMANCE SUMMARY")
print("=" * 80)
for _, row in channel_perf.iterrows():
    print(f"  {row['channel']:10s} | CTR: {row['CTR']:5.2f}% | CVR: {row['CVR']:5.2f}% | CPC: {row['CPC']:>7,.0f} | CPA: {row['CPA']:>9,.0f} | ROAS: {row['ROAS']:.2f}x")

# Best & worst channels
best = channel_perf.loc[channel_perf['ROAS'].idxmax(), 'channel']
worst = channel_perf.loc[channel_perf['ROAS'].idxmin(), 'channel']
print(f"\n🏆 Best ROAS: {best} | 🔴 Worst ROAS: {worst}")

Phần A2: RFM Segmentation (25 phút)

python

# ============================================
# TASK A2: RFM Analysis
# ============================================
ref_date = pd.Timestamp('2026-02-15')

customers['recency_days'] = (ref_date - customers['last_purchase']).dt.days
customers['R'] = pd.qcut(customers['recency_days'], q=5, labels=[5,4,3,2,1]).astype(int)
customers['F'] = pd.qcut(customers['total_orders'].rank(method='first'), q=5, labels=[1,2,3,4,5]).astype(int)
customers['M'] = pd.qcut(customers['total_revenue'], q=5, labels=[1,2,3,4,5]).astype(int)

# Segment mapping
def rfm_segment(row):
    if row['R'] >= 4 and row['F'] >= 4 and row['M'] >= 4:
        return 'Champions'
    elif row['R'] >= 4 and row['F'] >= 3:
        return 'Loyal Customers'
    elif row['R'] >= 4 and row['F'] <= 2:
        return 'New Customers'
    elif row['R'] >= 3 and row['F'] >= 3:
        return 'Potential Loyalists'
    elif row['R'] <= 2 and row['F'] >= 3:
        return 'At Risk'
    elif row['R'] <= 2 and row['F'] >= 4 and row['M'] >= 4:
        return 'Cant Lose Them'
    elif row['R'] <= 2 and row['F'] <= 2:
        return 'Lost'
    else:
        return 'Need Attention'

customers['segment'] = customers.apply(rfm_segment, axis=1)

# Summary
rfm_summary = customers.groupby('segment').agg(
    count=('customer_id', 'count'),
    avg_recency=('recency_days', 'mean'),
    avg_frequency=('total_orders', 'mean'),
    avg_monetary=('total_revenue', 'mean')
).round(0).sort_values('avg_monetary', ascending=False)

print("📊 RFM SEGMENT SUMMARY")
print("=" * 75)
for seg, row in rfm_summary.iterrows():
    print(f"  {seg:20s} | N: {row['count']:>5.0f} | R: {row['avg_recency']:>5.0f}d | F: {row['avg_frequency']:>4.0f} | M: {row['avg_monetary']:>10,.0f}")

# Actionable segments
at_risk = customers[customers['segment'].isin(['At Risk', 'Cant Lose Them'])]
print(f"\n🚨 ACTION NEEDED: {len(at_risk)} customers in At Risk / Cant Lose Them segments")
print(f"   → Estimated revenue at risk: {at_risk['total_revenue'].sum():,.0f} VND")

Phần A3: Mini Dashboard + Executive Summary (30 phút)

python

# ============================================
# TASK A3: Marketing Mini Dashboard
# ============================================
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Marketing Analytics Dashboard — Brand Performance', fontsize=16, fontweight='bold')

# Chart 1: ROAS by Channel
ax1 = axes[0, 0]
colors = ['#4CAF50' if x >= 4 else '#FF9800' if x >= 2 else '#F44336' for x in channel_perf['ROAS']]
ax1.barh(channel_perf['channel'], channel_perf['ROAS'], color=colors)
ax1.set_xlabel('ROAS (x)')
ax1.set_title('ROAS by Channel')
ax1.axvline(x=4, color='green', linestyle='--', alpha=0.5, label='Target 4x')
ax1.legend()

# Chart 2: Monthly Revenue Trend
ax2 = axes[0, 1]
monthly_rev = campaigns.groupby('month')['revenue'].sum() / 1_000_000
ax2.plot(monthly_rev.index, monthly_rev.values, marker='o', color='#2196F3', linewidth=2)
ax2.set_title('Monthly Revenue Trend (M VND)')
ax2.tick_params(axis='x', rotation=45)
ax2.set_ylabel('Revenue (M VND)')

# Chart 3: RFM Segment Distribution
ax3 = axes[1, 0]
seg_counts = customers['segment'].value_counts()
colors_rfm = plt.cm.Set3(np.linspace(0, 1, len(seg_counts)))
ax3.pie(seg_counts.values, labels=seg_counts.index, autopct='%1.0f%%', colors=colors_rfm, textprops={'fontsize': 8})
ax3.set_title('Customer Segments (RFM)')

# Chart 4: Spend vs Revenue by Channel
ax4 = axes[1, 1]
x = channel_perf['total_spend'] / 1_000_000
y = channel_perf['total_revenue'] / 1_000_000
ax4.scatter(x, y, s=200, c='#9C27B0', alpha=0.7)
for i, ch in enumerate(channel_perf['channel']):
    ax4.annotate(ch, (x.iloc[i], y.iloc[i]), fontsize=9, ha='center', va='bottom')
ax4.set_xlabel('Total Spend (M VND)')
ax4.set_ylabel('Total Revenue (M VND)')
ax4.set_title('Spend vs Revenue by Channel')
# Break-even line
max_val = max(x.max(), y.max())
ax4.plot([0, max_val], [0, max_val], 'r--', alpha=0.3, label='Break-even')
ax4.legend()

plt.tight_layout()
plt.savefig('Marketing_Dashboard.png', dpi=150, bbox_inches='tight')
plt.show()
print("✅ Dashboard saved: Marketing_Dashboard.png")

python

# ============================================
# EXECUTIVE SUMMARY — Marketing
# ============================================
total_spend = campaigns['spend'].sum()
total_revenue = campaigns['revenue'].sum()
blended_roas = total_revenue / total_spend

print("=" * 60)
print("📋 EXECUTIVE SUMMARY — Marketing Analytics")
print("=" * 60)
print(f"\n📊 Overall Performance:")
print(f"   Total Spend:    {total_spend/1e9:>8.2f} tỷ VND")
print(f"   Total Revenue:  {total_revenue/1e9:>8.2f} tỷ VND")
print(f"   Blended ROAS:   {blended_roas:>8.2f}x")

print(f"\n🎯 Channel Insights:")
for _, row in channel_perf.sort_values('ROAS', ascending=False).iterrows():
    status = '🟢' if row['ROAS'] >= 4 else '🟡' if row['ROAS'] >= 2 else '🔴'
    print(f"   {status} {row['channel']:10s}: ROAS {row['ROAS']:.2f}x | CPA {row['CPA']:>9,.0f} VND")

print(f"\n👥 Customer Segments:")
print(f"   Champions:          {len(customers[customers['segment']=='Champions']):>5} customers")
print(f"   At Risk + Cant Lose:{len(at_risk):>5} customers (ACTION NEEDED)")
print(f"   Revenue at risk:    {at_risk['total_revenue'].sum()/1e6:>8,.0f}M VND")

print(f"\n💡 Recommendations:")
print(f"   1. Increase budget for {best} (highest ROAS)")
print(f"   2. Review {worst} — optimize or reduce spend")
print(f"   3. Launch win-back campaign for {len(at_risk)} At Risk customers")
print(f"   4. Invest in Champions referral program")

🅱️ Dataset B: Financial Statements Analytics

Phần B0: Tạo Dataset (10 phút)

python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

np.random.seed(14)

# ============================================
# TABLE 1: MONTHLY P&L (12 months, Budget vs Actual)
# ============================================
months = pd.date_range('2025-03-01', '2026-02-01', freq='MS')
departments = ['Sales', 'Marketing', 'Engineering', 'Operations', 'HR', 'Finance']

pnl_records = []
for month in months:
    # Revenue (seasonal pattern)
    base_rev = 8500  # base monthly revenue in triệu VND
    seasonal = 1 + 0.15 * np.sin((month.month - 1) * np.pi / 6)
    budget_rev = int(base_rev * seasonal)
    actual_rev = int(budget_rev * np.random.normal(1.0, 0.06))

    # COGS (60-65% of revenue)
    budget_cogs = int(budget_rev * 0.62)
    actual_cogs = int(actual_rev * np.random.uniform(0.60, 0.66))

    # OpEx by department
    for dept in departments:
        base_opex = {'Sales': 450, 'Marketing': 380, 'Engineering': 520,
                     'Operations': 280, 'HR': 150, 'Finance': 120}[dept]
        budget_opex = int(base_opex * (1 + 0.005 * (month.month - 3)))
        actual_opex = int(budget_opex * np.random.normal(1.0, 0.08))

        pnl_records.append({
            'month': month, 'department': dept,
            'budget_revenue': budget_rev if dept == 'Sales' else 0,
            'actual_revenue': actual_rev if dept == 'Sales' else 0,
            'budget_cogs': budget_cogs if dept == 'Sales' else 0,
            'actual_cogs': actual_cogs if dept == 'Sales' else 0,
            'budget_opex': budget_opex,
            'actual_opex': actual_opex
        })

pnl = pd.DataFrame(pnl_records)
print(f"✅ P&L Data: {len(pnl)} rows")
print(pnl.head(10))

python

# ============================================
# TABLE 2: BALANCE SHEET (Quarterly)
# ============================================
quarters = ['Q2-2025', 'Q3-2025', 'Q4-2025', 'Q1-2026']
balance_sheet = pd.DataFrame({
    'quarter': quarters,
    'cash': [12500, 11800, 10200, 13500],
    'receivables': [8500, 9200, 11500, 8800],
    'inventory': [6200, 5800, 7500, 5900],
    'fixed_assets': [25000, 26500, 28000, 27500],
    'total_assets': [52200, 53300, 57200, 55700],
    'current_liabilities': [9800, 10500, 12800, 10200],
    'long_term_debt': [15000, 14500, 14000, 13500],
    'equity': [27400, 28300, 30400, 32000]
})
# Units: triệu VND

print(f"\n✅ Balance Sheet: {len(balance_sheet)} rows")
print(balance_sheet)

Phần B1: Variance Analysis (25 phút)

python

# ============================================
# TASK B1: Monthly Variance Analysis
# ============================================

# Aggregate monthly P&L
monthly_pnl = pnl.groupby('month').agg(
    budget_revenue=('budget_revenue', 'sum'),
    actual_revenue=('actual_revenue', 'sum'),
    budget_cogs=('budget_cogs', 'sum'),
    actual_cogs=('actual_cogs', 'sum'),
    budget_opex=('budget_opex', 'sum'),
    actual_opex=('actual_opex', 'sum')
).reset_index()

monthly_pnl['budget_gross_profit'] = monthly_pnl['budget_revenue'] - monthly_pnl['budget_cogs']
monthly_pnl['actual_gross_profit'] = monthly_pnl['actual_revenue'] - monthly_pnl['actual_cogs']
monthly_pnl['budget_net_profit'] = monthly_pnl['budget_gross_profit'] - monthly_pnl['budget_opex']
monthly_pnl['actual_net_profit'] = monthly_pnl['actual_gross_profit'] - monthly_pnl['actual_opex']

# Variance
monthly_pnl['rev_variance'] = monthly_pnl['actual_revenue'] - monthly_pnl['budget_revenue']
monthly_pnl['rev_var_pct'] = (monthly_pnl['rev_variance'] / monthly_pnl['budget_revenue'] * 100).round(1)
monthly_pnl['profit_variance'] = monthly_pnl['actual_net_profit'] - monthly_pnl['budget_net_profit']
monthly_pnl['profit_var_pct'] = (monthly_pnl['profit_variance'] / monthly_pnl['budget_net_profit'] * 100).round(1)

print("📊 MONTHLY VARIANCE ANALYSIS")
print("=" * 85)
print(f"{'Month':>10s} | {'Rev Budget':>10s} | {'Rev Actual':>10s} | {'Rev Var%':>8s} | {'Profit Bgt':>10s} | {'Profit Act':>10s} | {'Profit Var%':>10s}")
print("-" * 85)
for _, r in monthly_pnl.iterrows():
    rflag = '🟢' if r['rev_var_pct'] >= 0 else '🔴'
    pflag = '🟢' if r['profit_var_pct'] >= 0 else '🔴'
    print(f"{r['month'].strftime('%Y-%m'):>10s} | {r['budget_revenue']:>10,} | {r['actual_revenue']:>10,} | {rflag}{r['rev_var_pct']:>+5.1f}% | {r['budget_net_profit']:>10,} | {r['actual_net_profit']:>10,} | {pflag}{r['profit_var_pct']:>+7.1f}%")

Phần B2: Financial Ratios (20 phút)

python

# ============================================
# TASK B2: Financial Ratios from Balance Sheet
# ============================================

print("📊 FINANCIAL RATIOS — Quarterly")
print("=" * 75)

for _, row in balance_sheet.iterrows():
    current_ratio = (row['cash'] + row['receivables'] + row['inventory']) / row['current_liabilities']
    quick_ratio = (row['cash'] + row['receivables']) / row['current_liabilities']
    debt_equity = (row['current_liabilities'] + row['long_term_debt']) / row['equity']
    roe = 0  # Would need net income per quarter

    print(f"\n  📅 {row['quarter']}:")
    print(f"     Current Ratio:  {current_ratio:.2f}  {'🟢' if current_ratio >= 1.5 else '🟡' if current_ratio >= 1 else '🔴'}")
    print(f"     Quick Ratio:    {quick_ratio:.2f}  {'🟢' if quick_ratio >= 1 else '🔴'}")
    print(f"     Debt/Equity:    {debt_equity:.2f}  {'🟢' if debt_equity <= 1.5 else '🔴'}")
    print(f"     Total Assets:   {row['total_assets']:>10,}M VND")

Phần B3: Mini Dashboard + Executive Summary (30 phút)

python

# ============================================
# TASK B3: Finance Mini Dashboard
# ============================================
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Financial Analytics Dashboard — Company Performance', fontsize=16, fontweight='bold')

# Chart 1: Revenue — Budget vs Actual
ax1 = axes[0, 0]
x_months = range(len(monthly_pnl))
ax1.bar([x - 0.2 for x in x_months], monthly_pnl['budget_revenue'], 0.4, label='Budget', color='#90CAF9', alpha=0.8)
ax1.bar([x + 0.2 for x in x_months], monthly_pnl['actual_revenue'], 0.4, label='Actual', color='#1565C0', alpha=0.8)
ax1.set_title('Revenue: Budget vs Actual')
ax1.set_ylabel('Revenue (M VND)')
ax1.legend()
ax1.set_xticks(x_months)
ax1.set_xticklabels([m.strftime('%b') for m in monthly_pnl['month']], rotation=45)

# Chart 2: Profit Variance Trend
ax2 = axes[0, 1]
colors_var = ['#4CAF50' if v >= 0 else '#F44336' for v in monthly_pnl['profit_variance']]
ax2.bar(x_months, monthly_pnl['profit_variance'], color=colors_var)
ax2.set_title('Monthly Profit Variance (M VND)')
ax2.axhline(y=0, color='black', linewidth=0.5)
ax2.set_xticks(x_months)
ax2.set_xticklabels([m.strftime('%b') for m in monthly_pnl['month']], rotation=45)

# Chart 3: Gross Margin Trend
ax3 = axes[1, 0]
monthly_pnl['actual_gm_pct'] = (monthly_pnl['actual_gross_profit'] / monthly_pnl['actual_revenue'] * 100)
monthly_pnl['budget_gm_pct'] = (monthly_pnl['budget_gross_profit'] / monthly_pnl['budget_revenue'] * 100)
ax3.plot(x_months, monthly_pnl['budget_gm_pct'], '--', color='gray', label='Budget GM%')
ax3.plot(x_months, monthly_pnl['actual_gm_pct'], '-o', color='#FF9800', label='Actual GM%')
ax3.set_title('Gross Margin % Trend')
ax3.set_ylabel('Gross Margin (%)')
ax3.legend()
ax3.set_xticks(x_months)
ax3.set_xticklabels([m.strftime('%b') for m in monthly_pnl['month']], rotation=45)

# Chart 4: Balance Sheet — Liquidity Ratios
ax4 = axes[1, 1]
current_ratios = [(row['cash'] + row['receivables'] + row['inventory']) / row['current_liabilities']
                  for _, row in balance_sheet.iterrows()]
quick_ratios = [(row['cash'] + row['receivables']) / row['current_liabilities']
                for _, row in balance_sheet.iterrows()]
x_q = range(len(quarters))
ax4.plot(x_q, current_ratios, '-o', label='Current Ratio', color='#2196F3', linewidth=2)
ax4.plot(x_q, quick_ratios, '-s', label='Quick Ratio', color='#4CAF50', linewidth=2)
ax4.axhline(y=1.5, color='blue', linestyle='--', alpha=0.3, label='Current target (1.5)')
ax4.axhline(y=1.0, color='green', linestyle='--', alpha=0.3, label='Quick target (1.0)')
ax4.set_title('Liquidity Ratios')
ax4.set_xticks(x_q)
ax4.set_xticklabels(quarters)
ax4.legend(fontsize=8)

plt.tight_layout()
plt.savefig('Finance_Dashboard.png', dpi=150, bbox_inches='tight')
plt.show()
print("✅ Dashboard saved: Finance_Dashboard.png")

python

# ============================================
# EXECUTIVE SUMMARY — Finance
# ============================================
total_budget_rev = monthly_pnl['budget_revenue'].sum()
total_actual_rev = monthly_pnl['actual_revenue'].sum()
total_budget_profit = monthly_pnl['budget_net_profit'].sum()
total_actual_profit = monthly_pnl['actual_net_profit'].sum()

rev_var = (total_actual_rev - total_budget_rev) / total_budget_rev * 100
profit_var = (total_actual_profit - total_budget_profit) / total_budget_profit * 100

months_miss_rev = (monthly_pnl['rev_var_pct'] < 0).sum()
months_miss_profit = (monthly_pnl['profit_var_pct'] < 0).sum()

print("=" * 60)
print("📋 EXECUTIVE SUMMARY — Finance Analytics")
print("=" * 60)
print(f"\n📊 Annual Performance (12 months):")
print(f"   Revenue:  {total_actual_rev/1000:>8,.0f} tỷ vs Budget {total_budget_rev/1000:>8,.0f} tỷ ({rev_var:+.1f}%)")
print(f"   Profit:   {total_actual_profit/1000:>8,.0f} tỷ vs Budget {total_budget_profit/1000:>8,.0f} tỷ ({profit_var:+.1f}%)")
print(f"   Rev miss months:    {months_miss_rev}/12")
print(f"   Profit miss months: {months_miss_profit}/12")

print(f"\n📊 Liquidity (Latest Quarter):")
print(f"   Current Ratio: {current_ratios[-1]:.2f}")
print(f"   Quick Ratio:   {quick_ratios[-1]:.2f}")

# Identify worst variance department
dept_variance = pnl.groupby('department').agg(
    budget_opex=('budget_opex', 'sum'),
    actual_opex=('actual_opex', 'sum')
).reset_index()
dept_variance['var_pct'] = ((dept_variance['actual_opex'] - dept_variance['budget_opex']) / dept_variance['budget_opex'] * 100).round(1)
worst_dept = dept_variance.loc[dept_variance['var_pct'].idxmax()]

print(f"\n⚠️ Cost Alert:")
print(f"   Highest overrun: {worst_dept['department']} ({worst_dept['var_pct']:+.1f}% vs budget)")

print(f"\n💡 Recommendations:")
print(f"   1. Deep dive {worst_dept['department']} cost overrun — identify root cause")
print(f"   2. Monitor gross margin trend — {'DECLINING' if monthly_pnl['actual_gm_pct'].iloc[-1] < monthly_pnl['actual_gm_pct'].iloc[0] else 'STABLE'}")
print(f"   3. {'Strengthen receivables collection' if balance_sheet['receivables'].iloc[-1] > balance_sheet['receivables'].iloc[0] else 'Receivables well managed'}")
print(f"   4. Reforecast remaining quarters with updated assumptions")

🅲️ Dataset C: Supply Chain & Inventory Analytics

Phần C0: Tạo Dataset (10 phút)

python

import pandas as pd
import numpy as np
from datetime import timedelta
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

np.random.seed(14)

# ============================================
# TABLE 1: DAILY INVENTORY & SALES (365 days, 6 products)
# ============================================
products = ['Sữa tươi', 'Sữa chua', 'Nước trái cây', 'Sữa bột', 'Sữa đặc', 'Yogurt drink']
shelf_lives = {'Sữa tươi': 10, 'Sữa chua': 21, 'Nước trái cây': 90,
               'Sữa bột': 365, 'Sữa đặc': 180, 'Yogurt drink': 14}
base_demands = {'Sữa tươi': 1200, 'Sữa chua': 800, 'Nước trái cây': 600,
                'Sữa bột': 300, 'Sữa đặc': 400, 'Yogurt drink': 500}

dates = pd.date_range('2025-03-01', '2026-02-28')
inv_records = []

for product in products:
    inventory = int(base_demands[product] * 5)  # Start with 5 days stock
    for date in dates:
        # Demand with seasonality + day-of-week + noise
        seasonal = 1 + 0.2 * np.sin((date.month - 1) * np.pi / 6)
        dow_factor = 1.3 if date.dayofweek >= 5 else 1.0  # Weekend boost
        demand = int(base_demands[product] * seasonal * dow_factor * np.random.uniform(0.8, 1.2))

        # Replenishment (every 3-7 days depending on product)
        replenish_freq = max(3, shelf_lives[product] // 3)
        received = int(base_demands[product] * replenish_freq * 0.9) if date.day % replenish_freq == 0 else 0

        # Sales = min(demand, inventory)
        sold = min(demand, inventory + received)
        stockout = 1 if demand > inventory + received else 0
        waste = max(0, int(inventory * 0.01)) if shelf_lives[product] <= 21 else 0  # Perishable waste

        inventory = max(0, inventory + received - sold - waste)

        inv_records.append({
            'date': date, 'product': product,
            'demand': demand, 'sold': sold, 'received': received,
            'inventory_end': inventory, 'stockout': stockout,
            'waste': waste, 'shelf_life': shelf_lives[product]
        })

inventory_df = pd.DataFrame(inv_records)
print(f"✅ Inventory Data: {len(inventory_df)} rows")
print(inventory_df.head(10))

python

# ============================================
# TABLE 2: SUPPLIER PERFORMANCE (6 suppliers, 12 months)
# ============================================
suppliers = ['NCC_A (Sữa tươi)', 'NCC_B (Sữa chua)', 'NCC_C (Nước ép)',
             'NCC_D (Sữa bột)', 'NCC_E (Sữa đặc)', 'NCC_F (Yogurt)']

supplier_records = []
for month in pd.date_range('2025-03-01', '2026-02-01', freq='MS'):
    for sup in suppliers:
        on_time = np.random.uniform(0.85, 0.99)
        quality = np.random.uniform(0.95, 1.0)
        cost_idx = np.random.uniform(0.9, 1.1)  # vs market average

        supplier_records.append({
            'month': month, 'supplier': sup,
            'on_time_rate': round(on_time, 3),
            'quality_rate': round(quality, 3),
            'cost_index': round(cost_idx, 2),
            'orders_placed': np.random.randint(8, 20)
        })

suppliers_df = pd.DataFrame(supplier_records)
print(f"\n✅ Supplier Data: {len(suppliers_df)} rows")
print(suppliers_df.head())

Phần C1: Inventory KPIs (20 phút)

python

# ============================================
# TASK C1: Inventory Metrics per Product
# ============================================

print("📦 INVENTORY KPI DASHBOARD")
print("=" * 90)

for product in products:
    p = inventory_df[inventory_df['product'] == product]
    total_sold = p['sold'].sum()
    total_demand = p['demand'].sum()
    avg_inventory = p['inventory_end'].mean()
    total_waste = p['waste'].sum()
    stockout_days = p['stockout'].sum()

    turnover = total_sold / avg_inventory if avg_inventory > 0 else 0
    dos = avg_inventory / (total_sold / 365) if total_sold > 0 else 0
    fill_rate = total_sold / total_demand * 100
    waste_rate = total_waste / (total_sold + total_waste) * 100 if (total_sold + total_waste) > 0 else 0

    shelf = shelf_lives[product]
    risk = '🔴 WASTE RISK' if dos > shelf * 0.5 else '🟢 OK'

    print(f"\n  📦 {product:15s} (Shelf life: {shelf}d)")
    print(f"     Turnover:     {turnover:>6.1f}x/year")
    print(f"     Days of Supply: {dos:>5.1f} days      {risk}")
    print(f"     Fill Rate:    {fill_rate:>6.1f}%")
    print(f"     Waste Rate:   {waste_rate:>6.2f}%")
    print(f"     Stockout Days: {stockout_days:>4d}/{len(p)} days ({stockout_days/len(p)*100:.1f}%)")

Phần C2: Demand Pattern & Supplier Scorecard (25 phút)

python

# ============================================
# TASK C2: Demand Pattern Analysis
# ============================================

# Weekly demand pattern (day of week)
inventory_df['dow'] = inventory_df['date'].dt.day_name()
dow_demand = inventory_df.groupby(['product', 'dow'])['demand'].mean().unstack()
dow_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
dow_demand = dow_demand[dow_order]

print("📊 DEMAND BY DAY OF WEEK (Average daily demand)")
print("=" * 100)
print(dow_demand.round(0).to_string())

# Monthly seasonality
inventory_df['month_name'] = inventory_df['date'].dt.month_name()
monthly_demand = inventory_df.groupby(['product', inventory_df['date'].dt.month])['demand'].mean()

print("\n\n📊 SUPPLIER SCORECARD")
print("=" * 80)

# Aggregate supplier performance
sup_perf = suppliers_df.groupby('supplier').agg(
    avg_on_time=('on_time_rate', 'mean'),
    avg_quality=('quality_rate', 'mean'),
    avg_cost_idx=('cost_index', 'mean'),
    total_orders=('orders_placed', 'sum')
).round(3)

# Weighted score (30% on-time, 30% quality, 20% cost, 20% volume)
sup_perf['score'] = (
    sup_perf['avg_on_time'] * 30 +
    sup_perf['avg_quality'] * 30 +
    (2 - sup_perf['avg_cost_idx']) * 20 +  # Lower cost = higher score
    (sup_perf['total_orders'] / sup_perf['total_orders'].max()) * 20
).round(1)

sup_perf = sup_perf.sort_values('score', ascending=False)

for sup, row in sup_perf.iterrows():
    grade = '🟢 A' if row['score'] >= 75 else '🟡 B' if row['score'] >= 65 else '🔴 C'
    print(f"  {sup:25s} | On-time: {row['avg_on_time']:.1%} | Quality: {row['avg_quality']:.1%} | Cost: {row['avg_cost_idx']:.2f}x | Score: {row['score']:.0f} | {grade}")

Phần C3: Mini Dashboard + Executive Summary (30 phút)

python

# ============================================
# TASK C3: Supply Chain Mini Dashboard
# ============================================
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Supply Chain Analytics Dashboard — Inventory & Demand', fontsize=16, fontweight='bold')

# Chart 1: Inventory Turnover by Product
ax1 = axes[0, 0]
turnover_data = []
for product in products:
    p = inventory_df[inventory_df['product'] == product]
    t = p['sold'].sum() / p['inventory_end'].mean() if p['inventory_end'].mean() > 0 else 0
    turnover_data.append(t)
colors_t = ['#4CAF50' if t >= 20 else '#FF9800' if t >= 10 else '#F44336' for t in turnover_data]
ax1.barh(products, turnover_data, color=colors_t)
ax1.set_xlabel('Turnover (x/year)')
ax1.set_title('Inventory Turnover by Product')

# Chart 2: Daily Demand Trend (top product)
ax2 = axes[0, 1]
top_product = 'Sữa tươi'
tp = inventory_df[inventory_df['product'] == top_product].set_index('date')
ax2.plot(tp.index, tp['demand'].rolling(7).mean(), color='#2196F3', linewidth=1.5, label='Demand (7-day MA)')
ax2.plot(tp.index, tp['sold'].rolling(7).mean(), color='#4CAF50', linewidth=1.5, label='Sold (7-day MA)', alpha=0.7)
ax2.set_title(f'Demand vs Sold Trend — {top_product}')
ax2.legend(fontsize=8)
ax2.tick_params(axis='x', rotation=45)

# Chart 3: Fill Rate by Product
ax3 = axes[1, 0]
fill_rates = []
for product in products:
    p = inventory_df[inventory_df['product'] == product]
    fr = p['sold'].sum() / p['demand'].sum() * 100
    fill_rates.append(fr)
colors_fr = ['#4CAF50' if fr >= 95 else '#FF9800' if fr >= 90 else '#F44336' for fr in fill_rates]
bars = ax3.bar(range(len(products)), fill_rates, color=colors_fr)
ax3.set_xticks(range(len(products)))
ax3.set_xticklabels(products, rotation=45, ha='right', fontsize=8)
ax3.set_ylabel('Fill Rate (%)')
ax3.set_title('Fill Rate by Product')
ax3.axhline(y=95, color='green', linestyle='--', alpha=0.5, label='Target 95%')
ax3.legend()
ax3.set_ylim(80, 101)

# Chart 4: Supplier Score
ax4 = axes[1, 1]
sup_names = [s.split(' ')[0] for s in sup_perf.index]
colors_sup = ['#4CAF50' if s >= 75 else '#FF9800' if s >= 65 else '#F44336' for s in sup_perf['score']]
ax4.barh(sup_names, sup_perf['score'], color=colors_sup)
ax4.set_xlabel('Composite Score')
ax4.set_title('Supplier Scorecard')
ax4.axvline(x=75, color='green', linestyle='--', alpha=0.5)

plt.tight_layout()
plt.savefig('SupplyChain_Dashboard.png', dpi=150, bbox_inches='tight')
plt.show()
print("✅ Dashboard saved: SupplyChain_Dashboard.png")

python

# ============================================
# EXECUTIVE SUMMARY — Supply Chain
# ============================================
total_demand = inventory_df['demand'].sum()
total_sold = inventory_df['sold'].sum()
total_waste = inventory_df['waste'].sum()
overall_fill = total_sold / total_demand * 100
total_stockout_days = inventory_df['stockout'].sum()

print("=" * 60)
print("📋 EXECUTIVE SUMMARY — Supply Chain Analytics")
print("=" * 60)
print(f"\n📦 Overall Performance (12 months):")
print(f"   Total Demand:     {total_demand:>12,} units")
print(f"   Total Sold:       {total_sold:>12,} units")
print(f"   Overall Fill Rate: {overall_fill:>10.1f}%")
print(f"   Total Waste:      {total_waste:>12,} units")
print(f"   Stockout Days:    {total_stockout_days:>12,} (across all products)")

# Worst performing product
worst_fill = 100
worst_product = ''
for product in products:
    p = inventory_df[inventory_df['product'] == product]
    fr = p['sold'].sum() / p['demand'].sum() * 100
    if fr < worst_fill:
        worst_fill = fr
        worst_product = product

print(f"\n⚠️ Attention:")
print(f"   Lowest fill rate: {worst_product} ({worst_fill:.1f}%)")
print(f"   Waste cost (est): {total_waste * 25000 / 1e6:,.0f}M VND")

# Best & worst supplier
best_sup = sup_perf.index[0]
worst_sup = sup_perf.index[-1]
print(f"\n🏆 Best Supplier:  {best_sup} (Score: {sup_perf.loc[best_sup, 'score']:.0f})")
print(f"🔴 Worst Supplier: {worst_sup} (Score: {sup_perf.loc[worst_sup, 'score']:.0f})")

print(f"\n💡 Recommendations:")
print(f"   1. Increase safety stock for {worst_product} — reduce stockouts")
print(f"   2. Negotiate improvement plan with {worst_sup}")
print(f"   3. Implement dynamic safety stock based on demand variability")
print(f"   4. Add weather data to demand forecasting model")

✅ Deliverables Checklist

Trước khi submit, kiểm tra:

✅ Dataset created (chosen: Marketing / Finance / Supply Chain)
✅ KPI Calculation
   ├── Marketing: CTR, CVR, CPC, CPA, ROAS + RFM segments
   ├── Finance: Revenue/Profit variance, Gross Margin, Financial Ratios
   └── Supply Chain: Turnover, DOS, Fill Rate, Waste Rate, Supplier Score
✅ Mini Dashboard (4 charts, saved as PNG)
✅ Executive Summary (printed — key metrics + insights + recommendations)
✅ At least 3 actionable recommendations

Tiêu chí đánh giá

Tiêu chí	Trọng số	Mô tả
KPI Calculation	25%	Tính đúng domain-specific KPIs, giải thích ý nghĩa
Analysis Depth	25%	Không chỉ tính KPI mà còn breakdown, find patterns
Dashboard	25%	4+ charts relevant, clear, well-labeled
Executive Summary	25%	Concise, has insight + data-backed recommendations

💡 Nâng cao (tùy chọn)

Nếu hoàn thành sớm, thử thêm:

Marketing: Attribution simulation — compare ROAS under different attribution models
Finance: Cash flow forecast 6 months + scenario analysis (best/base/worst)
Supply Chain: Build simple demand forecasting with exponential smoothing
Cross-industry: Do 2 datasets → compare insights across industries

🛠 Workshop — Phân Tích Dữ Liệu Ngành: Marketing / Finance / Supply Chain ​

🎯 Mục tiêu workshop ​

🧰 Yêu cầu ​

📦 Chọn Dataset ​

🅰️ Dataset A: Marketing Campaign Analytics ​

Phần A0: Tạo Dataset (10 phút) ​

Phần A1: Channel Performance Analysis (20 phút) ​

Phần A2: RFM Segmentation (25 phút) ​

Phần A3: Mini Dashboard + Executive Summary (30 phút) ​

🅱️ Dataset B: Financial Statements Analytics ​

Phần B0: Tạo Dataset (10 phút) ​

Phần B1: Variance Analysis (25 phút) ​

Phần B2: Financial Ratios (20 phút) ​

Phần B3: Mini Dashboard + Executive Summary (30 phút) ​

🅲️ Dataset C: Supply Chain & Inventory Analytics ​

Phần C0: Tạo Dataset (10 phút) ​

Phần C1: Inventory KPIs (20 phút) ​

Phần C2: Demand Pattern & Supplier Scorecard (25 phút) ​

Phần C3: Mini Dashboard + Executive Summary (30 phút) ​

✅ Deliverables Checklist ​

Tiêu chí đánh giá ​

🛠 Workshop — Phân Tích Dữ Liệu Ngành: Marketing / Finance / Supply Chain

🎯 Mục tiêu workshop

🧰 Yêu cầu

📦 Chọn Dataset

🅰️ Dataset A: Marketing Campaign Analytics

Phần A0: Tạo Dataset (10 phút)

Phần A1: Channel Performance Analysis (20 phút)

Phần A2: RFM Segmentation (25 phút)

Phần A3: Mini Dashboard + Executive Summary (30 phút)

🅱️ Dataset B: Financial Statements Analytics

Phần B0: Tạo Dataset (10 phút)

Phần B1: Variance Analysis (25 phút)

Phần B2: Financial Ratios (20 phút)

Phần B3: Mini Dashboard + Executive Summary (30 phút)

🅲️ Dataset C: Supply Chain & Inventory Analytics

Phần C0: Tạo Dataset (10 phút)

Phần C1: Inventory KPIs (20 phút)

Phần C2: Demand Pattern & Supplier Scorecard (25 phút)

Phần C3: Mini Dashboard + Executive Summary (30 phút)

✅ Deliverables Checklist

Tiêu chí đánh giá