Appearance
🛠 Workshop — Phân Tích Dữ Liệu Ngành: Marketing / Finance / Supply Chain
Chọn 1 of 3 datasets → Xác định KPI phù hợp → Phân tích end-to-end → Mini dashboard + executive summary. Output: Analysis notebook + mini dashboard + insight report!
🎯 Mục tiêu workshop
Sau khi hoàn thành workshop này, bạn sẽ:
- Chọn dataset theo ngành và xác định KPI phù hợp cho domain
- Phân tích end-to-end — từ data cleaning đến insight
- Xây mini dashboard với 4-6 charts quan trọng nhất
- Viết executive summary — tóm tắt insight + recommendation
- So sánh với KPI benchmark ngành
🧰 Yêu cầu
| Yêu cầu | Chi tiết |
|---|---|
| Kiến thức | Đã hoàn thành Buổi 14 lý thuyết (Industry Case Studies) |
| Python | pandas, matplotlib, seaborn (đã học Buổi 7-10) |
| Thời gian | 90–120 phút |
| Output | Analysis notebook + mini dashboard (PNG) + executive summary |
💡 Naming convention
Đặt tên file: HoTen_Buoi14_[Marketing|Finance|SupplyChain].ipynb
📦 Chọn Dataset
Bạn chọn 1 trong 3 datasets dưới đây. Mỗi dataset mô phỏng dữ liệu thực tế của ngành:
| # | Dataset | Ngành | Bài toán chính | Difficulty |
|---|---|---|---|---|
| 🅰️ | Marketing Campaign | Marketing Analytics | Channel attribution, RFM, ROAS optimization | ⭐⭐ Medium |
| 🅱️ | Financial Statements | Finance Analytics | Variance analysis, financial ratios, P&L | ⭐⭐⭐ Hard |
| 🅲️ | Inventory & Demand | Supply Chain Analytics | Turnover, demand forecast, fill rate | ⭐⭐ Medium |
🅰️ Dataset A: Marketing Campaign Analytics
Phần A0: Tạo Dataset (10 phút)
python
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
np.random.seed(14)
# ============================================
# TABLE 1: CAMPAIGNS (5 channels, 12 months)
# ============================================
channels = ['Facebook', 'Google', 'TikTok', 'Email', 'SEO']
months = pd.date_range('2025-03-01', '2026-02-01', freq='MS')
campaign_records = []
for month in months:
for ch in channels:
base_spend = {'Facebook': 180, 'Google': 150, 'TikTok': 120, 'Email': 30, 'SEO': 50}[ch]
spend = int(base_spend * (1 + 0.1 * np.sin(month.month * np.pi / 6) + np.random.normal(0, 0.1)) * 1_000_000)
base_ctr = {'Facebook': 0.035, 'Google': 0.055, 'TikTok': 0.025, 'Email': 0.12, 'SEO': 0.04}[ch]
impressions = int(spend / ({'Facebook': 8, 'Google': 15, 'TikTok': 5, 'Email': 2, 'SEO': 3}[ch]))
clicks = int(impressions * base_ctr * np.random.uniform(0.8, 1.2))
base_cvr = {'Facebook': 0.045, 'Google': 0.065, 'TikTok': 0.025, 'Email': 0.08, 'SEO': 0.055}[ch]
conversions = int(clicks * base_cvr * np.random.uniform(0.8, 1.2))
aov = int(np.random.normal(320000, 50000))
revenue = conversions * aov
campaign_records.append({
'month': month, 'channel': ch, 'spend': spend,
'impressions': impressions, 'clicks': clicks,
'conversions': conversions, 'revenue': revenue, 'aov': aov
})
campaigns = pd.DataFrame(campaign_records)
print(f"✅ Campaigns: {len(campaigns)} rows")
print(campaigns.head(10))python
# ============================================
# TABLE 2: CUSTOMERS for RFM (3,000 customers)
# ============================================
n_customers = 3000
customers = pd.DataFrame({
'customer_id': [f'CUST{i:04d}' for i in range(1, n_customers + 1)],
'first_purchase': pd.date_range('2024-06-01', '2025-12-31', periods=n_customers),
'channel': np.random.choice(channels, n_customers, p=[0.30, 0.25, 0.15, 0.10, 0.20]),
'total_orders': np.random.poisson(lam=4, size=n_customers) + 1,
'total_revenue': np.random.lognormal(mean=13.2, sigma=0.7, size=n_customers).astype(int),
'last_purchase': pd.NaT
})
# Generate realistic last purchase dates
for idx in customers.index:
days_active = np.random.exponential(scale=120)
customers.loc[idx, 'last_purchase'] = customers.loc[idx, 'first_purchase'] + timedelta(days=int(days_active))
if customers.loc[idx, 'last_purchase'] > pd.Timestamp('2026-02-15'):
customers.loc[idx, 'last_purchase'] = pd.Timestamp('2026-02-15') - timedelta(days=np.random.randint(1, 30))
print(f"\n✅ Customers: {len(customers)} rows")
print(customers.head())Phần A1: Channel Performance Analysis (20 phút)
python
# ============================================
# TASK A1: Calculate KPIs per channel
# ============================================
# Aggregate by channel
channel_perf = campaigns.groupby('channel').agg(
total_spend=('spend', 'sum'),
total_impressions=('impressions', 'sum'),
total_clicks=('clicks', 'sum'),
total_conversions=('conversions', 'sum'),
total_revenue=('revenue', 'sum')
).reset_index()
# Calculate KPIs
channel_perf['CTR'] = (channel_perf['total_clicks'] / channel_perf['total_impressions'] * 100).round(2)
channel_perf['CVR'] = (channel_perf['total_conversions'] / channel_perf['total_clicks'] * 100).round(2)
channel_perf['CPC'] = (channel_perf['total_spend'] / channel_perf['total_clicks']).round(0)
channel_perf['CPA'] = (channel_perf['total_spend'] / channel_perf['total_conversions']).round(0)
channel_perf['ROAS'] = (channel_perf['total_revenue'] / channel_perf['total_spend']).round(2)
print("📊 CHANNEL PERFORMANCE SUMMARY")
print("=" * 80)
for _, row in channel_perf.iterrows():
print(f" {row['channel']:10s} | CTR: {row['CTR']:5.2f}% | CVR: {row['CVR']:5.2f}% | CPC: {row['CPC']:>7,.0f} | CPA: {row['CPA']:>9,.0f} | ROAS: {row['ROAS']:.2f}x")
# Best & worst channels
best = channel_perf.loc[channel_perf['ROAS'].idxmax(), 'channel']
worst = channel_perf.loc[channel_perf['ROAS'].idxmin(), 'channel']
print(f"\n🏆 Best ROAS: {best} | 🔴 Worst ROAS: {worst}")Phần A2: RFM Segmentation (25 phút)
python
# ============================================
# TASK A2: RFM Analysis
# ============================================
ref_date = pd.Timestamp('2026-02-15')
customers['recency_days'] = (ref_date - customers['last_purchase']).dt.days
customers['R'] = pd.qcut(customers['recency_days'], q=5, labels=[5,4,3,2,1]).astype(int)
customers['F'] = pd.qcut(customers['total_orders'].rank(method='first'), q=5, labels=[1,2,3,4,5]).astype(int)
customers['M'] = pd.qcut(customers['total_revenue'], q=5, labels=[1,2,3,4,5]).astype(int)
# Segment mapping
def rfm_segment(row):
if row['R'] >= 4 and row['F'] >= 4 and row['M'] >= 4:
return 'Champions'
elif row['R'] >= 4 and row['F'] >= 3:
return 'Loyal Customers'
elif row['R'] >= 4 and row['F'] <= 2:
return 'New Customers'
elif row['R'] >= 3 and row['F'] >= 3:
return 'Potential Loyalists'
elif row['R'] <= 2 and row['F'] >= 3:
return 'At Risk'
elif row['R'] <= 2 and row['F'] >= 4 and row['M'] >= 4:
return 'Cant Lose Them'
elif row['R'] <= 2 and row['F'] <= 2:
return 'Lost'
else:
return 'Need Attention'
customers['segment'] = customers.apply(rfm_segment, axis=1)
# Summary
rfm_summary = customers.groupby('segment').agg(
count=('customer_id', 'count'),
avg_recency=('recency_days', 'mean'),
avg_frequency=('total_orders', 'mean'),
avg_monetary=('total_revenue', 'mean')
).round(0).sort_values('avg_monetary', ascending=False)
print("📊 RFM SEGMENT SUMMARY")
print("=" * 75)
for seg, row in rfm_summary.iterrows():
print(f" {seg:20s} | N: {row['count']:>5.0f} | R: {row['avg_recency']:>5.0f}d | F: {row['avg_frequency']:>4.0f} | M: {row['avg_monetary']:>10,.0f}")
# Actionable segments
at_risk = customers[customers['segment'].isin(['At Risk', 'Cant Lose Them'])]
print(f"\n🚨 ACTION NEEDED: {len(at_risk)} customers in At Risk / Cant Lose Them segments")
print(f" → Estimated revenue at risk: {at_risk['total_revenue'].sum():,.0f} VND")Phần A3: Mini Dashboard + Executive Summary (30 phút)
python
# ============================================
# TASK A3: Marketing Mini Dashboard
# ============================================
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Marketing Analytics Dashboard — Brand Performance', fontsize=16, fontweight='bold')
# Chart 1: ROAS by Channel
ax1 = axes[0, 0]
colors = ['#4CAF50' if x >= 4 else '#FF9800' if x >= 2 else '#F44336' for x in channel_perf['ROAS']]
ax1.barh(channel_perf['channel'], channel_perf['ROAS'], color=colors)
ax1.set_xlabel('ROAS (x)')
ax1.set_title('ROAS by Channel')
ax1.axvline(x=4, color='green', linestyle='--', alpha=0.5, label='Target 4x')
ax1.legend()
# Chart 2: Monthly Revenue Trend
ax2 = axes[0, 1]
monthly_rev = campaigns.groupby('month')['revenue'].sum() / 1_000_000
ax2.plot(monthly_rev.index, monthly_rev.values, marker='o', color='#2196F3', linewidth=2)
ax2.set_title('Monthly Revenue Trend (M VND)')
ax2.tick_params(axis='x', rotation=45)
ax2.set_ylabel('Revenue (M VND)')
# Chart 3: RFM Segment Distribution
ax3 = axes[1, 0]
seg_counts = customers['segment'].value_counts()
colors_rfm = plt.cm.Set3(np.linspace(0, 1, len(seg_counts)))
ax3.pie(seg_counts.values, labels=seg_counts.index, autopct='%1.0f%%', colors=colors_rfm, textprops={'fontsize': 8})
ax3.set_title('Customer Segments (RFM)')
# Chart 4: Spend vs Revenue by Channel
ax4 = axes[1, 1]
x = channel_perf['total_spend'] / 1_000_000
y = channel_perf['total_revenue'] / 1_000_000
ax4.scatter(x, y, s=200, c='#9C27B0', alpha=0.7)
for i, ch in enumerate(channel_perf['channel']):
ax4.annotate(ch, (x.iloc[i], y.iloc[i]), fontsize=9, ha='center', va='bottom')
ax4.set_xlabel('Total Spend (M VND)')
ax4.set_ylabel('Total Revenue (M VND)')
ax4.set_title('Spend vs Revenue by Channel')
# Break-even line
max_val = max(x.max(), y.max())
ax4.plot([0, max_val], [0, max_val], 'r--', alpha=0.3, label='Break-even')
ax4.legend()
plt.tight_layout()
plt.savefig('Marketing_Dashboard.png', dpi=150, bbox_inches='tight')
plt.show()
print("✅ Dashboard saved: Marketing_Dashboard.png")python
# ============================================
# EXECUTIVE SUMMARY — Marketing
# ============================================
total_spend = campaigns['spend'].sum()
total_revenue = campaigns['revenue'].sum()
blended_roas = total_revenue / total_spend
print("=" * 60)
print("📋 EXECUTIVE SUMMARY — Marketing Analytics")
print("=" * 60)
print(f"\n📊 Overall Performance:")
print(f" Total Spend: {total_spend/1e9:>8.2f} tỷ VND")
print(f" Total Revenue: {total_revenue/1e9:>8.2f} tỷ VND")
print(f" Blended ROAS: {blended_roas:>8.2f}x")
print(f"\n🎯 Channel Insights:")
for _, row in channel_perf.sort_values('ROAS', ascending=False).iterrows():
status = '🟢' if row['ROAS'] >= 4 else '🟡' if row['ROAS'] >= 2 else '🔴'
print(f" {status} {row['channel']:10s}: ROAS {row['ROAS']:.2f}x | CPA {row['CPA']:>9,.0f} VND")
print(f"\n👥 Customer Segments:")
print(f" Champions: {len(customers[customers['segment']=='Champions']):>5} customers")
print(f" At Risk + Cant Lose:{len(at_risk):>5} customers (ACTION NEEDED)")
print(f" Revenue at risk: {at_risk['total_revenue'].sum()/1e6:>8,.0f}M VND")
print(f"\n💡 Recommendations:")
print(f" 1. Increase budget for {best} (highest ROAS)")
print(f" 2. Review {worst} — optimize or reduce spend")
print(f" 3. Launch win-back campaign for {len(at_risk)} At Risk customers")
print(f" 4. Invest in Champions referral program")🅱️ Dataset B: Financial Statements Analytics
Phần B0: Tạo Dataset (10 phút)
python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
np.random.seed(14)
# ============================================
# TABLE 1: MONTHLY P&L (12 months, Budget vs Actual)
# ============================================
months = pd.date_range('2025-03-01', '2026-02-01', freq='MS')
departments = ['Sales', 'Marketing', 'Engineering', 'Operations', 'HR', 'Finance']
pnl_records = []
for month in months:
# Revenue (seasonal pattern)
base_rev = 8500 # base monthly revenue in triệu VND
seasonal = 1 + 0.15 * np.sin((month.month - 1) * np.pi / 6)
budget_rev = int(base_rev * seasonal)
actual_rev = int(budget_rev * np.random.normal(1.0, 0.06))
# COGS (60-65% of revenue)
budget_cogs = int(budget_rev * 0.62)
actual_cogs = int(actual_rev * np.random.uniform(0.60, 0.66))
# OpEx by department
for dept in departments:
base_opex = {'Sales': 450, 'Marketing': 380, 'Engineering': 520,
'Operations': 280, 'HR': 150, 'Finance': 120}[dept]
budget_opex = int(base_opex * (1 + 0.005 * (month.month - 3)))
actual_opex = int(budget_opex * np.random.normal(1.0, 0.08))
pnl_records.append({
'month': month, 'department': dept,
'budget_revenue': budget_rev if dept == 'Sales' else 0,
'actual_revenue': actual_rev if dept == 'Sales' else 0,
'budget_cogs': budget_cogs if dept == 'Sales' else 0,
'actual_cogs': actual_cogs if dept == 'Sales' else 0,
'budget_opex': budget_opex,
'actual_opex': actual_opex
})
pnl = pd.DataFrame(pnl_records)
print(f"✅ P&L Data: {len(pnl)} rows")
print(pnl.head(10))python
# ============================================
# TABLE 2: BALANCE SHEET (Quarterly)
# ============================================
quarters = ['Q2-2025', 'Q3-2025', 'Q4-2025', 'Q1-2026']
balance_sheet = pd.DataFrame({
'quarter': quarters,
'cash': [12500, 11800, 10200, 13500],
'receivables': [8500, 9200, 11500, 8800],
'inventory': [6200, 5800, 7500, 5900],
'fixed_assets': [25000, 26500, 28000, 27500],
'total_assets': [52200, 53300, 57200, 55700],
'current_liabilities': [9800, 10500, 12800, 10200],
'long_term_debt': [15000, 14500, 14000, 13500],
'equity': [27400, 28300, 30400, 32000]
})
# Units: triệu VND
print(f"\n✅ Balance Sheet: {len(balance_sheet)} rows")
print(balance_sheet)Phần B1: Variance Analysis (25 phút)
python
# ============================================
# TASK B1: Monthly Variance Analysis
# ============================================
# Aggregate monthly P&L
monthly_pnl = pnl.groupby('month').agg(
budget_revenue=('budget_revenue', 'sum'),
actual_revenue=('actual_revenue', 'sum'),
budget_cogs=('budget_cogs', 'sum'),
actual_cogs=('actual_cogs', 'sum'),
budget_opex=('budget_opex', 'sum'),
actual_opex=('actual_opex', 'sum')
).reset_index()
monthly_pnl['budget_gross_profit'] = monthly_pnl['budget_revenue'] - monthly_pnl['budget_cogs']
monthly_pnl['actual_gross_profit'] = monthly_pnl['actual_revenue'] - monthly_pnl['actual_cogs']
monthly_pnl['budget_net_profit'] = monthly_pnl['budget_gross_profit'] - monthly_pnl['budget_opex']
monthly_pnl['actual_net_profit'] = monthly_pnl['actual_gross_profit'] - monthly_pnl['actual_opex']
# Variance
monthly_pnl['rev_variance'] = monthly_pnl['actual_revenue'] - monthly_pnl['budget_revenue']
monthly_pnl['rev_var_pct'] = (monthly_pnl['rev_variance'] / monthly_pnl['budget_revenue'] * 100).round(1)
monthly_pnl['profit_variance'] = monthly_pnl['actual_net_profit'] - monthly_pnl['budget_net_profit']
monthly_pnl['profit_var_pct'] = (monthly_pnl['profit_variance'] / monthly_pnl['budget_net_profit'] * 100).round(1)
print("📊 MONTHLY VARIANCE ANALYSIS")
print("=" * 85)
print(f"{'Month':>10s} | {'Rev Budget':>10s} | {'Rev Actual':>10s} | {'Rev Var%':>8s} | {'Profit Bgt':>10s} | {'Profit Act':>10s} | {'Profit Var%':>10s}")
print("-" * 85)
for _, r in monthly_pnl.iterrows():
rflag = '🟢' if r['rev_var_pct'] >= 0 else '🔴'
pflag = '🟢' if r['profit_var_pct'] >= 0 else '🔴'
print(f"{r['month'].strftime('%Y-%m'):>10s} | {r['budget_revenue']:>10,} | {r['actual_revenue']:>10,} | {rflag}{r['rev_var_pct']:>+5.1f}% | {r['budget_net_profit']:>10,} | {r['actual_net_profit']:>10,} | {pflag}{r['profit_var_pct']:>+7.1f}%")Phần B2: Financial Ratios (20 phút)
python
# ============================================
# TASK B2: Financial Ratios from Balance Sheet
# ============================================
print("📊 FINANCIAL RATIOS — Quarterly")
print("=" * 75)
for _, row in balance_sheet.iterrows():
current_ratio = (row['cash'] + row['receivables'] + row['inventory']) / row['current_liabilities']
quick_ratio = (row['cash'] + row['receivables']) / row['current_liabilities']
debt_equity = (row['current_liabilities'] + row['long_term_debt']) / row['equity']
roe = 0 # Would need net income per quarter
print(f"\n 📅 {row['quarter']}:")
print(f" Current Ratio: {current_ratio:.2f} {'🟢' if current_ratio >= 1.5 else '🟡' if current_ratio >= 1 else '🔴'}")
print(f" Quick Ratio: {quick_ratio:.2f} {'🟢' if quick_ratio >= 1 else '🔴'}")
print(f" Debt/Equity: {debt_equity:.2f} {'🟢' if debt_equity <= 1.5 else '🔴'}")
print(f" Total Assets: {row['total_assets']:>10,}M VND")Phần B3: Mini Dashboard + Executive Summary (30 phút)
python
# ============================================
# TASK B3: Finance Mini Dashboard
# ============================================
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Financial Analytics Dashboard — Company Performance', fontsize=16, fontweight='bold')
# Chart 1: Revenue — Budget vs Actual
ax1 = axes[0, 0]
x_months = range(len(monthly_pnl))
ax1.bar([x - 0.2 for x in x_months], monthly_pnl['budget_revenue'], 0.4, label='Budget', color='#90CAF9', alpha=0.8)
ax1.bar([x + 0.2 for x in x_months], monthly_pnl['actual_revenue'], 0.4, label='Actual', color='#1565C0', alpha=0.8)
ax1.set_title('Revenue: Budget vs Actual')
ax1.set_ylabel('Revenue (M VND)')
ax1.legend()
ax1.set_xticks(x_months)
ax1.set_xticklabels([m.strftime('%b') for m in monthly_pnl['month']], rotation=45)
# Chart 2: Profit Variance Trend
ax2 = axes[0, 1]
colors_var = ['#4CAF50' if v >= 0 else '#F44336' for v in monthly_pnl['profit_variance']]
ax2.bar(x_months, monthly_pnl['profit_variance'], color=colors_var)
ax2.set_title('Monthly Profit Variance (M VND)')
ax2.axhline(y=0, color='black', linewidth=0.5)
ax2.set_xticks(x_months)
ax2.set_xticklabels([m.strftime('%b') for m in monthly_pnl['month']], rotation=45)
# Chart 3: Gross Margin Trend
ax3 = axes[1, 0]
monthly_pnl['actual_gm_pct'] = (monthly_pnl['actual_gross_profit'] / monthly_pnl['actual_revenue'] * 100)
monthly_pnl['budget_gm_pct'] = (monthly_pnl['budget_gross_profit'] / monthly_pnl['budget_revenue'] * 100)
ax3.plot(x_months, monthly_pnl['budget_gm_pct'], '--', color='gray', label='Budget GM%')
ax3.plot(x_months, monthly_pnl['actual_gm_pct'], '-o', color='#FF9800', label='Actual GM%')
ax3.set_title('Gross Margin % Trend')
ax3.set_ylabel('Gross Margin (%)')
ax3.legend()
ax3.set_xticks(x_months)
ax3.set_xticklabels([m.strftime('%b') for m in monthly_pnl['month']], rotation=45)
# Chart 4: Balance Sheet — Liquidity Ratios
ax4 = axes[1, 1]
current_ratios = [(row['cash'] + row['receivables'] + row['inventory']) / row['current_liabilities']
for _, row in balance_sheet.iterrows()]
quick_ratios = [(row['cash'] + row['receivables']) / row['current_liabilities']
for _, row in balance_sheet.iterrows()]
x_q = range(len(quarters))
ax4.plot(x_q, current_ratios, '-o', label='Current Ratio', color='#2196F3', linewidth=2)
ax4.plot(x_q, quick_ratios, '-s', label='Quick Ratio', color='#4CAF50', linewidth=2)
ax4.axhline(y=1.5, color='blue', linestyle='--', alpha=0.3, label='Current target (1.5)')
ax4.axhline(y=1.0, color='green', linestyle='--', alpha=0.3, label='Quick target (1.0)')
ax4.set_title('Liquidity Ratios')
ax4.set_xticks(x_q)
ax4.set_xticklabels(quarters)
ax4.legend(fontsize=8)
plt.tight_layout()
plt.savefig('Finance_Dashboard.png', dpi=150, bbox_inches='tight')
plt.show()
print("✅ Dashboard saved: Finance_Dashboard.png")python
# ============================================
# EXECUTIVE SUMMARY — Finance
# ============================================
total_budget_rev = monthly_pnl['budget_revenue'].sum()
total_actual_rev = monthly_pnl['actual_revenue'].sum()
total_budget_profit = monthly_pnl['budget_net_profit'].sum()
total_actual_profit = monthly_pnl['actual_net_profit'].sum()
rev_var = (total_actual_rev - total_budget_rev) / total_budget_rev * 100
profit_var = (total_actual_profit - total_budget_profit) / total_budget_profit * 100
months_miss_rev = (monthly_pnl['rev_var_pct'] < 0).sum()
months_miss_profit = (monthly_pnl['profit_var_pct'] < 0).sum()
print("=" * 60)
print("📋 EXECUTIVE SUMMARY — Finance Analytics")
print("=" * 60)
print(f"\n📊 Annual Performance (12 months):")
print(f" Revenue: {total_actual_rev/1000:>8,.0f} tỷ vs Budget {total_budget_rev/1000:>8,.0f} tỷ ({rev_var:+.1f}%)")
print(f" Profit: {total_actual_profit/1000:>8,.0f} tỷ vs Budget {total_budget_profit/1000:>8,.0f} tỷ ({profit_var:+.1f}%)")
print(f" Rev miss months: {months_miss_rev}/12")
print(f" Profit miss months: {months_miss_profit}/12")
print(f"\n📊 Liquidity (Latest Quarter):")
print(f" Current Ratio: {current_ratios[-1]:.2f}")
print(f" Quick Ratio: {quick_ratios[-1]:.2f}")
# Identify worst variance department
dept_variance = pnl.groupby('department').agg(
budget_opex=('budget_opex', 'sum'),
actual_opex=('actual_opex', 'sum')
).reset_index()
dept_variance['var_pct'] = ((dept_variance['actual_opex'] - dept_variance['budget_opex']) / dept_variance['budget_opex'] * 100).round(1)
worst_dept = dept_variance.loc[dept_variance['var_pct'].idxmax()]
print(f"\n⚠️ Cost Alert:")
print(f" Highest overrun: {worst_dept['department']} ({worst_dept['var_pct']:+.1f}% vs budget)")
print(f"\n💡 Recommendations:")
print(f" 1. Deep dive {worst_dept['department']} cost overrun — identify root cause")
print(f" 2. Monitor gross margin trend — {'DECLINING' if monthly_pnl['actual_gm_pct'].iloc[-1] < monthly_pnl['actual_gm_pct'].iloc[0] else 'STABLE'}")
print(f" 3. {'Strengthen receivables collection' if balance_sheet['receivables'].iloc[-1] > balance_sheet['receivables'].iloc[0] else 'Receivables well managed'}")
print(f" 4. Reforecast remaining quarters with updated assumptions")🅲️ Dataset C: Supply Chain & Inventory Analytics
Phần C0: Tạo Dataset (10 phút)
python
import pandas as pd
import numpy as np
from datetime import timedelta
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
np.random.seed(14)
# ============================================
# TABLE 1: DAILY INVENTORY & SALES (365 days, 6 products)
# ============================================
products = ['Sữa tươi', 'Sữa chua', 'Nước trái cây', 'Sữa bột', 'Sữa đặc', 'Yogurt drink']
shelf_lives = {'Sữa tươi': 10, 'Sữa chua': 21, 'Nước trái cây': 90,
'Sữa bột': 365, 'Sữa đặc': 180, 'Yogurt drink': 14}
base_demands = {'Sữa tươi': 1200, 'Sữa chua': 800, 'Nước trái cây': 600,
'Sữa bột': 300, 'Sữa đặc': 400, 'Yogurt drink': 500}
dates = pd.date_range('2025-03-01', '2026-02-28')
inv_records = []
for product in products:
inventory = int(base_demands[product] * 5) # Start with 5 days stock
for date in dates:
# Demand with seasonality + day-of-week + noise
seasonal = 1 + 0.2 * np.sin((date.month - 1) * np.pi / 6)
dow_factor = 1.3 if date.dayofweek >= 5 else 1.0 # Weekend boost
demand = int(base_demands[product] * seasonal * dow_factor * np.random.uniform(0.8, 1.2))
# Replenishment (every 3-7 days depending on product)
replenish_freq = max(3, shelf_lives[product] // 3)
received = int(base_demands[product] * replenish_freq * 0.9) if date.day % replenish_freq == 0 else 0
# Sales = min(demand, inventory)
sold = min(demand, inventory + received)
stockout = 1 if demand > inventory + received else 0
waste = max(0, int(inventory * 0.01)) if shelf_lives[product] <= 21 else 0 # Perishable waste
inventory = max(0, inventory + received - sold - waste)
inv_records.append({
'date': date, 'product': product,
'demand': demand, 'sold': sold, 'received': received,
'inventory_end': inventory, 'stockout': stockout,
'waste': waste, 'shelf_life': shelf_lives[product]
})
inventory_df = pd.DataFrame(inv_records)
print(f"✅ Inventory Data: {len(inventory_df)} rows")
print(inventory_df.head(10))python
# ============================================
# TABLE 2: SUPPLIER PERFORMANCE (6 suppliers, 12 months)
# ============================================
suppliers = ['NCC_A (Sữa tươi)', 'NCC_B (Sữa chua)', 'NCC_C (Nước ép)',
'NCC_D (Sữa bột)', 'NCC_E (Sữa đặc)', 'NCC_F (Yogurt)']
supplier_records = []
for month in pd.date_range('2025-03-01', '2026-02-01', freq='MS'):
for sup in suppliers:
on_time = np.random.uniform(0.85, 0.99)
quality = np.random.uniform(0.95, 1.0)
cost_idx = np.random.uniform(0.9, 1.1) # vs market average
supplier_records.append({
'month': month, 'supplier': sup,
'on_time_rate': round(on_time, 3),
'quality_rate': round(quality, 3),
'cost_index': round(cost_idx, 2),
'orders_placed': np.random.randint(8, 20)
})
suppliers_df = pd.DataFrame(supplier_records)
print(f"\n✅ Supplier Data: {len(suppliers_df)} rows")
print(suppliers_df.head())Phần C1: Inventory KPIs (20 phút)
python
# ============================================
# TASK C1: Inventory Metrics per Product
# ============================================
print("📦 INVENTORY KPI DASHBOARD")
print("=" * 90)
for product in products:
p = inventory_df[inventory_df['product'] == product]
total_sold = p['sold'].sum()
total_demand = p['demand'].sum()
avg_inventory = p['inventory_end'].mean()
total_waste = p['waste'].sum()
stockout_days = p['stockout'].sum()
turnover = total_sold / avg_inventory if avg_inventory > 0 else 0
dos = avg_inventory / (total_sold / 365) if total_sold > 0 else 0
fill_rate = total_sold / total_demand * 100
waste_rate = total_waste / (total_sold + total_waste) * 100 if (total_sold + total_waste) > 0 else 0
shelf = shelf_lives[product]
risk = '🔴 WASTE RISK' if dos > shelf * 0.5 else '🟢 OK'
print(f"\n 📦 {product:15s} (Shelf life: {shelf}d)")
print(f" Turnover: {turnover:>6.1f}x/year")
print(f" Days of Supply: {dos:>5.1f} days {risk}")
print(f" Fill Rate: {fill_rate:>6.1f}%")
print(f" Waste Rate: {waste_rate:>6.2f}%")
print(f" Stockout Days: {stockout_days:>4d}/{len(p)} days ({stockout_days/len(p)*100:.1f}%)")Phần C2: Demand Pattern & Supplier Scorecard (25 phút)
python
# ============================================
# TASK C2: Demand Pattern Analysis
# ============================================
# Weekly demand pattern (day of week)
inventory_df['dow'] = inventory_df['date'].dt.day_name()
dow_demand = inventory_df.groupby(['product', 'dow'])['demand'].mean().unstack()
dow_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
dow_demand = dow_demand[dow_order]
print("📊 DEMAND BY DAY OF WEEK (Average daily demand)")
print("=" * 100)
print(dow_demand.round(0).to_string())
# Monthly seasonality
inventory_df['month_name'] = inventory_df['date'].dt.month_name()
monthly_demand = inventory_df.groupby(['product', inventory_df['date'].dt.month])['demand'].mean()
print("\n\n📊 SUPPLIER SCORECARD")
print("=" * 80)
# Aggregate supplier performance
sup_perf = suppliers_df.groupby('supplier').agg(
avg_on_time=('on_time_rate', 'mean'),
avg_quality=('quality_rate', 'mean'),
avg_cost_idx=('cost_index', 'mean'),
total_orders=('orders_placed', 'sum')
).round(3)
# Weighted score (30% on-time, 30% quality, 20% cost, 20% volume)
sup_perf['score'] = (
sup_perf['avg_on_time'] * 30 +
sup_perf['avg_quality'] * 30 +
(2 - sup_perf['avg_cost_idx']) * 20 + # Lower cost = higher score
(sup_perf['total_orders'] / sup_perf['total_orders'].max()) * 20
).round(1)
sup_perf = sup_perf.sort_values('score', ascending=False)
for sup, row in sup_perf.iterrows():
grade = '🟢 A' if row['score'] >= 75 else '🟡 B' if row['score'] >= 65 else '🔴 C'
print(f" {sup:25s} | On-time: {row['avg_on_time']:.1%} | Quality: {row['avg_quality']:.1%} | Cost: {row['avg_cost_idx']:.2f}x | Score: {row['score']:.0f} | {grade}")Phần C3: Mini Dashboard + Executive Summary (30 phút)
python
# ============================================
# TASK C3: Supply Chain Mini Dashboard
# ============================================
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Supply Chain Analytics Dashboard — Inventory & Demand', fontsize=16, fontweight='bold')
# Chart 1: Inventory Turnover by Product
ax1 = axes[0, 0]
turnover_data = []
for product in products:
p = inventory_df[inventory_df['product'] == product]
t = p['sold'].sum() / p['inventory_end'].mean() if p['inventory_end'].mean() > 0 else 0
turnover_data.append(t)
colors_t = ['#4CAF50' if t >= 20 else '#FF9800' if t >= 10 else '#F44336' for t in turnover_data]
ax1.barh(products, turnover_data, color=colors_t)
ax1.set_xlabel('Turnover (x/year)')
ax1.set_title('Inventory Turnover by Product')
# Chart 2: Daily Demand Trend (top product)
ax2 = axes[0, 1]
top_product = 'Sữa tươi'
tp = inventory_df[inventory_df['product'] == top_product].set_index('date')
ax2.plot(tp.index, tp['demand'].rolling(7).mean(), color='#2196F3', linewidth=1.5, label='Demand (7-day MA)')
ax2.plot(tp.index, tp['sold'].rolling(7).mean(), color='#4CAF50', linewidth=1.5, label='Sold (7-day MA)', alpha=0.7)
ax2.set_title(f'Demand vs Sold Trend — {top_product}')
ax2.legend(fontsize=8)
ax2.tick_params(axis='x', rotation=45)
# Chart 3: Fill Rate by Product
ax3 = axes[1, 0]
fill_rates = []
for product in products:
p = inventory_df[inventory_df['product'] == product]
fr = p['sold'].sum() / p['demand'].sum() * 100
fill_rates.append(fr)
colors_fr = ['#4CAF50' if fr >= 95 else '#FF9800' if fr >= 90 else '#F44336' for fr in fill_rates]
bars = ax3.bar(range(len(products)), fill_rates, color=colors_fr)
ax3.set_xticks(range(len(products)))
ax3.set_xticklabels(products, rotation=45, ha='right', fontsize=8)
ax3.set_ylabel('Fill Rate (%)')
ax3.set_title('Fill Rate by Product')
ax3.axhline(y=95, color='green', linestyle='--', alpha=0.5, label='Target 95%')
ax3.legend()
ax3.set_ylim(80, 101)
# Chart 4: Supplier Score
ax4 = axes[1, 1]
sup_names = [s.split(' ')[0] for s in sup_perf.index]
colors_sup = ['#4CAF50' if s >= 75 else '#FF9800' if s >= 65 else '#F44336' for s in sup_perf['score']]
ax4.barh(sup_names, sup_perf['score'], color=colors_sup)
ax4.set_xlabel('Composite Score')
ax4.set_title('Supplier Scorecard')
ax4.axvline(x=75, color='green', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.savefig('SupplyChain_Dashboard.png', dpi=150, bbox_inches='tight')
plt.show()
print("✅ Dashboard saved: SupplyChain_Dashboard.png")python
# ============================================
# EXECUTIVE SUMMARY — Supply Chain
# ============================================
total_demand = inventory_df['demand'].sum()
total_sold = inventory_df['sold'].sum()
total_waste = inventory_df['waste'].sum()
overall_fill = total_sold / total_demand * 100
total_stockout_days = inventory_df['stockout'].sum()
print("=" * 60)
print("📋 EXECUTIVE SUMMARY — Supply Chain Analytics")
print("=" * 60)
print(f"\n📦 Overall Performance (12 months):")
print(f" Total Demand: {total_demand:>12,} units")
print(f" Total Sold: {total_sold:>12,} units")
print(f" Overall Fill Rate: {overall_fill:>10.1f}%")
print(f" Total Waste: {total_waste:>12,} units")
print(f" Stockout Days: {total_stockout_days:>12,} (across all products)")
# Worst performing product
worst_fill = 100
worst_product = ''
for product in products:
p = inventory_df[inventory_df['product'] == product]
fr = p['sold'].sum() / p['demand'].sum() * 100
if fr < worst_fill:
worst_fill = fr
worst_product = product
print(f"\n⚠️ Attention:")
print(f" Lowest fill rate: {worst_product} ({worst_fill:.1f}%)")
print(f" Waste cost (est): {total_waste * 25000 / 1e6:,.0f}M VND")
# Best & worst supplier
best_sup = sup_perf.index[0]
worst_sup = sup_perf.index[-1]
print(f"\n🏆 Best Supplier: {best_sup} (Score: {sup_perf.loc[best_sup, 'score']:.0f})")
print(f"🔴 Worst Supplier: {worst_sup} (Score: {sup_perf.loc[worst_sup, 'score']:.0f})")
print(f"\n💡 Recommendations:")
print(f" 1. Increase safety stock for {worst_product} — reduce stockouts")
print(f" 2. Negotiate improvement plan with {worst_sup}")
print(f" 3. Implement dynamic safety stock based on demand variability")
print(f" 4. Add weather data to demand forecasting model")✅ Deliverables Checklist
Trước khi submit, kiểm tra:
✅ Dataset created (chosen: Marketing / Finance / Supply Chain)
✅ KPI Calculation
├── Marketing: CTR, CVR, CPC, CPA, ROAS + RFM segments
├── Finance: Revenue/Profit variance, Gross Margin, Financial Ratios
└── Supply Chain: Turnover, DOS, Fill Rate, Waste Rate, Supplier Score
✅ Mini Dashboard (4 charts, saved as PNG)
✅ Executive Summary (printed — key metrics + insights + recommendations)
✅ At least 3 actionable recommendationsTiêu chí đánh giá
| Tiêu chí | Trọng số | Mô tả |
|---|---|---|
| KPI Calculation | 25% | Tính đúng domain-specific KPIs, giải thích ý nghĩa |
| Analysis Depth | 25% | Không chỉ tính KPI mà còn breakdown, find patterns |
| Dashboard | 25% | 4+ charts relevant, clear, well-labeled |
| Executive Summary | 25% | Concise, has insight + data-backed recommendations |
💡 Nâng cao (tùy chọn)
Nếu hoàn thành sớm, thử thêm:
- Marketing: Attribution simulation — compare ROAS under different attribution models
- Finance: Cash flow forecast 6 months + scenario analysis (best/base/worst)
- Supply Chain: Build simple demand forecasting with exponential smoothing
- Cross-industry: Do 2 datasets → compare insights across industries