Skip to content

🧠 Case Study — Capstone Analysis Sprint: 3 Projects thực tế

Trong buổi học này, chúng ta đã nắm sprint methodology, dashboard development, và executive summary writing. Bây giờ hãy xem 3 capstone projects ở các mức độ khác nhau — từ một project "trước và sau refactor", đến sprint process hoàn chỉnh, và customer segmentation end-to-end. Mỗi case study cho thấy process quan trọng hơn output — cách bạn đi từ raw data đến insight quyết định chất lượng cuối cùng.


Case Study 1: E-commerce Revenue Analysis — Before & After Dashboard

Bối cảnh

Dataset: Online retail — 25,000 transactions, 18 columns, 12 tháng (2025). Nguồn: Kaggle dataset, cleaned & augmented.

Business Context: ShopLocal — sàn e-commerce Việt Nam, GMV 50 tỷ VND/năm, 8,000 active sellers, 120,000 monthly buyers. CEO muốn biết: "Revenue đang đi đâu? Nên focus vào segment nào?"

5 Business Questions:

#QuestionType
Q1Revenue trend 12 tháng — tháng nào peak, tháng nào dip?Descriptive
Q2Top 10 product categories contribute bao nhiêu % revenue?Diagnostic
Q3Khách hàng mới vs returning khác nhau thế nào về AOV?Comparative
Q4Nên focus region nào để tăng revenue?Prescriptive
Q5Revenue Q1 2026 forecast?Predictive

❌ Version 1: Dashboard "trước khi peer review"

Lỗi phổ biến mà capstone projects thường mắc:

┌──────────────────────────────────────────────────────────────┐
│  ❌ DASHBOARD v1 — "Chart Dump"                              │
│                                                               │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐     │
│  │Pie   │ │Bar   │ │Line  │ │Scatter│ │Table │ │Area  │     │
│  │Chart │ │Chart │ │Chart │ │Plot  │ │     │ │Chart │     │
│  └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘     │
│                                                               │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐                        │
│  │Heatmap│ │Donut │ │Treemap│ │Gauge │                       │
│  │      │ │      │ │       │ │      │                        │
│  └──────┘ └──────┘ └──────┘ └──────┘                        │
│                                                               │
│  Problems:                                                    │
│  - 10 charts, 1 page — overwhelm                             │
│  - No visual hierarchy — equal size                           │
│  - No titling — "Chart 1", "Chart 2"                          │
│  - Colors: rainbow — mỗi chart 1 palette                     │
│  - No KPI cards — no context                                  │
│  - No filters — static view only                              │
└──────────────────────────────────────────────────────────────┘

Peer Review Feedback cho v1:

ReviewerFeedbackPriority
Reviewer A"Nhìn vào đâu trước? Không có hierarchy"🔴 High
Reviewer B"10 charts nhưng không chart nào answer question"🔴 High
Reviewer C"Pie chart 12 slices — unreadable. Dùng bar chart"🟡 Medium
Reviewer D"Color inconsistent — 'Electronics' xanh ở chart 1, đỏ ở chart 3"🟡 Medium
Reviewer E"Không có title, không biết dashboard này cho ai"🟡 Medium

✅ Version 2: Dashboard "sau khi refactor"

┌──────────────────────────────────────────────────────────────┐
│  ✅ DASHBOARD v2 — PAGE 1: EXECUTIVE OVERVIEW                │
│  "ShopLocal Revenue Dashboard — FY2025"                      │
│                                                               │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐        │
│  │ Revenue  │ │ Orders   │ │ AOV      │ │ Active   │        │
│  │ 52.3B    │ │ 25,041   │ │ 2.09M   │ │ Buyers   │        │
│  │ ▲ +12%   │ │ ▲ +8%    │ │ ▲ +3.5%  │ │ 118,420  │        │
│  │ vs LY    │ │ vs LY    │ │ vs LY    │ │ ▲ +15%   │        │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘        │
│                                                               │
│  ┌────────────────────────────┐ ┌────────────────┐           │
│  │  Revenue Trend (Line)     │ │ Top Categories │           │
│  │  ──────────────           │ │ (Horizontal Bar)│           │
│  │     /\    /\   📌+22%    │ │ Electronics 32%│           │
│  │    /  \  /  \  Campaign X│ │ Fashion    24%│           │
│  │   /    \/    \           │ │ Home       18%│           │
│  │  ────────────────────    │ │ Beauty     12%│           │
│  │  Annotated: Q3 dip,     │ │ Others     14%│           │
│  │  campaign impact         │ │               │           │
│  └────────────────────────────┘ └────────────────┘           │
│                                                               │
│  Filter: [Date Range ▼] [Region ▼] [Category ▼]             │
└──────────────────────────────────────────────────────────────┘

Những gì thay đổi:

AspectBefore (v1)After (v2)
Layout10 charts, 1 page3 pages, 3-4 charts each
HierarchyAll same sizeKPI cards top → trend middle → details bottom
Titles"Chart 1", "Chart 2"Descriptive: "Revenue Trend by Month (FY2025)"
ColorsRainbow3-color palette: blue (primary), gray (secondary), red (alert)
KPI CardsNone4 cards with YoY comparison
AnnotationsNoneCampaign impact, Q3 dip explanation
FiltersNoneDate, Region, Category slicer
QuestionsAnswered 0Each page answers 1-2 questions

Key Finding

"Pie chart 12 slices → bar chart horizontal đã tăng readability 300%. Peer review là bước tốn ít thời gian nhất nhưng tạo nhiều giá trị nhất."


Case Study 2: Healthcare Churn — Sprint Process hoàn chỉnh

Bối cảnh

Dataset: Subscription data từ một health-tech app — 15,000 users, 22 columns, 24 tháng. Bao gồm: user demographics, subscription plan, app usage, health metrics tracking, support tickets.

Sprint Timeline:

mermaid
gantt
    title Capstone Sprint — Healthcare Churn
    dateFormat HH:mm
    axisFormat %H:%M
    section Data Prep
        Data Audit Review      :a1, 08:00, 30m
        Data Cleaning          :a2, after a1, 90m
    section Analysis
        EDA                    :b1, after a2, 60m
        Q1-Q3 Analysis         :b2, after b1, 90m
        Q4-Q5 Analysis         :b3, after b2, 60m
    section Dashboard
        Mockup Sketch          :c1, after b3, 20m
        Dashboard Build        :c2, after c1, 120m
        Peer Review + Fix      :c3, after c2, 40m
    section Summary
        Executive Summary      :d1, after c3, 40m

Sprint Log

TimeTaskOutputIssue / Note
08:00Review data auditConfirmed: 4.2% missing in age, 8% in incomeOK — impute median
08:30Data cleaningCleaned dataset 14,370 rows (dropped 630 invalid)signup_date format inconsistent → fixed 3 formats
10:00EDA8 key distributions, correlation matrixlogin_frequency strongly correlated with churn (r = -0.62)
11:00Q1: Churn trendMonthly churn trending up: 3.1% → 5.4% over 12 monthsSpike in month 8 — coincides with pricing change
11:30Q2: Segment analysisFree-to-Premium < 3 months: churn 12.3%. Annual: 1.8%Monthly plan is the problem — not the product
12:00Lunch break
13:00Q3: Churn predictorsTop 3: login_frequency, feature_adoption, tenureUsers who log in < 2x/week → 5x churn risk
14:00Q4: Cohort analysisMonth-6 cohort: 40% retention. Month-12: 28%Steep drop at month 3 — onboarding issue
14:30Q5: Recommendations3 actionable recommendations with impact estimates
15:00Dashboard mockupSketched 3 pages on paperShowed to partner for quick feedback
15:20Dashboard buildPage 1: Overview (KPIs + trend)Used template layout from Buổi 11
16:00Dashboard buildPage 2: Segment deep diveAdded cross-filters between plan type and churn
16:40Dashboard buildPage 3: Recommendations + annotationsAnnotated the pricing change impact on trend chart
17:00Peer review3 feedback items: axis labels, tooltip detail, colorFixed all 3 in 15 minutes
17:20Executive summary1-page draft with 5 findingsFollowed What → So What → Now What template
18:00DoneNotebook (38 cells) + Dashboard (3 pages) + SummarySprint complete ✅

Analysis Deep Dive: Cohort Retention

Một trong những analyses mạnh nhất trong project này là cohort retention analysis:

python
# ============================================
# COHORT RETENTION ANALYSIS
# ============================================

# Create cohort groups by signup month
df['signup_month'] = df['signup_date'].dt.to_period('M')
df['activity_month'] = df['last_active_date'].dt.to_period('M')
df['cohort_age'] = (df['activity_month'] - df['signup_month']).apply(lambda x: x.n)

# Build cohort table
cohort = df.groupby(['signup_month', 'cohort_age'])['user_id'].nunique().reset_index()
cohort_pivot = cohort.pivot(index='signup_month', columns='cohort_age', values='user_id')

# Calculate retention rates
cohort_size = cohort_pivot[0]
retention = cohort_pivot.divide(cohort_size, axis=0) * 100

print("📊 Cohort Retention Table (%):")
print(retention.round(1))

Output:

Signup MonthM0M1M2M3M6M12
2024-01100%72%58%44%31%22%
2024-04100%68%51%38%25%
2024-07100%65%48%35%
2024-10100%61%42%

Finding: Retention đang xấu đi theo thời gian — cohort mới có M1 retention thấp hơn cohort cũ (61% vs 72%). Root cause: pricing change ở tháng 7 làm tăng expectation nhưng không cải thiện onboarding. Users trả nhiều hơn nhưng vẫn không biết dùng product.

Dashboard Highlight: Annotated Trend

Churn Rate Trend — 24 Months
6% ─                                          ╱── 5.4%
    │                              📌 Price   ╱
5% ─                              increase  ╱
    │                                ↓     ╱
4% ─           ─────────── ────────╱─────╱
    │          ╱                  ╱
3% ─ ────────╱                  ╱
    │  3.1%       Stable         ╱
2% ─              period        Post-price-change
    │                            acceleration
    └────┬────┬────┬────┬────┬────┬────┬────→
        M1   M4   M7   M10  M13  M16  M19  M22

Lesson Learned

What workedWhat didn'tNext time
Sprint log — track every hourSpent too long on formatting EDA chartsTime-box chart formatting: 5 min max per chart
Paper mockup before dashboardFirst attempt: too many filtersStart with 2 filters, add more if needed
Peer review before summaryWrote summary before final dashboard → had to rewriteDashboard first, summary last

Case Study 3: FinTech Customer Segmentation — Data to Action

Bối cảnh

Dataset: Transaction data từ fintech app (MoMo-like) — 40,000 users, 28 columns, data 2024-2025. Features: transaction frequency, average amount, categories (F&B, bills, transfer, savings), demographics.

Business Context: FinPay — ví điện tử 2 triệu users. Marketing team cần biết: "Có mấy nhóm users? Mỗi nhóm cần chiến lược gì?"

5 Business Questions:

#QuestionMethod
Q1Users phân bổ thế nào theo behavior?RFM Analysis
Q2Có bao nhiêu segments tự nhiên?K-Means Clustering
Q3Mỗi segment có đặc điểm gì?Segment profiling
Q4Segment nào có highest LTV?LTV calculation per segment
Q5Marketing strategy cho mỗi segment?Prescriptive recommendations

Analysis Process

Step 1: RFM Analysis

python
# ============================================
# RFM ANALYSIS
# ============================================
import pandas as pd
import numpy as np
from datetime import datetime

# Calculate RFM
snapshot_date = datetime(2025, 12, 31)
rfm = df.groupby('user_id').agg({
    'transaction_date': lambda x: (snapshot_date - x.max()).days,  # Recency
    'transaction_id': 'count',                                     # Frequency
    'amount': 'sum'                                                # Monetary
}).rename(columns={
    'transaction_date': 'Recency',
    'transaction_id': 'Frequency',
    'amount': 'Monetary'
})

# RFM Scoring (quintiles)
rfm['R_Score'] = pd.qcut(rfm['Recency'], 5, labels=[5,4,3,2,1])
rfm['F_Score'] = pd.qcut(rfm['Frequency'].rank(method='first'), 5, labels=[1,2,3,4,5])
rfm['M_Score'] = pd.qcut(rfm['Monetary'], 5, labels=[1,2,3,4,5])
rfm['RFM_Score'] = rfm['R_Score'].astype(str) + rfm['F_Score'].astype(str) + rfm['M_Score'].astype(str)

print(f"RFM Table:\n{rfm.head(10)}")

Step 2: K-Means Clustering

python
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

# Standardize
scaler = StandardScaler()
rfm_scaled = scaler.fit_transform(rfm[['Recency', 'Frequency', 'Monetary']])

# Elbow method
inertias = []
for k in range(2, 10):
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    km.fit(rfm_scaled)
    inertias.append(km.inertia_)

# Optimal: k=4 (elbow at 4)
km_final = KMeans(n_clusters=4, random_state=42, n_init=10)
rfm['Segment'] = km_final.fit_predict(rfm_scaled)

Step 3: Segment Profiling

SegmentNameUsersAvg RecencyAvg FrequencyAvg MonetaryLTVStrategy
0💎 Champions6,200 (15.5%)5 days48/month8.2M VND42MLoyalty rewards, early access
1🌟 Promising12,800 (32%)18 days22/month3.1M VND15MUpsell, cross-sell campaigns
2😴 At Risk14,000 (35%)45 days8/month1.2M VND4MRe-engagement, push notifications
3💤 Hibernating7,000 (17.5%)90 days2/month0.3M VND0.8MWin-back email, discount

Dashboard: Segment View

┌──────────────────────────────────────────────────────────────┐
│  PAGE 1: CUSTOMER SEGMENTS OVERVIEW                          │
│                                                               │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐    │
│  │ 💎 Champs │ │ 🌟 Promise│ │ 😴 At Risk│ │ 💤 Hibern │    │
│  │ 6,200     │ │ 12,800    │ │ 14,000    │ │ 7,000     │    │
│  │ LTV: 42M  │ │ LTV: 15M  │ │ LTV: 4M   │ │ LTV: 0.8M│    │
│  └───────────┘ └───────────┘ └───────────┘ └───────────┘    │
│                                                               │
│  ┌────────────────────────────────┐ ┌──────────────────┐     │
│  │  Scatter: Frequency vs         │ │ Revenue Contrib  │     │
│  │  Monetary (color = segment)    │ │ by Segment (Bar) │     │
│  │     💎💎                       │ │ Champs: 52%      │     │
│  │    💎   🌟🌟                   │ │ Promise: 28%     │     │
│  │  🌟🌟🌟 😴😴                  │ │ At Risk: 15%     │     │
│  │  😴😴  💤💤💤                  │ │ Hibern: 5%       │     │
│  └────────────────────────────────┘ └──────────────────┘     │
│                                                               │
│  ┌────────────────────────────────────────────────────────┐   │
│  │  Segment Migration: How users move between segments    │   │
│  │  (Sankey or alluvial diagram)                          │   │
│  └────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────┘

Executive Summary Excerpt

📊 KEY FINDING:
15.5% users (Champions) contribute 52% revenue.
35% users (At Risk) have not transacted in 45+ days.

💡 RECOMMENDATION:
1. Champions: Loyalty program → retain 95%+ → protect 52% revenue
   Cost: 200M VND/year. Expected: retain 5,900 Champions → save 248B revenue

2. At Risk: Re-engagement push campaign → convert 20% back to Promising
   Cost: 50M VND. Expected: reactivate 2,800 users → +3.36B revenue

3. Promising → Champions: Cross-sell savings products → increase frequency
   Cost: 100M VND. Expected: 15% conversion → 1,920 new Champions → +19.2B LTV

What Made This Project Stand Out

mermaid
flowchart LR
    A["❌ Typical Project<br/>Descriptive only<br/>Charts without insight<br/>No actionable recs"] --> B["✅ This Project<br/>Descriptive → Diagnostic → Prescriptive<br/>Each chart answers a question<br/>Each finding has cost/revenue impact"]
ElementAverage CapstoneThis Capstone
QuestionsVague: "analyze customers"Specific: "How many segments? What LTV?"
AnalysisDescriptive onlyRFM + Clustering + LTV
DashboardStatic chartsInteractive: click segment → see details
Findings"Segment A is bigger""Segment A = 52% revenue, losing 5%/quarter"
Recommendations"Should do marketing""Loyalty program: 200M cost → 248B revenue protection"

🔑 So sánh 3 Case Studies

Tiêu chíE-commerce (CS1)Healthcare (CS2)FinTech (CS3)
Dataset size25,000 rows15,000 rows40,000 rows
Key lessonBefore/After dashboard refactorSprint process disciplineSegmentation → Action
Analysis depthDescriptive + DiagnosticCohort + Trend + DiagnosticRFM + ML (K-Means) + LTV
Dashboard pages3 pages3 pages3 pages
Strongest elementVisual improvementSprint log & time managementBusiness impact quantification
Peer review impactRedesigned entire layoutFixed 3 formatting issuesAdded revenue impact to recs

💡 Lessons for Your Capstone

1. BEFORE/AFTER = powerful proof of learning
   → Show v1 dashboard AND v2 → demonstrates growth

2. SPRINT LOG = professional discipline
   → Track every hour → accountable → efficient

3. QUANTIFY EVERYTHING
   → Don't say "should improve retention"
   → Say "retain 2,800 users → +3.36B revenue, ROI = 67x"

4. PEER REVIEW = free consulting
   → 5 minutes feedback saves 2 hours rework

5. 3 PAGES IS ENOUGH
   → Overview + Deep Dive + Recommendations
   → Quality > Quantity, always

🔗 Liên kết