🛠 Workshop — Analysis Sprint: Từ Raw Data đến Dashboard

Chạy sprint phân tích hoàn chỉnh: Data Cleaning → EDA → Answer 5 Business Questions → Build Dashboard 3 pages → Peer Review → Executive Summary Draft. Output: Jupyter Notebook + Dashboard v1 + 1-page Summary!

🎯 Mục tiêu workshop

Sau khi hoàn thành workshop này, bạn sẽ:

Clean data — xử lý missing, duplicates, types cho capstone dataset
EDA hoàn chỉnh — distributions, correlations, segment analysis
Answer 5 business questions — mỗi question có evidence + finding
Build dashboard — 3 pages: Overview, Deep Dive, Recommendations
Peer review — nhận và cho feedback có cấu trúc
Draft executive summary — 5 findings theo What → So What → Now What

🧰 Yêu cầu

Yêu cầu	Chi tiết
Kiến thức	Đã hoàn thành Buổi 18 (Data Collection + Data Audit) + Buổi 19 lý thuyết
Data	Capstone dataset đã thu thập từ Buổi 18
Tools	Python (pandas, matplotlib, seaborn, scipy), Power BI / Tableau
Thời gian	120–150 phút (sprint time-boxed)
Output	Jupyter Notebook + Dashboard v1 + Executive Summary Draft

💡 Naming convention

Notebook: HoTen_Buoi19_CapstoneEDA.ipynb
Dashboard: HoTen_Buoi19_Dashboard.pbix (hoặc Tableau workbook)
Summary: HoTen_Buoi19_ExecSummary.md (hoặc .docx)

⏰ Sprint Timeline

┌──────────────────────────────────────────────────────────────┐
│  ⏰ SPRINT TIMELINE — 150 PHÚT                               │
│                                                               │
│  ┌────────────────┐  00:00 — 00:15   Sprint Planning          │
│  │ 🎯 PLAN        │  Review data audit, confirm 5 questions   │
│  └────────────────┘                                           │
│                                                               │
│  ┌────────────────┐  00:15 — 00:45   Data Cleaning            │
│  │ 🧹 CLEAN       │  Missing values, duplicates, types        │
│  └────────────────┘                                           │
│                                                               │
│  ┌────────────────┐  00:45 — 01:30   EDA + Business Questions │
│  │ 🔍 ANALYZE     │  Distributions, correlations, Q1—Q5      │
│  └────────────────┘                                           │
│                                                               │
│  ┌────────────────┐  01:30 — 02:00   Dashboard Build          │
│  │ 📊 DASHBOARD   │  3 pages: Overview, Deep Dive, Insights  │
│  └────────────────┘                                           │
│                                                               │
│  ┌────────────────┐  02:00 — 02:20   Peer Review              │
│  │ 👥 REVIEW      │  Present 3 min, feedback 5 min, fix 12   │
│  └────────────────┘                                           │
│                                                               │
│  ┌────────────────┐  02:20 — 02:30   Executive Summary        │
│  │ 📝 SUMMARY     │  5 findings, 3 recommendations            │
│  └────────────────┘                                           │
└──────────────────────────────────────────────────────────────┘

⚠️ Time-box nghiêm ngặt!

Nếu 30 phút mà cleaning chưa xong → dừng, tạm accept "good enough" cleaning, move on. Sprint = progress > perfection. Bạn có thể quay lại polish sau.

Phần 1: Sprint Planning (15 phút)

Bước 1.1: Review Data Audit

Mở Data Audit từ Buổi 18 và fill bảng:

┌──────────────────────────────────────────────────────────────┐
│  📋 SPRINT PLANNING — DATA REVIEW                            │
│                                                               │
│  Dataset: ________________________                            │
│  Rows: ________   Columns: ________                          │
│  Date Range: ________________________                         │
│  Source: ________________________                              │
│                                                               │
│  Key Columns (top 10):                                        │
│  1. _____________ (type: _____, missing: ___%)               │
│  2. _____________ (type: _____, missing: ___%)               │
│  3. _____________ (type: _____, missing: ___%)               │
│  4. _____________ (type: _____, missing: ___%)               │
│  5. _____________ (type: _____, missing: ___%)               │
│  6. _____________ (type: _____, missing: ___%)               │
│  7. _____________ (type: _____, missing: ___%)               │
│  8. _____________ (type: _____, missing: ___%)               │
│  9. _____________ (type: _____, missing: ___%)               │
│  10. ____________ (type: _____, missing: ___%)               │
└──────────────────────────────────────────────────────────────┘

Bước 1.2: Confirm 5 Business Questions

┌──────────────────────────────────────────────────────────────┐
│  📋 5 BUSINESS QUESTIONS                                      │
│                                                               │
│  Q1 (Descriptive):   ____________________________________    │
│  Analysis method:    ____________________________________    │
│                                                               │
│  Q2 (Diagnostic):    ____________________________________    │
│  Analysis method:    ____________________________________    │
│                                                               │
│  Q3 (Comparative):   ____________________________________    │
│  Analysis method:    ____________________________________    │
│                                                               │
│  Q4 (Predictive/Diagnostic): ____________________________    │
│  Analysis method:    ____________________________________    │
│                                                               │
│  Q5 (Prescriptive):  ____________________________________    │
│  Analysis method:    ____________________________________    │
└──────────────────────────────────────────────────────────────┘

Phần 2: Data Cleaning (30 phút)

Bước 2.1: Setup & Load

python

# ============================================
# CAPSTONE ANALYSIS SPRINT
# Author: [Your Name]
# Date: [Today's Date]
# Dataset: [Your Dataset Name]
# ============================================

# Table of Contents:
# 1. Setup & Load Data
# 2. Data Cleaning
# 3. EDA — Univariate
# 4. EDA — Bivariate
# 5. Q1: [Your Question]
# 6. Q2: [Your Question]
# 7. Q3: [Your Question]
# 8. Q4: [Your Question]
# 9. Q5: [Your Question]
# 10. Summary of Findings

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Settings
plt.rcParams['figure.figsize'] = (12, 6)
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('Set2')
np.random.seed(42)

print("✅ Libraries loaded!")

Bước 2.2: Load & Inspect

python

# ============================================
# LOAD DATA
# ============================================

df = pd.read_csv('data/your_dataset.csv')  # ← THAY PATH CỦA BẠN

# Quick overview
print(f"📊 Dataset Shape: {df.shape[0]:,} rows × {df.shape[1]} columns")
print(f"\n📋 Data Types:")
print(df.dtypes)
print(f"\n🔍 First 5 rows:")
df.head()

Bước 2.3: Missing Values

python

# ============================================
# MISSING VALUES AUDIT
# ============================================

missing = df.isnull().sum()
missing_pct = (missing / len(df) * 100).round(1)
missing_report = pd.DataFrame({
    'Missing': missing,
    '%': missing_pct
}).query('Missing > 0').sort_values('%', ascending=False)

if len(missing_report) > 0:
    print(f"⚠️ MISSING VALUES:")
    print(missing_report)
    print(f"\nTotal rows with any missing: {df.isnull().any(axis=1).sum():,}")
else:
    print("✅ No missing values!")

Bước 2.4: Handle Missing + Duplicates + Types

python

# ============================================
# DATA CLEANING
# ============================================

# 1. Handle missing values (customize per your dataset)
# Strategy: < 5% → drop, 5-30% → impute, > 30% → drop column
# Example:
# df['column_name'].fillna(df['column_name'].median(), inplace=True)
# df.dropna(subset=['critical_column'], inplace=True)

# YOUR CLEANING CODE HERE:
# _______________________________________
# _______________________________________
# _______________________________________

# 2. Remove duplicates
before = len(df)
df.drop_duplicates(inplace=True)
after = len(df)
print(f"🔁 Duplicates removed: {before - after}")

# 3. Fix data types
# df['date_column'] = pd.to_datetime(df['date_column'])
# df['category_column'] = df['category_column'].astype('category')

# YOUR TYPE FIXES HERE:
# _______________________________________

# 4. Final check
print(f"\n✅ Cleaned dataset: {df.shape[0]:,} rows × {df.shape[1]} columns")
print(f"   Missing values: {df.isnull().sum().sum()}")
print(f"   Duplicates: {df.duplicated().sum()}")

Bước 2.5: Save Cleaned Data

python

# Save cleaned version
df.to_csv('data/your_dataset_cleaned.csv', index=False)
print("💾 Cleaned dataset saved!")

Phần 3: EDA + Business Questions (45 phút)

Bước 3.1: Univariate Analysis

python

# ============================================
# EDA — UNIVARIATE
# ============================================

# Numeric columns distribution
numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
n_cols = min(len(numeric_cols), 6)  # Show top 6

fig, axes = plt.subplots(2, 3, figsize=(16, 10))
axes = axes.flatten()

for idx, col in enumerate(numeric_cols[:n_cols]):
    ax = axes[idx]
    df[col].hist(bins=30, ax=ax, color='steelblue', edgecolor='white', alpha=0.8)
    ax.axvline(df[col].median(), color='red', linestyle='--', label=f'Median: {df[col].median():,.1f}')
    ax.set_title(col, fontsize=12, fontweight='bold')
    ax.legend(fontsize=9)

# Hide empty subplots
for idx in range(n_cols, 6):
    axes[idx].set_visible(False)

plt.suptitle('Distribution of Key Numeric Variables', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

# Summary stats
print("📊 Summary Statistics:")
print(df[numeric_cols].describe().round(2))

Bước 3.2: Bivariate Analysis

python

# ============================================
# EDA — BIVARIATE (Correlation)
# ============================================

corr = df[numeric_cols].corr()

plt.figure(figsize=(12, 8))
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, mask=mask, annot=True, cmap='RdBu_r', center=0,
            fmt='.2f', square=True, linewidths=0.5,
            cbar_kws={'label': 'Correlation'})
plt.title('Correlation Matrix — Key Variables', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Top positive & negative correlations
corr_pairs = corr.unstack().sort_values()
corr_pairs = corr_pairs[(corr_pairs < 1) & (corr_pairs > -1)]
print("\n🔝 Top 5 Positive Correlations:")
print(corr_pairs.tail(5))
print("\n🔻 Top 5 Negative Correlations:")
print(corr_pairs.head(5))

Bước 3.3: Answer Business Questions

Template cho mỗi question (copy-paste và customize):

python

# ============================================
# Q1: [YOUR QUESTION HERE]
# ============================================

# Analysis
# _______________________________________
# _______________________________________

# Visualization
# _______________________________________

# Finding
print("""
┌──────────────────────────────────────────────────────┐
│  📋 FINDING #1                                        │
│                                                       │
│  WHAT: ___________________________________________   │
│  SO WHAT: ________________________________________   │
│  NOW WHAT: _______________________________________   │
│                                                       │
│  Evidence: [chart above / statistic]                 │
│  Confidence: [High / Medium / Low]                   │
└──────────────────────────────────────────────────────┘
""")

Ví dụ Q3 — Comparative Analysis:

python

# ============================================
# Q3: Khách hàng mới vs returning khác nhau thế nào về [metric]?
# ============================================

# Group comparison
group_a = df[df['customer_type'] == 'New']['order_value']
group_b = df[df['customer_type'] == 'Returning']['order_value']

# Stats
print(f"New Customers:     Mean = {group_a.mean():,.0f}, Median = {group_a.median():,.0f}, N = {len(group_a):,}")
print(f"Returning Customers: Mean = {group_b.mean():,.0f}, Median = {group_b.median():,.0f}, N = {len(group_b):,}")

# T-test
t_stat, p_value = stats.ttest_ind(group_a, group_b)
print(f"\nt-test: t = {t_stat:.3f}, p = {p_value:.4f}")
print(f"Significant? {'✅ Yes' if p_value < 0.05 else '❌ No'} (α = 0.05)")

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Box plot
df.boxplot(column='order_value', by='customer_type', ax=axes[0])
axes[0].set_title('Order Value by Customer Type')
axes[0].set_xlabel('')

# Distribution overlay
group_a.hist(bins=30, ax=axes[1], alpha=0.6, label='New', color='steelblue')
group_b.hist(bins=30, ax=axes[1], alpha=0.6, label='Returning', color='coral')
axes[1].set_title('Distribution Comparison')
axes[1].legend()

plt.suptitle('Q3: New vs Returning Customer — Order Value', fontsize=13, fontweight='bold')
plt.tight_layout()
plt.show()

print("""
┌──────────────────────────────────────────────────────┐
│  📋 FINDING #3                                        │
│                                                       │
│  WHAT: Returning customers có AOV cao hơn 35%        │
│  SO WHAT: Retention tăng 10% → revenue tăng ước tính │
│           15% do AOV effect                          │
│  NOW WHAT: Invest vào loyalty program + personalized │
│            recommendations cho returning customers   │
│                                                       │
│  Evidence: t-test p < 0.001, effect size = 0.42      │
│  Confidence: High                                    │
└──────────────────────────────────────────────────────┘
""")

Bước 3.4: Summary of Findings

python

# ============================================
# SUMMARY OF FINDINGS
# ============================================

print("""
╔══════════════════════════════════════════════════════════════╗
║  📊 SUMMARY OF FINDINGS                                      ║
║                                                               ║
║  Q1: ___________________________________________________     ║
║      Finding: ___________________________________________     ║
║                                                               ║
║  Q2: ___________________________________________________     ║
║      Finding: ___________________________________________     ║
║                                                               ║
║  Q3: ___________________________________________________     ║
║      Finding: ___________________________________________     ║
║                                                               ║
║  Q4: ___________________________________________________     ║
║      Finding: ___________________________________________     ║
║                                                               ║
║  Q5: ___________________________________________________     ║
║      Finding: ___________________________________________     ║
║                                                               ║
║  📌 TOP INSIGHT: ________________________________________     ║
║  💡 TOP RECOMMENDATION: _________________________________     ║
╚══════════════════════════════════════════════════════════════╝
""")

Phần 4: Dashboard Build (30 phút)

Bước 4.1: Export Cleaned Data

python

# Export for Power BI / Tableau
df.to_csv('data/dashboard_data.csv', index=False)

# If needed — create summary tables
# monthly_summary = df.groupby('month').agg({...}).reset_index()
# monthly_summary.to_csv('data/monthly_summary.csv', index=False)

print("📦 Data exported for dashboard!")

Bước 4.2: Dashboard Mockup (5 phút)

Trước khi mở Power BI / Tableau, sketch trên giấy:

┌──────────────────────────────────────────────────────────────┐
│  📊 MY DASHBOARD MOCKUP                                      │
│                                                               │
│  PAGE 1: _______________                                      │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐                        │
│  │ KPI  │ │ KPI  │ │ KPI  │ │ KPI  │                        │
│  │ ___  │ │ ___  │ │ ___  │ │ ___  │                        │
│  └──────┘ └──────┘ └──────┘ └──────┘                        │
│  ┌──────────────────┐ ┌──────────────┐                       │
│  │  Chart: ________  │ │  Chart: ____ │                      │
│  │  Type: _________  │ │  Type: _____ │                      │
│  └──────────────────┘ └──────────────┘                       │
│                                                               │
│  PAGE 2: _______________                                      │
│  [Sketch your deep dive page]                                 │
│                                                               │
│  PAGE 3: _______________                                      │
│  [Sketch your insights/recommendations page]                  │
└──────────────────────────────────────────────────────────────┘

Bước 4.3: Build Dashboard (25 phút)

Checklist per page:

PAGE 1 — OVERVIEW:
☐ Import cleaned data
☐ Create date table (if needed)
☐ KPI Card 1: _______ (with YoY/MoM comparison)
☐ KPI Card 2: _______ (with comparison)
☐ KPI Card 3: _______ (with comparison)
☐ KPI Card 4: _______ (with comparison)
☐ Main chart: _______ (type: _______)
☐ Secondary chart: _______ (type: _______)
☐ Filters: Date, _______, _______

PAGE 2 — DEEP DIVE:
☐ Segment comparison chart: _______
☐ Cross-filter working? ☐ Yes ☐ No
☐ Detail chart/table: _______

PAGE 3 — INSIGHTS:
☐ Annotated key finding chart
☐ Recommendations summary
☐ Dashboard title + date range + source

Phần 5: Peer Review (20 phút)

Bước 5.1: Self-Review (3 phút)

Trước peer review, tự check:

NOTEBOOK SELF-CHECK:
☐ Kernel → Restart & Run All → no errors
☐ All sections have markdown headers
☐ 5 findings documented (What/So What/Now What)
☐ Charts have titles + labels
☐ No unused/debug cells

DASHBOARD SELF-CHECK:
☐ 3 pages built
☐ KPI cards with comparison
☐ Charts have titles + axis labels
☐ Filters working
☐ Colors consistent

Bước 5.2: Peer Review Session (12 phút)

Pair up với 1 bạn. Mỗi người:

Present 3 phút: Walk through notebook → dashboard → findings
Review 5 phút: Partner fills review form

Peer Review Form

┌──────────────────────────────────────────────────────────────┐
│  📋 PEER REVIEW FORM                                          │
│                                                               │
│  Builder: ________________  Reviewer: ________________       │
│                                                               │
│  NOTEBOOK                                      Score (1-5)   │
│  ☐ Code runs without errors                    [___]         │
│  ☐ Structure clear (headers, sections)          [___]         │
│  ☐ Findings documented (What/So What/Now What)  [___]         │
│  ☐ Charts informative + labeled                [___]         │
│                                                               │
│  DASHBOARD                                     Score (1-5)   │
│  ☐ Answers business questions                   [___]         │
│  ☐ Visual hierarchy (KPI → Trend → Detail)     [___]         │
│  ☐ Colors consistent, charts appropriate        [___]         │
│  ☐ Interactivity works (filters, cross-filter)  [___]         │
│                                                               │
│  🌟 WHAT WORKS WELL:                                         │
│  _________________________________________________________   │
│  _________________________________________________________   │
│                                                               │
│  🔧 TOP 3 IMPROVEMENTS:                                      │
│  1. _____________________________________________________    │
│  2. _____________________________________________________    │
│  3. _____________________________________________________    │
│                                                               │
│  Overall: [___] / 5                                          │
└──────────────────────────────────────────────────────────────┘

Bước 5.3: Fix Top 3 (5 phút)

Nhận feedback → fix top 3 issues ngay:

FIX LOG:
☐ Issue 1: _________________ → Fixed: ☐
☐ Issue 2: _________________ → Fixed: ☐
☐ Issue 3: _________________ → Fixed: ☐

Phần 6: Executive Summary (10 phút)

Bước 6.1: Draft Summary

Viết 1 trang — dùng template:

markdown

# Executive Summary — [Your Project Title]

**Author:** [Name] | **Date:** [Date] | **Dataset:** [Description, N rows, period]

---

## Objective
[1-2 câu: bài toán gì, data gì, business context]

## Key Findings

### Finding 1: [Title]
- **What:** [Data says...]
- **So What:** [Business impact...]
- **Now What:** [Recommended action...]

### Finding 2: [Title]
- **What:** [Data says...]
- **So What:** [Business impact...]
- **Now What:** [Recommended action...]

### Finding 3: [Title]
- **What:** [Data says...]
- **So What:** [Business impact...]
- **Now What:** [Recommended action...]

### Finding 4: [Title]
- **What:** [Data says...]
- **So What:** [Business impact...]
- **Now What:** [Recommended action...]

### Finding 5: [Title]
- **What:** [Data says...]
- **So What:** [Business impact...]
- **Now What:** [Recommended action...]

## Top 3 Recommendations

| # | Recommendation | Expected Impact | Priority |
|---|---------------|----------------|---------|
| 1 | [Action] | [Quantified impact] | 🔴 High |
| 2 | [Action] | [Quantified impact] | 🟡 Medium |
| 3 | [Action] | [Quantified impact] | 🟢 Low |

## Appendix
- Jupyter Notebook: [filename]
- Dashboard: [filename]
- Data Source: [description]

✅ Sprint Completion Checklist

┌──────────────────────────────────────────────────────────────┐
│  ✅ SPRINT COMPLETION — FINAL CHECK                           │
│                                                               │
│  DELIVERABLE 1: JUPYTER NOTEBOOK                              │
│  ☐ File: HoTen_Buoi19_CapstoneEDA.ipynb                     │
│  ☐ Restart & Run All = no errors                             │
│  ☐ Structured sections (≤ 50 cells, organized)               │
│  ☐ 5 business questions answered with evidence               │
│  ☐ Findings documented (What/So What/Now What)               │
│  ☐ Charts titled + labeled                                   │
│                                                               │
│  DELIVERABLE 2: DASHBOARD v1                                  │
│  ☐ File: HoTen_Buoi19_Dashboard.pbix / .twbx                │
│  ☐ 3 pages: Overview + Deep Dive + Insights                 │
│  ☐ KPI cards with period comparison                          │
│  ☐ Filters working                                           │
│  ☐ Peer reviewed — top issues fixed                          │
│                                                               │
│  DELIVERABLE 3: EXECUTIVE SUMMARY DRAFT                       │
│  ☐ File: HoTen_Buoi19_ExecSummary.md                        │
│  ☐ 5 findings (What/So What/Now What)                        │
│  ☐ 3 recommendations with quantified impact                  │
│  ☐ ≤ 2 pages                                                │
│                                                               │
│  PEER REVIEW:                                                 │
│  ☐ Reviewed by: ________________ (score: ___/5)             │
│  ☐ Top 3 feedback items addressed                            │
│                                                               │
│  📌 STATUS: ☐ COMPLETE  ☐ NEEDS ITERATION                   │
└──────────────────────────────────────────────────────────────┘

📊 Rubric đánh giá

Tiêu chí	Excellent (5)	Good (4)	Adequate (3)	Needs Work (2)
Data Cleaning	Complete, documented, no missing	Complete, minor gaps	Partial, some issues	Incomplete, many errors
EDA	Deep, insightful, well-visualized	Good coverage, clear charts	Basic distributions only	Minimal, messy
Business Questions	5 questions, strong evidence + findings	4-5 questions, adequate evidence	3 questions, weak evidence	< 3 questions
Dashboard	3+ pages, interactive, annotated, polished	3 pages, working, clean	2 pages, basic	1 page, broken
Executive Summary	5 findings, quantified recommendations	4-5 findings, some quantification	3 findings, vague recs	< 3 findings
Peer Review	Given + received, all fixes applied	Given + received, most fixes	Received only	Not done

🛠 Workshop — Analysis Sprint: Từ Raw Data đến Dashboard ​

🎯 Mục tiêu workshop ​

🧰 Yêu cầu ​

⏰ Sprint Timeline ​

Phần 1: Sprint Planning (15 phút) ​

Bước 1.1: Review Data Audit ​

Bước 1.2: Confirm 5 Business Questions ​

Phần 2: Data Cleaning (30 phút) ​

Bước 2.1: Setup & Load ​

Bước 2.2: Load & Inspect ​

Bước 2.3: Missing Values ​

Bước 2.4: Handle Missing + Duplicates + Types ​

Bước 2.5: Save Cleaned Data ​

Phần 3: EDA + Business Questions (45 phút) ​

Bước 3.1: Univariate Analysis ​

Bước 3.2: Bivariate Analysis ​

Bước 3.3: Answer Business Questions ​

Bước 3.4: Summary of Findings ​

Phần 4: Dashboard Build (30 phút) ​

Bước 4.1: Export Cleaned Data ​

Bước 4.2: Dashboard Mockup (5 phút) ​

Bước 4.3: Build Dashboard (25 phút) ​

Phần 5: Peer Review (20 phút) ​

Bước 5.1: Self-Review (3 phút) ​

Bước 5.2: Peer Review Session (12 phút) ​

Peer Review Form ​

Bước 5.3: Fix Top 3 (5 phút) ​

Phần 6: Executive Summary (10 phút) ​

Bước 6.1: Draft Summary ​

✅ Sprint Completion Checklist ​

📊 Rubric đánh giá ​

🔗 Liên kết ​

🛠 Workshop — Analysis Sprint: Từ Raw Data đến Dashboard

🎯 Mục tiêu workshop

🧰 Yêu cầu

⏰ Sprint Timeline

Phần 1: Sprint Planning (15 phút)

Bước 1.1: Review Data Audit

Bước 1.2: Confirm 5 Business Questions

Phần 2: Data Cleaning (30 phút)

Bước 2.1: Setup & Load

Bước 2.2: Load & Inspect

Bước 2.3: Missing Values

Bước 2.4: Handle Missing + Duplicates + Types

Bước 2.5: Save Cleaned Data

Phần 3: EDA + Business Questions (45 phút)

Bước 3.1: Univariate Analysis

Bước 3.2: Bivariate Analysis

Bước 3.3: Answer Business Questions

Bước 3.4: Summary of Findings

Phần 4: Dashboard Build (30 phút)

Bước 4.1: Export Cleaned Data

Bước 4.2: Dashboard Mockup (5 phút)

Bước 4.3: Build Dashboard (25 phút)

Phần 5: Peer Review (20 phút)

Bước 5.1: Self-Review (3 phút)

Bước 5.2: Peer Review Session (12 phút)

Peer Review Form

Bước 5.3: Fix Top 3 (5 phút)

Phần 6: Executive Summary (10 phút)

Bước 6.1: Draft Summary

✅ Sprint Completion Checklist

📊 Rubric đánh giá

🔗 Liên kết