📘 Buổi 18: Capstone — Project Kickoff — Bắt đầu dự án xin việc

Portfolio quyết định 70% phỏng vấn DA. Làm project thật, không phải bài tập.

🎯 Mục tiêu buổi học

Sau buổi này, học viên sẽ:

Chọn đề tài capstone: có data, business context, đủ scope
Thu thập và đánh giá chất lượng dataset
Lập kế hoạch phân tích: questions → methods → deliverables → timeline
Setup project repository (Git) cho portfolio

📋 Tổng quan

Qua 17 buổi trước, bạn đã nắm vững toàn bộ Data Analytics workflow — từ Excel/Google Sheets (Buổi 2-5), SQL (Buổi 6), Python + Pandas (Buổi 7-8), EDA + Visualization (Buổi 9-11), Storytelling (Buổi 12), Business Metrics (Buổi 13), Industry Analytics (Buổi 14), A/B Testing (Buổi 15), Time Series (Buổi 16), và Machine Learning basics (Buổi 17). Bạn có skill. Nhưng skill không portfolio = vô hình với nhà tuyển dụng.

Buổi 18 bắt đầu chuỗi 3 buổi Capstone Project — dự án tổng hợp tất cả kiến thức đã học thành một portfolio piece thực tế, đủ sức thuyết phục nhà tuyển dụng rằng bạn là DA có năng lực. Đây không phải bài tập — đây là dự án xin việc.

Theo LinkedIn Talent Solutions (2025), 72% hiring managers kiểm tra portfolio/GitHub trước khi phỏng vấn DA. Resume có portfolio project: callback rate 3.2x so với resume chỉ liệt kê skills. Một capstone project tốt = bằng chứng bạn biết làm, không chỉ biết nói.

mermaid

flowchart LR
    A["📥 Skills<br/>Buổi 1-17"] --> B["🎯 Project Kickoff<br/>✅ Buổi 18: Chọn đề + Setup"]
    B --> C["📊 Analysis & Dashboard<br/>Buổi 19: Phân tích + BI"]
    C --> D["🎤 Presentation<br/>Buổi 20: Present + Portfolio"]
    style B fill:#e8f5e9,stroke:#4caf50,stroke-width:3px

💡 Portfolio vs Resume — Nhà tuyển dụng nghĩ gì?

Tình huống	Không Portfolio	Có Portfolio
Apply vị trí Junior DA	"Biết SQL, Python, Tableau" — ai cũng viết thế	GitHub repo: E-commerce analysis, dashboard, recommendations → phỏng vấn
Phỏng vấn hỏi "show me your work"	"Em... chưa có project cụ thể" 😬	"Đây là churn analysis em làm, từ EDA đến model, insight gì, recommend gì" → offer
Cạnh tranh 200 ứng viên	1 trong 200 CV giống nhau	1 trong 15 người có portfolio → top shortlist
Đàm phán lương	"Em muốn X triệu" — không evidence	"Project em tạo insight tiết kiệm Y tỷ" → negotiation power

📌 Phần 1: Chọn đề tài Capstone — Real Projects, Not Exercises

5 Tiêu chí chọn đề tài tốt

Không phải đề tài nào cũng phù hợp cho capstone. Đề tài tốt phải đáp ứng 5 tiêu chí — nhớ tắt là DRBST:

Tiêu chí	Mô tả	❌ Ví dụ xấu	✅ Ví dụ tốt
D — Data Available	Có dataset thật, đủ lớn (≥ 1,000 rows), clean enough	Data bạn tự tạo random	Kaggle dataset 50K rows với context
R — Real Business Context	Có business problem rõ ràng, không chỉ "phân tích cho vui"	"Phân tích dataset Titanic" (cũ, overused)	"Phân tích churn → retention strategy"
B — Business Impact	Insight dẫn đến hành động cụ thể, đo được ROI	"Thống kê mô tả demographics"	"Segment khách hàng → personalized marketing → +15% conversion"
S — Scope 2 tuần	Đủ lớn để showcase skill, đủ nhỏ để hoàn thành trong 2 tuần	"Xây ML production pipeline end-to-end"	"EDA + Dashboard + 5 insights + recommendations"
T — Tell-able Story	Có thể kể thành story hấp dẫn trong 5 phút presentation	"Regression trên dataset XYZ"	"Tại sao khách hàng Gen Z rời bỏ → giải pháp?"

10 Đề tài gợi ý (đã kiểm chứng phù hợp)

mermaid

mindmap
  root((Capstone<br/>Ideas))
    E-commerce
      Customer Segmentation
      Sales Forecasting
      Product Recommendation
    Marketing
      Campaign ROI Analysis
      Channel Attribution
      Customer Journey
    HR & People
      Employee Attrition
      Salary Benchmarking
      Hiring Funnel Analysis
    Finance & Operations
      Revenue Forecasting
      Supply Chain Optimization
      Credit Risk Analysis

Top 5 đề tài được nhà tuyển dụng đánh giá cao nhất:

#	Đề tài	Dataset	Business Questions	Tools
1	E-commerce Customer Segmentation	Brazilian E-commerce (Olist) — 100K orders	Segment nào giá trị nhất? Retention rate? Upsell opportunities?	Python, SQL, Tableau
2	Marketing Campaign ROI	Marketing Campaign dataset (Kaggle) — 50K customers	Campaign nào ROI cao nhất? Channel nào convert tốt? Budget allocation?	Python, Excel, Power BI
3	HR Employee Attrition	IBM HR Analytics — 1,470 employees	Ai sắp nghỉ? Factor nào gây attrition? Cost of turnover?	Python, SQL, Dashboard
4	Sales Forecasting	Retail Sales dataset — 500K transactions	Revenue prediction Q+1? Seasonal patterns? Product trends?	Python, Time Series, BI
5	Customer Churn Analysis	Telecom/SaaS Churn — 7K–10K customers	Churn rate by segment? Predictive factors? Retention strategies?	Python, ML, Dashboard

Tránh 5 Pitfalls phổ biến

⚠️ Capstone Pitfalls — Đừng mắc phải!

Đề tài quá rộng: "Phân tích toàn bộ ngành E-commerce Việt Nam" → không thể hoàn thành 2 tuần. Fix: Thu hẹp: "Phân tích customer segmentation của 1 dataset E-commerce cụ thể"
Data không có sẵn: "Phân tích data nội bộ công ty X" → không access được. Fix: Dùng public dataset (Kaggle, UCI, government data)
Titanic/Iris syndrome: Overused datasets → nhà tuyển dụng đã thấy 1,000 lần. Fix: Chọn dataset ít phổ biến hơn, hoặc approach khác biệt
Chỉ có EDA, không có insight: "Biểu đồ đẹp nhưng so what?" Fix: Mỗi chart phải trả lời 1 business question, mỗi analysis phải dẫn đến recommendation
Không có README: Repo trống trơn, không ai hiểu bạn đã làm gì. Fix: README là "landing page" — phải có context, methodology, key findings

📌 Phần 2: Data Collection — Thu thập và đánh giá Dataset

Data Sources đáng tin cậy

Source	Loại data	Pros	Cons	Ví dụ
Kaggle	Đa dạng, structured	Clean, community notebooks, context	Overused, synthetic	E-commerce, HR, Marketing
UCI ML Repository	Academic datasets	Well-documented, benchmark	Cũ, nhỏ	Wine Quality, Bank Marketing
Google Dataset Search	Cross-domain	Real-world, diverse	Chất lượng không đồng đều	Government, research data
Government Open Data	Public sector	Authentic, large	Messy, Vietnamese context	data.gov, data.gov.vn
Public APIs	Real-time data	Fresh, scalable	Rate limits, auth required	Twitter/X, Spotify, Weather
Web Scraping	Custom collection	Exactly what you need	Legal risks, maintenance	Tiki, Shopee (cẩn thận TOS!)

Dataset Evaluation Checklist

Trước khi commit vào một dataset, đánh giá bằng checklist 10 điểm — dataset cần đạt ≥ 7/10 để phù hợp cho capstone:

╔════════════════════════════════════════════════════════════════╗
║              DATASET EVALUATION CHECKLIST (≥ 7/10)            ║
╠════════════════════════════════════════════════════════════════╣
║  1. ☐ Size ≥ 1,000 rows (lý tưởng ≥ 5,000)                  ║
║  2. ☐ ≥ 8 columns với mix numeric + categorical               ║
║  3. ☐ Có date/time column cho temporal analysis               ║
║  4. ☐ Missing values ≤ 30% overall                            ║
║  5. ☐ Có documentation / data dictionary                      ║
║  6. ☐ Business context rõ ràng (biết data về gì)             ║
║  7. ☐ Có target variable phù hợp (nếu cần prediction)        ║
║  8. ☐ License cho phép sử dụng (CC, public domain)            ║
║  9. ☐ Data không quá cũ (≤ 5 năm)                            ║
║ 10. ☐ Chưa bị overused (≠ Titanic, Iris, Boston Housing)     ║
╚════════════════════════════════════════════════════════════════╝

Data profiling nhanh bằng Python

python

import pandas as pd

# Đọc dataset
df = pd.read_csv('your_dataset.csv')

# --- DATA AUDIT REPORT ---
print("=" * 60)
print("📊 DATA AUDIT REPORT")
print("=" * 60)

# 1. Shape
print(f"\n📐 Shape: {df.shape[0]:,} rows × {df.shape[1]} columns")

# 2. Data types
print(f"\n📋 Data Types:")
print(df.dtypes.value_counts())

# 3. Missing values
missing = df.isnull().sum()
missing_pct = (missing / len(df) * 100).round(1)
missing_report = pd.DataFrame({
    'Missing': missing,
    'Percent': missing_pct
}).query('Missing > 0').sort_values('Percent', ascending=False)
print(f"\n❓ Missing Values:")
print(missing_report if len(missing_report) > 0 else "No missing values!")

# 4. Numeric summary
print(f"\n📈 Numeric Summary:")
print(df.describe().round(2))

# 5. Categorical summary
cat_cols = df.select_dtypes(include='object').columns
for col in cat_cols:
    print(f"\n🏷️ {col}: {df[col].nunique()} unique values")
    print(df[col].value_counts().head(5))

# 6. Duplicate check
dups = df.duplicated().sum()
print(f"\n🔄 Duplicates: {dups} ({dups/len(df)*100:.1f}%)")

# 7. Score
print(f"\n{'=' * 60}")
print(f"✅ DATASET VIABILITY: {'PASS' if df.shape[0] >= 1000 else 'FAIL'}")
print(f"{'=' * 60}")

Khi thu thập và sử dụng data cho capstone, luôn xem xét 3 khía cạnh đạo đức:

Khía cạnh	Câu hỏi cần đặt ra	Hành động
Privacy	Data có chứa PII (tên, email, phone)?	Anonymize trước khi publish lên GitHub
Consent	Data được thu thập có sự đồng ý?	Ưu tiên public/open datasets
Bias	Data có đại diện? Có thiên lệch demographics?	Document known limitations trong README

🚨 Tuyệt đối KHÔNG:

Upload data chứa PII lên public repo
Scrape data vi phạm Terms of Service
Dùng data từ công ty cũ/hiện tại mà không được phép
Bỏ qua license restrictions

📌 Phần 3: Project Planning — Từ Questions đến Deliverables

Viết Business Questions tốt

Một capstone tốt bắt đầu bằng 3-5 business questions — cụ thể, trả lời được bằng data, và dẫn đến action:

❌ Câu hỏi xấu	✅ Câu hỏi tốt	Tại sao tốt?
"Data có gì thú vị?"	"Customer segment nào có LTV cao nhất?"	Cụ thể metric (LTV), actionable (focus marketing)
"Churn rate bao nhiêu?"	"Churn rate khác nhau thế nào giữa contract types, và factors nào predict churn?"	So sánh + causation, dẫn đến retention strategy
"Sales theo tháng"	"Revenue growth trend qua 12 tháng? Seasonal patterns nào ảnh hưởng inventory planning?"	Temporal analysis + business application
"Demographics thống kê"	"Khách hàng nào respond tốt nhất với email marketing campaigns?"	Target audience → improved ROAS
"Biểu đồ correlation"	"Factors nào ảnh hưởng mạnh nhất đến employee attrition, và cost per turnover ước tính bao nhiêu?"	Feature importance + financial impact

Project Brief Template

╔═══════════════════════════════════════════════════════════════╗
║                    📋 PROJECT BRIEF                           ║
╠═══════════════════════════════════════════════════════════════╣
║ Title: [Tên project ngắn gọn, hấp dẫn]                       ║
║ Author: [Họ tên]                                              ║
║ Date: [Ngày bắt đầu]                                         ║
╠═══════════════════════════════════════════════════════════════╣
║ 🎯 BUSINESS PROBLEM:                                         ║
║ [1-2 câu mô tả vấn đề business]                              ║
║                                                               ║
║ 📊 DATASET:                                                   ║
║ - Source: [Kaggle / API / Scraping]                           ║
║ - Size: [X rows × Y columns]                                 ║
║ - Time range: [Từ - Đến]                                     ║
║ - Key variables: [Liệt kê 5-8 biến quan trọng]              ║
║                                                               ║
║ ❓ BUSINESS QUESTIONS (3-5):                                  ║
║ 1. [Question 1]                                               ║
║ 2. [Question 2]                                               ║
║ 3. [Question 3]                                               ║
║ 4. [Question 4] (optional)                                    ║
║ 5. [Question 5] (optional)                                    ║
║                                                               ║
║ 🔧 METHODS & TOOLS:                                          ║
║ - Analysis: [EDA, Segmentation, Regression, ...]              ║
║ - Tools: [Python, SQL, Tableau/Power BI]                      ║
║ - Libraries: [pandas, matplotlib, scikit-learn, ...]          ║
║                                                               ║
║ 📦 DELIVERABLES:                                              ║
║ - [ ] Jupyter Notebook (analysis + code)                      ║
║ - [ ] Dashboard (Tableau / Power BI)                          ║
║ - [ ] Presentation (key insights + recommendations)           ║
║ - [ ] GitHub README (project documentation)                   ║
║                                                               ║
║ 📅 TIMELINE:                                                  ║
║ - Week 1: Data cleaning + EDA + initial insights              ║
║ - Week 2: Deep analysis + Dashboard + Presentation            ║
╚═══════════════════════════════════════════════════════════════╝

Methods → Questions Mapping

mermaid

flowchart TD
    Q1["❓ Q1: Segment nào<br/>giá trị nhất?"] --> M1["📊 RFM Analysis<br/>+ Clustering"]
    Q2["❓ Q2: Churn rate<br/>theo segment?"] --> M2["📈 Cohort Analysis<br/>+ Survival"]
    Q3["❓ Q3: Revenue<br/>prediction?"] --> M3["🤖 Linear Regression<br/>+ Time Series"]
    Q4["❓ Q4: Campaign<br/>hiệu quả?"] --> M4["🧪 A/B Test Analysis<br/>+ ROI Calc"]
    Q5["❓ Q5: Factor nào<br/>ảnh hưởng nhất?"] --> M5["🌲 Feature Importance<br/>+ Correlation"]

    M1 --> D["📦 Deliverables"]
    M2 --> D
    M3 --> D
    M4 --> D
    M5 --> D

    D --> D1["📓 Notebook"]
    D --> D2["📊 Dashboard"]
    D --> D3["🎤 Presentation"]

Timeline chi tiết — 2 tuần

Ngày	Task	Output	Checklist
Day 1	Chọn dataset + Data audit	Data audit report	☐ Dataset downloaded ☐ Profile complete ☐ Viability confirmed
Day 2	Data cleaning + Wrangling	Clean dataset	☐ Missing handled ☐ Types correct ☐ Outliers checked
Day 3-4	EDA — Univariate + Bivariate	EDA notebook + charts	☐ Distributions ☐ Correlations ☐ Key patterns found
Day 5	Answer Q1-Q2 (descriptive)	Analysis notebook	☐ Segments identified ☐ Metrics calculated
Day 6-7	Answer Q3-Q5 (analytical/predictive)	Model/analysis results	☐ Models trained/analysis done ☐ Results validated
Day 8-9	Dashboard design + build	Interactive dashboard	☐ KPI cards ☐ Charts ☐ Filters ☐ Story flow
Day 10	Key insights + Recommendations	Insight document	☐ 5+ insights ☐ Actions ☐ Expected impact
Day 11-12	Presentation slides	Slide deck 10-15 slides	☐ Problem ☐ Data ☐ Analysis ☐ Insights ☐ Recs
Day 13	Polish GitHub repo + README	Complete repo	☐ README ☐ Code clean ☐ .gitignore ☐ License
Day 14	Review + Practice present	Final deliverables	☐ Peer review ☐ Dry run ☐ All links work

📌 Phần 4: Portfolio Setup — Git, GitHub, README

Git cơ bản — 10 lệnh cần biết

Git là version control system — giúp bạn theo dõi thay đổi code, backup, và showcase project cho nhà tuyển dụng. GitHub profile = CV online cho DA.

bash

# ============================================
# GIT BASICS — 10 LỆNH QUAN TRỌNG NHẤT
# ============================================

# 1. Cấu hình lần đầu
git config --global user.name "Tên của bạn"
git config --global user.email "email@example.com"

# 2. Tạo repo mới
git init

# 3. Kiểm tra trạng thái
git status

# 4. Thêm files vào staging
git add .                    # Thêm tất cả files
git add notebook.ipynb       # Thêm 1 file cụ thể

# 5. Commit (lưu snapshot)
git commit -m "Initial commit: add dataset and EDA notebook"

# 6. Xem lịch sử commit
git log --oneline

# 7. Kết nối remote (GitHub)
git remote add origin https://github.com/username/project-name.git

# 8. Push lên GitHub
git push -u origin main

# 9. Pull (cập nhật từ remote)
git pull origin main

# 10. Xem diff (thay đổi so với commit trước)
git diff

Commit Message Best Practices

# ❌ Xấu:
git commit -m "update"
git commit -m "fix"
git commit -m "stuff"

# ✅ Tốt — format: <type>: <description>
git commit -m "feat: add EDA notebook with sales distribution analysis"
git commit -m "fix: handle missing values in customer_age column"
git commit -m "docs: update README with project description and methodology"
git commit -m "data: add cleaned dataset v2 with outlier treatment"
git commit -m "viz: create Tableau dashboard for customer segmentation"

GitHub Repository Structure

📁 capstone-ecommerce-analysis/
├── 📄 README.md                  ← Landing page — "sales pitch" cho project
├── 📄 .gitignore                 ← Bỏ qua files không cần track
├── 📄 LICENSE                    ← MIT hoặc CC BY 4.0
├── 📁 data/
│   ├── 📄 raw/                   ← Data gốc, KHÔNG chỉnh sửa
│   │   └── ecommerce_data.csv
│   ├── 📄 processed/             ← Data đã clean
│   │   └── ecommerce_clean.csv
│   └── 📄 data_dictionary.md     ← Mô tả từng column
├── 📁 notebooks/
│   ├── 📄 01_data_cleaning.ipynb
│   ├── 📄 02_eda.ipynb
│   ├── 📄 03_analysis.ipynb
│   └── 📄 04_modeling.ipynb      ← (optional)
├── 📁 dashboards/
│   ├── 📄 dashboard_screenshot.png
│   └── 📄 dashboard_link.md      ← Link Tableau Public / Power BI
├── 📁 reports/
│   ├── 📄 presentation.pdf
│   └── 📄 key_insights.md
└── 📁 src/                       ← (optional) Reusable functions
    └── 📄 utils.py

`.gitignore` cho DA Project

gitignore

# Data files lớn (dùng Git LFS hoặc link download)
*.csv
*.xlsx
*.parquet
data/raw/

# Jupyter checkpoints
.ipynb_checkpoints/

# Python
__pycache__/
*.pyc
*.pyo
.env
venv/
.venv/

# OS files
.DS_Store
Thumbs.db
desktop.ini

# IDE
.vscode/
.idea/

# Tableau
*.twbx
*.hyper

💡 Data quá lớn?

Nếu dataset > 100MB — KHÔNG push lên GitHub. Thay vào đó:

Thêm data folder vào .gitignore
Trong README, ghi link download dataset (Kaggle link, Google Drive)
Trong notebook, viết code download: kaggle datasets download -d dataset-name

README.md Template — "Landing Page" cho Project

README là thứ đầu tiên nhà tuyển dụng đọc. Nó phải trả lời 5 câu hỏi trong 30 giây:

markdown

# 📊 [Project Title — Catchy & Descriptive]

> One-liner mô tả project (1-2 câu)

![Dashboard Screenshot](dashboards/dashboard_screenshot.png)

## 🎯 Business Problem
[2-3 câu: Vấn đề gì? Tại sao quan trọng? Impact?]

## 📊 Dataset
| Attribute | Detail |
|-----------|--------|
| **Source** | [Kaggle / UCI / API] (link) |
| **Size** | X rows × Y columns |
| **Period** | Month/Year — Month/Year |
| **Key Variables** | var1, var2, var3, var4, var5 |

## ❓ Key Questions
1. [Business Question 1]
2. [Business Question 2]
3. [Business Question 3]

## 🔧 Tools & Methods
- **Python**: pandas, matplotlib, seaborn, scikit-learn
- **SQL**: PostgreSQL / MySQL queries
- **BI**: Tableau / Power BI dashboard
- **Analysis**: EDA, Segmentation, Regression, A/B Testing

## 📈 Key Findings
1. **Finding 1**: [Insight + metric + so what]
2. **Finding 2**: [Insight + metric + so what]
3. **Finding 3**: [Insight + metric + so what]

## 💡 Recommendations
1. [Action 1]: Expected impact [metric]
2. [Action 2]: Expected impact [metric]
3. [Action 3]: Expected impact [metric]

## 📁 Project Structure
[Mô tả cấu trúc folder — xem mẫu ở trên]

## 🚀 How to Run
1. Clone repo: `git clone https://github.com/username/project.git`
2. Install: `pip install -r requirements.txt`
3. Run notebooks in order: 01 → 02 → 03

## 👤 Author
**[Tên]** — Data Analyst
- 🔗 [LinkedIn](linkedin.com/in/your-profile)
- 📧 email@example.com

LinkedIn & Personal Site Integration

Portfolio project hoàn chỉnh cần được publish ở 3 nơi:

Platform	Content	Mục đích
GitHub	Code + Notebook + README	Showcase technical skills
LinkedIn	Post tóm tắt + screenshots	Networking + visibility
Tableau Public / Power BI	Interactive dashboard	Showcase BI skills

LinkedIn post template:

📊 Just completed my Data Analytics Capstone Project!

🎯 Problem: [1 câu]
📈 Key Finding: [Insight ấn tượng nhất + metric]
💡 Recommendation: [Action + expected impact]

Tools: Python | SQL | Tableau
🔗 GitHub: [link]
🔗 Dashboard: [link]

#DataAnalytics #Portfolio #DataScience #Python

🔑 Tổng kết Buổi 18

mermaid

flowchart TD
    A["1️⃣ Chọn đề tài<br/>DRBST criteria"] --> B["2️⃣ Thu thập data<br/>Evaluate dataset"]
    B --> C["3️⃣ Project Planning<br/>Questions → Methods"]
    C --> D["4️⃣ Setup Git repo<br/>README + Structure"]
    D --> E["🎯 Ready for<br/>Buổi 19: Analysis!"]

    style E fill:#e8f5e9,stroke:#4caf50,stroke-width:3px

Key Takeaways

#	Takeaway	Action
1	Portfolio > Resume	Có ít nhất 1 capstone project trên GitHub
2	Đề tài tốt = DRBST	Data, Real context, Business impact, Scope 2 tuần, Tell-able story
3	Data audit TRƯỚC khi phân tích	Checklist 10 điểm, ≥ 7/10 mới dùng
4	3-5 business questions	Cụ thể, data-answerable, actionable
5	README = Landing page	30 giây để impress nhà tuyển dụng
6	Git workflow	init → add → commit → push — thường xuyên!

Chuẩn bị cho Buổi 19

Trước khi vào Buổi 19 (Analysis & Dashboard), hãy chắc chắn bạn đã:

☐ Chọn đề tài đạt 5/5 tiêu chí DRBST
☐ Download dataset + Data audit report (≥ 7/10 checklist)
☐ Viết Project Brief (5 questions + methods + tools)
☐ Setup Git repo + Push README.md lên GitHub
☐ Data đã clean cơ bản (missing values, types, outliers)

📚 Tài liệu tham khảo

Microsoft TDSP — Team Data Science Process: docs.microsoft.com
Git Documentation — Official: git-scm.com/doc
Kaggle Learn — Datasets & Notebooks: kaggle.com/learn
GitHub Guides — Hello World: guides.github.com
Data Ethics — ODI: theodi.org/topic/data-ethics

📘 Buổi 18: Capstone — Project Kickoff — Bắt đầu dự án xin việc ​

🎯 Mục tiêu buổi học ​

📋 Tổng quan ​

📌 Phần 1: Chọn đề tài Capstone — Real Projects, Not Exercises ​

5 Tiêu chí chọn đề tài tốt ​

10 Đề tài gợi ý (đã kiểm chứng phù hợp) ​

Top 5 đề tài được nhà tuyển dụng đánh giá cao nhất: ​

Tránh 5 Pitfalls phổ biến ​

📌 Phần 2: Data Collection — Thu thập và đánh giá Dataset ​

Data Sources đáng tin cậy ​

Dataset Evaluation Checklist ​

Data profiling nhanh bằng Python ​

Data Ethics — Privacy, Consent, Bias ​

📌 Phần 3: Project Planning — Từ Questions đến Deliverables ​

Viết Business Questions tốt ​

Project Brief Template ​

Methods → Questions Mapping ​

Timeline chi tiết — 2 tuần ​

📌 Phần 4: Portfolio Setup — Git, GitHub, README ​

Git cơ bản — 10 lệnh cần biết ​

Commit Message Best Practices ​

GitHub Repository Structure ​

.gitignore cho DA Project ​

README.md Template — "Landing Page" cho Project ​

LinkedIn & Personal Site Integration ​

🔑 Tổng kết Buổi 18 ​

Key Takeaways ​

Chuẩn bị cho Buổi 19 ​

📚 Tài liệu tham khảo ​

📘 Buổi 18: Capstone — Project Kickoff — Bắt đầu dự án xin việc

🎯 Mục tiêu buổi học

📋 Tổng quan

📌 Phần 1: Chọn đề tài Capstone — Real Projects, Not Exercises

5 Tiêu chí chọn đề tài tốt

10 Đề tài gợi ý (đã kiểm chứng phù hợp)

Top 5 đề tài được nhà tuyển dụng đánh giá cao nhất:

Tránh 5 Pitfalls phổ biến

📌 Phần 2: Data Collection — Thu thập và đánh giá Dataset

Data Sources đáng tin cậy

Dataset Evaluation Checklist

Data profiling nhanh bằng Python

Data Ethics — Privacy, Consent, Bias

📌 Phần 3: Project Planning — Từ Questions đến Deliverables

Viết Business Questions tốt

Project Brief Template

Methods → Questions Mapping

Timeline chi tiết — 2 tuần

📌 Phần 4: Portfolio Setup — Git, GitHub, README

Git cơ bản — 10 lệnh cần biết

Commit Message Best Practices

GitHub Repository Structure

`.gitignore` cho DA Project

README.md Template — "Landing Page" cho Project

LinkedIn & Personal Site Integration

🔑 Tổng kết Buổi 18

Key Takeaways

Chuẩn bị cho Buổi 19

📚 Tài liệu tham khảo