🛠 Workshop — Xử lý file dữ liệu bằng Python

Từ file CSV bán hàng thô → viết 5 functions Python → xuất báo cáo hoàn chỉnh ra CSV + JSON. Tất cả trong Jupyter Notebook!

🎯 Mục tiêu workshop

Sau khi hoàn thành workshop này, bạn sẽ:

Setup Jupyter Notebook hoặc Google Colab — sẵn sàng viết Python
Đọc file CSV bằng csv.DictReader — hiểu cấu trúc dữ liệu dạng dict
Viết 5 functions phân tích doanh thu bán hàng — áp dụng kiến thức Buổi 7
Ghi kết quả ra file CSV mới + file JSON — hoàn chỉnh pipeline input → process → output
Tuân thủ PEP 8 — code sạch, có docstring, tên biến rõ ràng

🧰 Yêu cầu

Yêu cầu	Chi tiết
Kiến thức	Đã học Buổi 7: data types, control flow, functions, file handling
Công cụ	Jupyter Notebook (local) HOẶC Google Colab (online)
Python	Python 3.8+ (Colab đã có sẵn)
Thư viện	Chỉ dùng standard library: `csv`, `json`, `os` — không cần Pandas
Thời gian	60–90 phút

💡 Tại sao dùng standard library?

Workshop này cố ý dùng csv + json thay vì Pandas — để bạn hiểu nền tảng xử lý file. Khi học Pandas ở Buổi 8–9, bạn sẽ thấy Pandas đơn giản hóa mọi thứ — nhưng hiểu "behind the scenes" giúp bạn debug tốt hơn!

📦 Dataset: Dữ liệu bán hàng DataMart

Mô tả

File sales_data.csv chứa dữ liệu bán hàng của cửa hàng online DataMart tháng 01/2026, gồm 20 đơn hàng với thông tin sản phẩm, danh mục, giá, số lượng, và trạng thái.

Cấu trúc file CSV

Cột	Kiểu	Mô tả	Ví dụ
`order_id`	int	Mã đơn hàng	1001
`product_name`	str	Tên sản phẩm	Tai nghe Bluetooth
`category`	str	Danh mục	Điện Tử
`price`	float	Đơn giá (VNĐ)	350000
`quantity`	int	Số lượng	2
`status`	str	Trạng thái	completed / cancelled
`order_date`	str	Ngày đặt	2026-01-05
`region`	str	Khu vực	Bắc / Nam / Trung

Sample Data

Tạo file sales_data.csv với nội dung sau (copy vào Jupyter cell hoặc tạo thủ công):

python

# Cell 1: Tạo file CSV mẫu
csv_content = """order_id,product_name,category,price,quantity,status,order_date,region
1001,Tai nghe Bluetooth,Điện Tử,350000,2,completed,2026-01-03,Bắc
1002,Áo thun nam basic,Thời Trang,250000,3,completed,2026-01-03,Nam
1003,Cà phê rang xay 500g,Thực Phẩm,120000,5,completed,2026-01-05,Trung
1004,Sạc dự phòng 10000mAh,Điện Tử,280000,1,cancelled,2026-01-05,Bắc
1005,Nồi cơm điện 1.8L,Gia Dụng,850000,1,completed,2026-01-06,Nam
1006,Quần jean slim fit,Thời Trang,450000,2,completed,2026-01-07,Bắc
1007,Kem chống nắng SPF50,Mỹ Phẩm,320000,3,completed,2026-01-08,Trung
1008,Bình giữ nhiệt 500ml,Gia Dụng,180000,4,completed,2026-01-08,Nam
1009,Laptop văn phòng 14",Điện Tử,12500000,1,completed,2026-01-10,Bắc
1010,Trà oolong hộp 100 túi,Thực Phẩm,95000,6,cancelled,2026-01-10,Trung
1011,Sữa rửa mặt trà xanh,Mỹ Phẩm,150000,2,completed,2026-01-12,Nam
1012,Áo thun nam basic,Thời Trang,250000,1,completed,2026-01-13,Trung
1013,Tai nghe Bluetooth,Điện Tử,350000,1,completed,2026-01-14,Nam
1014,Bàn phím cơ gaming,Điện Tử,1200000,1,completed,2026-01-15,Bắc
1015,Cà phê rang xay 500g,Thực Phẩm,120000,3,completed,2026-01-16,Bắc
1016,Nồi cơm điện 1.8L,Gia Dụng,850000,1,cancelled,2026-01-17,Trung
1017,Quần jean slim fit,Thời Trang,450000,1,completed,2026-01-18,Nam
1018,Kem chống nắng SPF50,Mỹ Phẩm,320000,2,completed,2026-01-20,Bắc
1019,Sạc dự phòng 10000mAh,Điện Tử,280000,2,completed,2026-01-22,Nam
1020,Bình giữ nhiệt 500ml,Gia Dụng,180000,3,completed,2026-01-25,Trung"""

with open("sales_data.csv", "w", encoding="utf-8") as f:
    f.write(csv_content)

print("✅ File sales_data.csv đã được tạo thành công!")

⚠️ Encoding

Luôn dùng encoding="utf-8" khi đọc/ghi file có tiếng Việt — nếu không sẽ bị lỗi UnicodeDecodeError hoặc hiển thị ký tự lạ.

Phần 1: Setup môi trường

Option A: Google Colab (Khuyến nghị cho người mới)

Truy cập colab.research.google.com
Click "New Notebook" → đặt tên: HoTen_Buoi07_Workshop.ipynb
Copy Cell 1 (tạo CSV) ở trên → dán vào cell đầu tiên → nhấn Shift + Enter để chạy
Kiểm tra: chạy !cat sales_data.csv → thấy nội dung file

Option B: Jupyter Notebook (Local)

Mở Terminal / Command Prompt
Chạy: pip install jupyter (nếu chưa có)
Chạy: jupyter notebook → trình duyệt mở tự động
Click "New" → "Python 3" → đặt tên notebook
Copy Cell 1 → chạy

💡 Cấu trúc Notebook

Chia notebook thành các cell rõ ràng:

Cell 1: Tạo CSV mẫu
Cell 2: Import & đọc file
Cell 3–7: 5 functions phân tích
Cell 8: Xuất kết quả
Thêm Markdown cell giữa các code cell để ghi chú

Phần 2: Đọc file CSV

Bước 2.1: Import thư viện

python

# Cell 2: Import libraries
import csv
import json
import os

Bước 2.2: Đọc CSV bằng DictReader

python

# Cell 3: Đọc file CSV
def read_sales_data(filename):
    """Đọc file CSV bán hàng và trả về list of dicts.

    Args:
        filename (str): Đường dẫn tới file CSV.

    Returns:
        list[dict]: Danh sách đơn hàng, mỗi đơn là 1 dict.
    """
    orders = []
    with open(filename, "r", encoding="utf-8") as file:
        reader = csv.DictReader(file)
        for row in reader:
            # Convert data types
            order = {
                "order_id": int(row["order_id"]),
                "product_name": row["product_name"],
                "category": row["category"],
                "price": float(row["price"]),
                "quantity": int(row["quantity"]),
                "status": row["status"],
                "order_date": row["order_date"],
                "region": row["region"],
            }
            orders.append(order)
    return orders


# Đọc dữ liệu
sales = read_sales_data("sales_data.csv")

# Kiểm tra
print(f"📊 Tổng số đơn hàng: {len(sales)}")
print(f"📋 Đơn hàng đầu tiên: {sales[0]}")
print(f"📋 Các cột: {list(sales[0].keys())}")

Output mong đợi:

📊 Tổng số đơn hàng: 20
📋 Đơn hàng đầu tiên: {'order_id': 1001, 'product_name': 'Tai nghe Bluetooth', 'category': 'Điện Tử', 'price': 350000.0, 'quantity': 2, 'status': 'completed', 'order_date': '2026-01-03', 'region': 'Bắc'}
📋 Các cột: ['order_id', 'product_name', 'category', 'price', 'quantity', 'status', 'order_date', 'region']

⚠️ Lỗi thường gặp khi đọc CSV

FileNotFoundError: file không cùng thư mục → dùng os.path.exists(filename) để check
KeyError: tên cột sai chính tả → print(row.keys()) dòng đầu để kiểm tra
ValueError khi convert type: có dòng bị rỗng → thêm if row["price"]: trước convert

Phần 3: Viết functions phân tích

Function 1: `total_revenue()` — Tổng doanh thu

python

# Cell 4: Function 1 — Tổng doanh thu
def total_revenue(orders):
    """Tính tổng doanh thu từ các đơn hàng completed.

    Doanh thu = price × quantity cho mỗi đơn hàng có status = 'completed'.

    Args:
        orders (list[dict]): Danh sách đơn hàng.

    Returns:
        float: Tổng doanh thu (VNĐ).
    """
    total = 0.0
    for order in orders:
        if order["status"] == "completed":
            total += order["price"] * order["quantity"]
    return total


# Test
revenue = total_revenue(sales)
print(f"💰 Tổng doanh thu (completed): {revenue:,.0f} VNĐ")

Output mong đợi:

💰 Tổng doanh thu (completed): 20,590,000 VNĐ

Function 2: `avg_order_value()` — Giá trị đơn hàng trung bình

python

# Cell 5: Function 2 — Giá trị đơn hàng trung bình
def avg_order_value(orders):
    """Tính giá trị trung bình mỗi đơn hàng completed.

    Args:
        orders (list[dict]): Danh sách đơn hàng.

    Returns:
        float: Giá trị trung bình (VNĐ). Trả về 0.0 nếu không có đơn completed.
    """
    total = 0.0
    count = 0
    for order in orders:
        if order["status"] == "completed":
            total += order["price"] * order["quantity"]
            count += 1

    if count == 0:
        return 0.0
    return total / count


# Test
avg = avg_order_value(sales)
print(f"📈 Giá trị đơn hàng trung bình: {avg:,.0f} VNĐ")

Output mong đợi:

📈 Giá trị đơn hàng trung bình: 1,211,176 VNĐ

Function 3: `top_product()` — Sản phẩm bán chạy nhất

python

# Cell 6: Function 3 — Sản phẩm doanh thu cao nhất
def top_product(orders):
    """Tìm sản phẩm có tổng doanh thu cao nhất (chỉ đơn completed).

    Gộp doanh thu theo tên sản phẩm, sau đó tìm sản phẩm có tổng cao nhất.

    Args:
        orders (list[dict]): Danh sách đơn hàng.

    Returns:
        dict: {"name": str, "revenue": float, "quantity_sold": int}
    """
    product_stats = {}

    for order in orders:
        if order["status"] == "completed":
            name = order["product_name"]
            revenue = order["price"] * order["quantity"]
            qty = order["quantity"]

            if name not in product_stats:
                product_stats[name] = {"revenue": 0.0, "quantity_sold": 0}

            product_stats[name]["revenue"] += revenue
            product_stats[name]["quantity_sold"] += qty

    # Tìm sản phẩm có doanh thu cao nhất
    best_product = None
    best_revenue = 0.0

    for name, stats in product_stats.items():
        if stats["revenue"] > best_revenue:
            best_revenue = stats["revenue"]
            best_product = name

    return {
        "name": best_product,
        "revenue": best_revenue,
        "quantity_sold": product_stats[best_product]["quantity_sold"],
    }


# Test
top = top_product(sales)
print(f"🏆 Sản phẩm bán chạy nhất: {top['name']}")
print(f"   Doanh thu: {top['revenue']:,.0f} VNĐ")
print(f"   Số lượng bán: {top['quantity_sold']} sản phẩm")

Output mong đợi:

🏆 Sản phẩm bán chạy nhất: Laptop văn phòng 14"
   Doanh thu: 12,500,000 VNĐ
   Số lượng bán: 1 sản phẩm

Function 4: `revenue_by_category()` — Doanh thu theo danh mục

python

# Cell 7: Function 4 — Doanh thu theo danh mục
def revenue_by_category(orders):
    """Tính tổng doanh thu theo từng danh mục sản phẩm (chỉ completed).

    Args:
        orders (list[dict]): Danh sách đơn hàng.

    Returns:
        dict: {category_name: {"revenue": float, "order_count": int}}
    """
    categories = {}

    for order in orders:
        if order["status"] == "completed":
            cat = order["category"]
            revenue = order["price"] * order["quantity"]

            if cat not in categories:
                categories[cat] = {"revenue": 0.0, "order_count": 0}

            categories[cat]["revenue"] += revenue
            categories[cat]["order_count"] += 1

    return categories


# Test
cat_revenue = revenue_by_category(sales)
print("📊 Doanh thu theo danh mục:")
print("-" * 50)
for cat, stats in sorted(cat_revenue.items(), key=lambda x: x[1]["revenue"], reverse=True):
    print(f"  {cat:15s} | {stats['revenue']:>15,.0f} VNĐ | {stats['order_count']} đơn")

Output mong đợi:

📊 Doanh thu theo danh mục:
--------------------------------------------------
  Điện Tử         |  14,480,000 VNĐ | 4 đơn
  Thời Trang      |   1,850,000 VNĐ | 4 đơn
  Mỹ Phẩm         |   1,600,000 VNĐ | 3 đơn
  Gia Dụng        |   1,570,000 VNĐ | 2 đơn
  Thực Phẩm       |     960,000 VNĐ | 2 đơn

Function 5: `classify_orders()` — Phân loại đơn hàng

python

# Cell 8: Function 5 — Phân loại đơn hàng theo giá trị
# Hằng số phân loại
VIP_THRESHOLD = 5_000_000
STANDARD_THRESHOLD = 1_000_000


def classify_orders(orders):
    """Phân loại đơn hàng completed theo giá trị thành VIP / Standard / Basic.

    Tiêu chí:
    - VIP: revenue >= 5,000,000 VNĐ
    - Standard: 1,000,000 <= revenue < 5,000,000
    - Basic: revenue < 1,000,000

    Args:
        orders (list[dict]): Danh sách đơn hàng.

    Returns:
        dict: {"VIP": list, "Standard": list, "Basic": list}
              Mỗi list chứa dict: {"order_id", "product_name", "revenue"}
    """
    classified = {"VIP": [], "Standard": [], "Basic": []}

    for order in orders:
        if order["status"] != "completed":
            continue

        revenue = order["price"] * order["quantity"]
        order_info = {
            "order_id": order["order_id"],
            "product_name": order["product_name"],
            "revenue": revenue,
        }

        if revenue >= VIP_THRESHOLD:
            classified["VIP"].append(order_info)
        elif revenue >= STANDARD_THRESHOLD:
            classified["Standard"].append(order_info)
        else:
            classified["Basic"].append(order_info)

    return classified


# Test
result = classify_orders(sales)
for tier, orders_list in result.items():
    print(f"\n{'='*50}")
    print(f"📦 {tier} ({len(orders_list)} đơn)")
    print(f"{'='*50}")
    for o in orders_list:
        print(f"  #{o['order_id']} | {o['product_name']:25s} | {o['revenue']:>12,.0f} VNĐ")

Output mong đợi:

==================================================
📦 VIP (1 đơn)
==================================================
  #1009 | Laptop văn phòng 14"      |   12,500,000 VNĐ

==================================================
📦 Standard (2 đơn)
==================================================
  #1014 | Bàn phím cơ gaming        |    1,200,000 VNĐ
  #1007 | Kem chống nắng SPF50      |      960,000 VNĐ
  ...

==================================================
📦 Basic (14 đơn)
==================================================
  #1001 | Tai nghe Bluetooth         |      700,000 VNĐ
  ...

Phần 4: Xuất kết quả

4.1: Ghi báo cáo tổng hợp ra JSON

python

# Cell 9: Xuất báo cáo JSON
def export_json_report(orders, output_filename):
    """Tạo báo cáo tổng hợp và ghi ra file JSON.

    Args:
        orders (list[dict]): Danh sách đơn hàng.
        output_filename (str): Tên file JSON output.
    """
    # Thu thập tất cả metrics
    top = top_product(orders)
    cat_rev = revenue_by_category(orders)
    classified = classify_orders(orders)

    report = {
        "report_title": "Báo cáo bán hàng DataMart - Tháng 01/2026",
        "generated_date": "2026-02-18",
        "summary": {
            "total_revenue": total_revenue(orders),
            "avg_order_value": round(avg_order_value(orders), 0),
            "total_orders": len(orders),
            "completed_orders": sum(1 for o in orders if o["status"] == "completed"),
            "cancelled_orders": sum(1 for o in orders if o["status"] == "cancelled"),
        },
        "top_product": top,
        "revenue_by_category": {
            cat: {"revenue": stats["revenue"], "order_count": stats["order_count"]}
            for cat, stats in cat_rev.items()
        },
        "order_classification": {
            tier: len(order_list)
            for tier, order_list in classified.items()
        },
    }

    with open(output_filename, "w", encoding="utf-8") as file:
        json.dump(report, file, ensure_ascii=False, indent=2)

    print(f"✅ Báo cáo JSON đã lưu: {output_filename}")
    return report


# Xuất file
report = export_json_report(sales, "sales_report.json")

# Hiển thị nội dung
print("\n📄 Nội dung báo cáo:")
print(json.dumps(report, ensure_ascii=False, indent=2))

💡 ensure_ascii=False

Khi ghi JSON có tiếng Việt, phải dùng ensure_ascii=False — nếu không, "Điện Tử" sẽ thành "\u0110i\u1ec7n T\u1eed". Thêm indent=2 để file dễ đọc.

4.2: Ghi chi tiết phân loại ra CSV mới

python

# Cell 10: Xuất CSV phân loại đơn hàng
def export_classified_csv(orders, output_filename):
    """Ghi danh sách đơn hàng đã phân loại ra file CSV.

    Args:
        orders (list[dict]): Danh sách đơn hàng gốc.
        output_filename (str): Tên file CSV output.
    """
    fieldnames = ["order_id", "product_name", "category", "revenue", "tier"]

    with open(output_filename, "w", encoding="utf-8", newline="") as file:
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        writer.writeheader()

        for order in orders:
            if order["status"] != "completed":
                continue

            revenue = order["price"] * order["quantity"]

            if revenue >= VIP_THRESHOLD:
                tier = "VIP"
            elif revenue >= STANDARD_THRESHOLD:
                tier = "Standard"
            else:
                tier = "Basic"

            writer.writerow({
                "order_id": order["order_id"],
                "product_name": order["product_name"],
                "category": order["category"],
                "revenue": revenue,
                "tier": tier,
            })

    print(f"✅ File CSV phân loại đã lưu: {output_filename}")


# Xuất file
export_classified_csv(sales, "classified_orders.csv")

# Kiểm tra: đọc lại file vừa ghi
print("\n📄 Nội dung file classified_orders.csv:")
with open("classified_orders.csv", "r", encoding="utf-8") as f:
    print(f.read())

⚠️ newline="" khi ghi CSV

Trên Windows, phải dùng newline="" trong open() khi ghi CSV — nếu không sẽ bị dòng trắng thừa giữa các dòng dữ liệu. Đây là quy ước chuẩn của module csv trong Python.

Phần 5: Bonus challenges 🚀

Hoàn thành phần chính rồi? Thử thêm các thử thách nâng cao:

Bonus 1: Doanh thu theo vùng miền

python

# Bonus 1: Function tính doanh thu theo region
def revenue_by_region(orders):
    """Tính tổng doanh thu theo vùng miền (Bắc, Trung, Nam)."""
    regions = {}
    for order in orders:
        if order["status"] == "completed":
            region = order["region"]
            revenue = order["price"] * order["quantity"]
            if region not in regions:
                regions[region] = 0.0
            regions[region] += revenue
    return regions

# Test
region_rev = revenue_by_region(sales)
for region, rev in sorted(region_rev.items(), key=lambda x: x[1], reverse=True):
    print(f"  {region}: {rev:>15,.0f} VNĐ")

Bonus 2: Tìm ngày có doanh thu cao nhất

python

# Bonus 2: Function tìm ngày bán chạy nhất
def best_sales_day(orders):
    """Tìm ngày có tổng doanh thu cao nhất."""
    daily = {}
    for order in orders:
        if order["status"] == "completed":
            date = order["order_date"]
            revenue = order["price"] * order["quantity"]
            if date not in daily:
                daily[date] = 0.0
            daily[date] += revenue

    best_date = max(daily, key=daily.get)
    return {"date": best_date, "revenue": daily[best_date]}

# Test
best = best_sales_day(sales)
print(f"📅 Ngày bán chạy nhất: {best['date']} — {best['revenue']:,.0f} VNĐ")

Bonus 3: Xử lý lỗi với try/except

python

# Bonus 3: Thêm error handling cho function đọc CSV
def read_sales_data_safe(filename):
    """Đọc CSV với xử lý lỗi đầy đủ."""
    if not os.path.exists(filename):
        print(f"❌ File không tồn tại: {filename}")
        return []

    orders = []
    error_rows = 0

    try:
        with open(filename, "r", encoding="utf-8") as file:
            reader = csv.DictReader(file)
            for i, row in enumerate(reader, start=2):
                try:
                    order = {
                        "order_id": int(row["order_id"]),
                        "product_name": row["product_name"],
                        "category": row["category"],
                        "price": float(row["price"]),
                        "quantity": int(row["quantity"]),
                        "status": row["status"],
                        "order_date": row["order_date"],
                        "region": row["region"],
                    }
                    orders.append(order)
                except (ValueError, KeyError) as e:
                    error_rows += 1
                    print(f"⚠️ Lỗi dòng {i}: {e}")
    except Exception as e:
        print(f"❌ Lỗi đọc file: {e}")
        return []

    print(f"✅ Đọc thành công: {len(orders)} đơn | Lỗi: {error_rows} dòng")
    return orders

# Test
safe_data = read_sales_data_safe("sales_data.csv")
safe_data_bad = read_sales_data_safe("khong_ton_tai.csv")

📋 Deliverable — Bài nộp

File cần nộp

#	File	Mô tả
1	`HoTen_Buoi07_Workshop.ipynb`	Jupyter Notebook hoàn chỉnh
2	`sales_data.csv`	File CSV đầu vào
3	`sales_report.json`	File JSON báo cáo tổng hợp
4	`classified_orders.csv`	File CSV phân loại đơn hàng

Yêu cầu Notebook

[ ] Tất cả cell đã chạy thành công (không có error)
[ ] Có Markdown cell giải thích trước mỗi code cell
[ ] 5 functions bắt buộc hoạt động đúng
[ ] Output files (JSON + CSV) được tạo thành công
[ ] Code tuân thủ PEP 8 (tên biến snake_case, 4-space indent)
[ ] Mỗi function có docstring

📊 Rubric — Tiêu chí chấm điểm

Tiêu chí	Xuất sắc (9–10)	Tốt (7–8)	Đạt (5–6)	Chưa đạt (< 5)
Functions (40%)	5/5 functions đúng + có docstring + handle edge cases	5/5 đúng + có docstring	4/5 đúng	< 4 functions hoạt động
File I/O (25%)	Đọc CSV + ghi JSON + CSV hoàn chỉnh, có error handling	Đọc + ghi thành công, không error handling	Đọc CSV OK, ghi file thiếu	Không đọc/ghi được file
Code Quality (20%)	PEP 8 hoàn chỉnh + meaningful names + no magic numbers	PEP 8 cơ bản + tên biến OK	Code chạy nhưng không theo PEP 8	Code lộn xộn, khó đọc
Notebook (15%)	Markdown giải thích rõ + output sạch + bonus challenges	Có Markdown + output	Chỉ có code, không giải thích	Thiếu cell, có error

💡 Mẹo đạt điểm cao

Thêm Markdown cell trước mỗi code cell — giải thích bạn đang làm gì
Chạy lại toàn bộ notebook từ đầu (Kernel → Restart & Run All) — đảm bảo không có cell phụ thuộc thứ tự sai
Làm Bonus — thể hiện bạn nắm vững kiến thức, cộng điểm thưởng
Đặt tên biến tiếng Anh — total_revenue thay vì tong_doanh_thu

🛠 Workshop — Xử lý file dữ liệu bằng Python ​

🎯 Mục tiêu workshop ​

🧰 Yêu cầu ​

📦 Dataset: Dữ liệu bán hàng DataMart ​

Mô tả ​

Cấu trúc file CSV ​

Sample Data ​

Phần 1: Setup môi trường ​

Option A: Google Colab (Khuyến nghị cho người mới) ​

Option B: Jupyter Notebook (Local) ​

Phần 2: Đọc file CSV ​

Bước 2.1: Import thư viện ​

Bước 2.2: Đọc CSV bằng DictReader ​

Phần 3: Viết functions phân tích ​

Function 1: total_revenue() — Tổng doanh thu ​

Function 2: avg_order_value() — Giá trị đơn hàng trung bình ​

Function 3: top_product() — Sản phẩm bán chạy nhất ​

Function 4: revenue_by_category() — Doanh thu theo danh mục ​

Function 5: classify_orders() — Phân loại đơn hàng ​

Phần 4: Xuất kết quả ​

4.1: Ghi báo cáo tổng hợp ra JSON ​

4.2: Ghi chi tiết phân loại ra CSV mới ​

Phần 5: Bonus challenges 🚀 ​

Bonus 1: Doanh thu theo vùng miền ​

Bonus 2: Tìm ngày có doanh thu cao nhất ​

Bonus 3: Xử lý lỗi với try/except ​

📋 Deliverable — Bài nộp ​

File cần nộp ​

Yêu cầu Notebook ​

📊 Rubric — Tiêu chí chấm điểm ​

🛠 Workshop — Xử lý file dữ liệu bằng Python

🎯 Mục tiêu workshop

🧰 Yêu cầu

📦 Dataset: Dữ liệu bán hàng DataMart

Mô tả

Cấu trúc file CSV

Sample Data

Phần 1: Setup môi trường

Option A: Google Colab (Khuyến nghị cho người mới)

Option B: Jupyter Notebook (Local)

Phần 2: Đọc file CSV

Bước 2.1: Import thư viện

Bước 2.2: Đọc CSV bằng DictReader

Phần 3: Viết functions phân tích

Function 1: `total_revenue()` — Tổng doanh thu

Function 2: `avg_order_value()` — Giá trị đơn hàng trung bình

Function 3: `top_product()` — Sản phẩm bán chạy nhất

Function 4: `revenue_by_category()` — Doanh thu theo danh mục

Function 5: `classify_orders()` — Phân loại đơn hàng

Phần 4: Xuất kết quả

4.1: Ghi báo cáo tổng hợp ra JSON

4.2: Ghi chi tiết phân loại ra CSV mới

Phần 5: Bonus challenges 🚀

Bonus 1: Doanh thu theo vùng miền

Bonus 2: Tìm ngày có doanh thu cao nhất

Bonus 3: Xử lý lỗi với try/except

📋 Deliverable — Bài nộp

File cần nộp

Yêu cầu Notebook

📊 Rubric — Tiêu chí chấm điểm