CLAUDE.md Python FastAPI Data / ML

Python ETLパイプライン(Polars + DuckDB)

Python、Polars、DuckDB、Celeryによるデータパイプライン構築ルール。FastAPIオーケストレーション、厳密な型安全。

CLAUDE.md · 43 lines
# DataFlow Pipeline - CLAUDE.md

## Tech Stack
- Python 3.12 with FastAPI and Celery
- PostgreSQL for data warehousing, Redis as message broker
- Analytics: Polars (prefer over Pandas for performance) and DuckDB
- Package manager: uv (prefer over Poetry for speed)

## Commands
- Install deps: uv sync
- Run API: uv run fastapi dev
- Run tests: uv run pytest
- Type check: uv run mypy src/
- Format: uv run ruff format
- Lint: uv run ruff check --fix
- Local infra: docker compose up -d (PostgreSQL + Redis)

## Architecture
- API routes: Handle HTTP requests, validate input with Pydantic
- Services: Business logic layer, orchestrates operations
- Repositories: Data access layer, database queries
- Workers: Celery tasks for background ETL jobs
- ETL phases: Extract (API/file ingestion) → Transform (Polars) → Load (PostgreSQL/DuckDB)

## Code Style
- Type hints required on ALL functions
- Max function length: 30 lines
- Use Polars over Pandas for all new transformations
- Use DuckDB for analytics queries (eliminates separate cluster overhead)
- Ruff for formatting and linting (replaces black + isort + flake8)
- mypy strict mode enabled

## Testing
- Minimum 85% test coverage
- Integration tests run against real PostgreSQL (docker compose)
- Use pytest fixtures for database setup/teardown
- Test ETL pipelines end-to-end with sample data

## Key Decisions
- Polars over Pandas: 10x faster for large datasets, better memory efficiency
- uv over Poetry: Faster dependency resolution
- DuckDB for analytics: SQL on local files, no cluster needed
- Celery for orchestration: Reliable task queue with retry logic
Share on X

こちらもおすすめ

Data / ML カテゴリの他のルール

もっとルールを探す

CLAUDE.md、.cursorrules、AGENTS.md、Image Prompts の全 157 ルールをチェック。