From 6332621114a45f0c30170f23f10ffbf569a3ef55 Mon Sep 17 00:00:00 2001 From: clcl777 <77223796+clcl777@users.noreply.github.com> Date: Sat, 4 Oct 2025 18:43:19 +0900 Subject: [PATCH 1/4] add cursor document --- .cursor/rules/documentation.mdc | 48 ++++++++++++++++++++ .cursor/rules/enums-generation.mdc | 53 +++++++++++++++++++++++ .cursor/rules/fetch-strategies.mdc | 65 ++++++++++++++++++++++++++++ .cursor/rules/project-structure.mdc | 58 +++++++++++++++++++++++++ .cursor/rules/protobuf.mdc | 42 ++++++++++++++++++ .cursor/rules/python-conventions.mdc | 41 ++++++++++++++++++ .cursor/rules/testing.mdc | 52 ++++++++++++++++++++++ 7 files changed, 359 insertions(+) create mode 100644 .cursor/rules/documentation.mdc create mode 100644 .cursor/rules/enums-generation.mdc create mode 100644 .cursor/rules/fetch-strategies.mdc create mode 100644 .cursor/rules/project-structure.mdc create mode 100644 .cursor/rules/protobuf.mdc create mode 100644 .cursor/rules/python-conventions.mdc create mode 100644 .cursor/rules/testing.mdc diff --git a/.cursor/rules/documentation.mdc b/.cursor/rules/documentation.mdc new file mode 100644 index 00000000..e49dabd9 --- /dev/null +++ b/.cursor/rules/documentation.mdc @@ -0,0 +1,48 @@ +--- +globs: docs/*.md,mkdocs.yml,README.md +--- + +# Documentation Guidelines + +## Documentation Structure +- [README.md](mdc:README.md) - Main project documentation with examples and background +- [docs/](mdc:docs/) - Extended documentation using MkDocs +- [mkdocs.yml](mdc:mkdocs.yml) - MkDocs configuration + +## Documentation Files +- [docs/index.md](mdc:docs/index.md) - Documentation homepage +- [docs/airports.md](mdc:docs/airports.md) - Airport enum usage and examples +- [docs/fallbacks.md](mdc:docs/fallbacks.md) - Fallback strategy documentation +- [docs/filters.md](mdc:docs/filters.md) - Filter creation guide +- [docs/local.md](mdc:docs/local.md) - Local Playwright setup and usage + +## Building Documentation +```bash +mkdocs build +mkdocs serve # For local preview +``` + +## Documentation Style +- Use clear, concise language +- Include code examples with type annotations +- Explain the "why" not just the "how" +- Add real-world use cases +- Keep examples up-to-date with the API + +## README.md +- Keep the README focused on getting started quickly +- Include installation instructions +- Show a complete, working example +- Link to full documentation +- Explain the project's unique value (Protobuf-based, fast, strongly-typed) + +## Code Examples in Docs +- All code examples should be tested and working +- Use realistic data (valid airports, future dates) +- Show type annotations to highlight strong typing +- Demonstrate error handling where relevant + +## Changelog +- Document major changes in README +- Use version numbers to track API changes +- Highlight breaking changes clearly diff --git a/.cursor/rules/enums-generation.mdc b/.cursor/rules/enums-generation.mdc new file mode 100644 index 00000000..be923af5 --- /dev/null +++ b/.cursor/rules/enums-generation.mdc @@ -0,0 +1,53 @@ +--- +globs: enums/*.py,enums/*.csv,fast_flights/_generated_enum.py +--- + +# Airport Enum Generation + +## Overview +The project auto-generates a Python enum of all airports from a CSV file to provide autocomplete functionality for airport codes. + +## Files +- [enums/airports.csv](mdc:enums/airports.csv) - Source data for airports (IATA codes and names) +- [enums/generate_enums.py](mdc:enums/generate_enums.py) - Generator script +- [fast_flights/_generated_enum.py](mdc:fast_flights/_generated_enum.py) - Generated Airport enum + +## Regenerating Enums +To regenerate the airport enums after updating the CSV: +```bash +python enums/generate_enums.py +``` + +## CSV Format +The airports.csv should have: +- First row: headers +- Each subsequent row: airport data with IATA code and airport name + +## Generated Code +- Never manually edit `fast_flights/_generated_enum.py` +- The generated `Airport` enum provides IDE autocomplete +- Each enum value is the IATA 3-letter code + +## Usage +```python +from fast_flights import FlightData, Airport + +# With enum (provides autocomplete) +flight = FlightData( + date="2025-01-01", + from_airport=Airport.TAIPEI_SONGSHAN_AIRPORT, # Autocomplete available! + to_airport=Airport.TOKYO_HANEDA_AIRPORT +) + +# With string (also works) +flight = FlightData( + date="2025-01-01", + from_airport="TPE", + to_airport="HND" +) +``` + +## Adding New Airports +1. Add the airport to [enums/airports.csv](mdc:enums/airports.csv) +2. Run `python enums/generate_enums.py` +3. The new airport will be available in the `Airport` enum diff --git a/.cursor/rules/fetch-strategies.mdc b/.cursor/rules/fetch-strategies.mdc new file mode 100644 index 00000000..8fa2b1bd --- /dev/null +++ b/.cursor/rules/fetch-strategies.mdc @@ -0,0 +1,65 @@ +--- +description: Guidelines for implementing and working with different fetch strategies +--- + +# Fetch Strategy Implementation + +## Overview +The project supports multiple fetch strategies to handle different scenarios (rate limiting, blocking, regional restrictions, etc.). + +## Strategy Files +- [fast_flights/primp.py](mdc:fast_flights/primp.py) & [fast_flights/primp.pyi](mdc:fast_flights/primp.pyi) - Fast HTTP client with browser impersonation (default) +- [fast_flights/fallback_playwright.py](mdc:fast_flights/fallback_playwright.py) - Serverless Playwright fallback +- [fast_flights/local_playwright.py](mdc:fast_flights/local_playwright.py) - Local Playwright implementation +- [fast_flights/bright_data_fetch.py](mdc:fast_flights/bright_data_fetch.py) - Bright Data proxy integration + +## Fetch Strategy Pattern +All fetch strategies should: +1. Accept `params: dict` as parameter (containing `tfs`, `hl`, `tfu`, `curr`) +2. Return a `Response` object (or compatible object) with: + - `.status_code` property + - `.text` property (HTML content) + - `.text_markdown` property (for error messages) + +## Response Interface +```python +class Response: + status_code: int + text: str + text_markdown: str # Markdown-formatted version for debugging +``` + +## When to Use Each Strategy + +### primp (default/common mode) +- Fastest option +- Uses browser impersonation to avoid basic detection +- Works for most requests +- No external dependencies beyond the primp library + +### Fallback Playwright +- Automatically triggered when primp fails (in fallback mode) +- Uses serverless Playwright functions +- Handles JavaScript rendering and more complex anti-bot measures +- Slower but more reliable + +### Local Playwright +- For development and testing +- Requires local Playwright installation (`pip install fast-flights[local]`) +- Useful for debugging issues with the scraper + +### Bright Data +- For production use with proxy service +- Requires Bright Data credentials +- Most reliable but has cost implications + +## Error Handling +- Fetch strategies should raise `AssertionError` if status code is not 200 +- Core module catches `AssertionError` and falls back to Playwright in fallback mode +- If parsing fails, retry with force-fallback mode to ensure JavaScript rendering + +## Adding New Strategies +1. Create a new file in `fast_flights/` (e.g., `my_fetch.py`) +2. Implement a function that accepts `params: dict` and returns `Response` +3. Add the strategy to the mode options in [fast_flights/core.py](mdc:fast_flights/core.py) +4. Update type hints to include the new mode literal diff --git a/.cursor/rules/project-structure.mdc b/.cursor/rules/project-structure.mdc new file mode 100644 index 00000000..6baeeba0 --- /dev/null +++ b/.cursor/rules/project-structure.mdc @@ -0,0 +1,58 @@ +--- +alwaysApply: true +--- + +# Fast-Flights Project Structure + +## Overview +This is a Python-based Google Flights scraper that uses Protocol Buffers to decode Google's base64-encoded flight data. The project provides a strongly-typed API for fetching flight information. + +## Core Architecture + +### Main Entry Points +- [fast_flights/__init__.py](mdc:fast_flights/__init__.py) - Public API exports +- [fast_flights/core.py](mdc:fast_flights/core.py) - Primary flight fetching logic with multiple fetch modes + +### Key Components + +1. **Flight Data & Filters** + - [fast_flights/flights_impl.py](mdc:fast_flights/flights_impl.py) - Typed implementations of `FlightData`, `Passengers`, and `TFSData` + - [fast_flights/filter.py](mdc:fast_flights/filter.py) - Filter creation utilities + +2. **Protocol Buffers** + - [fast_flights/flights.proto](mdc:fast_flights/flights.proto) - Flight data protobuf schema + - [fast_flights/cookies.proto](mdc:fast_flights/cookies.proto) - Cookie/consent protobuf schema + - [fast_flights/flights_pb2.py](mdc:fast_flights/flights_pb2.py) - Generated protobuf code + - [fast_flights/cookies_pb2.py](mdc:fast_flights/cookies_pb2.py) - Generated protobuf code + +3. **Fetch Strategies** + - `primp` Client (default) - Fast HTTP client with browser impersonation + - `fallback_playwright` - Serverless Playwright fallback for blocked requests + - `local_playwright` - Local Playwright for development/testing + - `bright_data` - Bright Data proxy service + +4. **Data Processing** + - [fast_flights/schema.py](mdc:fast_flights/schema.py) - Response data models (`Result`, `Flight`) + - [fast_flights/decoder.py](mdc:fast_flights/decoder.py) - JavaScript data decoder for alternative data source + - HTML parsing using `selectolax.lexbor` + +5. **Generated Code** + - [fast_flights/_generated_enum.py](mdc:fast_flights/_generated_enum.py) - Auto-generated Airport enum + - [enums/generate_enums.py](mdc:enums/generate_enums.py) - Script to generate airport enums from CSV + +## Fetch Modes +- `common` - Uses primp HTTP client only +- `fallback` - Tries primp, falls back to Playwright if it fails +- `force-fallback` - Forces Playwright usage +- `local` - Uses local Playwright installation +- `bright-data` - Uses Bright Data proxy service + +## Data Sources +- `html` - Parses HTML response (default) +- `js` - Parses JavaScript data from `