Polla App — Reliable jackpot ingestion for Chilean Loto

Aggregate próximo pozo estimates from vetted community mirrors, enforce provenance, and publish Google Sheets updates without touching polla.cl.

Features

Orchestrates multi-source ingestion with a unified registry (pozos, resultadoslotochile, openloto) and deterministic fallbacks.
Ensures data integrity via SHA-256 content-hash verification and magnitude-based consensus quarantine (10% threshold).
Publishes structured JSONL outputs and comparison reports with full provenance traceability.
Ships a Click-based CLI (run, publish, pozos, health) with dry-run diffing and automated guardrails.
Handles rate-limiting gracefully with jittered exponential backoff and polite robots.txt enforcement.
Locks behaviour with fixture-driven pytest suites and doctests executed in CI for documentation drift.
Simplifies day-to-day DX with Make targets, Black/Ruff/Mypy automation, and GitHub Actions parity.

Tech Stack

Python 3.10+, Click CLI, Requests + BeautifulSoup parsers
Google Sheets integration via gspread + google-auth
Testing: Pytest (+ doctests), Faker fixtures
Tooling: Ruff, Black, Mypy, GitHub Actions (tests, docs, health)

Architecture at a Glance

%%{init: {"themeVariables": {"fontSize":"16px"}, "flowchart": {"htmlLabels": false, "wrap": true}}}%%
flowchart TB
  A[CLI command] --> B[Pipeline Orchestrator]
  B --> C{Source loader}
  C -->|ResultadosLotoChile| D[Primary scrape]
  C -->|OpenLoto fallback| E[Fallback scrape]
  D --> F[Normalizer]
  E --> F[Normalizer]
  F --> G["Artifacts<br/>(JSONL, reports, state)"]
  G --> H{Publish?}
  H -->|Yes| I[Google Sheets via gspread]
  H -->|No| J[Quarantine + logs]
  B --> K["Structured logging<br/>(spans + metrics)"]

Quick Start

Ensure Python 3.10+ is available (use pyenv local 3.10.13 or your preferred manager).

Create an isolated environment and install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt

Run the pozos pipeline locally:

python -m polla_app run \
  --sources pozos \
  --normalized artifacts/normalized.jsonl \
  --comparison-report artifacts/comparison_report.json \
  --summary artifacts/run_summary.json

Optional: dry-run publishing to Google Sheets once credentials are configured:

python -m polla_app publish \
  --normalized artifacts/normalized.jsonl \
  --comparison-report artifacts/comparison_report.json \
  --summary artifacts/run_summary.json \
  --worksheet "Normalized" \
  --discrepancy-tab "Discrepancies" \
  --dry-run

Configuration

Name	Type	Default	Required	Description
`GOOGLE_SPREADSHEET_ID`	string	—	For `publish`	Target worksheet key for Google Sheets publishing.
`GOOGLE_SERVICE_ACCOUNT_JSON`	JSON string	—	Conditional	Inline service account credentials (alternative to file).
`GOOGLE_CREDENTIALS` / `CREDENTIALS`	JSON string	—	Conditional	Legacy env vars recognised for service account auth.
`service_account.json`	file	—	Conditional	Disk-based credentials if env vars are not supplied.
`ALT_SOURCE_URLS`	JSON string	`{}`	No	Override source URLs for mirrors or testing.
`POLLA_USER_AGENT`	string	Library default	No	Custom HTTP user agent for polite scraping.
`POLLA_RATE_LIMIT_RPS`	float	unset	No	Per-host requests-per-second throttle.

Quality & Tests

pytest -q – executes unit/integration suites with offline fixtures; expect N passed in <10s.
ruff check polla_app tests – enforces linting, naming, and import hygiene.
mypy polla_app – verifies strict typing (3rd-party stubs ignored where unavailable).
black --check polla_app tests – maintains consistent formatting.
pytest --doctest-glob='*.md' README.md docs -q – ensures documentation examples stay executable.

CI mirrors these commands through .github/workflows/tests.yml and .github/workflows/docs.yml so local runs match automation. Add pytest --cov=polla_app when you need a coverage report.¹

Performance & Reliability

Scheduled health.yml workflow exercises offline health checks daily to catch data source drift before operators do.
scripts/benchmark_pozos_parsing.py offers a quick regression guard for parsing speed—keep median scrape under 150ms on commodity hardware.
Structured metrics emitted via polla_app.obs.metric simplify alerting and feed SLO reviews (docs/SLOs.md).

Roadmap

Expand publish command to surface mismatch deltas via Slack/webhooks for quicker operator response.
Wire Codecov and fail PRs below agreed coverage thresholds.¹
Add smoke-test fixtures for newly emerging aggregator mirrors.

Why It Matters

Demonstrates operational empathy: dry-run defaults, quarantine support, and explicit provenance reduce on-call stress.
Highlights disciplined scraping practices respectful of third-party infrastructure and legal boundaries.
Shows ability to automate reliability checks end-to-end (health workflow, observability hooks, structured metrics).
Illustrates developer-experience focus through reproducible CLI, Make targets, and strict typing/linting gates.
Proves comfort with secure credential handling when integrating with Google Workspace APIs.

Contributing & License

Contributions are welcome—see CONTRIBUTING.md for style, testing, and review expectations.

This project is distributed under the MIT License.

TODO: Enable Codecov (or GitHub Actions coverage summary) to visualise and gate coverage in CI. ↩ ↩²

Name		Name	Last commit message	Last commit date
Latest commit History 345 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
polla_app		polla_app
scripts		scripts
tests		tests
venv		venv
.gitignore		.gitignore
.ruff.toml		.ruff.toml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
README.md		README.md
fix_all.py		fix_all.py
fix_test_pipeline.py		fix_test_pipeline.py
fix_tests.py		fix_tests.py
license.md		license.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
test_regex.py		test_regex.py
test_regex2.py		test_regex2.py
test_regex3.py		test_regex3.py
test_regex4.py		test_regex4.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Polla App — Reliable jackpot ingestion for Chilean Loto

Features

Tech Stack

Architecture at a Glance

Quick Start

Configuration

Quality & Tests

Performance & Reliability

Roadmap

Why It Matters

Contributing & License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Polla App — Reliable jackpot ingestion for Chilean Loto

Features

Tech Stack

Architecture at a Glance

Quick Start

Configuration

Quality & Tests

Performance & Reliability

Roadmap

Why It Matters

Contributing & License

Footnotes

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages