chore: plan for deployment by LucasHJin · Pull Request #29 · Argus-Argus/argus

LucasHJin · 2026-03-10T02:37:49Z

Made a claude md + plan md file to be referenced for improvements for deployment

Copilot

Pull request overview

Adds planning/documentation artifacts to guide deployment and future technical improvements for the Argus codebase.

Changes:

Added a deployment-focused PRD with phased tickets and success criteria.
Added a “Claude context” file capturing architecture, file map, conventions, and environment variables.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
plan/PRD.md	Introduces phased deployment/automation/schema/testing roadmap and success metrics.
plan/CLAUDE.md	Adds architectural/context reference for contributors (stack, structure, conventions, env vars).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T02:43:07Z

plan/PRD.md

+| # | Ticket | Priority |
+|---|--------|----------|
+| 1 | Create backend Dockerfile (FastAPI + uvicorn, multi-stage build) | P0 |
+| 2 | Create frontend Dockerfile (Vite build -> nginx static serve) | P0 |
+| 3 | Create `docker-compose.yml` with services: backend, frontend, postgres+pgvector, redis | P0 |
+| 4 | Add `.dockerignore` files (exclude `.env`, `node_modules`, `.venv`, `__pycache__`) | P0 |


The Markdown tables are written with a double leading pipe (e.g., || # | Ticket | Priority |), which renders as an extra empty first column in GitHub Markdown. Use a single leading pipe (| # | Ticket | Priority |) for each table row to avoid the blank column and keep column alignment consistent.

Copilot · 2026-03-10T02:43:07Z

plan/CLAUDE.md

+| Component | Technology | Notes |
+|-----------|-----------|-------|
+| Frontend | React 19 + TypeScript 5 + Vite 6 | SPA, no SSR |
+| Globe | react-globe.gl (three.js wrapper) | Heavy bundle — lazy-load |
+| Styling | Tailwind CSS 3 + CSS custom properties | Design tokens in `index.css` |
+| Backend | FastAPI + Uvicorn (Python 3.11+) | Async throughout |


The Tech Stack and other tables also start rows with || ..., which creates an unintended empty first column in GitHub Markdown. Switch these rows to a single leading | so the tables render with the expected number of columns.

Copilot · 2026-03-10T02:43:07Z

plan/CLAUDE.md

+        │   ├── agent_service.py       # Graph-RAG pipeline orchestration
+        │   ├── agent_tools.py         # DB query tools (search, relate, detail, impact)
+        │   ├── gemini_client.py       # Gemini API: synthesis, confidence, realtime analysis
+        │   ├── scraping_service.py    # Polymarket + Kalshi + GDELT orchestrator
+        │   └── content_repository.py  # (duplicate — also in ingestion/)
+        ├── repositories/
+        │   └── content_repository.py  # Market signal row persistence
+        ├── embeddings/


services/content_repository.py is listed in the project tree, but there is no backend/app/services/content_repository.py in the repo. The existing implementations are backend/app/repositories/content_repository.py and backend/app/ingestion/content_repository.py, so the tree should be updated to match the actual paths (or the duplicate description moved to the correct locations).

Copilot · 2026-03-10T02:43:08Z

plan/CLAUDE.md

+### Backend
+- **Async everywhere** — use `async def` for all route handlers and service methods
+- **asyncpg** for DB access (not SQLAlchemy ORM)
+- **Pydantic v2** for request/response models
+- **Raw SQL** for queries (no ORM) — parameterize all user inputs with `$1, $2` syntax
+- **Environment variables** via `python-dotenv` and `os.getenv()`
+- **Scraper output** normalized via `row_format.make_content_row()` before DB insert


The Backend conventions say DB access is asyncpg-only and "Async everywhere", but several production routes are currently synchronous and use psycopg2 (e.g., backend/app/routers/content.py defines def get_content_points() / def get_content_arcs() and opens psycopg2.connect()). Please update this section to reflect the current mixed sync/async state, while still stating the desired direction for new work.

Copilot · 2026-03-10T02:43:08Z

plan/CLAUDE.md

+```
+DATABASE_URL=postgresql+asyncpg://user:pass@host:5432/dbname    # Required
+GEMINI_API_KEY=...          # Required for agent
+GEMINI_MODEL=gemini-2.5-flash  # Optional, default shown
+OPENAI_API_KEY=...          # Required for embeddings (until local model migration)
+ACLED_API_KEY=...           # Required for ACLED ingestion


The sample DATABASE_URL uses the SQLAlchemy-style postgresql+asyncpg://... scheme, but the codebase connects with psycopg2.connect(DATABASE_URL) in multiple places and with asyncpg.create_pool(dsn=settings.database_url) elsewhere; those drivers expect a libpq/asyncpg DSN like postgresql://... (without +asyncpg). Using postgresql+asyncpg:// will fail for psycopg2 and likely for asyncpg. Align this example with backend/.env.example (which uses postgresql://...).

Kaibo-Huang · 2026-03-10T03:24:55Z

plan/CLAUDE.md

+## Known Issues & Tech Debt
+
+1. **Duplicate `content_repository.py`** — exists in both `repositories/` and `ingestion/`. Must consolidate.
+2. **No shared DB pool** — `ingestion/db.py` has its own pool; other services use inline `asyncpg.connect()`. Need a single shared pool.


We should explicitly document that the max_size of the pool needs to be set so we won't throw a "too many connections" error if there's too many users. I believe the default max is 100

Makes sense 🫡

Kaibo-Huang · 2026-03-10T03:27:35Z

plan/PRD.md

+| 16 | Design `BaseScraper` ABC interface: `async fetch() -> list[NormalizedRow]` with built-in rate limiting, error handling, and dedup. Use `kalshi.py` rate limiter and `ingestion_service.py` error patterns as reference. | P0 |
+| 17 | Write new production scrapers implementing `BaseScraper` for each data source (determine which sources to keep/add based on product needs) | P0 |
+| 18 | Create unified scrape entrypoint (`run_daily_pipeline.py`) that runs all scrapers + embedding backfill in sequence | P0 |
+| 19 | Add cron scheduler (APScheduler in a separate container, or cron in docker-compose) | P0 |


We need a locking mechanism here. If run_daily_pipeline.py hangs, we don't want the next cron job to spawn a zombie process. We could use a Redis-based lock for this but i gotta do more research

Could probably use a locking in postgresql for simplicity, but i agree with the cron problem

Kaibo-Huang · 2026-03-10T03:28:59Z

plan/CLAUDE.md

+| Backend | FastAPI + Uvicorn (Python 3.11+) | Async throughout |
+| Database | PostgreSQL 15+ with pgvector + pgcrypto | Extensions required |
+| AI Model | Google Gemini 2.5-flash | Structured JSON output |
+| Embeddings | OpenAI text-embedding-3-small (1536 dims) | Migration target: local sentence-transformers |


why are we migrating to sentence-transformers? Is it better than openai?

Cheaper embeddings so we don't just waste money every day this is running. Unless you guys don't care and then we can stick with the other text-embedding.

Kaibo-Huang · 2026-03-10T03:36:24Z

plan/CLAUDE.md

+
+| Method | Path | Purpose | AI Cost |
+|--------|------|---------|---------|
+| GET | `/content/points` | All content with lat/lng (last 31 days) | None |


This is gonna create like a insanely huge json file and it will prolly lag computers on the lower end. Apparently react-globe.gl’ has built-in clustering. Why don't we make it so that you need to zoom in a certain amount to actually look at the details of that point and when your rlly zoomed out, the points in the same area merge into one. We need to experiment with this though bc im not sure if it will look good or not

Makes sense, what are your thoughts on doing server side viewport filtering (i.e. pass additional parameters -> GET /content/points?bbox=west,south,east,north&zoom=level) and then from here we only render what is shown (i.e. front half + zoom) and if it's zoomed out we cmobine points and show individual when zoomed in. But, I don't think there is clustering in react-globe.gl from what i can see.

ryanzhou147 · 2026-03-10T04:02:01Z

plan/CLAUDE.md

+        │   └── acled/
+        │       ├── acled_client.py    # ACLED API client
+        │       └── acled_normalizer.py
+        └── scrapers/


keep scrapers for inspo, marginally improve off them

ryanzhou147 · 2026-03-10T04:02:50Z

plan/CLAUDE.md

+- `id` UUID PK (gen_random_uuid)
+- `title`, `body`, `url` (UNIQUE)
+- `latitude`, `longitude` (nullable floats)
+- `image_url`, `s3_url` — DEPRECATED, to be dropped (Cloudinary/S3 no longer used)


we will be using s3 for media storage

Yeah from my ubnderstnading it was like we don't need both image_url and s3_url cause we can use just image_url for s3

ryanzhou147 · 2026-03-10T04:03:35Z

plan/CLAUDE.md

+| GET | `/content/arcs?threshold=0.7` | Similarity arcs via pgvector cosine | None |
+| GET | `/content/{id}` | Single content item detail | None |
+| POST | `/content/{id}/confidence-score` | Gemini credibility scoring (0.31-1.0) | 1 Gemini call |
+| POST | `/content/{id}/realtime-analysis` | Gemini + Google Search grounding | 1 Gemini call |


how much credit are we gonna burn...

LMFAO yeah lowk i think one of the challenges is just figuring out how to minimize credit burn

[WIP] Add deployment planning documentation

Co-authored-by: Tpypan <155002057+Tpypan@users.noreply.github.com>

chore: add deployment issue board with all pre-deploy tasks

need to use issue board for cursor hackathon

chore: plan for deployment

9e7dd55

LucasHJin requested review from Kaibo-Huang, Tpypan, Copilot and ryanzhou147 March 10, 2026 02:38

Copilot started reviewing on behalf of LucasHJin March 10, 2026 02:38 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

chore: update planning docs

62a30fc

Kaibo-Huang previously requested changes Mar 10, 2026

View reviewed changes

ryanzhou147 reviewed Mar 10, 2026

View reviewed changes

Tpypan approved these changes Mar 16, 2026

View reviewed changes

Initial plan

4884b09

Copilot AI mentioned this pull request Mar 16, 2026

[WIP] Add deployment planning documentation #30

Merged

Merge pull request #30 from Argus-Argus/copilot/sub-pr-29

c0731a1

[WIP] Add deployment planning documentation

Copilot AI mentioned this pull request Mar 16, 2026

Merge PR #29 planning docs and add a deployment issue board checklist #31

Closed

Copilot AI added a commit that referenced this pull request Mar 16, 2026

Merge pull request #29 from Argus-Argus/planning

b5f7b9f

Initial plan

5439e83

Copilot AI mentioned this pull request Mar 16, 2026

chore: add deployment issue board with all pre-deploy tasks #32

Merged

Copilot AI and others added 2 commits March 16, 2026 23:21

chore: add comprehensive deployment issue board with all tasks

571e49c

Co-authored-by: Tpypan <155002057+Tpypan@users.noreply.github.com>

Merge pull request #32 from Argus-Argus/copilot/sub-pr-29-again

7290d07

chore: add deployment issue board with all pre-deploy tasks

Argus-Argus deleted a comment from Copilot AI Mar 16, 2026

Conversation

LucasHJin commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants