Skip to content

Performance improvements and content poster embed migration#193

Open
user1303836 wants to merge 5 commits intomainfrom
feature/improvements
Open

Performance improvements and content poster embed migration#193
user1303836 wants to merge 5 commits intomainfrom
feature/improvements

Conversation

@user1303836
Copy link
Owner

Summary

  • Concurrent source fetching: Replace sequential source polling with asyncio.gather + Semaphore(5) for parallel fetching, significantly reducing total poll cycle time
  • Batch database queries: Eliminate N+1 queries in _fetch_source (content existence checks), summarize_pending (source lookups), and _handle_first_posting_backfill (source lookups) by using batch operations
  • Discord embed formatting: Switch content poster from plain text messages to structured Discord embeds with color-coded source types, timestamps, and clean bullet-point summaries
  • Substack deduplication: Simplify SubstackAdapter by extending RSSAdapter directly, removing ~100 lines of duplicated RSS parsing logic
  • Concurrent RSS discovery: Probe RSS feed paths concurrently instead of sequentially for faster feed detection

Changes

Pipeline performance (pipeline.py, repository.py)

  • Add content_items_exist() batch method to Repository (single IN query vs N individual queries)
  • Replace per-item content_item_exists() calls with batch lookup in _fetch_source
  • Replace per-item get_source_by_id() calls with get_sources_by_ids() in summarize_pending and _handle_first_posting_backfill
  • Add _is_due_for_poll() helper and _fetch_source_safe() error wrapper
  • Fetch due sources concurrently with asyncio.gather and Semaphore(5)

Content poster (content_poster.py, content_posting.py)

  • Replace format_message() with format_embed() returning discord.Embed
  • Add _truncate_summary_at_bullet() to cleanly truncate summaries at bullet boundaries
  • Update post_content() to send embeds instead of plain text
  • Color-coded embeds by source type

Adapters (substack.py, rss_discovery.py)

  • SubstackAdapter now extends RSSAdapter, removing duplicated parsing
  • RSS discovery probes paths concurrently with asyncio.gather

Test plan

  • All 606 tests pass
  • ruff check . passes
  • ruff format --check . passes
  • mypy src/ passes
  • 2 new tests for content_items_exist batch method
  • Updated pipeline tests for batch mocks and concurrent fetching
  • Updated content poster tests for embed format
  • Updated RSS discovery tests for concurrent probing

@greptile

Pipeline performance:
- Concurrent source fetching with asyncio.gather and Semaphore(5)
- Batch content_items_exist query replaces per-item existence checks
- Batch get_sources_by_ids replaces per-item source lookups in summarize_pending and backfill

Content poster:
- Switch from plain text messages to Discord embeds with structured formatting
- Add summary truncation that preserves bullet point boundaries

Substack adapter:
- Deduplicate by extending RSSAdapter directly

RSS discovery:
- Concurrent path probing for faster RSS feed detection
greptile-apps[bot]

This comment was marked as resolved.

Add Discord autocomplete handlers for /source remove, /source info,
/source toggle, /github remove, and /github toggle commands. Filters
suggestions by guild and matches on partial name input.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant