Skip to content

Improve slide content extraction: better prompt, higher DPI, text fallback, retry#18

Open
BrandonS7 wants to merge 1 commit intoBirmingham-AI:mainfrom
BrandonS7:improve/slide-extraction
Open

Improve slide content extraction: better prompt, higher DPI, text fallback, retry#18
BrandonS7 wants to merge 1 commit intoBirmingham-AI:mainfrom
BrandonS7:improve/slide-extraction

Conversation

@BrandonS7
Copy link
Copy Markdown

What this does

Addresses slide content capture gaps where information was being lost during PDF processing.

1. Higher resolution rendering (150 → 250 DPI)

Dense slides with small text, chart labels, and fine details were hard for the vision model to read at 150 DPI. Bumped to 250 for significantly better legibility.

2. Enhanced vision prompt

The old prompt asked for just title + key_points. The new prompt explicitly asks for:

  • Text inside diagrams, flowcharts, and architecture diagrams
  • Data from tables/charts with actual numbers and labels
  • Code snippets and technical terms
  • URLs, references, and citations
  • A new raw_text field for anything not captured in key_points

Completeness now prioritized over brevity.

3. Text extraction fallback

Previously, if vision analysis failed or returned minimal content (<20 chars), the slide was silently skipped. Now falls back to direct PDF text extraction via pypdf. If vision got partial content, combines both. This means slides are never silently lost.

4. Retry on vision API failures

Vision API calls now retry once with a 1-second backoff before giving up. Transient API errors no longer cause permanent slide loss.

5. More robust JSON parsing

Replaced brittle markdown fence splitting with regex extraction (re.search) that handles any wrapper format the model returns.


Net effect: more information captured per slide, fewer silent skips, more resilient to API hiccups. No changes to upload routes, embedding storage format, or CLI interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant