Skip to content

Conversation

@pingSubhajit
Copy link
Contributor

Summary

This PR enhances the pdf:text-layer extractor with improved PDF.js configuration and reduces warning noise, while also making the install wizard fully responsive for mobile and tablet devices.

Changes

PDF Text Layer Extractor Improvements

  • Proper PDF.js asset configuration: Dynamically resolves and configures standardFontDataUrl and cMapUrl paths to prevent "standardFontDataUrl" warnings during extraction
  • Reduced console noise: Sets PDF.js verbosity level to ERRORS only, eliminating non-critical warnings in CLI output
  • Page-level concurrency: Extracts text from multiple pages in parallel (respects assetProcessing.concurrency setting, capped at 8)
  • Memory management: Properly calls doc.destroy() after extraction to free resources
  • Early termination: Stops parsing additional pages once maxOutputChars is reached
  • Improved error handling: Gracefully handles per-page extraction failures without aborting the entire document
  • Added TypeScript declarations: New pdfjs-dist-legacy.d.ts for the legacy PDF.js module exports

Install Wizard Responsive Design

  • Mobile-friendly header with adjusted padding and gap spacing
  • Mobile Reset button in header (visible on smaller screens)
  • Collapsible step navigation sidebar on mobile with horizontal scrollable step buttons
  • Live preview panel displayed inline on mobile/tablet (below xl breakpoint)
  • Responsive grid layouts for chunk settings (1/2/3 columns based on breakpoint)
  • Responsive review summary grid and action buttons
  • Full-width navigation buttons on mobile with proper visibility handling

Cleanup

  • Removed deep-merge.ts from registry file copy list (no longer needed as separate file)
  • Updated init test to reflect the removal

Testing

  • Verified PDF extraction produces cleaner console output
  • Confirmed install wizard displays correctly across mobile, tablet, and desktop viewports

@pingSubhajit pingSubhajit self-assigned this Jan 8, 2026
@vercel
Copy link

vercel bot commented Jan 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
unrag-web Ready Ready Preview, Comment Jan 8, 2026 6:47pm

@pingSubhajit pingSubhajit merged commit 6e4218f into release/v0.2.8 Jan 8, 2026
3 checks passed
@pingSubhajit pingSubhajit deleted the feat/logging-for-pdf-text-layer-extractor branch January 8, 2026 18:47
pingSubhajit added a commit that referenced this pull request Jan 9, 2026
* fix: deep merge file not present after installation (#13)
* feat: robust logging for pdf:text-layer extractor (#14)
* docs: new supported runtimes page in the docs
* feat: Add unrag doctor command for installation validation and troubleshooting (#15)
* chore: add spec for eval harness feature
* feat: Add reranker battery with Cohere and custom reranker support (#16)
* chore: bump patch version & updated new badge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants