Skip to content

feat: add interlinear PDF export pipeline#134

Draft
delgado-jacob wants to merge 1 commit intoglobalbibletools:mainfrom
delgado-jacob:feature_pdf_export
Draft

feat: add interlinear PDF export pipeline#134
delgado-jacob wants to merge 1 commit intoglobalbibletools:mainfrom
delgado-jacob:feature_pdf_export

Conversation

@delgado-jacob
Copy link
Copy Markdown
Contributor

closes: #109

Feature

For Translators:

  • Export interlinear PDFs directly from the Language Settings page
  • Choose specific books and chapter ranges, or export everything
  • Select between "Standard" (word-by-word) and "Parallel" (two-column) layouts
  • Download links with automatic expiration tracking

For Snapshots:

  • Automatic PDF generation when language snapshots are created
  • PDFs are archived alongside snapshot data for historical record

Architecture

New export Module (src/modules/export/)

  • InterlinearPdfGenerator - Core PDF rendering using PDFKit with SBL Hebrew/Greek fonts
  • ExportRequestRepository - Tracks export requests, status, and download URLs
  • ExportStorageRepository - S3/LocalStack integration for PDF storage
  • Server actions for requesting exports and polling status
  • React components for the export UI with real-time status updates

Background Jobs

  • export_interlinear_pdf - Generates PDFs asynchronously, merges multi-book exports
  • cleanup_exports - Purges expired exports from storage
  • create_snapshot_interlinear_pdf - Generates archival PDFs during snapshot creation

Database Changes

  • New export_request table tracking request lifecycle (pending → in progress → complete/failed)
  • New export_request_book junction table for multi-book exports
  • Three new job types registered

Key Design Decisions

  1. Async Generation: PDFs are generated in background jobs rather than blocking the UI, with polling for status updates
  2. Per-Book Chunking with Merge: Multi-book exports generate individual PDFs per book, then merge them using pdf-lib—this prevents memory issues with large exports
  3. Script Detection: Automatically detects Hebrew vs Greek source text to apply correct fonts and RTL/LTR layout
  4. Expiring Downloads: Export URLs expire (configurable, default 7 days) with automatic cleanup job
  5. Snapshot Integration: Snapshot creation now triggers interlinear PDF generation, archived at snapshots/{languageId}/{snapshotId}/interlinear.pdf

Testing

  • Unit tests for PDF generator, jobs, actions, and React components
  • Integration tests for repository layer
  • LocalStack integration tests for S3 operations
  • Test coverage for error paths and edge cases

Copy link
Copy Markdown
Member

@arrocke arrocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow thanks for the contribution! There is a lot of good stuff in here, but a PR this size is very difficult to review. I'm going to treat this like a prototype to work through product/design decisions together. Then let's chop up the PR up into smaller PRs for easier technical review. There will be a lot to work through and this will make it less overwhelming to review and faster to get changes shipped for testing.

In the future, I encourage you to seek clarity on vague tickets before starting work and open PRs with smaller changes. We have a feature flag system to be able to do this with incomplete features.

We can use this PR to discuss the bigger picture of this feature, so I will leave some high level feedback here to start that discussion. From there, here is a way we can break this work up into smaller PRs for easier review:

  1. Introduce localstack: I'm thrilled you are introducing this. There are a number of other places where we could take advantage of this. This PR could simply introduce localstack and its initialization scripts.
  2. Basic export job infrastructure: Set up the UI to trigger the new job and report it's status. At this stage the job is a noop. This new UI can be behind a feature flag.
  3. Upload logic: Add the logic to upload an empty PDF in the job.
  4. PDF generation: Add the MVP PDF generation logic to the job. This will be the meat of the work, but we'll be able to focus on just that logic now that all of the infrastructure is taken care of.

Please don't be discouraged by this feedback. My goal is to clarify how we can best work together. I'm grateful for the work you've put in thus far to develop this feature. Please let me know if you have any questions.

Comment on lines +50 to +54
await enqueueJob(SNAPSHOT_JOB_TYPES.CREATE_SNAPSHOT_INTERLINEAR_PDF, {
languageId: snapshot.languageId,
languageCode: language.code,
snapshotId: snapshot.id,
});
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot system is meant for backup/restore purposes, not to export for general consumption. We can remove PDF generation from the snapshot module

@@ -0,0 +1,36 @@
create type export_request_status as enum ('PENDING', 'IN_PROGRESS', 'COMPLETE', 'FAILED');

create table export_request (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to track all of this in the payload and data columns of the job table. Both of those columns can hold arbitrary JSON. Jobs are ephemeral so I find referential integrity of job data to be less important and having a single mechanism for tracking job progress is useful. Do you have any concerns with this approach?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, no concerns.

import { defineConfig } from "vitest/config";
import tsconfigPaths from "vite-tsconfig-paths";

export default defineConfig({
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to require localstack to be running for integration tests. I think we can get away with stubbing the s3 api and asserting it is called with the right data

@@ -0,0 +1,57 @@
services:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to break this out into it's own file? Our compose.yaml is for locale development only since everything is deployed separately in production environments

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just precaution to avoid messing with any other workflows. I can consolidate it.

Comment on lines +223 to +226
<InterlinearExportPanel
languageCode={languageSettings.code}
books={books}
/>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to move this to its own page in the admin language view. The export module would own that view. The settings page is reserved for configuration on the language

Comment on lines +10 to +17
export interface PdfGeneratorOptions {
pageSize?: "letter" | "a4";
layout: "standard" | "parallel";
direction?: "ltr" | "rtl";
header?: { title?: string; subtitle?: string };
footer?: { generatedAt?: Date; pageOffset?: number; pageTotal?: number };
sourceScript?: "hebrew" | "greek";
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this first version, I'd like to simplify this to a single button for the user. That button converts to a loading UI while the job is in progress and then reports when the job is done.

For the PDF, let's use your standard layout and export all books by default even if there are no glosses for that book yet. This is a true MVP of this feature and we can expand options later.

@delgado-jacob
Copy link
Copy Markdown
Contributor Author

There is a lot of good stuff in here, but a PR this size is very difficult to review. I'm going to treat this like a prototype to work through product/design decisions together. Then let's chop up the PR up into smaller PRs for easier technical review.

Makes sense, I'll work on splitting things up. Are you ok with stacked branches or would you prefer a single branch at a time in PR?

Please don't be discouraged by this feedback.

Not at all. My main objective was to start getting familiar with the platform and prototype something for discussion/iteration. I realize it's a large change set and have no issues breaking it up.

@arrocke
Copy link
Copy Markdown
Member

arrocke commented Dec 23, 2025

Stacked branches are completely fine. Since we squash a PR into a single commit, the branch history can be a little messy and it will all consolidate to a single commit in the main branch. Open stacked PRs whenever you'd like, just leave them in draft state until you have something you want to merge. PRs are the best place to ask questions about the changes you are making so opening them early and mentioning me with specific questions will speed up the review time later rather than addressing all of that at the end. I'll try to be prompt in responding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

pdf interlinear export

2 participants