A full-stack internship project for uploading documents, processing them asynchronously, tracking live progress, reviewing extracted data, finalizing approved records, and exporting results.
This project is designed around a realistic document operations workflow:
- users upload one or more documents
- the backend stores file metadata and creates processing jobs
- background workers process files asynchronously
- progress updates are streamed back to the UI
- reviewers inspect and edit extracted fields
- finalized records can be exported as JSON or CSV
The main goal is to demonstrate clean system design with Next.js, FastAPI, PostgreSQL, Redis, and Celery.
The recommended scenario for this project is invoice processing for an operations or accounts team.
Example extracted fields:
- vendor name
- invoice number
- invoice date
- total amount
- tax amount
- currency
- payment status
- multi-document upload
- async background processing
- job status tracking
- realtime or near-realtime progress updates
- searchable document dashboard
- detail page for extracted output
- editable review workflow
- retry for failed jobs
- finalize approved records
- export as JSON and CSV
The frontend should include these screens:
Route: / or /documents
Purpose:
- show all uploaded documents
- search by filename or invoice number
- filter by status
- sort by upload date or status
- display progress and job state
Route: /upload
Purpose:
- upload one or more documents
- show selected file list
- submit files for background processing
Route: /documents/[id]
Purpose:
- show file metadata
- show processing state
- show extracted data
- show progress timeline
- show failure state if processing fails
Route: /documents/[id]/review
Purpose:
- edit extracted invoice fields
- save reviewed output
- prepare the document for finalization
Route: /documents/[id]/export
Purpose:
- export finalized data as JSON
- export finalized data as CSV
POST /api/documents/uploadGET /api/documentsGET /api/documents/{id}
GET /api/jobs/{id}POST /api/jobs/{id}/retryGET /api/jobs/{id}/eventsGET /api/jobs/{id}/stream
PUT /api/documents/{id}/reviewPOST /api/documents/{id}/finalize
GET /api/documents/{id}/export/jsonGET /api/documents/{id}/export/csv
GET /health
Queued -> Processing -> Completed -> Finalized
\-> Failed -> Retry -> Queued
job_queuedjob_starteddocument_parsing_starteddocument_parsing_completedfield_extraction_startedfield_extraction_completedjob_completedjob_failed
Next.jsTypeScriptTailwind CSSTanStack QueryReact Hook Form
FastAPIPydanticSQLAlchemyorSQLModelAlembic
CeleryRedisRedis Pub/SubServer-Sent EventsorWebSocket
PostgreSQL
This repository currently contains the Next.js frontend scaffold. The structure below shows both the current frontend and the recommended backend layout.
async-document-processor/
app/ # Next.js App Router pages
public/ # static assets
backend/ # FastAPI backend (recommended)
package.json # frontend dependencies and scripts
tsconfig.json # TypeScript config
next.config.ts # Next.js config
README.md # project documentation
app/
layout.tsx # root layout
page.tsx # dashboard landing page
globals.css # global styles
upload/
page.tsx # upload screen
documents/
page.tsx # document list dashboard
[id]/
page.tsx # document detail page
review/
page.tsx # review and edit page
export/
page.tsx # export page
backend/
app/
main.py # FastAPI entry point
api/
documents.py # upload, list, detail routes
jobs.py # job status, retry, events, stream
review.py # reviewed output and finalize routes
export.py # JSON and CSV export routes
health.py # health check route
models/
document.py # database model for documents
job.py # database model for processing jobs
result.py # database model for extracted results
schemas/
document.py # request and response schemas
job.py # job schemas
review.py # review and finalize schemas
export.py # export response schemas if needed
services/
document_service.py # upload and document orchestration
job_service.py # job creation and tracking
review_service.py # save reviewed output
export_service.py # JSON and CSV generation
storage_service.py # local file storage handling
pubsub_service.py # Redis Pub/Sub publishing
workers/
celery_app.py # Celery config
tasks.py # background processing tasks
Frontend pages and backend APIs should match like this:
/upload -> POST /api/documents/upload
/documents -> GET /api/documents
/documents/[id] -> GET /api/documents/{id}
/documents/[id]/review -> PUT /api/documents/{id}/review
/documents/[id]/export -> GET /api/documents/{id}/export/json
GET /api/documents/{id}/export/csv
job progress widgets -> GET /api/jobs/{id}
GET /api/jobs/{id}/events
GET /api/jobs/{id}/stream
retry actions -> POST /api/jobs/{id}/retry
- keep
FastAPIas the main backend API instead of duplicating routes inside Next.js - keep business logic inside
services/, not directly in route files - keep background task logic inside
workers/ - keep schemas separate from database models
- use the
app/folder in Next.js only for UI pages and frontend logic
Install dependencies and run the Next.js app:
npm install
npm run devFrontend runs at:
http://localhost:3000
If your FastAPI app is in a backend/ folder, a typical local setup is:
pip install fastapi uvicorn celery redis sqlalchemy alembic python-multipart psycopg2-binary
uvicorn app.main:app --reloadBackend usually runs at:
http://localhost:8000
Build the project in this order:
- Create the dashboard and upload page.
- Create document detail and review pages.
- Implement document upload and listing APIs.
- Implement async job processing with Celery.
- Add Redis Pub/Sub progress updates.
- Implement review, finalize, and export flows.
- Next.js frontend is initialized
- default starter page should be replaced with the dashboard
- FastAPI backend should provide the main API surface
- Next.js API routes are optional and not required if FastAPI is used directly
- document processing must happen in background workers, not in request handlers
- exports should only be available for finalized documents
- failed jobs should be retryable
- progress updates should be visible from the dashboard and detail page