A production-style full-stack system for uploading documents, processing them asynchronously, tracking progress in real time, reviewing extracted output, and exporting finalized results.
This project is designed around a realistic document operations workflow. Users upload one or more files, the backend creates background jobs, Celery workers process the files asynchronously, Redis Pub/Sub streams progress updates, and the frontend allows users to review, edit, finalize, and export the result.
The focus is not advanced OCR or AI quality. The focus is strong system design, clean async execution, reliable progress tracking, and a practical human-in-the-loop review flow.
Manual document handling is slow, repetitive, and hard to track. In real teams, documents such as invoices, resumes, contracts, or onboarding forms often need:
- upload and storage
- background processing
- status visibility
- human review
- retry on failure
- export after approval
This system solves that with a clean async architecture instead of doing heavy work inside API requests.
An accounts team receives dozens of vendor invoices every day. A reviewer uploads invoice PDFs into the system. Each invoice is processed in the background to extract fields such as:
- vendor name
- invoice number
- invoice date
- total amount
- tax amount
- currency
- status
Because extraction may not always be perfect, the reviewer checks the output, edits incorrect values, finalizes the document, and exports the approved result as JSON or CSV.
This scenario fits the assignment especially well because it demonstrates:
- asynchronous job execution
- progress tracking
- structured extraction
- manual correction
- final approval
- export workflow
- Frontend must use
ReactorNext.jswithTypeScript - Backend must use
PythonwithFastAPI - Database must be
PostgreSQL - Background processing must use
Celery - Messaging and progress state must use
Redis Redis Pub/Subis mandatory for progress events- processing must not run inside the request-response cycle
Next.jswithTypeScriptTailwind CSSTanStack QueryReact Hook FormWebSocketorServer-Sent Eventsfor live updates
FastAPIPydanticSQLAlchemyorSQLModelAlembic
CeleryRedisas brokerRedis Pub/Subfor progress events
PostgreSQL- local file storage for demo
Docker Composefor local orchestration
- upload one or more documents
- create async processing jobs
- show job states:
Queued,Processing,Completed,Failed - publish live or near-real-time progress
- search, sort, and filter documents
- review extracted output
- edit and save reviewed output
- finalize approved records
- retry failed jobs
- export finalized output as
JSONandCSV
flowchart LR
U["User"] --> F["Frontend (Next.js)"]
F --> A["FastAPI Backend"]
A --> P["PostgreSQL"]
A --> R["Redis"]
A --> S["File Storage"]
A --> C["Celery Queue"]
C --> W["Celery Worker"]
W --> R
W --> P
R --> L["Realtime Stream (WebSocket / SSE)"]
L --> F
flowchart TD
A["Upload document"] --> B["Store file and metadata"]
B --> C["Create job record: queued"]
C --> D["Enqueue Celery task"]
D --> E["Worker starts processing"]
E --> F["Publish job_started"]
F --> G["Parse document"]
G --> H["Publish parsing completed"]
H --> I["Extract structured fields"]
I --> J["Publish extraction completed"]
J --> K["Store final JSON result"]
K --> L{"Success?"}
L -->|Yes| M["Mark completed"]
L -->|No| N["Mark failed"]
M --> O["Push progress update to frontend"]
N --> O
O --> P["User reviews output"]
P --> Q["User edits and finalizes"]
Q --> R["Export JSON / CSV"]
stateDiagram-v2
[*] --> Queued
Queued --> Processing
Processing --> Completed
Processing --> Failed
Failed --> Queued: Retry
Completed --> Finalized
job_queuedjob_starteddocument_parsing_starteddocument_parsing_completedfield_extraction_startedfield_extraction_completedjob_completedjob_failed
POST /api/documents/uploadGET /api/documentsGET /api/documents/{id}
GET /api/jobs/{id}POST /api/jobs/{id}/retryGET /api/jobs/{id}/eventsGET /api/jobs/{id}/stream
PUT /api/documents/{id}/reviewPOST /api/documents/{id}/finalize
GET /api/documents/{id}/export/jsonGET /api/documents/{id}/export/csv
idoriginal_filenamestored_file_pathmime_typefile_sizeuploaded_atcurrent_status
iddocument_idstatusprogress_percentagecurrent_stageerror_messageretry_countcreated_atstarted_atcompleted_at
iddocument_idraw_textstructured_output_jsonreviewed_output_jsonis_finalizedfinalized_atupdated_at
- Upload screen for single or multiple documents
- Dashboard with search, filter, sort, and progress status
- Detail page for extracted output review
- Edit/finalize workflow page
- Export action for finalized records
- A reviewer uploads 10 invoices.
- The backend stores metadata and creates 10 queued jobs.
- Celery workers process the files in the background.
- Redis Pub/Sub emits progress updates.
- The frontend dashboard shows live status changes.
- One job fails and becomes retryable.
- Completed jobs open in a review screen.
- The reviewer corrects extracted fields.
- The reviewer finalizes the approved result.
- The system exports the final record as JSON or CSV.
- API routes for upload, list, detail, retry, finalize, and export
- service layer for orchestration and business logic
- worker layer for async processing
- repository/data layer for persistence
- typed schemas / DTOs
- clear separation between sync API handling and async execution
- invalid uploads should fail fast with clear validation errors
- worker exceptions should move the job to
Failed - failure details should be stored and visible
- failed jobs should support retry
- progress events should include failure states
- finalization should prevent accidental duplicate exports or inconsistent state
- Docker Compose setup
- automated tests
- authentication
- idempotent retry handling
- cancellation support
- storage abstraction for cloud object storage
- large-file edge-case handling
- local file storage is acceptable for demo purposes
- parsing logic can be simple or mocked
- progress visibility can be near-real-time
- export is only required for finalized records
- authentication is optional unless explicitly added
This design directly supports the expected evaluation areas:
- correctness of async workflow
- proper use of Celery
- correct Redis Pub/Sub integration
- backend API design
- frontend-backend integration
- database design
- progress tracking
- retry and error handling
- maintainable code structure
- engineering maturity
- GitHub repository
README.mdwith setup and architecture overview- sample files for testing
- sample exported outputs
- short demo video
- note on AI tool usage if used during development
For the strongest submission, prioritize:
- a clean async pipeline
- visible job progress
- reliable retry behavior
- a simple but polished UI
- readable architecture and documentation
The detailed requirement breakdown is available in PROJECT_REQUIREMENTS.md.