Skip to content

feat: add gpt-4o OCR for documents and RAG Q&A#137

Open
jeonghoonkang wants to merge 2 commits intomasterfrom
glhab1-codex/modify-ocr-code-to-use-openai
Open

feat: add gpt-4o OCR for documents and RAG Q&A#137
jeonghoonkang wants to merge 2 commits intomasterfrom
glhab1-codex/modify-ocr-code-to-use-openai

Conversation

@jeonghoonkang
Copy link
Copy Markdown
Owner

Summary

  • use gpt-4o to transcribe uploaded receipt images or PDF documents
  • embed extracted text and answer questions through a simple RAG pipeline
  • Base64-encode receipt images before sending them to the OCR model
  • show upload progress in the Streamlit interface
  • show images one at a time with arrow navigation while OCR text remains hidden
  • provide a chat-style Q&A box for asking about the recognized text
  • cache processed receipts and display the time taken for each Q&A answer
  • allow jumping to a specific receipt by filename and store OCR results in a merged JSON file

Testing

  • python -m py_compile apps/receipt_ocr/receipt_ocr_app.py && echo "py_compile success"

https://chatgpt.com/codex/tasks/task_e_688a94ac3f808331bd8ab4da9e07125f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant