Skip to content

Small agentic AI app to extract information from documents, save it to files and send them through email

License

Notifications You must be signed in to change notification settings

ericmartinezr/document_ai

Repository files navigation

Document AI

Small agentic app to extract information from documents (pdfs only so far), save a few fields to a csv file and then send the file and the detail to an email

Installation

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Model

  • GPT OSS with Ollama

Initialization

Initialize the database (PGVector)

python init/stores.py

Initialize the documents

Note

  • The folder documents must exist and it should have pdf files inside
python init/documents.py

Execution

Note

  • The folder csv must exist. The files generated by the agents will be saved there.
  • It requires docker running since it's using Redis for caching
python main.py

TODO

Testing

  • Added a couple of tests based on [2]

Reference:

  1. https://docs.langchain.com/oss/python/langchain/test
  2. https://docs.langchain.com/langsmith/test-react-agent-pytest

Run tests

pytest --langsmith-output tests

Caching

  • Added Redis Cache
  • Probably investigate more about (e.g., KV Cache, or is it redis enough?)

Errors

Check the following error when initializating DeepSeek PDF

sqlalchemy.exc.DataError: (psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes
[SQL: INSERT INTO "public"."documents"("langchain_id", "content", "embedding", "langchain_metadata")VALUES (%(langchain_id)s, %(content)s, %(embedding)s, %(extra)s) ON CONFLICT ("langchain_id") DO UPDATE SET "content" = EXCLUDED."content", "embedding" = EXCLUDED."embedding", "langchain_metadata" = EXCLUDED."langchain_metadata";]
[parameters: {'langchain_id': '50e5e4f4-d8d0-4f47-bf4d-9d73200fa5f9', 'content': 'mechanismretrieves only the key-value entries {c𝑠}corresponding to the top-k index scores.\nThen, the attention outputu 𝑡 is ...

About

Small agentic AI app to extract information from documents, save it to files and send them through email

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages