Small agentic app to extract information from documents (pdfs only so far), save a few fields to a csv file and then send the file and the detail to an email
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt- GPT OSS with Ollama
python init/stores.pyNote
- The folder
documentsmust exist and it should have pdf files inside
python init/documents.pyNote
- The folder
csvmust exist. The files generated by the agents will be saved there. - It requires docker running since it's using Redis for caching
python main.py- Added a couple of tests based on [2]
Reference:
- https://docs.langchain.com/oss/python/langchain/test
- https://docs.langchain.com/langsmith/test-react-agent-pytest
pytest --langsmith-output tests- Added Redis Cache
- Probably investigate more about (e.g., KV Cache, or is it redis enough?)
Check the following error when initializating DeepSeek PDF
sqlalchemy.exc.DataError: (psycopg.DataError) PostgreSQL text fields cannot contain NUL (0x00) bytes
[SQL: INSERT INTO "public"."documents"("langchain_id", "content", "embedding", "langchain_metadata")VALUES (%(langchain_id)s, %(content)s, %(embedding)s, %(extra)s) ON CONFLICT ("langchain_id") DO UPDATE SET "content" = EXCLUDED."content", "embedding" = EXCLUDED."embedding", "langchain_metadata" = EXCLUDED."langchain_metadata";]
[parameters: {'langchain_id': '50e5e4f4-d8d0-4f47-bf4d-9d73200fa5f9', 'content': 'mechanismretrieves only the key-value entries {c𝑠}corresponding to the top-k index scores.\nThen, the attention outputu 𝑡 is ...