Skip to content

Conversation

@pranshu-raj-211
Copy link
Collaborator

@pranshu-raj-211 pranshu-raj-211 commented Jul 20, 2025

This PR aims to improve the repo by:

  1. Standardization to improve adoption - use the standard PEP 621 format for pyproject.toml for uv support.
  2. Asynchronous groq LLM calls.
  3. Asynchronous file utils (for the most part).
  4. Performance benchmarking

Why was this needed?

  1. Previously using requirements.txt and poetry based pyproject (uv is much better than poetry, will see higher adoption).
  2. Every file processing took too much time. The major time consumption was done in network calls, file io and parsing pdfs, which has been worked on here.
  3. Performance benchmarking done to verify if changes really do improve performance.

project now uv and pip compatible,
run uv sync to get started with uv,
uv sync --extra dev to install dev dependencies as well
benefits from async http requests,
no need to handle requests vulnerability now
made network calls async - httpx,
made file operations async - aiofiles,
run cpu intensive operations in separate thread,
added aiofiles dependency,
next steps - improve perf script, use config files for testing config, improve structure of performance checks, include failure and success rates

Signed-off-by: pranshu-raj-211 <pranshuraj65536@gmail.com>
@pranshu-raj-211 pranshu-raj-211 added the enhancement New feature or request label Jul 20, 2025
@pranshu-raj-211 pranshu-raj-211 requested a review from JS12540 July 23, 2025 20:03
@pranshu-raj-211
Copy link
Collaborator Author

pranshu-raj-211 commented Jul 23, 2025

Needs automated testing verify outputs same across versions - plan to make a lot more code asynchronous.

Suggested pytest approach - unit test for file io and crucial functions (data modifying), integration and e2e tests to ensure pipeline doesn't break.

Also need a rework of perf benchmarking, improve it, maybe add multiple trials of parsing the same inputs - take an average.

@pranshu-raj-211
Copy link
Collaborator Author

This PR also fixes the problem with the vulnerability in the requests package version that we're using, and improves existing methods by switching over to async http calls (httpx async client, groq async client).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants