Skip to content

Optimize API rate limiting and reduce request latency #16

@dsmithnautel

Description

@dsmithnautel

Summary

Reduce the number of API calls and improve response times for resume/JD processing.

Current Issues

  • Rate limiting causes long wait times (up to 60s+ per retry)
  • Multiple retries on rate limit errors consume API quota
  • Single operations can take several minutes when rate limited

Optimization Tasks

Reduce API Calls

  • Cache parsed JD results to avoid re-parsing same URLs
  • Cache resume extraction results by file hash
  • Batch multiple small operations into single API calls where possible
  • Consider if extraction can be split into smaller, cacheable chunks

Smarter Rate Limit Handling

  • Implement request queuing instead of immediate retries
  • Add configurable delay between requests (not just on error)
  • Track quota usage locally to predict rate limits before hitting them
  • Implement circuit breaker pattern for sustained rate limiting

Reduce Latency

  • Optimize prompts to reduce token count
  • Use streaming responses where applicable
  • Add progress indicators for long operations
  • Consider parallel processing where independent

Fallback Strategies

  • Graceful degradation when rate limited (use cached/partial results)
  • Option to skip LLM calls and use rule-based extraction as fallback
  • Queue failed requests for background retry

Monitoring

  • Track API call counts and latencies
  • Alert when approaching rate limits
  • Log token usage for cost tracking

Priority

High - directly impacts user experience

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions