Optimize API rate limiting and reduce request latency

## Summary
Reduce the number of API calls and improve response times for resume/JD processing.

## Current Issues
- Rate limiting causes long wait times (up to 60s+ per retry)
- Multiple retries on rate limit errors consume API quota
- Single operations can take several minutes when rate limited

## Optimization Tasks

### Reduce API Calls
- [ ] Cache parsed JD results to avoid re-parsing same URLs
- [ ] Cache resume extraction results by file hash
- [ ] Batch multiple small operations into single API calls where possible
- [ ] Consider if extraction can be split into smaller, cacheable chunks

### Smarter Rate Limit Handling
- [ ] Implement request queuing instead of immediate retries
- [ ] Add configurable delay between requests (not just on error)
- [ ] Track quota usage locally to predict rate limits before hitting them
- [ ] Implement circuit breaker pattern for sustained rate limiting

### Reduce Latency
- [ ] Optimize prompts to reduce token count
- [ ] Use streaming responses where applicable
- [ ] Add progress indicators for long operations
- [ ] Consider parallel processing where independent

### Fallback Strategies
- [ ] Graceful degradation when rate limited (use cached/partial results)
- [ ] Option to skip LLM calls and use rule-based extraction as fallback
- [ ] Queue failed requests for background retry

### Monitoring
- [ ] Track API call counts and latencies
- [ ] Alert when approaching rate limits
- [ ] Log token usage for cost tracking

## Priority
High - directly impacts user experience

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize API rate limiting and reduce request latency #16

Summary

Current Issues

Optimization Tasks

Reduce API Calls

Smarter Rate Limit Handling

Reduce Latency

Fallback Strategies

Monitoring

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Optimize API rate limiting and reduce request latency #16

Description

Summary

Current Issues

Optimization Tasks

Reduce API Calls

Smarter Rate Limit Handling

Reduce Latency

Fallback Strategies

Monitoring

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions