Rate limiting

I've moved from running `mall` locally with `ollama` to use models provided by GitHub Copilot, but I've found that any request I make gets rate limited almost instantly, making it unusable. 

Is there something I need to consider or configure? I'm only trying to summarize 24 rows and it gets rate limit of 8 seconds almost right away, and then goes for rate limits of 50 seconds before returning nothing.

```
options(.mall_chat =  ellmer::chat_github(model = "mistral-ai/mistral-medium-2505"))

resumen <- datos |> 
  distinct(variable, tema) |> 
  mall::llm_summarize(tema, max_words = 7, pred_name = "tema_resumen")
```

It also seems the problem is due to requests being done in parallel. Is there any way to change this?

```
Error in `mutate()`:
ℹ In argument: `tema_resumen = llm_vec_summarize(x = tema, max_words = max_words,
  additional_prompt = additional_prompt)`.
Caused by error in `req_perform_parallel()`:
! HTTP 429 Too Many Requests.
ℹ Rate limit of 24 per 60s exceeded for UserByMinute. Please wait 0 seconds before retrying.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rate limiting #64

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rate limiting #64

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions