ttok adds a 20 second delay when using llm-gpt4all when offline

ttok adds a 20 second delay when using llm-gpt4all in offline compared to online tested on mistral-7b-instruct-v0.

This is easily seen when looking at the CPU usage after asking the model a question as shown in this video:

[llm-gpt4all offline and no CPU  usage before 20 seconds after question asked to mistral-7b-instruct-v0 .webm](https://github.com/simonw/llm-gpt4all/assets/24938923/ae6ba573-d9cf-4953-9e9c-c099ae619fb3)

In the video the CPU usage spikes when the model is asked a question using a custom plugin to Datasette that also uses ttok to log tokens used and hereafter no CPU usage until after the 20 second mark.

This delay is not visible when online.

Code used:
Dockerfile:
```Dockerfile
FROM python:3.11
WORKDIR /code
COPY ./requirements.txt /code/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
# Download mistral-7b-instruct-v0 
RUN llm -m mistral-7b-instruct-v0 "Fun fact about AI?" --no-log --no-stream  
# Set default model
RUN llm models default mistral-7b-instruct-v0
# Fix no internet bug using https://github.com/simonw/llm-gpt4all/pull/18
COPY llm_gpt4all.py /usr/local/lib/python3.11/site-packages/
```

requirements.txt:
```
datasette
llm 
llm-gpt4all
ttok
```

ttok and llm-gpt4all is at version 0.2



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ttok adds a 20 second delay when using llm-gpt4all when offline #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

ttok adds a 20 second delay when using llm-gpt4all when offline #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions