Cost to run & Dependence on OpenAI #99

jayfalls · 2023-10-12T11:41:09Z

jayfalls
Oct 12, 2023

I see that a similar discussion exists here #10

But considering that this suggests removing any uses of the OpenAI API with a local implementation using my reference model or any model that is better, rather than using different models for different layers or layer components as seen in the existing thread.
I feel it warrants a separate discussion.

When watching Dave's demo of the project, a big standout were his remarks of timing out the API when just running the demo briefly, and seeing the amount of inferences that will need to be generated.

I don't think this limitation is necessary, and depending on a third party is not ideal. The limitation should rather be the amount of compute available, and getting this to run on consumer hardware would be the best.

As such, I suggest using the dolphin-2.1-mistral-7b model.
Specifically a quantised version that can run with a maximum ram requirement of only 7.63 GB and a download size of only 5.13gb.
Using the llama-cpp-python bindings, which meets the project requirements of only being in python.

There are benefits to doing it this way:

No dependence on a third party for the LLM (THE MOST ESSENTIAL COMPONENT)
No cost besides the electricity bill, and obviously upfront hardware cost

Benefits to this model specifically

Higher benchmark performance than LLama 70B
Apache 2.0, meaning commercially viable
Completely uncensored, which gives it higher performance and higher compliance to the system and user prompts
8000k Context Window
Small model, which means higher performance and lower memory requirements
Quantised model, which means it can run with a maximum ram requirement of 7.63 GB
GGUF format, which has massive support for many different bindings, with CPU/GPU or CPU&GPU support

Benefits to using the llama-cpp-python bindings:

Python, meets thee project requirements
Can be run as a drop in replacement API for an existing OpenAI API implementation

This is just a suggestion, and this model will become outdated within the week.
But I think that this is truly the right way to go.

daveshap · 2023-10-12T11:45:23Z

daveshap
Oct 12, 2023
Maintainer

Thanks. We have a few team members who are specialized in smaller, open source models. This is on the roadmap. Good resources!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cost to run & Dependence on OpenAI #99

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Cost to run & Dependence on OpenAI #99

Uh oh!

Uh oh!

jayfalls Oct 12, 2023

There are benefits to doing it this way:

Benefits to this model specifically

Benefits to using the llama-cpp-python bindings:

Replies: 1 comment

Uh oh!

daveshap Oct 12, 2023 Maintainer

jayfalls
Oct 12, 2023

daveshap
Oct 12, 2023
Maintainer