This repository was archived by the owner on Aug 13, 2024. It is now read-only.
Replies: 1 comment
-
|
Thanks. We have a few team members who are specialized in smaller, open source models. This is on the roadmap. Good resources! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I see that a similar discussion exists here #10
But considering that this suggests removing any uses of the OpenAI API with a local implementation using my reference model or any model that is better, rather than using different models for different layers or layer components as seen in the existing thread.
I feel it warrants a separate discussion.
When watching Dave's demo of the project, a big standout were his remarks of timing out the API when just running the demo briefly, and seeing the amount of inferences that will need to be generated.
I don't think this limitation is necessary, and depending on a third party is not ideal. The limitation should rather be the amount of compute available, and getting this to run on consumer hardware would be the best.
As such, I suggest using the dolphin-2.1-mistral-7b model.
Specifically a quantised version that can run with a maximum ram requirement of only 7.63 GB and a download size of only 5.13gb.
Using the llama-cpp-python bindings, which meets the project requirements of only being in python.
There are benefits to doing it this way:
Benefits to this model specifically
Benefits to using the llama-cpp-python bindings:
This is just a suggestion, and this model will become outdated within the week.
But I think that this is truly the right way to go.
Beta Was this translation helpful? Give feedback.
All reactions