Skip to content
This repository was archived by the owner on Sep 12, 2024. It is now read-only.
This repository was archived by the owner on Sep 12, 2024. It is now read-only.

Can't run example on llama-2-13b-chat q4_0 #116

@gioragutt

Description

@gioragutt

I apologize in advance if I omit any useful details, I'm just a simple dev with no knowledge or understanding in DS and therefore I'm in trial and error land.

I followed the instructions from llama.cpp on the llama-2-13b-chat model, and I now have the q4_0 file: llama-2-13b-chat/ggml-model-q4_0.gguf.

I use the example code from this repo and of course have changed it to point to the model file, but loading fails:

The code:

import { LLM } from 'llama-node';
import { LLamaCpp } from 'llama-node/dist/llm/llama-cpp.js';
import path from 'path';

const model = path.resolve(
	process.cwd(),
	'../llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf',
);

console.log(model);

const llama = new LLM(LLamaCpp);
/** @type {import('llama-node/dist/llm/llama-cpp').LoadConfig} */
const config = {
	modelPath: model,
	enableLogging: true,
	nCtx: 1024,
	seed: 0,
	f16Kv: false,
	logitsAll: false,
	vocabOnly: false,
	useMlock: false,
	embedding: false,
	useMmap: true,
	nGpuLayers: 128,
};

const template = `How are you?`;
const prompt = `A chat between a user and an assistant.
USER: ${template}
ASSISTANT:`;

const params = {
	nThreads: 4,
	nTokPredict: 2048,
	topK: 40,
	topP: 0.1,
	temp: 0.2,
	repeatPenalty: 1,
	prompt,
};

const run = async () => {
	await llama.load(config);

	await llama.createCompletion(params, response => {
		process.stdout.write(response.token);
	});
};

run();

The error:

Debugger listening on ws://127.0.0.1:59899/c72280cb-a098-4c15-859f-54025e513896
For help, see: https://nodejs.org/en/docs/inspector
Debugger attached.
/Users/gioraguttsait/Git/personal-repos/llm/llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf
llama.cpp: loading model from /Users/gioraguttsait/Git/personal-repos/llm/llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf
error loading model: unknown (magic, version) combination: 46554747, 00000001; is this really a GGML file?
llama_init_from_file: failed to load model
Waiting for the debugger to disconnect...
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[Error: Failed to initialize LLama context from file: /Users/gioraguttsait/Git/personal-repos/llm/llama.cpp/models/llama-2-13b-chat/ggml-model-q4_0.gguf] {
  code: 'GenericFailure'
}

Node.js v18.17.1

I can see that the error refers to some constants which it doesn't expect in the file (error loading model: unknown (magic, version) combination: 46554747, 00000001; is this really a GGML file?), and I see that it's a gguf file and not a ggml one.

From a quick google search, I got to this post on r/LocalLLaMA which stats that gguf is sort of a successor to ggml.

I have literally 0 understanding of what I'm doing, and would appreciate if someone could point me in some direction of how to deal with it. Even just pointing out keywords I might have missed which could have led me to find a better answer in the first place 😅

Thanks in advance for your time!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions