Skip to content

Commit 320779a

Browse files
committed
Add libpsijent fallback
1 parent b5eeb1e commit 320779a

File tree

10 files changed

+151
-73
lines changed

10 files changed

+151
-73
lines changed

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
11
[submodule "libpsirngclient"]
22
path = libpsirngclient
33
url = https://github.com/nullspook/libpsirngclient.git
4+
[submodule "libpsijent"]
5+
path = libpsijent
6+
url = https://github.com/nullspook/libpsijent.git

CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,7 @@ endif()
256256
#
257257

258258
add_subdirectory(libpsirngclient)
259+
add_subdirectory(libpsijent)
259260

260261
#
261262
# install

README.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,12 @@ cd build/bin
3131

3232
**Note:** quantum-llama.cpp currently does not support `-DLLAMA_CURL=ON`.
3333

34+
#### Fallback RNG option
35+
36+
quantum-llama.cpp includes [libpsijent](https://github.com/nullspook/libpsijent.git)
37+
hardware timing jitter RNG as a fallback if a psirng server is not available.
38+
Enable it by setting `PSIJENT_FALLBACK=ON` before running `llama-*` programs.
39+
3440
---
3541

3642
# llama.cpp
@@ -68,7 +74,7 @@ LLM inference in C/C++
6874

6975
Getting started with llama.cpp is straightforward. Here are several ways to install it on your machine:
7076

71-
- Install `llama.cpp` using [brew, nix or winget](docs/install.md)
77+
- Install `llama.cpp` using [brew, nix, or winget](docs/install.md)
7278
- Run with Docker - see our [Docker documentation](docs/docker.md)
7379
- Download pre-built binaries from the [releases page](https://github.com/ggml-org/llama.cpp/releases)
7480
- Build from source by cloning this repository - check out [our build guide](docs/build.md)
@@ -94,7 +100,7 @@ The main goal of `llama.cpp` is to enable LLM inference with minimal setup and s
94100
range of hardware - locally and in the cloud.
95101

96102
- Plain C/C++ implementation without any dependencies
97-
- Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
103+
- Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate, and Metal frameworks
98104
- AVX, AVX2, AVX512 and AMX support for x86 architectures
99105
- RVV, ZVFH, ZFH, ZICBOP and ZIHINTPAUSE support for RISC-V architectures
100106
- 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
@@ -341,7 +347,7 @@ After downloading a model, use the CLI tools to run it locally - see below.
341347

342348
`llama.cpp` requires the model to be stored in the [GGUF](https://github.com/ggml-org/ggml/blob/master/docs/gguf.md) file format. Models in other data formats can be converted to GGUF using the `convert_*.py` Python scripts in this repo.
343349

344-
The Hugging Face platform provides a variety of online tools for converting, quantizing and hosting models with `llama.cpp`:
350+
The Hugging Face platform provides a variety of online tools for converting, quantizing, and hosting models with `llama.cpp`:
345351

346352
- Use the [GGUF-my-repo space](https://huggingface.co/spaces/ggml-org/gguf-my-repo) to convert to GGUF format and quantize model weights to smaller sizes
347353
- Use the [GGUF-my-LoRA space](https://huggingface.co/spaces/ggml-org/gguf-my-lora) to convert LoRA adapters to GGUF format (more info: https://github.com/ggml-org/llama.cpp/discussions/10123)
@@ -539,7 +545,7 @@ To learn more about model quantization, [read this documentation](tools/quantize
539545
- Contributors can open PRs
540546
- Collaborators will be invited based on contributions
541547
- Maintainers can push to branches in the `llama.cpp` repo and merge PRs into the `master` branch
542-
- Any help with managing issues, PRs and projects is very appreciated!
548+
- Any help with managing issues, PRs, and projects is very appreciated!
543549
- See [good first issues](https://github.com/ggml-org/llama.cpp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) for tasks suitable for first contributions
544550
- Read the [CONTRIBUTING.md](CONTRIBUTING.md) for more information
545551
- Make sure to read this: [Inference at the edge](https://github.com/ggml-org/llama.cpp/discussions/205)

libpsijent

Submodule libpsijent added at 16ffd40

src/CMakeLists.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -139,8 +139,8 @@ add_library(llama
139139
models/xverse.cpp
140140
models/mistral3.cpp
141141
models/graph-context-mamba.cpp
142-
psirngclient-manager.cpp
143-
psirngclient-manager.h
142+
psirng-wrapper.cpp
143+
psirng-wrapper.h
144144
)
145145

146146
set_target_properties(llama PROPERTIES
@@ -150,10 +150,10 @@ set_target_properties(llama PROPERTIES
150150
)
151151

152152
target_include_directories(llama PRIVATE .)
153-
target_include_directories(llama PUBLIC ../include ../libpsirngclient/src)
153+
target_include_directories(llama PUBLIC ../include ../libpsirngclient/src ../libpsijent/src)
154154
target_compile_features (llama PRIVATE cxx_std_17) # don't bump
155155

156-
target_link_libraries(llama PUBLIC ggml psirngclient)
156+
target_link_libraries(llama PUBLIC ggml psirngclient psijent)
157157

158158
if (BUILD_SHARED_LIBS)
159159
set_target_properties(llama PROPERTIES POSITION_INDEPENDENT_CODE ON)

src/llama-sampling.cpp

Lines changed: 4 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
#include "llama-impl.h"
44
#include "llama-vocab.h"
55
#include "llama-grammar.h"
6-
#include "psirngclient-manager.h"
6+
#include "psirng-wrapper.h"
77

88
#include "ggml-cpp.h"
99

@@ -216,11 +216,7 @@ static void llama_token_data_array_partial_sort_inplace(llama_token_data_array *
216216
}
217217

218218
static int llama_sample_dist(llama_token_data_array * cur_p, std::mt19937 & rng) {
219-
double chance;
220-
int rand_result = psirngclient_randuniform(psirngclient_manager::get_psirngclient(), &chance, 1, 0.0, 1.0);
221-
if (rand_result != PSIRNGCLIENT_RESULT_OK) {
222-
GGML_ABORT("%s: psirngclient_randuniform error: %d", __func__, rand_result);
223-
}
219+
const double chance = psirng_wrapper::uniform01();
224220

225221
double cumulative = 0.0;
226222
for (size_t i = 0; i < cur_p->size; ++i) {
@@ -1060,11 +1056,7 @@ static void llama_sampler_dist_apply(struct llama_sampler * smpl, llama_token_da
10601056
// sample from the obtained probabilities and normalize the probs in a single pass
10611057
// this is ~3x faster on Mac with full gpt-oss vocab than the version below
10621058
//
1063-
double rnd;
1064-
int rand_result = psirngclient_randuniform(psirngclient_manager::get_psirngclient(), &rnd, 1, 0.0, 1.0);
1065-
if (rand_result != PSIRNGCLIENT_RESULT_OK) {
1066-
GGML_ABORT("%s: psirngclient_randuniform error: %d", __func__, rand_result);
1067-
}
1059+
const double rnd = psirng_wrapper::uniform01();
10681060
double sum_run = 0.0f;
10691061
const double sum_tgt = sum_cum*rnd;
10701062

@@ -2140,12 +2132,7 @@ static void llama_sample_xtc_apply(struct llama_sampler * smpl, llama_token_data
21402132
return;
21412133
}
21422134

2143-
double chance;
2144-
int rand_result = psirngclient_randuniform(psirngclient_manager::get_psirngclient(), &chance, 1, 0.0, 1.0);
2145-
if (rand_result != PSIRNGCLIENT_RESULT_OK) {
2146-
GGML_ABORT("%s: psirngclient_randuniform error: %d", __func__, rand_result);
2147-
}
2148-
if (chance > ctx->probability) {
2135+
if (double chance = psirng_wrapper::uniform01(); chance > ctx->probability) {
21492136
return;
21502137
}
21512138

src/psirng-wrapper.cpp

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
#include "psirng-wrapper.h"
2+
3+
#include <algorithm>
4+
#include <cstdlib>
5+
#include <stdexcept>
6+
#include <string>
7+
8+
psirng_wrapper & psirng_wrapper::instance() {
9+
static psirng_wrapper instance;
10+
return instance;
11+
}
12+
13+
psirng_wrapper::psirng_wrapper() {
14+
const char* psirng_host = std::getenv("PSIRNG_HOST");
15+
const char* psirng_grpc_port = std::getenv("PSIRNG_GRPC_PORT");
16+
const char* psirng_cert_path = std::getenv("PSIRNG_CERT_PATH");
17+
18+
bool bool_psijent_fallback = false;
19+
if (const char * psijent_fallback = std::getenv("PSIJENT_FALLBACK")) {
20+
std::string str_psijent_fallback(psijent_fallback);
21+
std::transform(
22+
str_psijent_fallback.begin(),
23+
str_psijent_fallback.end(),
24+
str_psijent_fallback.begin(),
25+
[](const unsigned char c) {
26+
return static_cast<char>(std::tolower(c));
27+
}
28+
);
29+
bool_psijent_fallback = str_psijent_fallback == "yes" ||
30+
str_psijent_fallback == "on" ||
31+
str_psijent_fallback == "true" ||
32+
str_psijent_fallback == "1";
33+
}
34+
35+
int result;
36+
bool should_init_psijent = false;
37+
38+
if (psirng_host && psirng_grpc_port && psirng_cert_path) {
39+
result = psirngclient_init(&psirngclient_ptr, psirng_host, std::atoi(psirng_grpc_port), psirng_cert_path);
40+
if (result != PSIRNGCLIENT_RESULT_OK) {
41+
if (bool_psijent_fallback) {
42+
should_init_psijent = true;
43+
} else {
44+
throw std::runtime_error("failed to initialize psirng client: " + std::to_string(result));
45+
}
46+
}
47+
48+
if (!psirngclient_ishealthy(psirngclient_ptr)) {
49+
psirngclient_free(psirngclient_ptr);
50+
if (bool_psijent_fallback) {
51+
should_init_psijent = true;
52+
} else {
53+
throw std::runtime_error("psirng is not healthy");
54+
}
55+
}
56+
} else {
57+
if (bool_psijent_fallback) {
58+
should_init_psijent = true;
59+
} else {
60+
throw std::runtime_error("psirng is not configured");
61+
}
62+
}
63+
64+
if (should_init_psijent) {
65+
result = psijent_init(&psijent_ptr);
66+
if (result != PSIJENT_RESULT_OK) {
67+
throw std::runtime_error("failed to initialize psijent");
68+
}
69+
70+
result = psijent_start(psijent_ptr);
71+
if (result != PSIJENT_RESULT_OK) {
72+
psijent_free(psijent_ptr);
73+
throw std::runtime_error("failed to start psijent");
74+
}
75+
76+
if (const char * mantissa_length = std::getenv("PSIJENT_MANTISSA_LENGTH")) {
77+
psijent_mantissa_length = std::atoi(mantissa_length);
78+
}
79+
}
80+
}
81+
82+
psirng_wrapper::~psirng_wrapper() {
83+
if (psirngclient_ptr) {
84+
psirngclient_free(psirngclient_ptr);
85+
}
86+
87+
if (psijent_ptr) {
88+
psijent_free(psijent_ptr);
89+
}
90+
}
91+
92+
double psirng_wrapper::uniform01() {
93+
const psirng_wrapper & instance = psirng_wrapper::instance();
94+
95+
int result;
96+
double value = 0.0;
97+
98+
if (instance.psijent_ptr) {
99+
result = psijent_randuniform(instance.psijent_ptr, &value, 1, instance.psijent_mantissa_length);
100+
if (result != PSIJENT_RESULT_OK) {
101+
throw std::runtime_error("psijent_randuniform failed: " + std::to_string(result));
102+
}
103+
} else {
104+
result = psirngclient_randuniform(instance.psirngclient_ptr, &value, 1, 0.0, 1.0);
105+
if (result != PSIRNGCLIENT_RESULT_OK) {
106+
throw std::runtime_error("psirngclient_randuniform failed: " + std::to_string(result));
107+
}
108+
}
109+
110+
return value;
111+
}

src/psirng-wrapper.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#pragma once
2+
3+
#include "psirngclient.h"
4+
#include "psijent.h"
5+
6+
class psirng_wrapper {
7+
public:
8+
~psirng_wrapper();
9+
static double uniform01();
10+
11+
private:
12+
psirng_wrapper();
13+
static psirng_wrapper& instance();
14+
psirngclient* psirngclient_ptr = nullptr;
15+
psijent* psijent_ptr = nullptr;
16+
int psijent_mantissa_length = 52;
17+
};

src/psirngclient-manager.cpp

Lines changed: 0 additions & 35 deletions
This file was deleted.

src/psirngclient-manager.h

Lines changed: 0 additions & 13 deletions
This file was deleted.

0 commit comments

Comments
 (0)