A C++ static library for Ed25519 elliptic curve cryptography primitives, based on the SUPERCOP ref10 reference implementation.
Ed25519 is a high-speed, high-security elliptic curve used for digital signatures and key exchange. It operates on the twisted Edwards curve -x² + y² = 1 + d·x²·y² over the prime field GF(2²⁵⁵ - 19) (hence the name). This library provides the low-level building blocks — field arithmetic, curve point operations, scalar math, X25519 key exchange, and ristretto255 — so you can build protocols on top without worrying about the math underneath.
This is not a signature library. There's no sign() or verify() function here. Instead, you get the primitives that those operations are built from: constant-time scalar multiplication, multi-scalar multiplication, Diffie-Hellman key exchange, hash-to-curve, subgroup checking, and so on. If you need a turnkey Ed25519 signature scheme, look at libsodium or OpenSSL. If you need to build a protocol — Schnorr signatures, VRFs, Bulletproofs, threshold schemes — this library gives you the building blocks with full control over the math.
- Field arithmetic over GF(2²⁵⁵ - 19) — add, subtract, multiply, square, invert, and more
- Curve point operations — projective, extended, completed, and precomputed representations for efficient computation
- Scalar arithmetic modulo the group order l (~2²⁵²)
- Constant-time scalar multiplication — both fixed-base and variable-base, with no secret-dependent branches or memory access patterns
- Variable-time double scalar multiplication for signature verification
- Multi-scalar multiplication — compute s₁·P₁ + s₂·P₂ + ... + sₙ·Pₙ in a single pass using Straus (n≤32) and Pippenger (n>32) algorithms, with logarithmic speedup over serial
- Batch scalar multiplication — process N independent operations in parallel via AVX2 (4-way) and AVX-512 IFMA (8-way) horizontal SIMD
- X25519 Diffie-Hellman (RFC 7748) — constant-time Montgomery ladder on Curve25519 for key agreement, with automatic scalar clamping and secure erasure of intermediates
- Cofactor handling —
ge_mul8clears the cofactor,sc_clampapplies RFC 8032 bit clamping - Hash-to-curve — Elligator-like mapping from arbitrary bytes to curve points (
ge_fromfe_frombytes_vartime) - Subgroup checking — verify points lie in the prime-order subgroup (
ge_check_subgroup_precomp_vartime,ge_check_subgroup_precomp_negate_vartime) - Curve form conversion —
ge_p3_to_wei25519extracts the Wei25519 (short-Weierstrass) X-coordinate from an Ed25519 point, bridging Edwards and Weierstrass representations
- Ristretto255 (RFC 9496) — prime-order group abstraction over Ed25519, with encode, decode, equality, and hash-to-group operations
- SIMD acceleration (x86_64) — runtime-dispatched AVX2 and AVX-512 IFMA backends for scalar multiplication, verification, and multi-scalar multiplication, with automatic CPU feature detection via CPUID
- Secure memory erasure —
ed25519_secure_erasezeros secret data using platform-specific mechanisms the compiler can't optimize away - Cross-platform — MSVC, GCC, Clang, AppleClang, MinGW
Requires CMake 3.10+ and a C++17 compiler.
# Configure and build
cmake -S . -B build -DBUILD_TESTS=ON
cmake --build build --config Release -j
# Run tests
./build/ed25519-tests # Linux / macOS
./build/Release/ed25519-tests # Windows (MSVC)| Option | Default | Description |
|---|---|---|
BUILD_TESTS |
OFF |
Build the unit tests (ed25519-tests) |
BUILD_BENCHMARKS |
OFF |
Build the benchmark tool (ed25519-benchmark) |
FORCE_PORTABLE |
OFF |
Force the 32-bit portable implementation on 64-bit platforms (for testing/comparison) |
ENABLE_LTO |
OFF |
Enable link-time optimization |
ENABLE_AVX2 |
ON* |
Enable AVX2 SIMD backend with runtime dispatch |
ENABLE_AVX512 |
ON* |
Enable AVX-512 IFMA SIMD backend with runtime dispatch |
ENABLE_SPECTRE_MITIGATIONS |
OFF |
Enable MSVC /Qspectre Spectre mitigations (opt-in for hardened builds) |
BUILD_TOOLS |
OFF |
Build the test vector generator (ed25519-gen-testvectors) |
CMAKE_BUILD_TYPE |
Release |
Debug, Release, or RelWithDebInfo |
* On x86_64 only. Both default to OFF on other architectures or when FORCE_PORTABLE is set. Set either to OFF to benchmark individual code paths in isolation.
Include the master header to access all operations:
#include "ed25519.h"Or include individual headers for specific operations:
#include "fe_mul.h"
#include "ge_scalarmult_base_ct.h"
#include "sc_reduce.h"Link against the ed25519 static library target in your CMake project:
add_subdirectory(ed25519)
target_link_libraries(your_target ed25519)Schnorr-style signature using the library primitives:
#include "ed25519.h"
#include <cstring>
// Compute public key: A = clamp(seed) · B
void keypair(const unsigned char seed[32], unsigned char public_key[32]) {
unsigned char scalar[32];
memcpy(scalar, seed, 32);
sc_clamp(scalar);
ge_p3 A;
ge_scalarmult_base_ct(&A, scalar);
ge_p3_tobytes(public_key, &A);
ed25519_secure_erase(scalar, 32);
}
// Verify: check that [s]B - [h]A == R
bool verify(const ge_p3 *A, const unsigned char R[32],
const unsigned char s[32], const unsigned char h[32]) {
ge_p2 check;
ge_double_scalarmult_base_negate_vartime(&check, h, A, s);
unsigned char result[32];
ge_tobytes(result, &check);
return memcmp(result, R, 32) == 0;
}X25519 key exchange:
#include "x25519.h"
#include "ed25519_secure_erase.h"
// Alice generates a keypair
unsigned char alice_priv[32], alice_pub[32];
// ... fill alice_priv with 32 random bytes ...
x25519_base(alice_pub, alice_priv); // public key = scalar * basepoint(9)
// Bob does the same
unsigned char bob_priv[32], bob_pub[32];
// ... fill bob_priv with 32 random bytes ...
x25519_base(bob_pub, bob_priv);
// Both compute the same shared secret
unsigned char shared[32];
x25519(shared, alice_priv, bob_pub); // Alice's side
// x25519(shared, bob_priv, alice_pub); // Bob's side — same result
ed25519_secure_erase(alice_priv, 32);
ed25519_secure_erase(shared, 32);SIMD dispatch (x86_64) — initialize before any crypto operations:
// Fast heuristic — picks IFMA > AVX2 > baseline
ed25519_init();
// Or: benchmark all backends, pick fastest per-function (~1-2 seconds)
ed25519_init(true);Both modes are thread-safe and only execute once; subsequent calls are no-ops. On non-x86_64 platforms, ed25519_init() is a no-op.
The library is organized in three layers, matching the math of elliptic curve cryptography:
- Field elements (
fe_*) — Integers modulo p = 2²⁵⁵ - 19. These are the coordinates of curve points. - Group elements (
ge_*) — Points on the Ed25519 curve. Addition, doubling, scalar multiplication, encoding/decoding. - Scalars (
sc_*) — 256-bit integers modulo the group order l. Used as private keys and in signature equations.
The library uses two levels of dispatch to get the best performance on each platform without sacrificing portability.
Compile-time dispatch via ed25519_platform.h selects the field element representation:
- 64-bit (x86_64, ARM64): Field elements are
uint64_t[5]in radix-2⁵¹. Multiplication uses 128-bit products (__int128on GCC/Clang,_umul128on MSVC). - Everything else: Falls back to the portable implementation with
int32_t[10]in alternating 26/25-bit limbs.
Both representations are 40 bytes, so the fe type has the same layout regardless of backend. The FORCE_PORTABLE CMake option forces the 32-bit path on 64-bit platforms for testing.
Runtime dispatch (x86_64 only) selects SIMD-accelerated implementations for ten hot-path functions:
ge_scalarmult_ct— variable-base, constant-timege_scalarmult_base_ct— fixed-base, constant-timege_double_scalarmult_negate_vartime— verificationge_double_scalarmult_base_negate_vartime— verificationge_scalarmult_ct_batch— batch variable-base, constant-timege_double_scalarmult_base_negate_vartime_batch— batch verificationge_double_scalarmult_negate_vartime_batch_ss— batch DSM with shared scalars (returnsge_p2)ge_double_scalarmult_negate_vartime_batch_ss_p3— batch DSM with shared scalars (returnsge_p3, avoids expensive p2-to-p3 conversion)ge_multiscalar_mul_vartime— multi-scalar multiplicationge_multiscalar_mul_base_vartime— multi-scalar multiplication with base point
CPU features (AVX2, AVX-512F, AVX-512 IFMA) are detected via CPUID+XGETBV at startup. Call ed25519_init() for fast heuristic selection, or ed25519_init(true) to benchmark all available implementations and pick the fastest per-function.
Two SIMD backends accelerate the constant-time scalar multiplication and verification operations.
AVX2 uses _mm256_blendv_epi8 for branchless constant-time table selection (both MSVC and GCC). On MSVC, the entire scalar multiplication loop runs in radix-2²⁵·⁵ (fe10) representation to avoid uint128 struct emulation overhead, converting to/from radix-2⁵¹ only at entry and exit. On GCC/Clang, the standard radix-2⁵¹ chain ops are used (since __int128 is already efficient) with AVX2 cmov for table lookups. Batch operations use 4-way horizontal parallelism via fe10x4 (10 x __m256i), processing 4 independent operations per batch iteration.
AVX-512 IFMA uses vpmadd52lo/vpmadd52hi for field element multiplication, replacing the multi-instruction schoolbook multiply with hardware 52-bit fused multiply-accumulate. This backend includes a lazy normalization optimization: most IFMA operations skip input carry-propagation when bit-width bounds are known safe, with fe_normalize_weak inserted only at the few points where limbs could exceed 52 bits. Batch operations use 8-way horizontal parallelism via fe51x8 (5 x __m512i), processing 8 independent operations per batch iteration. IFMA dispatch is enabled for all compilers when AVX-512 IFMA is detected.
Performance-critical group element operations use a chain macro pattern to control inlining of field element mul/sq within larger operations:
- GCC/Clang: Field element multiply and square are force-inlined directly into group element bodies for maximum optimization.
- MSVC: Field element ops stay as extern calls because MSVC's optimizer regresses when inlining large bodies that involve its uint128 emulation structs.
This split is handled automatically by the fe51_chain.h, fe25_chain.h, fe10_avx2_chain.h, and fe_ifma_chain.h headers.
Arithmetic on elements of GF(2²⁵⁵ - 19). These are the coordinates of points on the curve — every point operation ultimately boils down to field element math.
- Arithmetic:
fe_add,fe_sub,fe_mul,fe_sq,fe_sq2,fe_neg,fe_mul121666 - Higher-order:
fe_invert,fe_pow22523,fe_divpowm1 - Serialization:
fe_frombytes,fe_tobytes - Utilities:
fe_copy,fe_cmov,fe_0,fe_1,fe_isnegative,fe_isnonzero
fe_mul121666 multiplies by the Montgomery curve constant (A+2)/4 = 121666, used internally by X25519.
Points on the Ed25519 curve. Multiple internal representations (projective, extended, completed, precomputed) are used for efficient computation — conversions between them are provided so you can chain operations.
Point types: ge_p2 (projective), ge_p3 (extended), ge_p1p1 (completed), ge_precomp, ge_cached
Arithmetic: ge_add, ge_sub, ge_madd, ge_msub, ge_p2_dbl, ge_p3_dbl, ge_mul8
Conversions: ge_p1p1_to_p2, ge_p1p1_to_p3, ge_p3_to_p2, ge_p3_to_cached, ge_p2_to_p3
Scalar multiplication:
ge_scalarmult_base_ct— fixed-base, constant-timege_scalarmult_ct— variable-base, constant-time
Batch operations:
ge_scalarmult_ct_batch— N independent variable-base scalarmultsge_double_scalarmult_base_negate_vartime_batch— N independent verification scalarmultsge_double_scalarmult_negate_vartime_batch— N independent general DSMge_double_scalarmult_negate_vartime_batch_ss— N DSM with shared scalars, returnsge_p2ge_double_scalarmult_negate_vartime_batch_ss_p3— same but returnsge_p3directly, avoiding expensive p2-to-p3 inversion
Multi-scalar multiplication:
ge_multiscalar_mul_vartime— Q = Σ sᵢ·Pᵢge_multiscalar_mul_base_vartime— Q = s₀·B + Σ sᵢ·Pᵢ
Verification: ge_double_scalarmult_negate_vartime, ge_double_scalarmult_base_negate_vartime
Precomputation: ge_dsm_precomp — builds a precomputation table for a point, used by the double scalar multiplication functions
Serialization: ge_tobytes, ge_p3_tobytes, ge_frombytes_vartime
Curve form conversion: ge_p3_to_wei25519 — extracts the Wei25519 short-Weierstrass X-coordinate from a ge_p3 point, using the birational map X_wei = (Z+Y)/(Z-Y) + A/3 where A=486662
Hash-to-curve: ge_fromfe_frombytes_vartime
Subgroup checking: ge_check_subgroup_precomp_vartime, ge_check_subgroup_precomp_negate_vartime — verify that l·A = 0 (identity) using precomputed DSM tables
Operations on 32-byte scalars modulo the group order l. These are the secret exponents and signature components.
- Arithmetic:
sc_add,sc_sub,sc_mul,sc_muladd,sc_mulsub - Reduction:
sc_reduce(s, len)— modular reduction mod l (supports 32-byte or 64-byte input) - Clamping:
sc_clamp— RFC 8032 bit clamping (clears bits 0-2 and 255, sets bit 254) - Validation:
sc_check_reduced(checks s < l),sc_check_clamped(checks RFC 8032 clamping),sc_isnonzero,sc_0
Prime-order group abstraction over the Ed25519 curve (RFC 9496). Ristretto maps equivalence classes of curve points to a unique canonical 32-byte encoding, eliminating cofactor-related pitfalls. All operations work with the same ge_p3 point type used by the rest of the library.
- Serialization:
ristretto255_decode(32 bytes toge_p3),ristretto255_encode(ge_p3to 32 bytes) - Comparison:
ristretto255_equals— constant-time equivalence check without encoding - Hash-to-group:
ristretto255_from_uniform_bytes— maps 64 uniform bytes (e.g. SHA-512 output) to a ristretto255 point via Elligator 2
Key exchange on the Montgomery form of Curve25519 (RFC 7748). Uses a constant-time Montgomery ladder with automatic scalar clamping.
x25519(shared_secret, scalar, point)— Compute the shared secret:shared_secret = clamp(scalar) * pointx25519_base(public_key, scalar)— Compute the public key:public_key = clamp(scalar) * basepoint(9)
Platform-agnostic: composed entirely of fe_* operations, so no x64/portable split is needed. All intermediate values are securely erased before return.
On x86_64, SIMD-accelerated backends are selected at runtime based on detected CPU features.
ed25519_init(bool autotune = false)— Initializes the dispatch table. Withoutautotune, uses CPUID heuristics to select a good backend (~microseconds). Withautotune=true, benchmarks all candidates and selects the empirically fastest per function (~1-2 seconds). Thread-safe; only the first call executes. On non-x86_64 platforms, this is a no-op.ed25519_cpu_features()— Returns a bitmask of detected CPU features (ED25519_CPU_AVX2,ED25519_CPU_AVX512F,ED25519_CPU_AVX512IFMA).ed25519_has_avx2(),ed25519_has_avx512f(),ed25519_has_avx512ifma()— Query individual CPU features.ed25519_get_dispatch()— Returns a const reference to the dispatch table (read-only; onlyed25519_initcan modify it).
ed25519_secure_erase(pointer, length)— Zeroes memory in a way the compiler can't optimize away. UsesSecureZeroMemory(MSVC/Windows),memset_s(C11),explicit_bzero(glibc 2.25+, OpenBSD, FreeBSD), or a volatile function pointer tomemsetas a last resort.
Build with -DBUILD_TESTS=ON to get the ed25519-tests executable. CLI options:
| Flag | Effect |
|---|---|
| (none) | Run with baseline dispatch (x64 or portable, no SIMD) |
--init |
Use CPUID heuristic dispatch before running tests |
--autotune |
Use benchmarked best-per-function dispatch before running tests |
The test suite has three layers:
Deterministic tests for operations not covered by the other layers: sc_reduce (32/64-byte), sc_clamp, sc_check_reduced, sc_check_clamped, sc_isnonzero, fe_frombytes/fe_tobytes roundtrip, ge_scalarmult_base_ct (RFC 8032 vectors), ge_frombytes (roundtrip, non-canonical rejection, off-curve rejection), ge_fromfe_frombytes, subgroup checking, ristretto255 RFC 9496 vectors (16 generator multiples, 29 bad encodings, 7 hash-to-group, 4 equivalence), batch scalar multiplication (N=0..17), multi-scalar multiplication (Straus and Pippenger), dispatch init/autotune, and edge cases (zero scalar, identity point, scalar=1, batch-vs-single cross-validation).
Independently validated deterministic vectors covering scalar, field element, group element, and ristretto255 operations. The pipeline:
- Generator (
tools/gen_test_vectors.cpp) — built with-DFORCE_PORTABLE=ONto use the trusted portable backend. Emits JSON totest_vectors/ed25519_test_vectors.json. - Validator (
tools/validate_test_vectors.py) — independently recomputes every result using PyNaCl (libsodium bindings). Validates all vectors before they're compiled into tests. - Converter (
tools/json_to_header.py) — converts JSON toinclude/ed25519_test_vectors.hwith constexpr C++ structs.
Operations covered: sc_add, sc_sub, sc_mul, sc_muladd, sc_mulsub, sc_reduce (32/64-byte), sc_clamp, fe_frombytes/fe_tobytes roundtrip, fe_add, fe_sub, fe_mul, fe_sq, fe_sq2, fe_neg, fe_invert, fe_pow22523, generator/identity points, ge_scalarmult_base_ct, ge_scalarmult_ct, ge_double_scalarmult_base_negate_vartime, ge_multiscalar_mul_vartime, ge_frombytes_vartime (valid/invalid), ge_p3_to_wei25519, ristretto255_encode/decode roundtrip, ristretto255_from_uniform_bytes.
25 randomized test functions exercising algebraic invariants with a fixed-seed PRNG (std::mt19937_64, seed=42, 256 iterations each):
- Scalar: commutativity, associativity, identity, inverse for add/mul; definition checks for muladd/mulsub; reduce idempotency; clamp bit properties
- Field element: frombytes/tobytes roundtrip idempotency; add/sub inverse; mul commutativity/associativity/distributivity; sq(f)==mul(f,f); f*invert(f)==1; cmov correctness
- Group element: tobytes/frombytes roundtrip; 0*P==identity, 1*P==P, l*P==identity; scalarmult linearity and compatibility; DSM matches serial
- Batch: batch CT/DSM/shared-scalar results match serial for various N
- MSM: MSM n=1 matches scalarmult, n=2..8 matches sum of individual; zero scalar, duplicate points, MSM base consistency
- Ristretto255: encode/decode roundtrip and determinism; equals reflexivity/symmetry; from_uniform_bytes determinism
- Cross-backend: captures a fingerprint of deterministic operation results before
ed25519_init(), compares after SIMD dispatch to ensure all backends produce identical output
# Build the generator with the portable backend
cmake -S . -B build-portable -DFORCE_PORTABLE=ON -DBUILD_TOOLS=ON
cmake --build build-portable --config Release --target ed25519-gen-testvectors
# Generate JSON
./build-portable/ed25519-gen-testvectors > test_vectors/ed25519_test_vectors.json
# Validate independently (requires PyNaCl: pip install pynacl)
python tools/validate_test_vectors.py test_vectors/ed25519_test_vectors.json
# Convert to C++ header
python tools/json_to_header.py test_vectors/ed25519_test_vectors.json include/ed25519_test_vectors.hBuild with -DBUILD_BENCHMARKS=ON to get the ed25519-benchmark executable. CLI options:
| Flag | Effect |
|---|---|
| (none) | Baseline dispatch (x64/portable), core operations only |
--init |
Use CPUID heuristic dispatch |
--autotune |
Use benchmarked best-per-function dispatch |
--batch |
Include batch scalar multiplication benchmarks (N=4,8,16,64) |
--msm |
Include multi-scalar multiplication benchmarks (N=2,4,...,256) with speedup vs serial |
Flags can be combined: ed25519-benchmark --init --batch --msm
GitHub Actions runs on every push, pull request, and on a weekly schedule. Every compiler is tested across four build configurations to ensure correctness at every layer:
| Configuration | CMake flags | What it tests |
|---|---|---|
| portable | -DFORCE_PORTABLE=ON |
32-bit int32_t[10] field arithmetic on all platforms |
| x64-baseline | -DENABLE_AVX2=OFF -DENABLE_AVX512=OFF |
64-bit radix-2⁵¹ backend with no SIMD dispatch |
| x64-avx2 | -DENABLE_AVX512=OFF |
AVX2 SIMD backend (runtime-dispatched if CPU supports it) |
| x64-full | (defaults) | Full build with AVX2 + AVX-512 IFMA backends |
Compilers under test:
| Platform | Compilers | Configs |
|---|---|---|
| Linux (x86_64) | GCC 11, GCC 12, Clang 14, Clang 15 | All four |
| macOS (ARM64) | AppleClang, LLVM Clang | portable, arm64 |
| Windows (x86_64) | MSVC, MinGW-GCC | All four |
Unit tests and benchmarks run for every combination.
This project is released into the public domain under the Unlicense.