Skip to content

Conversation

@marcomd
Copy link
Owner

@marcomd marcomd commented Jun 18, 2025

Description

  • Rust implementation using magnus 0.7
  • Updated extconf.rb to automatically compile extension with Rust via rb_sys if available or fallback to C
    • If you want to compile in C simply remove rust from your .tool-versions file
  • Added embedding.rs and cargo configuration
  • Added rb_sys to handle rust compilation

Compilation

Compile using C or Rust

  1. Rust extension

If you still don't have rust install it:

asdf plugin add rust
asdf install rust 1.87.0
asdf set rust 1.87.0
bundle install
bundle exec rake compile

🧹 Cleaning artifacts...
     Removed 0 files
/Users/mark/.asdf/installs/ruby/3.4.4/bin/ruby extconf.rb
🔧 Rust toolchain detected, building Rust extension...
checking for clang... yes
checking for clang++... yes
checking for ar... yes
checking for install_name_tool... yes
generating target/release/libembedding.dylib (release)
cargo rustc  --manifest-path ./Cargo.toml --target-dir target --lib --profile release -- -C linker=clang -L native=/Users/mark/.asdf/installs/ruby/3.4.4/lib -L native=/opt/homebrew/opt/openssl@1.1/lib -L native=/opt/homebrew/Cellar/gmp/6.3.0/lib -C link-arg=-Wl,-undefined,dynamic_lookup -l pthread
   Compiling proc-macro2 v1.0.95
   Compiling memchr v2.7.5
   Compiling unicode-ident v1.0.18
   Compiling glob v0.3.2
   Compiling libc v0.2.173
   Compiling regex-syntax v0.8.5
   Compiling cfg-if v1.0.1
   Compiling minimal-lexical v0.2.1
   Compiling either v1.15.0
   Compiling bindgen v0.69.5
   Compiling libloading v0.8.8
   Compiling itertools v0.12.1
   Compiling rustc-hash v1.1.0
   Compiling lazycell v1.3.0
   Compiling shlex v1.3.0
   Compiling lazy_static v1.5.0
   Compiling aho-corasick v1.1.3
   Compiling nom v7.1.3
   Compiling clang-sys v1.8.1
   Compiling bitflags v2.9.1
   Compiling shell-words v1.1.0
   Compiling rb-sys-env v0.1.2
   Compiling seq-macro v0.3.6
   Compiling magnus v0.7.1
   Compiling regex-automata v0.4.9
   Compiling cexpr v0.6.0
   Compiling quote v1.0.40
   Compiling syn v2.0.103
   Compiling regex v1.11.1
   Compiling magnus-macros v0.6.0
   Compiling rb-sys-build v0.9.116
   Compiling rb-sys v0.9.116
   Compiling embedding v0.3.0 (/Users/mark/Lavoro/Gemme/rag_embeddings/ext/rag_embeddings)
    Finished `release` profile [optimized] target(s) in 15.40s
  1. C extension

Remove rust from your .tool-versions file to force C compilation

bundle exec rake compile

📦 Building C extension...
creating Makefile
compiling embedding.c
linking shared-object rag_embeddings/embedding.bundle

Performance Comparison: C vs Rust Extension

Performance benchmarks comparing C and Rust implementations of the embedding extension across different embedding sizes (768, 2048, 3072, 4096).

⚠️ Please note that this benchmark generates highly variable results and should be averaged over many runs. For simplicity, I limited myself to three runs and chose the best result.

bundle exec rspec spec/performance_spec.rb --seed 57210

Performance Metrics

Metric Embedding Size C Implementation Rust Implementation Rust vs C
Embedding Creation (10k ops) 768 22 ms 24 ms ❌ (+2ms)
2048 57 ms 40 ms ✅ (-17ms)
3072 92 ms 62 ms ✅ (-30ms)
4096 112 ms 72 ms ✅ (-40ms)
Cosine Similarity (10k ops) 768 8 ms 9 ms ❌ (+1ms)
2048 21 ms 20 ms ✅ (-1ms)
3072 30 ms 34 ms ❌ (+4ms)
4096 41 ms 42 ms ❌ (+1ms)

Memory Usage

Metric Embedding Size C Implementation Rust Implementation Rust vs C
RSS After Cleanup 768 81.11 MB 74.55 MB ✅ (-6.56MB)
2048 160.64 MB 181.06 MB ❌ (+20.42MB)
3072 162.33 MB 181.05 MB ❌ (+18.72MB)
4096 196.02 MB 239.89 MB ❌ (+43.87MB)
Peak RSS 768 114.22 MB 113.53 MB ✅ (-0.69MB)
2048 165.16 MB 176.8 MB ❌ (+11.64MB)
3072 183.19 MB 213.69 MB ❌ (+30.5MB)
4096 229.06 MB 218.92 MB ✅ (-10.14MB)

Memory Efficiency

Embedding Size C Implementation Rust Implementation Rust vs C
768 179.3% 272.4% ✅ (more efficient)
2048 1733.5% 667.4% ❌ (less efficient)
3072 170.3% 117.2% ❌ (less efficient)
4096 340.4% 395.3% ✅ (more efficient)

Test Duration

Implementation Total Time File Load Time
C 4.5 seconds 0.47474 seconds
Rust 4.18 seconds 0.22027 seconds
Rust vs C -0.32s -0.25s

Key Findings

Performance

  • C excels at small embeddings (768) with slightly better creation and similarity times
  • Rust significantly outperforms C for larger embeddings (2048+) with 30-40ms improvements in creation time
  • Cosine similarity performance is comparable between both implementations

Memory Usage

  • Rust uses less memory for small embeddings (768)
  • C is more memory-efficient for larger embeddings (2048+) with 18-44MB less RSS usage
  • C shows better memory efficiency ratios in most test cases

Overall

  • Rust is faster for compute-intensive operations on larger data
  • C is more memory-efficient for larger embedding sizes
  • Rust has faster startup times (file loading ~2x faster)

marcomd added 2 commits June 18, 2025 17:30
- Updated extconf.rb to automatically compile extension with rust if available or fallback to c
- Added embedding.rs and cargo configuration
- Added `rb_sys` to handle rust compilation
@marcomd marcomd added the enhancement New feature or request label Jun 18, 2025
@marcomd marcomd self-assigned this Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants