Skip to content

Conversation

@guoqingbao
Copy link

The current version does not support multiple builder instances and a build instance cannot be used for another build (as also mentioned in #2 ), I have fixed this by making the builder reusable, e.g., building ptx after lib building.

The tested use case:

fn main() {
    println!("cargo:rerun-if-changed=build.rs");
    println!("cargo:rerun-if-changed=src/attention.cu");
    println!("cargo:rerun-if-changed=src/copy_blocks_kernel.cu");
    println!("cargo:rerun-if-changed=src/reshape_and_cache_kernel.cu");
    println!("cargo:rerun-if-changed=src/rotary_embedding_kernel.cu");
    let builder = bindgen_cuda::Builder::default();
    println!("cargo:info={builder:?}");
    builder.build_lib("libattention.a");

    let bindings = builder.build_ptx().unwrap();
    bindings.write("src/lib.rs").unwrap();

    println!("cargo:rustc-link-lib=attention"); 
}

@guoqingbao guoqingbao mentioned this pull request Jun 7, 2024
@guoqingbao
Copy link
Author

The current version does not support multiple builder instances and a build instance cannot be used for another build (as also mentioned in #2 ), I have fixed this by making the builder reusable, e.g., building ptx after lib building.

The tested use case:

fn main() {
    println!("cargo:rerun-if-changed=build.rs");
    println!("cargo:rerun-if-changed=src/attention.cu");
    println!("cargo:rerun-if-changed=src/copy_blocks_kernel.cu");
    println!("cargo:rerun-if-changed=src/reshape_and_cache_kernel.cu");
    println!("cargo:rerun-if-changed=src/rotary_embedding_kernel.cu");
    let builder = bindgen_cuda::Builder::default();
    println!("cargo:info={builder:?}");
    builder.build_lib("libattention.a");

    let bindings = builder.build_ptx().unwrap();
    bindings.write("src/lib.rs").unwrap();

    println!("cargo:rustc-link-lib=attention"); 
}

This is a different solution by making the builder reusable instead of using multiple thread pools like #3

@guoqingbao
Copy link
Author

@ivarflakstad Hi Ivar, are you able to review this PR? I believe the PR at huggingface/candle#3221 depends on this.

authors = ["Nicolas Patry <patry.nicolas@protonmail.com>"]
name = "bindgen_cuda"
version = "0.1.5"
version = "0.1.7"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
version = "0.1.7"
version = "0.1.5"

We can increment the version in a separate PR when creating a new release.

true
};
let ccbin_env = std::env::var("NVCC_CCBIN");
let nvcc_binary = if std::path::Path::new("/usr/local/cuda/bin/nvcc").exists() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could create a utility ala get_nvcc_binary and avoid the repetition

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is horrible. Let users setup their own path. This location is hardcoded to specific linux distributions it has nothing to do here.

If you really want allow environment overrides. It's pretty bad too, but at least it won't be hardcoded.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is horrible. Let users setup their own path. This location is hardcoded to specific linux distributions it has nothing to do here.

If you really want allow environment overrides. It's pretty bad too, but at least it won't be hardcoded.

This is intended as a fallback to the default nvcc path. At the moment, bindgen_cuda can panic even when nvcc is available in the standard installation location (/usr/local/cuda/bin).
Given that, we either provide a reasonable fallback here, or require every user to manually set and hardcode the path in their own environment. A get_nvcc_binary may needed as suggested by @ivarflakstad

pub fn build_ptx(self) -> Result<Bindings, Error> {
let cuda_root = self.cuda_root.expect("Could not find CUDA in standard locations, set it manually using Builder().set_cuda_root(...)");
pub fn build_ptx(&self) -> Result<Bindings, Error> {
let mut cuda_include_dir = PathBuf::from("/usr/local/cuda/include");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has nothing to do here.

Comment on lines +85 to +92
/// Force to use a given compute capability
pub fn set_compute_cap(&mut self, cap: usize) {
self.compute_cap = Some(cap);
}

pub fn get_compute_cap(&self) -> Option<usize> {
self.compute_cap
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They already exist, remove this.

Copy link
Author

@guoqingbao guoqingbao Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it shouldn't be here because I'm pushing to my own forked repo (forgot this has being pushed for a merge). But these interface are actual demand, we haven't expose compute_cap (initialized on default) to users and sometime they need a specific cuda compute cap to build certain kernels (instead of relying on env or nvidia-smi), e.g., flash attention v2 kernels needs to compiled with sm_90 or below even other kernels compiled with sm_120+. Related issue guoqingbao/vllm.rs#194

/// println!("cargo:rustc-link-lib=flash");
/// ```
pub fn build_lib<P>(self, out_file: P)
pub fn build_lib<P>(&self, out_file: P)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was kind of points to consume the builder...

Everything already works, instead of making build_lib(&self) you should clone the builder directly (so implementing Clone).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants