Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 38 additions & 5 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,51 @@ num-traits = { version = "0.2.14", default-features = false, features = [
"libm",
], optional = true }
zerocopy = { version = "0.6.0", default-features = false, optional = true }
rand = { version = "0.9.0", default-features = false, features = ["std", "std_rng", "thread_rng"], optional = true }
rand_distr = { version = "0.5.0", default-features = false, optional = true }
rand = { version = "0.9.0", default-features = false, features = [
"std",
"std_rng",
"thread_rng",
], optional = true }
rand_distr = { version = "0.5.1", default-features = false, optional = true }
rkyv = { version = "0.7", optional = true }
cudarc = { version = "0.13.3", features = ["std", "cublas", "cublaslt", "curand", "driver", "nvrtc", "f16", "cuda-version-from-build-system", "dynamic-linking"], default-features=false, optional = true }
mistralrs_cudarc_fork = { version = "0.12.1", features = ["std", "cublas", "cublaslt", "curand", "driver", "nvrtc", "f16", "cuda-version-from-build-system", "dynamic-linking"], default-features=false, optional = true }
cudarc = { version = "0.16.3", features = [
"std",
"cublas",
"cublaslt",
"curand",
"driver",
"nvrtc",
"f16",
"cuda-version-from-build-system",
"dynamic-linking",
], default-features = false, optional = true }
mistralrs_cudarc_fork = { version = "0.12.1", features = [
"std",
"cublas",
"cublaslt",
"curand",
"driver",
"nvrtc",
"f16",
"cuda-version-from-build-system",
"dynamic-linking",
], default-features = false, optional = true }

[features]
default = ["std"]
std = []
cuda = ["dep:cudarc"]
mistralrs_cudarc_fork = ["dep:mistralrs_cudarc_fork"]
all = ["std", "num-traits", "rand_distr", "bytemuck", "zerocopy", "rkyv", "serde", "cuda"]
all = [
"std",
"num-traits",
"rand_distr",
"bytemuck",
"zerocopy",
"rkyv",
"serde",
"cuda",
]
num-traits = ["dep:num-traits"]
rand_distr = ["dep:rand_distr", "dep:rand", "std"]
bytemuck = ["dep:bytemuck"]
Expand Down
16 changes: 4 additions & 12 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1565,22 +1565,14 @@ from_t!(F8E4M3);
from_t!(F8E5M2);

#[cfg(feature = "cuda")]
unsafe impl cudarc::driver::DeviceRepr for F8E4M3 {
fn as_kernel_param(&self) -> *mut std::ffi::c_void {
use std::ptr::addr_of;
addr_of!(self.0) as *const u8 as *mut _
}
}
unsafe impl cudarc::driver::DeviceRepr for F8E4M3 {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure why cudarc removed as_kernel_param, could not find any documentation about that change. It was part of a much larger code reorganization.


#[cfg(feature = "cuda")]
unsafe impl cudarc::driver::ValidAsZeroBits for F8E4M3 {}

#[cfg(feature = "cuda")]
unsafe impl cudarc::driver::safe::DeviceRepr for F8E5M2 {
fn as_kernel_param(&self) -> *mut std::ffi::c_void {
use std::ptr::addr_of;
addr_of!(self.0) as *const u8 as *mut _
}
}
unsafe impl cudarc::driver::safe::DeviceRepr for F8E5M2 {}

Comment on lines +1568 to +1575
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Inconsistent use of DeviceRepr vs safe::DeviceRepr between the two types

F8E4M3 → implements driver::DeviceRepr
F8E5M2 → implements driver::safe::DeviceRepr

Unless there is a documented reason that one type must be “unsafe-param” and the other the “safe” variant, this asymmetry is surprising and makes the maintenance story harder (especially after the cudarc bump).

  1. If both types satisfy the invariants of the safe trait, prefer that for both:
 #[cfg(feature = "cuda")]
 unsafe impl cudarc::driver::DeviceRepr for F8E4M3 {}
+#[cfg(feature = "cuda")]
+unsafe impl cudarc::driver::safe::DeviceRepr for F8E4M3 {}
  1. If the old, now-removed as_kernel_param implementation is still required for the unsafe trait in 0 .16, the build will fail with a trait-method-not-implemented error. Make sure cargo check --features=cuda passes.

Please double-check the cudarc 0.16 trait docs and align the two impls accordingly.


🏁 Script executed:

#!/bin/bash
# Inspect implementations of DeviceRepr and safe::DeviceRepr in src/lib.rs around F8E4M3 and F8E5M2
rg -n "F8E4M3" src/lib.rs
rg -n "F8E5M2" src/lib.rs
sed -n '1550,1600p' src/lib.rs
rg -n "DeviceRepr" src/lib.rs

Length of output: 12573


🌐 Web query:

In cudarc 0.16, what are the definitions and requirements of cudarc::driver::DeviceRepr and cudarc::driver::safe::DeviceRepr traits?

💡 Result:

Definitions and Requirements of DeviceRepr in cudarc 0.16

cudarc::driver::DeviceRepr

The DeviceRepr trait in cudarc::driver is an unsafe Rust trait that describes types whose memory representation is suitable for copying between the host (CPU) and the device (GPU) using CUDA's driver API. In other words, it is a marker for types that can be safely and correctly transferred between Rust and CUDA device memory without introducing undefined behavior.

Requirements:

  • The trait is generally implemented for types that are #[repr(C)] or #[repr(transparent)] and contain only CUDA-compatible primitives (e.g., integers, floats).
  • Implementors must ensure that their type's memory layout matches what CUDA expects so that data can be faithfully reinterpreted on the device.
  • Types using custom representations or containing references, pointers, or Rust-managed resources must not implement this trait.
  • The trait is inherently unsafe, as incorrect implementations can lead to undefined behavior.

Typical signature:

pub unsafe trait DeviceRepr: Copy + Sized {}

cudarc::driver::safe::DeviceRepr

The DeviceRepr trait under the cudarc::driver::safe module refers to the same concept as above but is exposed through cudarc's "safe" API surface. This module wraps the lower-level, potentially unsafe CUDA driver interactions in a safer Rust interface wherever possible.

Requirements and Guarantees:

  • The trait in safe is typically a re-export or a wrapper of the root cudarc::driver::DeviceRepr, enforcing the same requirements and constraints: the type must have a valid, device-compatible binary representation.
  • In the "safe" context, this trait is used to generically constrain types that can be transferred or viewed in device memory using abstractions like CudaSlice<T>, CudaView<T>, and CudaViewMut<T>, which provide more ergonomic and memory-safe device buffer management[3].
  • The use of this trait helps enforce that only "POD" (plain-old-data) types can be moved to CUDA device memory, reducing user error in memory transfers.

Summary Table

Trait Location Type of Trait Key Requirements/Guarantees
cudarc::driver::DeviceRepr Unsafe Only for types compatible with CUDA device memory, must be Copy, Sized, and have a C-compatible layout; unsafe to implement incorrectly.
cudarc::driver::safe::DeviceRepr Safe API Same requirements as above; used in higher-level abstractions for safer device memory access.

Practical Usage

These traits are primarily used to constrain the types that can be passed to or received from device buffers. They protect against common errors, such as attempting to transfer complex Rust types (structs with references, heap allocations, or non-C-compatible layouts) directly to the GPU, which would result in undefined or erroneous behavior.

"Both of these traits can generically accept memory allocated by the driver in the form of: driver::CudaSlice<T>, driver::CudaView<T>, and driver::CudaViewMut<T>..."[3]

Conclusion

  • DeviceRepr is a marker trait for types that are plain-old-data and have a memory layout directly compatible with CUDA device memory.
  • Only safe, simple types (like primitives and #[repr(C)] structs containing them) should implement these traits.
  • The trait is declared as unsafe in the core API and is used as a bound in cudarc's safe, ergonomic abstractions for device memory buffers[3].

If you require the exact trait definition, it can be reviewed in the cudarc source code on docs.rs[1] or the project's GitHub repository, but the above summarizes its definition and requirements accurately.

Citations:


Align DeviceRepr impls for F8E4M3 under both CUDA features

In cudarc 0.16 the safe::DeviceRepr trait has identical requirements to the root DeviceRepr but is the preferred, higher-level API. Currently:

  • Under feature = "cuda",
    • F8E4M3 implements cudarc::driver::DeviceRepr
    • F8E5M2 implements cudarc::driver::safe::DeviceRepr

  • Under feature = "mistralrs_cudarc_fork",
    • F8E4M3 implements mistralrs_cudarc_fork::driver::DeviceRepr
    • F8E5M2 implements mistralrs_cudarc_fork::driver::safe::DeviceRepr

Please make them symmetric by adding the missing safe impls for F8E4M3:

• In src/lib.rs under #[cfg(feature = "cuda")], after

unsafe impl cudarc::driver::DeviceRepr for F8E4M3 {}

add

#[cfg(feature = "cuda")]
+unsafe impl cudarc::driver::safe::DeviceRepr for F8E4M3 {}

• In the same file under #[cfg(feature = "mistralrs_cudarc_fork")], after

unsafe impl mistralrs_cudarc_fork::driver::DeviceRepr for F8E4M3 {
    fn as_kernel_param(&self) -> *mut std::ffi::c_void {}
}

add

#[cfg(feature = "mistralrs_cudarc_fork")]
+unsafe impl mistralrs_cudarc_fork::driver::safe::DeviceRepr for F8E4M3 {
+    fn as_kernel_param(&self) -> *mut std::ffi::c_void {
+        use std::ptr::addr_of;
+        addr_of!(self.0) as *const u8 as *mut _
+    }
+}

Then verify:

cargo check --features="cuda mistralrs_cudarc_fork"

to ensure you haven’t introduced any missing trait errors.

🤖 Prompt for AI Agents
In src/lib.rs around lines 1568 to 1575, the implementations of DeviceRepr
traits for F8E4M3 and F8E5M2 are inconsistent: F8E4M3 implements
cudarc::driver::DeviceRepr while F8E5M2 implements
cudarc::driver::safe::DeviceRepr under the "cuda" feature. To fix this, add an
unsafe impl of cudarc::driver::safe::DeviceRepr for F8E4M3 under the
#[cfg(feature = "cuda")] block after the existing DeviceRepr impl. Similarly,
under the #[cfg(feature = "mistralrs_cudarc_fork")] feature, add the
safe::DeviceRepr impl for F8E4M3 after its DeviceRepr impl. Finally, run cargo
check with both features enabled to confirm no trait errors remain.

#[cfg(feature = "cuda")]
unsafe impl cudarc::driver::ValidAsZeroBits for F8E5M2 {}

Expand Down