GPU Compute

A high-performance Rust framework for GPU computation with unified support for NVIDIA (CUDA) and AMD (ROCm/HIP) platforms.

⚠️ WARNING: This library is in early experimental stage and NOT production ready

This project has many limitations and known issues. It is primarily intended for research and experimentation. APIs may change without notice, performance optimizations are incomplete, and error handling is still developing.

Overview

GPU Compute provides a tensor-based computational model similar to PyTorch/TensorFlow but in native Rust, with strong type safety and memory management. It handles complex GPU programming while exposing a high-level API for developers to focus on algorithms rather than hardware details.

Key Features

Cross-Platform: Unified API for both NVIDIA (CUDA) and AMD (ROCm/HIP) GPUs
Tensor Operations: Comprehensive set of operations for numerical computing and ML
Memory Management: Efficient device, host, and unified memory abstractions
Kernel Execution: Easy launching of custom GPU kernels with automatic configuration
Neural Network Primitives: Optimized convolutions, pooling, and normalization operations
BLAS Integration: Level 1-3 BLAS operations for linear algebra
JIT Compilation: Runtime code generation for specialized kernels
Debugging Tools: Profiling, visualization, and memory inspection utilities
Safe Abstractions: Rust wrappers around unsafe FFI calls with robust error handling

Installation

Add this to your Cargo.toml:

[dependencies]
# For NVIDIA GPUs
gpu-compute = { path = "path/to/gpu-compute", features = ["cuda"] }

# For AMD GPUs
gpu-compute = { path = "path/to/gpu-compute", features = ["rocm"] }

# For both
gpu-compute = { path = "path/to/gpu-compute", features = ["cuda", "rocm"] }

Prerequisites

For CUDA support

CUDA Toolkit (version 11.0 or higher recommended)
Set CUDA_PATH environment variable to your CUDA installation path

For ROCm support

ROCm installation (version 4.0 or higher recommended)
Set ROCM_PATH environment variable to your ROCm installation path

Quick Start

Basic Initialization

use gpu_compute::context;
use gpu_compute::error::GpuResult;

fn main() -> GpuResult<()> {
    // Initialize GPU subsystem
    context::initialize()?;
    
    // Use the GPU context
    context::with_context(|ctx| {
        println!("Using device: {}", ctx.get_device(ctx.current_device()).unwrap().name());
        
        // Do GPU operations...
        
        Ok(())
    })?;
    
    // Cleanup
    context::shutdown()?;
    
    Ok(())
}

Working with Tensors

use gpu_compute::context;
use gpu_compute::error::GpuResult;
use gpu_compute::tensor::{Tensor, Shape};
use gpu_compute::ops::elementwise;

fn main() -> GpuResult<()> {
    context::initialize()?;
    
    context::with_context(|ctx| {
        // Create tensors on device
        let shape = Shape::new(vec![2, 3]);
        let a = Tensor::ones(shape.clone(), ctx.current_device())?;
        let b = Tensor::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape, ctx.current_device())?;
        
        // Perform operations
        let c = elementwise::add(&a, &b)?;
        
        // Transfer result to host and print
        let host_result = c.to_host_vec()?;
        println!("Result: {:?}", host_result);
        
        Ok(())
    })?;
    
    context::shutdown()?;
    
    Ok(())
}

Examples

Matrix Multiplication

use gpu_compute::context;
use gpu_compute::error::GpuResult;
use gpu_compute::tensor::{Tensor, Shape};
use gpu_compute::ops::blas_level2_3;

fn main() -> GpuResult<()> {
    context::initialize()?;
    
    context::with_context(|ctx| {
        // Create matrices
        let a = Tensor::from_vec(
            vec![1.0, 2.0, 3.0, 4.0], 
            Shape::new(vec![2, 2]), 
            ctx.current_device()
        )?;
        
        let b = Tensor::from_vec(
            vec![5.0, 6.0, 7.0, 8.0], 
            Shape::new(vec![2, 2]), 
            ctx.current_device()
        )?;
        
        // Perform matrix multiplication
        let c = blas_level2_3::matmul(&a, &b, false, false)?;
        
        // Transfer result to host and print
        let host_result = c.to_host_vec()?;
        println!("Result: {:?}", host_result);
        
        Ok(())
    })?;
    
    context::shutdown()?;
    
    Ok(())
}

Building and Testing

# Build with CUDA support
cargo build --features cuda

# Build with ROCm support
cargo build --features rocm

# Run tests
cargo test --features cuda  # or --features rocm

# Run benchmarks
cargo bench --features cuda  # or --features rocm

Performance Profiling

use gpu_compute::context;
use gpu_compute::error::GpuResult;
use gpu_compute::debug::profiler;

fn main() -> GpuResult<()> {
    context::initialize()?;
    
    context::with_context(|ctx| {
        // Start profiling
        let mut profile_session = profiler::ProfileSession::new("my_operation")?;
        
        // ... your GPU operations here ...
        
        // End profiling and print results
        profile_session.end()?;
        profile_session.print_summary();
        
        Ok(())
    })?;
    
    context::shutdown()?;
    
    Ok(())
}

Current Limitations

This library has several significant limitations and known issues:

No Automatic Differentiation: Unlike PyTorch or TensorFlow, there is no autodiff support for deep learning
Limited Tensor Operations: Many common operations are still missing or incomplete
Performance Issues: Several operations have suboptimal implementations
Memory Management: Inefficient memory slicing and no caching allocators
No Multi-GPU Support: No infrastructure for distributed computation
Limited Documentation: API documentation is minimal
No Python Bindings: No easy integration with the Python ML ecosystem
Testing Coverage: Incomplete test coverage for many components

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Areas where help is especially needed:

Adding missing tensor operations
Improving performance of existing operations
Expanding test coverage
Enhancing documentation
Adding Python bindings

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ci		ci
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
build.rs		build.rs
wrapper_cuda.h		wrapper_cuda.h
wrapper_hip.h		wrapper_hip.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPU Compute

Overview

Key Features

Installation

Prerequisites

For CUDA support

For ROCm support

Quick Start

Basic Initialization

Working with Tensors

Examples

Matrix Multiplication

Building and Testing

Performance Profiling

Current Limitations

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

aslitaser/gpu-compute

Folders and files

Latest commit

History

Repository files navigation

GPU Compute

Overview

Key Features

Installation

Prerequisites

For CUDA support

For ROCm support

Quick Start

Basic Initialization

Working with Tensors

Examples

Matrix Multiplication

Building and Testing

Performance Profiling

Current Limitations

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages