Skip to content

aslitaser/gpu-compute

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU Compute

License: MIT Status: Experimental

A high-performance Rust framework for GPU computation with unified support for NVIDIA (CUDA) and AMD (ROCm/HIP) platforms.

⚠️ WARNING: This library is in early experimental stage and NOT production ready

This project has many limitations and known issues. It is primarily intended for research and experimentation. APIs may change without notice, performance optimizations are incomplete, and error handling is still developing.

Overview

GPU Compute provides a tensor-based computational model similar to PyTorch/TensorFlow but in native Rust, with strong type safety and memory management. It handles complex GPU programming while exposing a high-level API for developers to focus on algorithms rather than hardware details.

Key Features

  • Cross-Platform: Unified API for both NVIDIA (CUDA) and AMD (ROCm/HIP) GPUs
  • Tensor Operations: Comprehensive set of operations for numerical computing and ML
  • Memory Management: Efficient device, host, and unified memory abstractions
  • Kernel Execution: Easy launching of custom GPU kernels with automatic configuration
  • Neural Network Primitives: Optimized convolutions, pooling, and normalization operations
  • BLAS Integration: Level 1-3 BLAS operations for linear algebra
  • JIT Compilation: Runtime code generation for specialized kernels
  • Debugging Tools: Profiling, visualization, and memory inspection utilities
  • Safe Abstractions: Rust wrappers around unsafe FFI calls with robust error handling

Installation

Add this to your Cargo.toml:

[dependencies]
# For NVIDIA GPUs
gpu-compute = { path = "path/to/gpu-compute", features = ["cuda"] }

# For AMD GPUs
gpu-compute = { path = "path/to/gpu-compute", features = ["rocm"] }

# For both
gpu-compute = { path = "path/to/gpu-compute", features = ["cuda", "rocm"] }

Prerequisites

For CUDA support

  • CUDA Toolkit (version 11.0 or higher recommended)
  • Set CUDA_PATH environment variable to your CUDA installation path

For ROCm support

  • ROCm installation (version 4.0 or higher recommended)
  • Set ROCM_PATH environment variable to your ROCm installation path

Quick Start

Basic Initialization

use gpu_compute::context;
use gpu_compute::error::GpuResult;

fn main() -> GpuResult<()> {
    // Initialize GPU subsystem
    context::initialize()?;
    
    // Use the GPU context
    context::with_context(|ctx| {
        println!("Using device: {}", ctx.get_device(ctx.current_device()).unwrap().name());
        
        // Do GPU operations...
        
        Ok(())
    })?;
    
    // Cleanup
    context::shutdown()?;
    
    Ok(())
}

Working with Tensors

use gpu_compute::context;
use gpu_compute::error::GpuResult;
use gpu_compute::tensor::{Tensor, Shape};
use gpu_compute::ops::elementwise;

fn main() -> GpuResult<()> {
    context::initialize()?;
    
    context::with_context(|ctx| {
        // Create tensors on device
        let shape = Shape::new(vec![2, 3]);
        let a = Tensor::ones(shape.clone(), ctx.current_device())?;
        let b = Tensor::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape, ctx.current_device())?;
        
        // Perform operations
        let c = elementwise::add(&a, &b)?;
        
        // Transfer result to host and print
        let host_result = c.to_host_vec()?;
        println!("Result: {:?}", host_result);
        
        Ok(())
    })?;
    
    context::shutdown()?;
    
    Ok(())
}

Examples

Matrix Multiplication

use gpu_compute::context;
use gpu_compute::error::GpuResult;
use gpu_compute::tensor::{Tensor, Shape};
use gpu_compute::ops::blas_level2_3;

fn main() -> GpuResult<()> {
    context::initialize()?;
    
    context::with_context(|ctx| {
        // Create matrices
        let a = Tensor::from_vec(
            vec![1.0, 2.0, 3.0, 4.0], 
            Shape::new(vec![2, 2]), 
            ctx.current_device()
        )?;
        
        let b = Tensor::from_vec(
            vec![5.0, 6.0, 7.0, 8.0], 
            Shape::new(vec![2, 2]), 
            ctx.current_device()
        )?;
        
        // Perform matrix multiplication
        let c = blas_level2_3::matmul(&a, &b, false, false)?;
        
        // Transfer result to host and print
        let host_result = c.to_host_vec()?;
        println!("Result: {:?}", host_result);
        
        Ok(())
    })?;
    
    context::shutdown()?;
    
    Ok(())
}

Building and Testing

# Build with CUDA support
cargo build --features cuda

# Build with ROCm support
cargo build --features rocm

# Run tests
cargo test --features cuda  # or --features rocm

# Run benchmarks
cargo bench --features cuda  # or --features rocm

Performance Profiling

use gpu_compute::context;
use gpu_compute::error::GpuResult;
use gpu_compute::debug::profiler;

fn main() -> GpuResult<()> {
    context::initialize()?;
    
    context::with_context(|ctx| {
        // Start profiling
        let mut profile_session = profiler::ProfileSession::new("my_operation")?;
        
        // ... your GPU operations here ...
        
        // End profiling and print results
        profile_session.end()?;
        profile_session.print_summary();
        
        Ok(())
    })?;
    
    context::shutdown()?;
    
    Ok(())
}

Current Limitations

This library has several significant limitations and known issues:

  • No Automatic Differentiation: Unlike PyTorch or TensorFlow, there is no autodiff support for deep learning
  • Limited Tensor Operations: Many common operations are still missing or incomplete
  • Performance Issues: Several operations have suboptimal implementations
  • Memory Management: Inefficient memory slicing and no caching allocators
  • No Multi-GPU Support: No infrastructure for distributed computation
  • Limited Documentation: API documentation is minimal
  • No Python Bindings: No easy integration with the Python ML ecosystem
  • Testing Coverage: Incomplete test coverage for many components

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Areas where help is especially needed:

  • Adding missing tensor operations
  • Improving performance of existing operations
  • Expanding test coverage
  • Enhancing documentation
  • Adding Python bindings

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages