Implement CUDA tensor operations

## Title: Implement CUDA tensor operations

### Description:
We need atleast the following tensor operations using CUDA. Rest of them we can implement later. 

### Tasks:
- [ ] Implement `cpu_to_cuda` to copy tensor data from CPU to CUDA.
- [ ] Implement `cuda_to_cpu` to copy tensor data from CUDA to CPU.
- [ ] Implement `free_cuda` to free CUDA memory allocated for tensor data.

### Kernel and Host Functions:
- [ ] Implement `add_tensor_cuda_kernel` for elementwise tensor addition.
- [ ] Implement `add_broadcasted_tensor_cuda_kernel` for elementwise tensor addition with broadcasting.
- [ ] Implement `sub_broadcasted_tensor_cuda_kernel` for elementwise tensor subtraction with broadcasting.
- [ ] Implement `sum_tensor_cuda_kernel` for summing elements in a tensor.
- [ ] Implement `max_tensor_cuda_kernel` for finding the maximum value in a tensor.
- [ ] Implement `min_tensor_cuda_kernel` for finding the minimum value in a tensor.
- [ ] Implement `sub_tensor_cuda_kernel` for elementwise tensor subtraction.
- [ ] Implement `elementwise_mul_tensor_cuda_kernel` for elementwise tensor multiplication.
- [ ] Implement `scalar_mul_tensor_cuda_kernel` for scalar multiplication of a tensor.
- [ ] Implement `scalar_div_tensor_cuda_kernel` for scalar division of a tensor.
- [ ] Implement `tensor_div_scalar_cuda_kernel` for division of tensor by scalar.
- [ ] Implement `tensor_div_tensor_cuda_kernel` for elementwise tensor division.
- [ ] Implement `matmul_tensor_cuda_kernel` for matrix multiplication of two tensors.
- [ ] Implement `batched_matmul_tensor_cuda_kernel` for batched matrix multiplication.
- [ ] Implement `broadcasted_batched_matmul_tensor_cuda_kernel` for batched matrix multiplication with broadcasting.
- [ ] Implement `tensor_pow_scalar_cuda_kernel` for raising tensor elements to a scalar power.
- [ ] Implement `scalar_pow_tensor_cuda_kernel` for raising tensor elements to a scalar base power.
- [ ] Implement `log_tensor_cuda_kernel` for computing the logarithm of each element in a tensor.
- [ ] Implement `equal_tensor_cuda_kernel` for checking elementwise equality between two tensors.
- [ ] Implement `equal_broadcasted_tensor_cuda_kernel` for checking elementwise equality between two tensors with broadcasting.
- [ ] Implement `ones_like_tensor_cuda_kernel` for creating a tensor of ones with the same shape as the input tensor.
- [ ] Implement `zeros_like_tensor_cuda_kernel` for creating a tensor of zeros with the same shape as the input tensor.
- [ ] Implement `transpose_1D_tensor_cuda_kernel` for transposing a 1D tensor.
- [ ] Implement `transpose_2D_tensor_cuda_kernel` for transposing a 2D tensor.
- [ ] Implement `transpose_3D_tensor_cuda_kernel` for transposing a 3D tensor.
- [ ] Implement `assign_tensor_cuda_kernel` for assigning data to a tensor.
- [ ] Implement `make_contiguous_tensor_cuda_kernel` for making a tensor contiguous in memory.
- [ ] Implement `sin_tensor_cuda_kernel` for applying the sine function elementwise to a tensor.
- [ ] Implement `cos_tensor_cuda_kernel` for applying the cosine function elementwise to a tensor.
- [ ] Implement `sigmoid_tensor_cuda_kernel` for applying the sigmoid function elementwise to a tensor.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement CUDA tensor operations #7

Title: Implement CUDA tensor operations

Description:

Tasks:

Kernel and Host Functions:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement CUDA tensor operations #7

Description

Title: Implement CUDA tensor operations

Description:

Tasks:

Kernel and Host Functions:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions