-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
featureslists new features requiredlists new features required
Milestone
Description
Overview
Optimize CPU kernel implementations in the csrc/cpu directory to utilize SIMD instructions (AVX2) and multi-threading (OpenMP) for improved performance in tensor operations. This significantly accelerates inference workloads on CPU, making the library more efficient for production deployments.
Operations to Optimize
Binary Operations (Priority 1)
-
add_ops,add_scalar_ops -
sub_ops,sub_scalar_ops -
mul_ops,mul_scalar_ops -
div_ops,div_scalar_ops -
pow_array_ops,pow_scalar_ops - Broadcasted operations:
add/sub/mul/div_broadcasted_array_ops
Helper Operations (Priority 1)
-
ones_array_ops,zeros_array_ops -
fill_array_ops -
linspace_array_ops -
arange_array_ops(already efficient, low priority)
Future Operations (Priority 2)
- Unary ops:
sqrt,exp,log,sin,cos,tan - Reduction ops:
sum,mean,min,max,var,std,clip,clamp - Shape ops:
transpose,reshape,smaller,greater,flatten
Core Operations (Priority 3)
- core file
- dtype & assignments
- contiguous ops on arrays
Implementation Strategy
Adaptive Thresholds
#define SIMD_THRESHOLD 64
#define OMP_THRESHOLD 4096template:
operation(float* a, float* b, float* out, size_t size) {
#if USE_AVX2
if (size >= SIMD_THRESHOLD) {
const size_t simd_size = size & ~7UL;
#ifdef _OPENMP
if (size >= OMP_THRESHOLD) {
#pragma omp parallel for schedule(static)
for (size_t i = 0; i < simd_size; i += 8) {
}
} else
#endif
{
for (size_t i = 0; i < simd_size; i += 8) {
}
}
for (size_t i = simd_size; i < size; i++) {
}
return;
}
#endif
#ifdef _OPENMP
if (size >= OMP_THRESHOLD) {
#pragma omp parallel for schedule(static)
for (size_t i = 0; i < size; i++) {
}
return;
}
#endif
for (size_t i = 0; i < size; i++) {
}
}| Operation | AVX2 Intrinsic |
|---|---|
| Load | _mm256_loadu_ps |
| Store | _mm256_storeu_ps |
| Set scalar | _mm256_set1_ps |
| Add | _mm256_add_ps |
| Sub | _mm256_sub_ps |
| Mul | _mm256_mul_ps |
| Div | _mm256_div_ps |
Metadata
Metadata
Assignees
Labels
featureslists new features requiredlists new features required
Type
Projects
Status
Todo