Skip to content

Performance drop for small to medium sizes on f32 when using .process_with_scratch() #169

@Shnatsel

Description

@Shnatsel

I've been benchmarking RustFFT with criterion to compare against my own FFT implementation. To my surprise, I've found that RustFFT performs better for small to medium sizes of f32 (up to 2097152 elements) when using .process() as opposed to .process_with_scratch(). This effect does not hold for f64, which is always faster with .process_with_scratch().

I can reproduce this on Zen 4 CPU on Linux but not on Apple M4 with Mac OS, where .process_with_scratch() is neutral or beneficial even for f32.

The exact code used for the measurements can be found in QuState/PhastFT#81

The command to run benchmarks is cargo bench --bench=bench RustFFT; you can run it before and after the PR linked above to reproduce the measurements.

I find this very surprising considering that the implementation of .process() is just this:

RustFFT/src/lib.rs

Lines 195 to 198 in 4758ab0

fn process(&self, buffer: &mut [Complex<T>]) {
let mut scratch = vec![Complex::zero(); self.get_inplace_scratch_len()];
self.process_with_scratch(buffer, &mut scratch);
}

I'm not sure what could be done about it, so feel free to close this. But it's an interesting enough and dramatic enough anomaly that I figured I should let you know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions