Performance drop for small to medium sizes on `f32` when using `.process_with_scratch()`

I've been benchmarking RustFFT with criterion to compare against my own FFT implementation. To my surprise, I've found that RustFFT performs better for small to medium sizes of `f32` (up to 2097152 elements) when using `.process()` as opposed to  `.process_with_scratch()`. This effect does not hold for `f64`, which is always faster with `.process_with_scratch()`.

I can reproduce this on Zen 4 CPU on Linux but not on Apple M4 with Mac OS, where `.process_with_scratch()` is neutral or beneficial even for f32.

The exact code used for the measurements can be found in https://github.com/QuState/PhastFT/pull/81

The command to run benchmarks is `cargo bench --bench=bench RustFFT`; you can run it before and after the PR linked above to reproduce the measurements.

I find this very surprising considering that the implementation of `.process()` is just this:

https://github.com/ejmahler/RustFFT/blob/4758ab0dd6f256c50ac8987c75c9cb96152dc2ca/src/lib.rs#L195-L198

I'm not sure what could be done about it, so feel free to close this. But it's an interesting enough and dramatic enough anomaly that I figured I should let you know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance drop for small to medium sizes on `f32` when using `.process_with_scratch()` #169

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	fn process(&self, buffer: &mut [Complex<T>]) {
	let mut scratch = vec![Complex::zero(); self.get_inplace_scratch_len()];
	self.process_with_scratch(buffer, &mut scratch);
	}

Performance drop for small to medium sizes on f32 when using .process_with_scratch() #169

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Performance drop for small to medium sizes on `f32` when using `.process_with_scratch()` #169