Not a big deal, since the first iteration of KISS_FFT benchmark works, but 2nd and 3rd iteration report
KISS FFT: N = 1024 in *** NOT ENOUGH TEMP MEMORY ***
Is there a memory leak?
FYI, I ran the benchmark on Teensy 3.2/3.5/3.6 and confirmed your results. Also ran it on dragonfly (STM32L4 @80mhz) and mbed K64F @120mhz, speed up to T3.2@96mhz for float 128:
dragonfly 7.7
mbed k64 12.97
the mbed k64 is faster than the teensy 3.5 probably because mbed uses ARM gcc with -O3