Checklist
Issue or Suggestion Description
Hi,
When deploying a TensorFlow Lite model using ESP-TFLite-Micro on ESP32-S3, I observed significantly lower inference speed. Upon comparing with a fully connected operator implementation in a test application, the test application runs 5x faster than the ESP-TFLite-Micro version. What could be causing this performance gap?
same idf env and same sdkconfig used in test_app and tflite-micro
esp-tflite-micro

test_app in esp_nn
