fc operator different performance in esp_nn test_app and esp-tflite-micro

### Checklist

- [x] Checked the issue tracker for similar issues to ensure this is not a duplicate.
- [x] Provided a clear description of your suggestion.
- [x] Included any relevant context or examples.

### Issue or Suggestion Description

Hi,
When deploying a TensorFlow Lite model using ESP-TFLite-Micro on ESP32-S3, I observed significantly lower inference speed. Upon comparing with a fully connected operator implementation in a test application, the test application runs 5x faster than the ESP-TFLite-Micro version. What could be causing this performance gap?

same idf env and same sdkconfig used in test_app and tflite-micro

esp-tflite-micro
![Image](https://github.com/user-attachments/assets/6b64245e-ce86-4c68-991f-ff404e79acfa)

test_app in esp_nn
![Image](https://github.com/user-attachments/assets/5364bd25-7157-44e7-aa3d-9b98b2420200)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fc operator different performance in esp_nn test_app and esp-tflite-micro #20

Checklist

Issue or Suggestion Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fc operator different performance in esp_nn test_app and esp-tflite-micro #20

Description

Checklist

Issue or Suggestion Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions