Question: could a flash-resident tiny language runtime motivate new low-RAM lookup kernels in esp-nn?

### Checklist

- [x] Checked the issue tracker for similar issues to ensure this is not a duplicate.
- [x] Provided a clear description of your suggestion.
- [x] Included any relevant context or examples.

### Issue or Suggestion Description

Hi Espressif team,                                                                                                                                   
                                                                                                                                                       
  I wanted to share a small ESP32-C3 language-runtime experiment and ask whether workloads like this are interesting from the esp-nn point of view.    
                                                                                                                                                       
  We built a public demo line called Engram and deployed it on a commodity ESP32-C3.                                                                   
                                                                                                                                                       
  Current public numbers:                                                                                                                              
                                                                                                                                                       
  * Host-side benchmark capability                                                                                                                     
    * `LogiQA = 0.392523`                                                                                                                              
    * `IFEval = 0.780037`                                                                                                                              
                                                                                                                                                       
  * Published board proof                                                                                                                              
    * `LogiQA 642 = 249 / 642 = 0.3878504672897196`                                                                                                    
    * `host_full_match = 642 / 642`                                                                                                                    
    * runtime artifact size = `1,380,771 bytes`                                                                                                        
                                                                                                                                                       
  Important scope note:                                                                                                                                
                                                                                                                                                       
  This is **not** presented as unrestricted open-input native LLM generation on MCU.                                                                   
                                                                                                                                                       
  The board-side path is closer to a flash-resident, table-driven runtime with:                                                                        
                                                                                                                                                       
  * packed token weights                                                                                                                               
  * hashed lookup structures                                                                                                                           
  * fixed compiled probe batches                                                                                                                       
  * streaming fold / checksum style execution over precompiled structures                                                                              
                                                                                                                                                       
  So this is not a standard dense neural-network kernel workload. It is much more lookup-heavy and traversal-heavy under tight RAM constraints.        
                                                                                                                                                       
  Repo:                                                                                                                                                
  https://github.com/Alpha-Guardian/Engram                                                                                                             
                                                                                                                                                       
  The question I’m interested in is whether systems like this suggest a useful class of low-level support below the usual dense operator path, for example:                                                                                                                                             
                                                                                                                                                       
  * low-RAM packed lookup traversal                                                                                                                    
  * flash-resident table access patterns                                                                                                               
  * lightweight reduction / fold kernels over compact structures                                                                                       
                                                                                                                                                       
  If the answer is “this is completely outside esp-nn scope”, that is also useful for us to know.           

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: could a flash-resident tiny language runtime motivate new low-RAM lookup kernels in esp-nn? #24

Checklist

Issue or Suggestion Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: could a flash-resident tiny language runtime motivate new low-RAM lookup kernels in esp-nn? #24

Description

Checklist

Issue or Suggestion Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions