Espresso + your ANE training work = complete ANE ecosystem?

Hi @maderix,

I've been following your Substack series "Inside the M4 Apple Neural Engine" and your work on ANE training (maderix/ANE). The efficiency numbers and your findings on the convolution vs. matmul throughput differences are impressive.

I'm working on the inference side of the same problem space with **Espresso** (https://github.com/christopherkarani/Espresso), a pure-Swift framework that achieves 519 tok/s on M3 Max for transformer inference using the same private MIL APIs (3.41x over CoreML).

Our experiences complement each other naturally:
- You've mapped ANE training primitives and the full software stack to the IOKit layer
- We've mapped ANE inference kernel fusion patterns (fused RWKV-style decode, lane-packed attention, triplet-layer fusion)

There's a natural joint story here: a combined piece on "The complete ANE developer toolkit — training and inference." The Hacker News thread on your Part 1 shows the community is hungry for this kind of deep work.

Would you be interested in:
1. A joint blog post comparing our benchmark methodologies and results?
2. Cross-referencing findings (your 19 TFLOPS FP16, our 519 tok/s decode path)?
3. Exploring whether Espresso could serve as an inference runtime for models trained via your framework?

This is a genuinely novel space and combining our research could make a strong contribution to the community.

— Chris
https://github.com/christopherkarani/Espresso

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Espresso + your ANE training work = complete ANE ecosystem? #49

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Espresso + your ANE training work = complete ANE ecosystem? #49

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions