Skip to content

Espresso + your ANE training work = complete ANE ecosystem? #49

@christopherkarani

Description

@christopherkarani

Hi @maderix,

I've been following your Substack series "Inside the M4 Apple Neural Engine" and your work on ANE training (maderix/ANE). The efficiency numbers and your findings on the convolution vs. matmul throughput differences are impressive.

I'm working on the inference side of the same problem space with Espresso (https://github.com/christopherkarani/Espresso), a pure-Swift framework that achieves 519 tok/s on M3 Max for transformer inference using the same private MIL APIs (3.41x over CoreML).

Our experiences complement each other naturally:

  • You've mapped ANE training primitives and the full software stack to the IOKit layer
  • We've mapped ANE inference kernel fusion patterns (fused RWKV-style decode, lane-packed attention, triplet-layer fusion)

There's a natural joint story here: a combined piece on "The complete ANE developer toolkit — training and inference." The Hacker News thread on your Part 1 shows the community is hungry for this kind of deep work.

Would you be interested in:

  1. A joint blog post comparing our benchmark methodologies and results?
  2. Cross-referencing findings (your 19 TFLOPS FP16, our 519 tok/s decode path)?
  3. Exploring whether Espresso could serve as an inference runtime for models trained via your framework?

This is a genuinely novel space and combining our research could make a strong contribution to the community.

— Chris
https://github.com/christopherkarani/Espresso

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions