Introducing momo-kiji: CUDA for Apple Neural Engine
We're building the open-source SDK for Apple Neural Engine the way CUDA revolutionized GPU computing.
Apple Neural Engine has been around since 2017. It powers:
- • Every Mac since M1
- • Every modern iPhone and iPad
- • Every Apple TV
Yet there's still no official SDK to use it.
Developers who want to harness ANE fall back to CoreML (a black box), reverse-engineer private APIs (undocumented and risky), or hope their models just work (they don't, consistently).
We're fixing that. Today, we're introducing momo-kiji.
The Problem
Imagine if NVIDIA released GPUs with no CUDA SDK. Developers would be frustrated, performance would be mysterious, and most of the GPU's potential would go untapped.
That's the ANE situation today.
The numbers are staggering:
- • ANE is 10-100x more efficient than GPUs for inference
- • Billions of Apple devices have ANE
- • But developers have no standard way to use it
Why This Matters
- Efficiency: ANE can run inference models that would drain a GPU in seconds
- Privacy: Models run locally, never leaving the device
- Accessibility: Bring advanced ML to consumer hardware at scale
- Economics: Inference is becoming a billion-dollar opportunity for on-device ML
The Current State
Today, developing for ANE is like the wild west:
- • CoreML gives you limited control (it's a black box)
- • Power users reverse-engineer private APIs (against the terms of service)
- • Researchers publish findings scattered across papers and GitHub repos
- • Performance varies wildly (20x difference, no one knows why)
This is unacceptable.
Introducing momo-kiji
momo-kiji is an open-source framework that brings clarity, control, and community to ANE development.
We're building what Apple hasn't: a unified SDK for Apple Neural Engine, inspired by how CUDA revolutionized GPU computing.
What We're Building
High-Level API
that's familiar to ML developers
Open Intermediate Representation
for ANE with documented standards
Compiler Framework
with transparency and optimization passes
Debugging & Profiling Tools
built-in for understanding performance
Why Now
- Latest Research: The Orion paper reveals ANE internals in unprecedented detail
- Community Hunger: Developers want better tools and clear documentation
- Hardware Evolution: M5 with distributed accelerators changes the game
- Market Timing: On-device AI is exploding
The Dream
In 5 years, developers can write ANE kernels in Python, profile with standard tools, and achieve 10x speedup.
Just like CUDA users do today.
Getting Involved
This is Phase 1 (research & documentation). We're publishing everything — specs, design proposals, and roadmaps.
Bob Reilly created momo-kiji to democratize Apple Neural Engine development.