Compute-in-Memory (CIM) and Analog AI Chips: How Mythic, EnCharge AI, Rain AI, and Memristor Crossbars Are Eliminating the Von Neumann Bottleneck for Edge Inference in 2026
- Internet Pros Team
- May 14, 2026
- AI & Technology
There is a strange and somewhat heretical idea quietly reshaping the edge of the AI hardware industry in 2026: that the cleanest way to make neural networks faster and dramatically more efficient is to stop moving data altogether. For seventy years, every computer ever built has obeyed the von Neumann architecture — fetch a number from memory, ship it across a bus to a separate processor, compute, ship the result back. AI workloads break that model. A transformer running on a GPU spends more than 90 percent of its energy budget shuttling weights between HBM and the math units, not on the multiplications themselves. A handful of startups — Mythic, EnCharge AI, Rain AI, TetraMem, Mentium — and research giants IBM, HPE, and Samsung have spent the last decade building chips that do the opposite. They run the matrix multiplication inside the memory cells, using analog physics rather than digital arithmetic. The technology is called compute-in-memory (CIM), and in 2026 it is finally crossing from research curiosity to shipping silicon — powering hearing aids, smart cameras, drones, and the first generation of always-on edge AI devices that can run a 1-billion-parameter model on a coin-cell battery.
The Von Neumann Bottleneck, Explained in One Picture
Imagine a chef in a kitchen where the pantry is on the other side of town. Every time a recipe calls for an ingredient, the chef has to drive across the city, pick up one item, and drive back. The cooking itself takes milliseconds; the round trips take hours. That is what a modern AI accelerator does — only the "pantry" is DRAM and the "kitchen" is a tensor core, and the trip is roughly a thousand times more expensive in energy than the actual multiplication. Compute-in-memory flips the model: it moves the chef into the pantry. Weights live permanently inside the memory array, and the array itself performs the multiply-accumulate operation using the laws of physics — Ohm's law for multiplication (voltage times conductance equals current) and Kirchhoff's law for summation (currents on a shared wire add naturally). A 256x256 memristor crossbar can complete a full matrix-vector multiplication in a single analog step, with no data movement and almost no energy.
"We stopped asking how many TOPS we could squeeze out of a digital pipeline and started asking how few electrons we could move per inference. The answer turned out to be almost none — if you let the memory do the math."
The Four Memory Technologies Behind Analog AI in 2026
Compute-in-memory is not a single technology — it is a family of techniques that share one idea (compute happens in the memory array) but differ wildly in the underlying device. Four memory types dominate the 2026 landscape, each with its own physics, manufacturability story, and target workload.
Analog Flash (NOR)
The most mature option. Mythic uses 40-nm embedded NOR flash cells as analog weights, storing 8-bit values per cell. Manufacturable in standard CMOS fabs today — no exotic materials required. The trade-off: relatively slow programming and limited write endurance, making it ideal for fixed inference workloads.
ReRAM / Memristors
Resistive RAM cells whose conductance can be tuned across many analog levels. TetraMem, Weebit Nano, Crossbar Inc., and HPE's memristor work all target ReRAM crossbars. GlobalFoundries' 22FDX with embedded ReRAM is the first volume foundry node, and ReRAM is the leading candidate for in-situ on-chip learning.
Phase-Change Memory (PCM)
IBM Research Almaden has built multiple generations of PCM-based analog accelerators, culminating in the 2023 HERMES chip and the 2025 NorthPole digital sibling. PCM offers excellent retention and multi-level storage but has historically struggled with drift; 2026 silicon mitigates drift with on-chip compensation circuits.
Switched-Capacitor / SRAM CIM
EnCharge AI's approach: instead of exotic memory cells, use precisely matched capacitors in a standard CMOS process to do charge-domain multiply-accumulate. Less energy efficient than memristors in theory, but trivially manufacturable today — and the foundation of EnCharge's EN100 client-side AI accelerator.
Who Is Building Analog AI Chips in 2026
| Company | Technology | Where They Win |
|---|---|---|
| Mythic | Analog NOR-flash matrix processor (Mythic M2000 AMP) | The first analog AI startup to ship in production volume. Targets security cameras, drones, AR/VR, and industrial vision with up to 25 TOPS in under 3 watts — a power envelope no digital GPU can touch. |
| EnCharge AI | Switched-capacitor charge-domain CIM (EN100) | Princeton spinout focused on client-side and laptop AI. The EN100 PCIe card promises 200 TOPS at 8.25 watts — putting frontier-class inference on a workstation without a discrete GPU. |
| Rain AI | Memristor-based neuromorphic compute | Backed by Sam Altman and OpenAI, Rain is targeting end-to-end analog training and inference. The bet: that analog can scale not just edge inference, but data-center-class training — a contrarian and high-stakes position. |
| IBM Research | PCM analog cores + digital NorthPole | The deepest publication record in analog AI and the most credible enterprise channel. IBM's HERMES chip demonstrated GPT-class workloads on PCM; NorthPole shows what a digital "near-memory" architecture can do in production. |
| TetraMem & Mentium | ReRAM crossbars on standard CMOS | UC-affiliated startups commercializing the foundational HP-Labs memristor research. Targeting embedded AI, defense sensing, and aerospace where radiation tolerance and sub-watt budgets matter more than raw TOPS. |
| Samsung & SK Hynix | HBM-PIM and AiM near-memory compute | Not pure analog CIM, but the most realistic path to in-memory compute inside the data center — putting simple compute units inside HBM stacks themselves to handle attention and embedding-lookup workloads. |
Why 2026 Is the Inflection Year
Analog AI has been "two years away" for a decade. What changed in 2026 is a convergence of three trends. First, edge inference demand exploded: every modern device — earbuds, hearing aids, doorbell cameras, AR glasses, factory sensors — now needs to run a small neural network locally, on a budget measured in milliwatts. A 30-watt digital GPU is not in the conversation. Second, quantization research caught up with analog noise: techniques like quantization-aware training, noise injection during training, and low-precision integer formats made it possible to deploy a model on a noisy analog substrate without destroying accuracy. Third, standard foundries finally offer the exotic memory: GlobalFoundries' 22FDX with ReRAM, TSMC's embedded MRAM, and Samsung's eFlash CIM offerings give analog AI startups a tape-out path that does not require building their own fab.
The numbers are striking. Mythic's M2000 delivers roughly 25 TOPS per watt on real vision workloads — about 10x what a comparable digital edge accelerator achieves. IBM's HERMES prototype hit 12.4 TOPS per watt on transformer attention. EnCharge claims 30+ TOPS per watt for client LLM inference. Compare that to an NVIDIA Jetson Orin at roughly 1 TOPS per watt of measured efficiency on real workloads, and the gap is obvious — for the right workloads, analog is not 2x better. It is an order of magnitude.
The Real Constraint: Software
The remaining bottleneck is not silicon. It is the toolchain. CUDA took NVIDIA twenty years to build, and every analog AI startup is essentially trying to recreate it from scratch — quantization-aware compilers, weight-mapping algorithms that partition a model across multiple crossbar tiles, drift-compensation routines that recalibrate periodically, and PyTorch / ONNX / ExecuTorch front-ends that hide all of it from the developer. The companies that win the analog AI race in 2026 will not be the ones with the best memristor; they will be the ones whose Python SDK lets a developer take a Hugging Face model, quantize it, map it onto the chip, and ship it to a fleet of edge devices in an afternoon. Mythic, EnCharge, and Syntiant have all rebuilt their software stacks twice in the last three years for exactly this reason.
What Compute-in-Memory Means for Edge AI Buyers in 2026
- Always-on inference becomes free. Keyword spotting, gesture recognition, anomaly detection, and predictive maintenance can now run continuously on a coin-cell battery — workloads that simply did not exist at digital power budgets.
- Small LLMs fit on wearables. Llama 3.2 1B and Phi-3 Mini are deployable on analog edge chips with sub-watt power envelopes, opening the door to AR glasses and earbuds that run a private language model without a phone tether.
- Accuracy stops being free. Analog noise, drift, and limited bit-precision mean some models simply will not run on analog hardware without quantization-aware retraining — a real engineering cost.
- Workload fit matters more than peak TOPS. Analog wins on dense matrix multiplication with fixed weights; dynamic workloads with frequent weight updates are still better served by digital. Pick the chip to match the task.
- The software stack is the moat. Treat any analog AI vendor as a software company first. If their model mapping toolchain is immature, the silicon underneath will not save you.
Where Analog AI Goes Next
The 2026 conversation is dominated by edge inference, but the 2028 conversation will be different. Rain AI, IBM, and the hyperscaler research labs are all investigating analog training — using the same crossbar arrays not just for inference but for the gradient updates that train a network from scratch. If it works, it would collapse the energy cost of training by another order of magnitude, and reshape the economics of model development. Separately, photonic compute-in-memory — using optical resonators instead of resistors — is in late research at MIT, Princeton, and several stealth startups, and could combine the bandwidth advantages of silicon photonics with the energy advantages of analog computation. Both are speculative. Both have credible paths.
For now, compute-in-memory is doing something more practical: making it possible to put real machine intelligence on devices that were previously too small, too battery-constrained, or too cost-sensitive to run any AI at all. The future of edge intelligence in 2026 is not a smaller GPU. It is a memory array that does math — and it is already shipping.