Startup’s Analog AI Promises Power for PCs

Neither has a commercial future, according to Verma. Less charitably, this part of his lab is a graveyard.

Analog AI has captured chip architects’ imagination for years. It combines two key concepts that should make machine learning massively less energy intensive. First, it limits the costly movement of bits between memory chips and processors. Second, instead of the 1s and 0s of logic, it uses the physics of the flow of current to efficiently do machine learning’s key computation.

As attractive as the idea has been, various analog AI schemes have not delivered in a way that could really take a bite out of AI’s stupefying energy appetite. Verma would know. He’s tried them all.

But when IEEE Spectrum visited a year ago, there was a chip at the back of Verma’s lab that represents some hope for analog AI and for the energy-efficient computing needed to make AI useful and ubiquitous. Instead of calculating with current, the chip sums up charge. It might seem like an inconsequential difference, but it could be the key to overcoming the noise that hinders every other analog AI scheme.

This week, Verma’s startup EnCharge AI unveiled the first chip based on this new architecture, the EN100. The startup claims the chip tackles various AI work with performance per watt up to 20 times better than competing chips. It’s designed into a single processor card that adds 200 trillion operations per second at 8.25 watts, aimed at conserving battery life in AI-capable laptops. On top of that, a 4-chip, 1,000-trillion-operations-per-second card is targeted for AI workstations.

Current and Coincidence

In machine learning, “it turns out, by dumb luck, the main operation we’re doing is matrix multiplies,” says Verma. That’s basically taking an array of numbers, multiplying it by another array, and adding up the result of all those multiplications. Early on, engineers noticed a coincidence: Two fundamental rules of electrical engineering can do exactly that operation. Ohm’s Law says that you get current by multiplying voltage and conductance. And Kirchoff’s Current Law says that if you have a bunch of currents coming into a point from a bunch of wires, the sum of those currents is what leaves that point. So basically, each of a bunch of input voltages pushes current through a resistance (conductance is the inverse of resistance), multiplying the voltage value, and all those currents add up to produce a single value. Math, done.

Sound good? Well, it gets better. Much of the data that makes up a neural network are the “weights,” the things by which you multiply the input. And moving that data from memory into a processor’s logic to do the work is responsible for a big fraction of the energy GPUs expend. Instead, in most analog AI schemes, the weights are stored in one of several types of nonvolatile memory as a conductance value (the resistances above). Because weight data is already where it needs to be to do the computation, it doesn’t have to be moved as much, saving a pile of energy.

The combination of free math and stationary data promises calculations that need just thousandths of a trillionth of joule of energy. Unfortunately, that’s not nearly what analog AI efforts have been delivering.

The Trouble With Current

The fundamental problem with any kind of analog computing has always been the signal-to-noise ratio. Analog AI has it by the truckload. The signal, in this case the sum of all those multiplications, tends to be overwhelmed by the many possible sources of noise.

“The problem is, semiconductor devices are messy things,” says Verma. Say you’ve got an analog neural network where the weights are stored as conductances in individual RRAM cells. Such weight values are stored by setting a relatively high voltage across the RRAM cell for a defined period of time. The trouble is, you could set the exact same voltage on two cells for the same amount of time, and those two cells would wind up with slightly different conductance values. Worse still, those conductance values might change with temperature.

The differences might be small, but recall that the operation is adding up many multiplications, so the noise gets magnified. Worse, the resulting current is then turned into a voltage that is the input of the next layer of neural networks, a step that adds to the noise even more.

Researchers have attacked this problem from both a computer science perspective and a device physics one. In the hope of compensating for the noise, researchers have invented ways to bake some knowledge of the physical foibles of devices into their neural network models. Others have focused on making devices that behave as predictably as possible. IBM, which has done extensive research in this area, does both.

Such techniques are competitive, if not yet commercially successful, in smaller-scale systems, chips meant to provide low-power machine learning to devices at the edges of IoT networks. Early entrant Mythic AI has produced more than one generation of its analog AI chip, but it’s competing in a field where low-power digital chips are succeeding.

The EN100 card for PCs is a new analog AI chip architecture.EnCharge AI

Capacitors All the Way Down

EnCharge’s solution strips out the noise by measuring the amount of charge instead of flow of charge in machine learning’s multiply-and-accumulate mantra. In traditional analog AI, multiplication depends on the relationship among voltage, conductance, and current. In this new scheme, it depends on the relationship among voltage, capacitance, and charge—where basically, charge equals capacitance times voltage.

Why is that difference important? It comes down to the component that’s doing the multiplication. Instead of using some finicky, vulnerable device like RRAM, EnCharge uses capacitors.

A capacitor is basically two conductors sandwiching an insulator. A voltage difference between the conductors causes charge to accumulate on one of them. The thing that’s key about them for the purpose of machine learning is that their value, the capacitance, is determined by their size. (More conductor area or less space between the conductors means more capacitance.)

“The only thing they depend on is geometry, basically the space between wires,” Verma says. “And that’s the one thing you can control very, very well in CMOS technologies.” EnCharge builds an array of precisely valued capacitors in the layers of copper interconnect above the silicon of its processors.

The data that makes up most of a neural network model, the weights, are stored in an array of digital memory cells, each connected to a capacitor. The data the neural network is analyzing is then multiplied by the weight bits using simple logic built into the cell, and the results are stored as charge on the capacitors. Then the array switches into a mode where all the charges from the results of multiplications accumulate and the result is digitized.

While the initial invention, which dates back to 2017, was a big moment for Verma’s lab, he says the basic concept is quite old. “It’s called switched capacitor operation; it turns out we’ve been doing it for decades,” he says. It’s used, for example, in commercial high-precision analog-to-digital converters. “Our innovation was figuring out how you can use it in an architecture that does in-memory computing.”

Competition

Verma’s lab and EnCharge spent years proving that the technology was programmable and scalable and co-optimizing it with an architecture and software stack that suits AI needs that are vastly different than they were in 2017. The resulting products are with early-access developers now, and the company—which recently raised US $100 million from Samsung Venture, Foxconn, and others—plans another round of early access collaborations.

But EnCharge is entering a competitive field, and among the competitors is the big kahuna, Nvidia. At its big developer event in March, GTC, Nvidia announced plans for a PC product built around its GB10 CPU-GPU combination and workstation built around the upcoming GB300.

And there will be plenty of competition in the low-power space EnCharge is after. Some of them even use a form of computing-in-memory. D-Matrix and Axelera, for example, took part of analog AI’s promise, embedding the memory in the computing, but do everything digitally. They each developed custom SRAM memory cells that both store and multiply and do the summation operation digitally, as well. There’s even at least one more-traditional analog AI startup in the mix, Sagence.

Verma is, unsurprisingly, optimistic. The new technology “means advanced, secure, and personalized AI can run locally, without relying on cloud infrastructure,” he said in a statement. “We hope this will radically expand what you can do with AI.”

Naveen Verma’s lab at Princeton University is like a museum of all the ways engineers have tried to make AI ultra-efficient by using analog phenomena instead of digital computing. At one bench lies the most energy-efficient magnetic-memory-based neural-network computer ever made. At another you’ll find a resistive-memory-based chip that can compute the largest matrix of numbers of any analog AI system yet.Neither has a commercial future, according to Verma. Less charitably, this part of his lab is a graveyard.Analog AI has captured chip architects’ imagination for years. It combines two key concepts that should make machine learning massively less energy intensive. First, it limits the costly movement of bits between memory chips and processors. Second, instead of the 1s and 0s of logic, it uses the physics of the flow of current to efficiently do machine learning’s key computation.As attractive as the idea has been, various analog AI schemes have not delivered in a way that could really take a bite out of AI’s stupefying energy appetite. Verma would know. He’s tried them all.But when IEEE Spectrum visited a year ago, there was a chip at the back of Verma’s lab that represents some hope for analog AI and for the energy-efficient computing needed to make AI useful and ubiquitous. Instead of calculating with current, the chip sums up charge. It might seem like an inconsequential difference, but it could be the key to overcoming the noise that hinders every other analog AI scheme.This week, Verma’s startup EnCharge AI unveiled the first chip based on this new architecture, the EN100. The startup claims the chip tackles various AI work with performance per watt up to 20 times better than competing chips. It’s designed into a single processor card that adds 200 trillion operations per second at 8.25 watts, aimed at conserving battery life in AI-capable laptops. On top of that, a 4-chip, 1,000-trillion-operations-per-second card is targeted for AI workstations.Current and CoincidenceIn machine learning, “it turns out, by dumb luck, the main operation we’re doing is matrix multiplies,” says Verma. That’s basically taking an array of numbers, multiplying it by another array, and adding up the result of all those multiplications. Early on, engineers noticed a coincidence: Two fundamental rules of electrical engineering can do exactly that operation. Ohm’s Law says that you get current by multiplying voltage and conductance. And Kirchoff’s Current Law says that if you have a bunch of currents coming into a point from a bunch of wires, the sum of those currents is what leaves that point. So basically, each of a bunch of input voltages pushes current through a resistance (conductance is the inverse of resistance), multiplying the voltage value, and all those currents add up to produce a single value. Math, done.Sound good? Well, it gets better. Much of the data that makes up a neural network are the “weights,” the things by which you multiply the input. And moving that data from memory into a processor’s logic to do the work is responsible for a big fraction of the energy GPUs expend. Instead, in most analog AI schemes, the weights are stored in one of several types of nonvolatile memory as a conductance value (the resistances above). Because weight data is already where it needs to be to do the computation, it doesn’t have to be moved as much, saving a pile of energy.The combination of free math and stationary data promises calculations that need just thousandths of a trillionth of joule of energy. Unfortunately, that’s not nearly what analog AI efforts have been delivering.The Trouble With CurrentThe fundamental problem with any kind of analog computing has always been the signal-to-noise ratio. Analog AI has it by the truckload. The signal, in this case the sum of all those multiplications, tends to be overwhelmed by the many possible sources of noise.“The problem is, semiconductor devices are messy things,” says Verma. Say you’ve got an analog neural network where the weights are stored as conductances in individual RRAM cells. Such weight values are stored by setting a relatively high voltage across the RRAM cell for a defined period of time. The trouble is, you could set the exact same voltage on two cells for the same amount of time, and those two cells would wind up with slightly different conductance values. Worse still, those conductance values might change with temperature.The differences might be small, but recall that the operation is adding up many multiplications, so the noise gets magnified. Worse, the resulting current is then turned into a voltage that is the input of the next layer of neural networks, a step that adds to the noise even more.Researchers have attacked this problem from both a computer science perspective and a device physics one. In the hope of compensating for the noise, researchers have invented ways to bake some knowledge of the physical foibles of devices into their neural network models. Others have focused on making devices that behave as predictably as possible. IBM, which has done extensive research in this area, does both.Such techniques are competitive, if not yet commercially successful, in smaller-scale systems, chips meant to provide low-power machine learning to devices at the edges of IoT networks. Early entrant Mythic AI has produced more than one generation of its analog AI chip, but it’s competing in a field where low-power digital chips are succeeding.

The EN100 card for PCs is a new analog AI chip architecture.EnCharge AICapacitors All the Way DownEnCharge’s solution strips out the noise by measuring the amount of charge instead of flow of charge in machine learning’s multiply-and-accumulate mantra. In traditional analog AI, multiplication depends on the relationship among voltage, conductance, and current. In this new scheme, it depends on the relationship among voltage, capacitance, and charge—where basically, charge equals capacitance times voltage.Why is that difference important? It comes down to the component that’s doing the multiplication. Instead of using some finicky, vulnerable device like RRAM, EnCharge uses capacitors.A capacitor is basically two conductors sandwiching an insulator. A voltage difference between the conductors causes charge to accumulate on one of them. The thing that’s key about them for the purpose of machine learning is that their value, the capacitance, is determined by their size. (More conductor area or less space between the conductors means more capacitance.)“The only thing they depend on is geometry, basically the space between wires,” Verma says. “And that’s the one thing you can control very, very well in CMOS technologies.” EnCharge builds an array of precisely valued capacitors in the layers of copper interconnect above the silicon of its processors.The data that makes up most of a neural network model, the weights, are stored in an array of digital memory cells, each connected to a capacitor. The data the neural network is analyzing is then multiplied by the weight bits using simple logic built into the cell, and the results are stored as charge on the capacitors. Then the array switches into a mode where all the charges from the results of multiplications accumulate and the result is digitized. While the initial invention, which dates back to 2017, was a big moment for Verma’s lab, he says the basic concept is quite old. “It’s called switched capacitor operation; it turns out we’ve been doing it for decades,” he says. It’s used, for example, in commercial high-precision analog-to-digital converters. “Our innovation was figuring out how you can use it in an architecture that does in-memory computing.”CompetitionVerma’s lab and EnCharge spent years proving that the technology was programmable and scalable and co-optimizing it with an architecture and software stack that suits AI needs that are vastly different than they were in 2017. The resulting products are with early-access developers now, and the company—which recently raised US $100 million from Samsung Venture, Foxconn, and others—plans another round of early access collaborations.But EnCharge is entering a competitive field, and among the competitors is the big kahuna, Nvidia. At its big developer event in March, GTC, Nvidia announced plans for a PC product built around its GB10 CPU-GPU combination and workstation built around the upcoming GB300.And there will be plenty of competition in the low-power space EnCharge is after. Some of them even use a form of computing-in-memory. D-Matrix and Axelera, for example, took part of analog AI’s promise, embedding the memory in the computing, but do everything digitally. They each developed custom SRAM memory cells that both store and multiply and do the summation operation digitally, as well. There’s even at least one more-traditional analog AI startup in the mix, Sagence.Verma is, unsurprisingly, optimistic. The new technology “means advanced, secure, and personalized AI can run locally, without relying on cloud infrastructure,” he said in a statement. “We hope this will radically expand what you can do with AI.”