Learning Library

← Back to Library

In-Memory Computing for Energy-Efficient AI

Key Points

  • AI powers everyday services like speech‑to‑text and chatbots, but the data movement between memory and CPU consumes a large share of the energy used by these systems.
  • Training massive deep‑learning models (e.g., large language models) can emit as much carbon as five cars and may take weeks in cloud clusters, highlighting the urgency for more energy‑efficient compute.
  • AI progress is categorized as narrow, broad, and general; as we move toward broader and more complex models, the demand for faster, greener hardware will only increase.
  • In‑memory computing proposes to fuse memory and processing, eliminating the costly back‑and‑forth data transfers that dominate runtime and power usage in traditional architectures.
  • By treating memory arrays as networks of resistive elements that can perform calculations directly, in‑memory computing offers a promising route to boost speed while dramatically lowering AI’s energy footprint.

Full Transcript

# In-Memory Computing for Energy-Efficient AI **Source:** [https://www.youtube.com/watch?v=BTnr8z-ePR4](https://www.youtube.com/watch?v=BTnr8z-ePR4) **Duration:** 00:09:57 ## Summary - AI powers everyday services like speech‑to‑text and chatbots, but the data movement between memory and CPU consumes a large share of the energy used by these systems. - Training massive deep‑learning models (e.g., large language models) can emit as much carbon as five cars and may take weeks in cloud clusters, highlighting the urgency for more energy‑efficient compute. - AI progress is categorized as narrow, broad, and general; as we move toward broader and more complex models, the demand for faster, greener hardware will only increase. - In‑memory computing proposes to fuse memory and processing, eliminating the costly back‑and‑forth data transfers that dominate runtime and power usage in traditional architectures. - By treating memory arrays as networks of resistive elements that can perform calculations directly, in‑memory computing offers a promising route to boost speed while dramatically lowering AI’s energy footprint. ## Sections - [00:00:00](https://www.youtube.com/watch?v=BTnr8z-ePR4&t=0s) **In-Memory Computing for Green AI** - Nicole explains that everyday AI services consume large amounts of energy—primarily due to data transfers between CPU and memory—and presents in‑memory computing as a promising approach to make AI more energy‑efficient. - [00:03:13](https://www.youtube.com/watch?v=BTnr8z-ePR4&t=193s) **In-Memory Computing for Efficient AI** - The speaker explains how integrating memory and compute via resistive crossbar arrays can eliminate data movement, thereby increasing speed and energy efficiency for larger, more complex AI models. - [00:06:22](https://www.youtube.com/watch?v=BTnr8z-ePR4&t=382s) **Mapping Neural Networks onto Crossbar Arrays** - The passage explains how a neural‑network layer is realized in hardware by programming the crossbar’s conductance matrix to match layer weights, encoding inputs as voltage vectors, and reading column currents to perform the required matrix‑vector multiplication before applying the activation function. - [00:09:36](https://www.youtube.com/watch?v=BTnr8z-ePR4&t=576s) **Contribute to Energy‑Efficient AI** - The presenter encourages viewers to help create more energy‑efficient AI by participating, subscribing, liking, and accessing the open‑source analog AI hardware toolkit via the provided links. ## Full Transcript
0:00How many times a day do you use AI? 0:03You may be surprised to find 0:04that AI powers many of the tech services you use throughout your day. 0:08Any time you use speech to text on your phone 0:11or you use a chat bot for customer service, you're using AI. 0:16Behind the scenes, these existing technologies are consuming lots of energy. 0:22One very exciting field has emerged to try and make AI more energy efficient, 0:27and that's in-memory computing. 0:29But you may be wondering why is energy efficient AI desirable? 0:34My name is Nicole Saulnier, 0:36and I'm a researcher with IBM working on in-memory computing. 0:41In a traditional computer there are two main blocks. 0:48A memory, and a CPU or Central Processing Unit. 0:55These are connected together by a bus. 1:00And data is transferred back and forth 1:02to execute instructions and perform computations. 1:06As transistors have continued to scale, 1:09the CPU has become faster and more energy efficient. 1:12This has increased the importance of the limitations of the speed and energy 1:21that's used or consumed during the transfer of data back and forth 1:27between the memory and the CPU. 1:29In data intensive computation such as deep learning, 1:33the data communication is actually dominating the model runtimes and the energy consumption. 1:40To put that into perspective, 1:43we can think about some commonly used models today. 1:47To train just one very large natural language processing model, 1:55we actually are consuming about the same amount of energy as the equivalent carbon footprint of five cars. 2:06And even in a cloud environment where many computers are working together to solve the same problem, 2:12it can take over one week to train. 2:20To appreciate these energy and time constraints, 2:23we have to look at the field of AI and look at the trends. 2:28We can categorize AI by dividing it into three different categories. 2:36In narrow AI, we're able to solve a single task with superhuman accuracy. 2:46In broad AI, we're performing multiple tasks within the same domain. 2:53Things like diagnosing a patient with cancer and providing a treatment plan for them. 2:59And then in general AI, we're working across domain, 3:07applying learning from one area to another area with ease and often without any supervision. 3:13Today we're here somewhere between narrow and broad AI, 3:18and we know if we want to move further to the right, 3:22the size and complexity of our models are going to be increasing. 3:25This is going to drive a need for innovation 3:29and for more energy efficient AI compute. 3:32And that's where in-memory computing comes in. 3:35Instead of spending all this time, transferring our data back and forth, 3:40what if we could design a system that eliminated this data movement 3:45and actually perform the functions of both the memory and the CPU? 3:53Then we could potentially have an increase in our speed and our energy performance. 4:01In order to think about this, 4:04it helps to break it down and to start thinking about what types of computations a memory could perform. 4:11One can think of a memory as an array of resistive elements, 4:15and each of these resistive elements can be programed to some conductance value "G", 4:20where G is just going to be the inverse of our resistance. 4:24If we have a simple crossbar of two metal wires 4:29and we put one of our resistive elements between them, 4:33this can be programed to conductance G1. 4:36We can apply some voltage V1 across it, 4:39and then we can calculate the current I1 flowing through our device. 4:44And this will just be I1 is equal to v1 times G1. 4:50And this is just dictated by Ohm's Law. 4:56Now, if we extend our array and we add a second row of devices, 5:03the current through this device can be expressed as I2, 5:07and we can calculate the current coming out at the bottom of our column 5:12as I is equal to I1 plus I2. 5:18And this is just Kirchhoff's Law, which has performed an accumulation operation for us. 5:33So, we're able to perform different operations: a multiplication with ohms law, 5:37and an addition with Kirchhoff law by using these types of devices. 5:43And we now take our simple one column and expand it out into an array. 5:52We can put an element at each cross point 5:56between the various metal wires, 6:03and we can represent these as different conductance values. 6:07Each can be programmed to a different value. 6:13And we can represent this as a matrix G, 6:19and that consists of all of our different elements. 6:27We can then apply different voltages to each row of our array, 6:32and we can represent that as a vector V of our input voltages. 6:42Then our currents coming out the bottom of our columns 6:48can be represented by the resultant vector I, 6:53which is equal to our voltage vector times our conductance matrix G. 7:00And this is just a matrix vector multiplication or an MVM, 7:06and this is super convenient because it turns out that in AI inference workloads 7:11around 60 to 90 percent of our operations are going to be these MVM operations. 7:21So we have that basic building block available to us. 7:25How do we actually map our neural network onto our hardware? 7:29Well, a layer of a neural network consists of many output neurons. 7:34And each of those output neurons, for instance N, 7:40it's going to be driven by a set of input neurons through a set of weights. 7:56And if we have our input to our neural network layer as X, 8:02we can express the output from the layer mathematically as this equation. 8:20Now we have to map this equation onto our memory array. 8:25The first thing we can do is take all of our conductances 8:28and program them such that our conductance matrix G is equal to our weights of our layer. 8:35And then we can encode the inputs to our neural network layer X as a vector of input voltages V. 8:45And finally, we can collect the currents coming out of the columns of each column of the array 8:51and apply our activation function F to our current. 8:56And that is going to give us the output from our neural network layer Y. 9:01And this way, we can use these concepts to map our neural network onto our memory array 9:07to perform analog in-memory computing for more energy efficient AI. 9:13There are a lot of details that go into the design, the build, 9:17and the usage of these analog in-memory computing chips. 9:22You can join us and check out our AI hardware toolkit 9:27to learn more about different neural networks and simulate those, 9:31and you can also explore various memory elements, which we have included. 9:36And the best part is you can contribute 9:39and join us to help make AI more energy efficient. 9:44Thanks for watching. 9:45If you like the video, don't forget to like and subscribe to the channel 9:49and also check out the links below for access to our open source analog AI hardware toolkit.