‘For many AI applications, GPUs are compute overkill, consuming much more power and money than needed’: How Ampere Computing plans to ride the AI wave

Ampere Computing is a startup that’s making waves in the tech industry by challenging the dominance of tech giants like AMD, Nvidia, and Intel. With the rise of AI, the demand for computing power has skyrocketed, along with energy costs and demand on power grids. Ampere aims to address this with a low-power, high-performance solution.

Despite being the underdog, Ampere’s offering has been adopted by nearly all major hyperscalers worldwide. It has broken through the scaling wall multiple times with its CPUs, and the company plans to continue scaling in ways that legacy architectures can’t. We spoke to Ampere CPO Jeff Wittich about his company’s success and future plans.

I feel sometimes that challenger startups, like Ampere Computing, are stuck between a rock and a hard place. On one side, you’ve got multi billion dollar companies like AMD, Nvidia and Intel and on the other hand, hyperscalers like Microsoft, Google and Amazon that have their own offerings. How does it feel to be the little mammal in the land of dinosaurs?

It’s really an exciting time for Ampere. We may only be six years old, but as we predicted when we started the company, the need for a new compute solution for the cloud has never been stronger. The industry doesn’t need more dinosaurs – it needs something new.

The needs of the cloud have changed. The amount of computing power needed for today’s connected world is far greater than anyone could have ever imagined and will only grow more with the rise of AI. Simultaneously, energy costs have skyrocketed, demand on the world’s power grids is outpacing the supply, and new data center builds are being halted for a number of reasons. The convergence of these factors has created the perfect opportunity for Ampere to provide a much-needed low-power, high-performance solution that hasn’t been delivered by big, legacy players.

Because of our ability to provide this, we have grown rapidly and have been adopted by nearly all of the big hyperscalers around the world. We are also seeing increased adoption in the enterprise, as companies look to get the most out of their existing data center footprint. The increased demand we continue to see for Ampere products makes us confident that the industry sees our value.

Ampere has been the leader in high core count in the server CPU market for a few years. However others – AMD and Intel – have been catching up; given the immutable laws of physics, when do you foresee hitting a wall when it comes to physical cores and how do you plan to smash through it?

As you mentioned, Ampere has been the leader in high core count, dense, and efficient compute for the past few years. Early on, we identified where the key challenges would emerge for cloud growth, and we’re addressing those exact challenges today with our Ampere CPUs. Our Ampere CPUs are perfect for cloud use cases of all kinds and across a wide range of workloads.

We’ve now broken through the scaling wall several times now, being the first to 128 cores and now 192 cores. Innovation like this requires a new approach that breaks legacy constraints. Ampere’s new approach to CPU design, from the micro-architecture to the feature set, will allow us to continue scaling in ways that legacy architectures cannot.

Another credible threat that’s looming on the horizon is the rise of RISC-V with China putting its weight behind the micro architecture. What are your own personal views on that front? Could Ampere join team RISC one day?

Ampere’s core strategy is to develop sustainable processors to fuel compute both today and into the future. We will build our CPUs using the best available technologies to deliver leadership performance, efficiency, and scalability, as long as those technologies can be easily used by our customers to run the desired operating systems, infrastructure software, and user applications.

What can you tell us about the followup to Ampere One? Will it follow the same trajectory as Altra > One? More cores? Same frequency, more L2 cache per core? Will it be called Ampere 2 and still be single threaded?

Over the next few years, we’ll continue to focus on releasing CPUs that are more efficient and that deliver higher core counts, as well as more memory bandwidth and IO capabilities. This will give us more and more throughput for increasingly important workloads like AI inferencing, while uniquely meeting the sustainability goals of cloud providers and users.

Our products will also continue to focus on delivering predictable performance to cloud users, eliminating noisy neighbor issues, and allowing providers to run Ampere CPUs at high utilization. We will introduce additional features that provide greater degrees of flexibility for cloud providers to meet the diverse set of customer applications. These are critical for Cloud Native workload performance now and in the future.

Given the focused approach of Ampere Computing, can you give us a brief description of what your average customer and what sort of workloads do they usually handle?

Because our CPUs are general purpose, they serve a broad spectrum of applications. We built our CPUs from the ground up as Cloud Native Processors, so they perform really well across nearly all cloud workloads – AI inference, web services, databases, and video processing are just a few examples. In many cases, we can deliver 2x the performance for these workloads at half the power of legacy x86 processors.

in terms of customers, we are working with nearly all of the big hyperscalers across the U.S., Europe and China. In the U.S., for example, you can find Ampere instances at Oracle Cloud, Google Cloud, Microsoft Azure and more. Ampere CPUs are also available across Europe at various cloud providers.

Beyond the big cloud providers, we are seeing a lot of traction in the enterprise through our offerings with OEMs like HPE and Supermicro. This is largely due to the increased efficiency and rack density these companies can achieve by deploying Ampere servers. Enterprises want to save power and do not want to build additional data centers that are not core to their business.

With the rise of AI, once “simple” devices are becoming increasingly intelligent, leading to greater demand for cloud computing in super-local areas. These edge deployments have stringent space and power requirements, and because of Ampere’s ability to provide such a high number of cores in a low power envelope, we see a lot of demand for these workloads as well.

AI has become the main topic of conversation this year in the semiconductor industry and beyond. Will this change in 2024, in your opinion? How do you view this market?

We strongly believe AI will continue to be the main topic of conversation. But we do think the conversation will shift – and it’s already beginning to.

In 2024, many companies working on AI solutions will move from the initial training of neural networks to deploying them, also known as AI inference. Because AI inference can require 10 times more aggregate computing power than training, the ability to deploy AI at scale will become increasingly important. Achieving this required scale will be limited by performance, cost and availability, so organizations will look for alternatives to GPUs as they enter this next phase. CPUs, and particularly low-power, high performance CPUs like Ampere provides, will become an increasingly attractive choice given their ability to enable more efficient and cost-effective execution of AI inference models. GPUs will still be important for certain aspects of AI, but we expect to see the hype begin to settle down.

Secondly, sustainability and energy efficiency will become even more important next year in the context of AI. Today, data centers often struggle to cover their energy requirements. Increased AI usage will lead to even more demand for computing power in 2024, and for some AI workloads, that can require up to 20x more power. Because of this, sustainability and efficiency will become challenges for expansion. Data center operators will heavily prioritize efficiency in the new year to avoid jeopardizing growth.

How is Ampere addressing this new AI market opportunity with its products?

For many AI applications, GPUs are compute overkill, consuming much more power and money than needed. This is especially true for most inferencing, especially when running AI workloads in conjunction with other workloads like databases or web services. In these instances, replacing the GPU with a CPU saves power, space, and cost.

We are already seeing this coming to life for real-world workloads, and the benefit of using Ampere processors is strong. For example, if you run the popular generative AI model Whisper on our 128-core Altra CPU versus Nvidia’s A10 GPU card, we consume 3.6 times less power per inference. Compared to Nvidia Tesla T4 cards, we consume 5.6 times less.

Because of this, we have been seeing a substantial increase in demand for Ampere processors for AI inferencing, and we expect this to become a huge market for our products. Just a few weeks ago, Scaleway – one of Europe’s leading cloud providers – announced the upcoming general availability of new AI inference instances powered by Ampere. Additionally, over the last six months, we have seen a sevenfold usage increase in our AI software library. All of this speaks to the growing adoption of our products as a high-performance, low-power alternative for AI inferencing.