Combination of CDNA 3 and Zen 4 for MI300 Data Center APU in 2023

In addition to their Zen CPU architecture and RDNA client GPU architecture updates, AMD is also updating their roadmap for their CDNA server GPU architecture and related Instinct products this afternoon. And while CPUs and client GPUs are arguably on a pretty straightforward path over the next two years, AMD is planning to shake up its server GPU offerings in a big way.

Let’s start with AMD’s server GPU architectural roadmap first. Following on from AMD’s current CDNA 2 architecture, which is used in the MI200 series of Instinct Accelerators, CDNA 3 will be. And unlike AMD’s other roadmaps, the company doesn’t offer a two-year view here. Instead, the server GPU roadmap is only going out for one year – until 2023 – with AMD’s next server GPU architecture launching next year.

Our first look at CDNA 3 comes up with quite a bit of detail. With a 2023 launch, AMD isn’t withholding as much information as elsewhere. As a result, they reveal information about everything from the architecture to some basic information about one of the products that CDNA 3 is headed to: a data center APU made of CPU and GPU chiplets.

Looking at things from above, GPUs based on the CDNA 3 architecture will be built on a 5nm process. And like the CDNA2-based MI200 accelerators before it, they’ll rely on chiplets to combine memory, cache, and processor cores all into one package. Specifically, AMD calls this a “3D chiplet” design, meaning that not only chiplets are stacked on a substrate, but also some sort of chip is stacked on top of other chiplets, aka AMD’s V-Cache for Zen 3 CPUs.

That comparison is particularly apt here, as AMD is going to introduce its Infinity Cache technology into the CDNA 3 architecture. And, like the V-Cache example above, judging by AMD’s artwork, it looks like they’re going to stack the cache with the logic as separate dies, rather than integrating it into a monolithic die like their client GPUs. Because of this stacked character, the Infinity Cache chiplets for CDNA 3 will disappear below the processor chiplets, which AMD seemingly puts the very, very power-hungry logic chiplets at the top of the stack to effectively cool them.

cDNA 3 will also be AMD’s 4 . to usee generation Infinity Architecture. We’ll talk more about that in a separate article, but the short version is that for GPUs, IA4 goes hand-in-hand with AMD’s chiplet innovations. In particular, it will be able to use 2.5D/3D stacked chips with IA, allowing all chips in a package to share a unified and fully coherent memory subsystem. This is a major step beyond IA3 and current MI200 accelerators, which, while providing memory coherence, do not have a uniform memory address space. So, while MI200 accelerators essentially function as two GPUs on a single package, IA4 allows CDNA 3/MI300 accelerators to behave as a single chip, despite the disaggregated nature of the chiplets.

AMD’s diagrams also show that HBM memory is reused here. AMD doesn’t specify which version of HBM, but given the time frame of 2023, it’s a very safe bet that it will be HBM3.

Architecturally, AMD will also be taking several steps to improve the AI ​​performance of their powerful accelerator. According to the company, they are adding support for new mixed precision math formats. And while not explicitly stated today, AMD’s >5x improvement in performance per watt in AI workloads strongly implies AMD is significantly reworking and expanding their matrix cores for CNDA 3, as 5x is much more than what fantastic improvements can deliver on their own.

MI300: AMD’s First Disaggregated Data Center APU

But AMD doesn’t stop by just building a bigger GPU, nor do they unify the memory pool for a multi-chiplet architecture just to run GPUs from a shared memory pool. Instead, AMD’s ambitions are much bigger than that. With powerful CPU and GPU cores at their disposal, AMD is taking the next step in integration and building a disaggregated data center APU – a chip that combines CPU and GPU cores into one package.

The data center APU, which is currently codenamed MI300, is something AMD has been working on for a while. With MI200 and Infinity Architecture 3 allowing AMD CPUs and GPUs to work together with a coherent memory architecture, the next step for some time has been to bring the CPU and GPU further together, both in terms of packaging and memory architecture.

Especially for memory issues, a unified architecture offers MI300 a number of major advantages. From a performance standpoint, this improves things by eliminating redundant memory copies; processors no longer need to copy data to their own dedicated memory pool to access/modify that data. The unified memory pool also means that there is no need for a second pool of memory chips – in this case the DRAM that would normally be connected to the CPU.

MI300 will combine CDNA 3 GPU chiplets and Zen 4 CPU chiplets into one processor package. Both processor pools will in turn share the HBM memory on the package. And presumably also the Infinity Cache.

As mentioned before, AMD will be heavily relying on chiplets to achieve this. The CPU cores, GPU cores, Infinity Cache, and HBM are all different chiplets, some of which are stacked on top of each other. So this will be a chip unlike anything else AMD has built before, and it will be AMD’s biggest effort yet to integrate chiplets into their product designs.

Meanwhile, AMD is very explicit about their pursuit of market leadership in memory bandwidth and application latency. If AMD can pull it off, that would be a major achievement for the company. That said, they’re not the first company to tie HBM to CPU cores — Intel’s Sapphire Rapids Xeon CPU will claim that feat — so it’ll be interesting to see how MI300 plays out in that regard.

On the more specific issue of AI performance, AMD claims that the APU will provide better than 8x the training performance of the MI250X accelerator. Some further evidence is that AMD is going to make some big improvements to their GPU matrix cores over the MI200 series.

Overall, AMD’s server GPU trajectory is pretty similar to what we’ve seen Intel and NVIDIA announce in recent months. All three companies are building combined CPU+GPU products; NVIDIA with Grace Hopper (Grace + H100), Intel with Falcon Shores XPUs (mix & match CPU + GPU), and now MI300 using both CPU and GPU chiplets on a single package. In all three cases, these technologies aim to combine the best CPUs with the best GPUs for workloads that aren’t purely tied to either – and in AMD’s case, the company believes they have the best CPUs and the have the best GPUs. method.

Expect to see a lot more on CDNA 3 and MI300 in the coming months.