India's Centre for Development of Advanced Computing (C-DAC) recently announced that it was working on a series of ARM-based CPUs including the flagship AUM chip. Now, the company has revealed the first details of its AUM CPU which will be aimed at the HPC segment.

India Readies C-DAC AUM, A Dual-Chiplet CPU Housing 96 ARM Cores, 96 GB HBM3, 128 PCIe Gen 5 Lanes & 320W TDP.

C-DAC said that there are working on a multiple range of options for domestic applications that will scale from chips that power smart devices, IoT, AR/VR up to HPC and data center use. Its Vega CPU series which is based on dual and quad-core designs will target entry-level clients that require low-power and low-cost chips and will cover at least 10% of India's chip requirement. The company will also prep its octa-core chips within the next three years as a follow-up to Dhruv and Dhanush Plus chips.

But that's not all, the company is also working on a power-efficient HPC chip that will be aiming the large-scale workloads as a part of the National Supercomputing Mission (NSM) program. This chip is going to be called the C-DAC AUM.

The C-DAC AUM CPU is based on the ARM Neoverse V1 core architecture codenamed Zeus. There are a total of 96 cores on the AUM chip but there are divided into two chiplets, each housing 48 V1 cores. Each chiplet has its own memory, I/O, C2C/D2D interconnect, cache, security, and MSCP sub-systems. The two A48Z-based chiplets are connected together using a D2D chiplet interconnect on the same interposer. Each chip also carries 96 MB of L2 cache and 96 MB of system cache.

For memory, the C-DAC AUM CPU uses 64 GB HBM3-5600 while also packing 96 GB HBM3 memory on-die and 8-channel DDR5-5200 memory (scalable up to 16-channels for up to 332.8 GB/s of total bandwidth).

That's a triple-memory subsystem with on-die, inter-poser, and off-chip memory solutions. The CPU will carry 64/128 PCIe Gen 5 lanes with support for CXL and run on a platform that can two of these chips. The CPU will be fabricated on the TSMC 5nm process node. Clock speeds are said to range between 3.0 - 3.5 GHz. A CPU-only node featuring the C-DAC AUM will deliver up to 10 TFLOPs performance per node, 4.6+ TFLOPs per socket, and a dual-socket server design can support up to 4 industry standard GPU accelerators.

C-DAC will also be preparing a set of HPC System Software and development tools for leveraging the full potential of its hardware. The company expects to achieve 64 PetaFlops of compute power within the country by the end of 2024. The AUM chip is expected to hit shelves by 2023-2024.