The Microsoft Azure Maia AI Accelerator is the first designed by Microsoft for training large language models and inference in the Microsoft Cloud. (Image by Microsoft).
- Microsoft introduces two new chips designed to support AI infrastructure.
- The Microsoft Azure Maia AI Accelerator will be optimized for AI tasks and generative AI.
- The Microsoft Azure Cobalt CPU will be an Arm-based processor that is tailored to run general-purpose computing workloads.
Due to the demand for AI workloads, Microsoft has taken matters into its own hands. As companies want better AI infrastructure to support the development of AI use cases, the need to deliver that infrastructure has become a challenge for tech companies around the world.
More AI use cases simply mean the need for more computing power. And the need for more computing power means the need for more data centers and chips to handle these workloads. But now the problem is, are there enough chips that can do it all?
While the lack of chips is usually seen as a reason for difficulty and stalling in the progress of artificial intelligence, there is also the increasing cost of chips, as well as the challenge of making sure everything can work together with minimal complexity. This includes ensuring that cooling systems can support the amount of heat generated from data centers – which, with increased chip complexity, is no longer a certainty.
For Microsoft, AI will be the key to direction of the company in the future, especially in areas where he plans to develop solutions for customers. As such, Microsoft unveiled two of its own custom chips and integrated systems at its Ignite event. The Microsoft Azure Maia AI Accelerator will be optimized for artificial intelligence and generative AI tasks, while the Microsoft Azure Cobalt CPU will be an Arm-based processor tailored to run general-purpose computing workloads on the Microsoft Cloud.

The Microsoft Azure Maia AI Accelerator is the first chip designed by Microsoft for training large language models and inference in the Microsoft Cloud.
“Cobalt is the first CPU we’ve designed specifically for the Microsoft Cloud, and this 64-bit, 128-core ARM-based chip is the fastest of any cloud provider. It already runs parts of Microsoft Teams and Azure Communication Services, as well as Azure SQL. And next year we will make this available to users,” said Satya Nadella, CEO of Microsoft in his keynote address at the event.
“Starting with the launch of the Maia 100 design cloud AI workloads like LLM Training and Reasoning, this chip is manufactured using a five-nanometer process and has 105 billion transistors, making it one of the largest chips that can be made with current technology. And it goes beyond the chip, because we designed the Maiu 100 as an end-to-end rack for AI,” added Nadella.
It is expected to be unveiled early next year Microsoft data centers, the chips will initially run the company’s services, such as Microsoft Copilot or Azure OpenAI Service. They will then join an ever-expanding range of products from industry partners to help meet the growing demand for efficient, scalable and sustainable computing power and the needs of customers looking to take advantage of the latest breakthroughs in cloud and artificial intelligence.

The Microsoft Azure Cobalt CPU is the first chip that Microsoft has developed for the Microsoft Cloud. (Image by Microsoft)
Microsoft Azure Maia AI Accelerator and Microsoft Azure Cobalt CPU
As the new Maia 100 AI Accelerator is expected to power some of the largest in-house AI workloads running on Microsoft Azure, it made sense for OpenAI to also provide development feedback.
According to Sam Altman, the company’s CEO OpenAI, the company worked together with Microsoft to design and test the new chip with its models. For Altman, Azure’s end-to-end AI architecture, now optimized in silicon with Maya, paves the way for training more capable models and makes those models cheaper for users.
Looking at the hardware stack, Brian Harry, a Microsoft technical associate who leads the Azure Maia team, explained that vertical integration, which is the alignment of chip design with a larger AI infrastructure designed with Microsoft workloads in mind, can deliver big gains in performance and efficiency.
Meanwhile, Wes McCullough, corporate vice president of hardware product development at Microsoft, noted that the Cobalt 100 CPU is built on Arm architecture, a type of energy-efficient chip design, and optimized to deliver greater efficiency and performance in cloud-native offerings. McCullough added that the choice of Arm technology is a key element in Microsoft’s sustainability goal. It aims to optimize performance per watt across its data centers, which essentially means getting more computing power for every unit of energy consumed.

A custom-built rack for the Maia 100 AI Accelerator and “sidekick” that cools the chips at Microsoft’s lab in Redmond, Washington (Image by Microsoft).
Partnership with Nvidia
In addition to new chips, Microsoft is also continuing to build its AI infrastructure in close collaboration with other silicon vendors and industry leaders, such as Nvidia and AMD. Azure is working closely with Nvidia to leverage Nvidia H100 Tensor Core (GPU) GPU-based virtual machines for medium to large AI workloads, including the Azure Confidential VM.
The NC H100 v5 series of virtual machines (VMs), now available for public preview, is the latest addition to Microsoft’s portfolio of dedicated high-performance computing (HPC) infrastructure. and AI workloads. The new Azure NC H100 v5 series is powered by Nvidia Hopper generation H100 NVL 94GB PCIe Tensor Core GPUs and 4th generation AMD EPYC Genoa processors, which provide powerful performance and flexibility for a wide range of AI and HPC applications.

Chairman and CEO Satya Nadella and Nvidia founder Chairman and CEO Jensen Huang at Microsoft Ignite 2023 (Image by Microsoft).
Azure NC H100 v5 VMs are designed to accelerate a wide range of AI and HPC workloads, including:
- Medium-range model training and generative inference: unlike the massively scalable ND-series powered by the same Nvidia Hopper technology, the NC-series is optimized for training and inference of AI models that require smaller data sizes and fewer GPU parallelisms. These include generative AI models such as DALL-E, which creates original images based on text queries, as well as traditional discriminative AI models such as image classification, object detection, and natural language processing focused on predictive accuracy rather than generating new data. .
- Traditional HPC modeling and simulation workloads: Azure NC H100 v5 VMs are also an ideal platform for running a variety of HPC workloads that require high compute, memory and GPU acceleration. These include scientific workloads such as computational fluid dynamics (CFD), molecular dynamics, quantum chemistry, weather forecasting and climate modeling, and financial analytics.
Nvidia also introduced an AI casting service to power the development and tuning of custom generative AI applications for enterprises and startups that are deployed on Microsoft Azure.
The Nvidia AI foundry service brings together three elements – the Nvidia AI core model collection, the Nvidia NeMo framework and tools, and the Nvidia DGX Cloud AI supercomputing service – that give enterprises an end-to-end solution for creating custom generative AI models. Companies can then deploy their custom models with Nvidia AI Enterprise software to power generative AI applications, including intelligent search, summarization and content generation.
“Enterprises need custom models to perform specialized skills trained on their company’s proprietary DNA – their data,” said Jensen Huang, founder and CEO of Nvidia. “Nvidia’s AI casting service combines our generative AI model technologies, LLM training expertise and a massive AI factory. We’ve built this into Microsoft Azure so that businesses around the world can connect their custom model with Microsoft’s world-leading cloud services.”