Nvidia introduces its AI foundry service on Microsoft Azure, featuring the latest Nemotron-3 8B models.

Nvidia is strengthening its collaborative approach with Microsoft. During the Ignite conference, hosted by the Satya Nadella-led tech giant, Nvidia unveiled an AI foundry service aimed at assisting both enterprises and startups in building custom AI applications on the Azure cloud. These applications can leverage enterprise data through retrieval augmented generation (RAG) technology.

Jensen Huang, Nvidia’s founder and CEO, highlighted, “Nvidia’s AI foundry service combines our generative AI model technologies, LLM training expertise, and a massive AI factory. We built this service on Microsoft Azure, enabling enterprises worldwide to seamlessly integrate their custom models with Microsoft’s top-tier cloud services.”

In addition to this, Nvidia also introduced new 8-billion parameter models, which are part of the foundry service. They also announced their plan to incorporate their next-gen GPU into Microsoft Azure in the coming months.

So, how will the AI foundry service benefit Azure users? With Nvidia’s AI foundry service on Azure, cloud-based enterprises will gain access to all the essential components needed to create custom, business-focused generative AI applications in one place. This comprehensive offering includes Nvidia’s AI foundation models, the NeMo framework, and the Nvidia DGX cloud supercomputing service.

Manuvir Das, the VP of enterprise computing at Nvidia, emphasized, “For the first time, this entire process, from hardware to software, is available end to end on Microsoft Azure. Any customer can come and execute the entire enterprise generative AI workflow with Nvidia on Azure. They can procure the necessary technology components right within Azure. Simply put, it’s a collaborative effort between Nvidia and Microsoft.”

To provide enterprises with a wide range of foundation models for use with the foundry service in Azure environments, Nvidia is introducing a new family of Nemotron-3 8B models. These models support the creation of advanced enterprise chat and Q&A applications for sectors like healthcare, telecommunications, and financial services. They come with multilingual capabilities and will be accessible through the Azure AI model catalog, Hugging Face, and the Nvidia NGC catalog.

Among the other foundation models available in the Nvidia catalog are Llama 2 (also coming to the Azure AI catalog), Stable Diffusion XL, and Mistral 7b.

Once users have chosen their preferred model, they can move on to the training and deployment stage for custom applications using Nvidia DGX Cloud and AI Enterprise software, both of which are available through the Azure marketplace. DGX Cloud provides customers with scalable instances and includes the AI Enterprise toolkit, featuring the NeMo framework and Nvidia Triton Inference Server, enhancing Azure’s enterprise-grade AI service for faster LLM customization.

Nvidia noted that this toolkit is also available as a separate product on the marketplace, allowing users to utilize their existing Microsoft Azure Consumption Commitment credits to expedite model development.

Notably, Nvidia recently announced a similar partnership with Oracle, offering eligible enterprises the option to purchase these tools directly from the Oracle Cloud marketplace for training models and deployment on the Oracle Cloud Infrastructure (OCI).

Currently, early users of the foundry service on Azure include major software companies like SAP, Amdocs, and Getty Images. They are testing and building custom AI applications targeting various use cases.

Beyond the generative AI service, Microsoft and Nvidia have expanded their partnership to include the chipmaker’s latest hardware offerings. Microsoft unveiled new NC H100 v5 virtual machines for Azure, the first cloud instances in the industry featuring a pair of PCIe-based H100 GPUs connected via Nvidia NVLink. These machines provide nearly four petaflops of AI compute power and 188GB of faster HBM3 memory.

The Nvidia H100 NVL GPU offers up to 12 times higher performance on GPT-3 175B compared to the previous generation, making it suitable for inference and mainstream training workloads.

Furthermore, Microsoft plans to add the new Nvidia H200 Tensor Core GPU to its Azure fleet in the upcoming year. This GPU offers 141GB of HBM3e memory (1.8 times more than its predecessor) and 4.8 TB/s of peak memory bandwidth (a 1.4 times increase). It is designed for handling large AI workloads, including generative AI training and inference, and provides Azure users with multiple options for AI workloads alongside Microsoft’s new Maia 100 AI accelerator.

To accelerate LLM work on Windows devices, Nvidia announced several updates, including an update for TensorRT LLM for Windows. This update introduces support for new large language models like Mistral 7B and Nemotron-3 8B and delivers five times faster inference performance. These improvements make running these models smoother on desktops and laptops equipped with GeForce RTX 30 Series and 40 Series GPUs with at least 8GB of RAM.

Nvidia also mentioned that TensorRT-LLM for Windows will be compatible with OpenAI’s Chat API through a new wrapper, enabling hundreds of developer projects and applications to run locally on a Windows 11 PC with RTX, rather than relying on cloud-based infrastructure.

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top