Talks > 04/06/2025 Wolfgang De Salvador & Paul Edwards

Running AI training at scale: bridging the gap between traditional HPC infrastructure and cloud-native solutions

AI training workloads require infrastructure that mirrors the demands seen in traditional high-performance computing (HPC) over decades: high performance computing nodes, high-speed and low-latency interconnects, and fast, low-latency access to a unified storage namespace.

As a result, AI training tasks are still today largely performed using orchestration, computing, and storage infrastructure paradigms similar to those found in traditional supercomputing environments.

Cloud computing brings out of the box the possibility in the AI training domain to execute these workflows with the standard set of tools, schedulers and methodologies consolidated in on-premises system.

Simultaneously, it offers opportunities to modernize and enhance these workflows, leading to more efficient resource utilization across all infrastructure layers, including compute and storage.

This can be achieved through technologies such as containerization, PaaS services and object storage.

In this session, the audience will receive an overview of transitioning AI workloads from a traditional HPC infrastructure to a cloud-native approach using a phased methodology.

The discussion will highlight the benefits involved in running AI workloads in the cloud, suggesting how, through a non-disruptive approach it is possible to add incremental value and modernization elements to the classical training workflows.

Download PDF


Related Talks

Visit our forum

One of the main goals of this project is to motivate new initiatives and collaborations in the HPC field. Visit our forum to share your knowledge and discuss with other HPC experts!

About us

HPCKP (High-Performance Computing Knowledge Portal) is an Open Knowledge project focused on technology transfer and knowledge sharing in the HPC, AI and Quantum Science fields.

Promo HPCNow
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.