GPUs are widely used to accelerate scientific applications, but their adoption in HPC clusters presents several drawbacks.First, in addition to increasing acquisition costs, using accelerators also increments maintenance and space costs. Second, energy consumption is also increased. Third, GPUs in a cluster may present a low utilization rate. In consequence, virtualizing the GPUs of the cluster is an appealing strategy to simultaneously dealing with all these drawbacks. Additionally, cluster throughput is increased whereas costs and energy consumption are reduced.
In this talk the remote GPU virtualization technique will be presented as well as its benefits. The talk will also introduce one of the frameworks that implement this virtualization mechanism: the rCUDA middleware. By using the rCUDA framework over a high-performance interconnect such as InfiniBand, the overhead of remote GPU virtualization is reduced to negligible values, with the net result that local and remote GPUs present similar performance. The rCUDA framework will be used as a case study to show that the remote GPU virtualization mechanism provides many benefits to clusters, such as doubling cluster throughput (in jobs/hour), reducing overall energy consumption by more than 40%, creating a flexible way of providing GPUs to virtual machines in a cloud computing facility, providing a large number of GPUs to a single-node application, etc.