waverefa.blogg.se - Opencl benchmark ucsd

#Opencl benchmark ucsd Offline#

We propose a pipelined architecture for integrating 842-based, transparent on-the-fly I/O link compression into scale-out GPU workflows.

With this article, we make the following main contributions: The improved ease of use of CloudCL has come at the price of limited scalability for data-intensive workloads, as the overhead of inter-node data transfers has diminished the benefits of additional scale-out compute resources. 15 Essentially, CloudCL has extended the Aparapi framework with a dynamic, distributed job model and relies on the dOpenCL API forwarding library for scaling-out job partials across various compute nodes. In the second project, we have proposed CloudCL as a single-paradigm programming model for making scale-out GPU computing accessible to a wider audience. 14 However, a practical evaluation based on scale-out GPU workloads has still been pending thus far. 13 To provide all components necessary for a practical evaluation of 842-based on-the-fly I/O link compression, we have then implemented GPU-based decompression facilities for the 842 algorithm. Even though preceding approaches have successfully employed on-the-fly I/O link compression using workload-specific lightweight compression techniques, 11, 12 we are not aware of any successful approaches that have employed heavy-weight compression techniques on commercially available off-the-shelf systems in order improve data transfer efficiency across a wider range of workloads. 10 Being a heavy-weight compression algorithm, 842 has been designed with main memory compression in mind and delivers decent compression ratios for various payloads. In the first project, we have argued on a theoretical level that on-the-fly I/O link compression based on the 842 compression algorithm might be employed to improve data transfer efficiency in heterogeneous computer architectures. To test this hypothesis, this article brings together two research projects that we have investigated independently of each other in the past. 9 Based on these observations, we hypothesize that on-the-fly I/O link compression can be used to improve data transfer efficiency and consequently overall performance of data-intensive scale-out GPU workloads, as illustrated in Figure 1.Ĭompared to uncompressed data transfers (left), on-the-fly I/O link compression may increase the effective bandwidth (right) 7, 8 Even software-based compression facilities have gained notable levels of compression throughput. More recently, however, hardware-accelerated compression techniques are becoming available in an increasing number of computer architectures.

#Opencl benchmark ucsd Offline#

5, 6 To work around the issue of insufficient compression throughput, preceding investigations have proposed the use of offline I/O link compression, where the payload for data transfers is available in a pre-compressed form. Preceding efforts of the research community have identified compression as a viable method for improving data transfer efficiency for certain application domains. 4 The increased latencies and limited bandwidths of such infrastructures aggravate the performance penalty of data movement for scale-out GPU workloads. 3 In the vast majority of other data centers, NIC port speeds of 10, 25, or 40 Gbps are also still the norm. Unfortunately, even though latest network technologies have reached speeds of 100 Gbps and more, these high-end inter-node links are mostly employed by hyperscalers internally, as many of their IaaS offerings are limited to network bandwidths ranging between 10 and 25 Gbps. 1 For many data-intensive applications, scaling out to multiple nodes is the most feasible strategy to satisfy their resource demands. In heterogeneous compute architectures, the overhead of transferring data (e.g., between host and graphics processing unit (GPU) memory) can still have a major impact on the overall performance, even when the latest state-of-the-art interconnection technologies are used such as NVLink-2 on the intra-node level 1, 2 and InfiniBand EDR on the inter-node level.

In the age of ever-increasing data volumes, the overhead of data transfers is a major inhibitor of further performance improvements on many levels.