Zero copy serialization using RMA in the HPX distributed task-based runtime

Document Type

Conference Proceeding

Publication Date

1-1-2017

Abstract

Increasing layers of abstraction between user code and the hardware on which it runs can lead to reduced performance unless careful attention is paid to how data is transferred between the layers of the software stack. For distributed HPC applications, those layers include the network interface where data must be passed from the user's code and explicitly copied from one node to another. Message passing incurs relatively high latencies (compared to local copies) from the transmission of data across the network, as well as from the injection and retrieval of messages into and out of the network drivers that may be exacerbated by unwanted copies of data being made at different levels of the stack. As memory bandwidth is becoming one of the limiting factors in scalability of codes within a node, and latencies of messaging between nodes, it is important to reduce both memory transfers and latencies wherever possible. In this paper we show how the distributed asynchronous task-based runtime, HPX, has been developed to allow zero-copy transfers of data between arguments in user defined remote function invocations and demonstrate the performance of our network layer in a state-of-the-art astrophysics code.

Publication Source (Journal or Book title)

Proceedings of the International Conference on WWW/Internet 2017 and Applied Computing 2017

First Page

151

Last Page

158

This document is currently not available here.

Share

COinS