UCX main features

High-level API features

  • Select either a client/server connection establishment (similar to TCP), or connect directly by passing remote address blob.

  • Support sharing resources between threads, or allocating dedicated resources per thread.

  • Event-driven or polling-driven progress.

  • Java and Python bindings.

  • Seamless handling of GPU memory.

Main APIs

  • Stream-oriented send/receive operations.

  • Tag-matched send/receive.

  • Remote memory access.

  • Remote atomic operations.

Fabrics support

  • RoCE

  • InfiniBand

  • TCP sockets

  • Shared memory (CMA, knem, xpmem, SysV, mmap)

  • Cray Gemini / Aries (ugni)

Platforms support

  • Supported architectures: x86_64, Arm v8, Power.

  • Runs on virtual machines (using SRIOV) and containers (docker, singularity).

  • Can utilize either MLNX_OFED or Inbox RDMA drivers.

  • Tested on major Linux distributions (RedHat/Ubuntu/SLES).

GPU support

  • Cuda (for NVIDIA GPUs)

  • ROCm (for AMD GPUs)

Protocols, Optimizations and Advanced Features

  • Automatic selection of best transports and devices.

  • Zero-copy with registration cache.

  • Scalable flow control algorithms.

  • Optimized memory pools.

  • Accelerated direct-verbs transport for Mellanox devices.

  • Pipeline protocols for GPU memory

  • QoS and traffic isolation for RDMA transports

  • Platform (micro-architecture) specific optimizations (such as memcpy, memory barriers, etc.)

  • Multi-rail and RoCE link aggregation group support

  • Bare-metal, containers and cloud environments support

  • Advanced protocols for transfer messages of different sizes