User Tools

Site Tools


unite_python_mpi_4_py

This is an old revision of the document!


MPI4py Ping-Pong Example for Cluster

This example shows a NumPy-based ping-pong benchmark using the mpi4py library on a cluster. Two MPI ranks exchange a NumPy array for several message sizes and measure round-trip time, one-way latency, and effective bandwidth.

Python Script (pingpong_mpi4py.py)

pingpong_mpi4py.py
#!/usr/bin/env python3
from mpi4py import MPI
import numpy as np
import sys
 
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
 
if size != 2:
    if rank == 0:
        print("Need exactly 2 processes!")
    sys.exit(1)
 
partner = 1 - rank
 
nrounds = 100
msg_sizes = [1, 8, 64, 512, 1024, 4096, 16384, 65536, 262144]
 
for nbytes in msg_sizes:
    nelems = max(1, nbytes // np.dtype(np.uint8).itemsize)
 
    sendbuf = np.zeros(nelems, dtype=np.uint8)
    recvbuf = np.empty(nelems, dtype=np.uint8)
 
    comm.Barrier()
 
    t0 = MPI.Wtime()
 
    for i in range(nrounds):
        if rank == 0:
            comm.Send(sendbuf, dest=partner, tag=100)
            comm.Recv(recvbuf, source=partner, tag=200)
        else:
            comm.Recv(recvbuf, source=partner, tag=100)
            comm.Send(recvbuf, dest=partner, tag=200)
 
    t1 = MPI.Wtime()
 
    if rank == 0:
        total_time = t1 - t0
        avg_rtt = total_time / nrounds
        latency_us = (avg_rtt / 2.0) * 1.0e6
        bandwidth_mb_s = nbytes / latency_us
 
        print(
            f"size={nbytes:8d} bytes | "
            f"RTT={avg_rtt*1.0e6:10.2f} us | "
            f"latency={latency_us:10.2f} us | "
            f"bandwidth={bandwidth_mb_s:10.2f} MB/s"
        )
 
if rank == 0:
    print("Ping-pong benchmark completed.")

Slurm Job Script (slurm_pingpong.job)

slurm_pingpong.job
#!/bin/bash
#SBATCH --job-name=pingpong_mpi4py
#SBATCH --partition=unite
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:01:00
#SBATCH --output=pingpong_%j.out
 
module purge
#Load necessary modules
module add unite/python/3.14/mpi4py
 
#Start the example
mpirun -np $SLURM_NTASKS python3 ./pingpong_mpi4py.py

Run

sbatch slurm_pingpong.job

Example output

Loading unite/python/3.14/mpi4py
  Loading requirement: unite/python/3.14/python-3.14.0 unite/mpi/4.1
 
size=       1 bytes | RTT=     10.08 us | latency=      5.04 us | bandwidth=      0.20 MB/s
size=       8 bytes | RTT=      3.80 us | latency=      1.90 us | bandwidth=      4.21 MB/s
size=      64 bytes | RTT=      4.49 us | latency=      2.24 us | bandwidth=     28.54 MB/s
size=     512 bytes | RTT=     27.33 us | latency=     13.66 us | bandwidth=     37.47 MB/s
size=    1024 bytes | RTT=      6.46 us | latency=      3.23 us | bandwidth=    317.16 MB/s
size=    4096 bytes | RTT=      8.90 us | latency=      4.45 us | bandwidth=    920.86 MB/s
size=   16384 bytes | RTT=     15.53 us | latency=      7.77 us | bandwidth=   2109.61 MB/s
size=   65536 bytes | RTT=     29.86 us | latency=     14.93 us | bandwidth=   4390.07 MB/s
size=  262144 bytes | RTT=     72.79 us | latency=     36.40 us | bandwidth=   7202.47 MB/s
Ping-pong benchmark completed.
unite_python_mpi_4_py.1775817614.txt.gz · Last modified: 2026/04/10 13:40 by nshegunov

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki