unite_python_mpi_4_py
This is an old revision of the document!
MPI4py Ping-Pong Example for Cluster
This example shows a NumPy-based ping-pong benchmark using the mpi4py library on a cluster. Two MPI ranks exchange a NumPy array for several message sizes and measure round-trip time, one-way latency, and effective bandwidth.
Python Script (pingpong_mpi4py.py)
- pingpong_mpi4py.py
#!/usr/bin/env python3 from mpi4py import MPI import numpy as np import sys comm = MPI.COMM_WORLD rank = comm.Get_rank() size = comm.Get_size() if size != 2: if rank == 0: print("Need exactly 2 processes!") sys.exit(1) partner = 1 - rank nrounds = 100 msg_sizes = [1, 8, 64, 512, 1024, 4096, 16384, 65536, 262144] for nbytes in msg_sizes: nelems = max(1, nbytes // np.dtype(np.uint8).itemsize) sendbuf = np.zeros(nelems, dtype=np.uint8) recvbuf = np.empty(nelems, dtype=np.uint8) comm.Barrier() t0 = MPI.Wtime() for i in range(nrounds): if rank == 0: comm.Send(sendbuf, dest=partner, tag=100) comm.Recv(recvbuf, source=partner, tag=200) else: comm.Recv(recvbuf, source=partner, tag=100) comm.Send(recvbuf, dest=partner, tag=200) t1 = MPI.Wtime() if rank == 0: total_time = t1 - t0 avg_rtt = total_time / nrounds latency_us = (avg_rtt / 2.0) * 1.0e6 bandwidth_mb_s = nbytes / latency_us print( f"size={nbytes:8d} bytes | " f"RTT={avg_rtt*1.0e6:10.2f} us | " f"latency={latency_us:10.2f} us | " f"bandwidth={bandwidth_mb_s:10.2f} MB/s" ) if rank == 0: print("Ping-pong benchmark completed.")
Slurm Job Script (slurm_pingpong.job)
- slurm_pingpong.job
#!/bin/bash #SBATCH --job-name=pingpong_mpi4py #SBATCH --partition=unite #SBATCH --nodes=2 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --time=00:01:00 #SBATCH --output=pingpong_%j.out module purge #Load necessary modules module add unite/python/3.14/mpi4py #Start the example mpirun -np $SLURM_NTASKS python3 ./pingpong_mpi4py.py
unite_python_mpi_4_py.1775817435.txt.gz · Last modified: 2026/04/10 13:37 by nshegunov
