examples
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| examples [2025/12/29 20:57] – [MPI] dimitar | examples [2025/12/30 23:27] (current) – [C++ program which uses GPU] dimitar | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====MPI4PI==== | + | ====Description==== |
| - | TOD | + | |
| - | ---- | + | This page provides examples on how to use the cluster. There are language specific examples for **C/C++**, and **Python**, which showcase how you can compile and run applications which are written in those languages on the cluster. Additionally, |
| - | ==== PyTorch==== | + | |
| - | Consider | + | |
| - | + | ||
| - | <code python> | + | |
| - | import torch | + | |
| - | def test_pytorch(): | ||
| - | print(" | ||
| - | print(" | ||
| - | | ||
| - | if torch.cuda.is_available(): | ||
| - | print(" | ||
| - | device = torch.device(" | ||
| - | else: | ||
| - | device = torch.device(" | ||
| - | | ||
| - | # Simple tensor operation | ||
| - | x = torch.tensor([1.0, | ||
| - | y = torch.tensor([4.0, | ||
| - | z = x + y | ||
| - | print(" | ||
| - | |||
| - | test_pytorch() | ||
| - | </ | ||
| - | |||
| - | To test it on the unite cluster you can use the folling sbatch scrpit to run it: | ||
| - | <code bash> | ||
| - | #!/bin/bash | ||
| - | #SBATCH --job-name=pytorch_test | ||
| - | #SBATCH --output=pytorch_test.out | ||
| - | #SBATCH --error=pytorch_test.err | ||
| - | #SBATCH --time=00: | ||
| - | #SBATCH --partition=a40 | ||
| - | #SBATCH --gres=gpu: | ||
| - | #SBATCH --mem=4G | ||
| - | #SBATCH --cpus-per-task=2 | ||
| - | |||
| - | # Load necessary modules (modify based on your system) | ||
| - | module load python/ | ||
| - | |||
| - | # Activate your virtual environment if needed | ||
| - | # source ~/ | ||
| - | |||
| - | # Run the PyTorch script | ||
| - | python3.13 pytorch_test.py | ||
| - | |||
| - | </ | ||
| ---- | ---- | ||
| - | ====Pandas==== | ||
| - | Consider the following simple python test script( “pandas_test.py”): | ||
| - | <code python> | ||
| - | import pandas as pd | ||
| - | import numpy as np | ||
| - | # Create a simple DataFrame | ||
| - | data = { | ||
| - | ' | ||
| - | ' | ||
| - | ' | ||
| - | } | ||
| - | df = pd.DataFrame(data) | ||
| - | print(" | ||
| - | print(df) | ||
| - | |||
| - | # Test basic operations | ||
| - | print(" | ||
| - | print(df.sum()) | ||
| - | |||
| - | print(" | ||
| - | print(df.mean()) | ||
| - | |||
| - | # Adding a new column | ||
| - | df[' | ||
| - | print(" | ||
| - | print(df) | ||
| - | |||
| - | # Filtering rows | ||
| - | filtered_df = df[df[' | ||
| - | print(" | ||
| - | print(filtered_df) | ||
| - | |||
| - | # Check if NaN values exist | ||
| - | print(" | ||
| - | print(df.isna().sum()) | ||
| - | </ | ||
| - | |||
| - | You can use the following snatch script to run it: | ||
| - | <code bash> | ||
| - | #!/bin/bash | ||
| - | #SBATCH --job-name=pytorch_test | ||
| - | #SBATCH --output=pytorch_test.out | ||
| - | #SBATCH --error=pytorch_test.err | ||
| - | #SBATCH --time=00: | ||
| - | #SBATCH --partition=a40 | ||
| - | #SBATCH --gres=gpu: | ||
| - | #SBATCH --mem=4G | ||
| - | #SBATCH --cpus-per-task=2 | ||
| - | |||
| - | # Load necessary modules (modify based on your system) | ||
| - | module load python/ | ||
| - | module load python/ | ||
| - | |||
| - | # Activate your virtual environment if needed | ||
| - | # source ~/ | ||
| - | |||
| - | # Run the PyTorch script | ||
| - | python3.13 pandas_test.py | ||
| - | </ | ||
| - | ---- | ||
| ====Simple C/C++ program==== | ====Simple C/C++ program==== | ||
| The following is a simple **C/C++** program which performs element-wise addition of 2 vectors. It does **not** use any dependent libraries: | The following is a simple **C/C++** program which performs element-wise addition of 2 vectors. It does **not** use any dependent libraries: | ||
| Line 572: | Line 465: | ||
| ---- | ---- | ||
| - | ====MPI==== | + | ====C++ program which uses MPI==== |
| The following is an example **C/C++** application which uses **MPI** to perform element-wise addition of two vectors. Each **MPI** task computes the addition of its local region and then sends it back to the leader. Using **MPI** with **Python** is similar assuming that you know how to manage **Python** dependencies on the cluster which is described in the previous section. What is important here is to understand how to manage the resources of the system. | The following is an example **C/C++** application which uses **MPI** to perform element-wise addition of two vectors. Each **MPI** task computes the addition of its local region and then sends it back to the leader. Using **MPI** with **Python** is similar assuming that you know how to manage **Python** dependencies on the cluster which is described in the previous section. What is important here is to understand how to manage the resources of the system. | ||
| Line 677: | Line 570: | ||
| echo "Job completed!" | echo "Job completed!" | ||
| echo " | echo " | ||
| + | </ | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ====C++ program which uses multiple threads==== | ||
| + | |||
| + | The following is a simple **C++** program which computes the sum of 2 vectors. It uses multiple **threads**. Each **thread** computes the sum for its respective region. | ||
| + | |||
| + | <code C++> | ||
| + | #include < | ||
| + | #include < | ||
| + | #include < | ||
| + | |||
| + | #define VECTOR_SIZE 100000 | ||
| + | |||
| + | void vector_add_worker(int thread_id, int start_idx, int end_idx, | ||
| + | const int* a, const int* b, int* c) { | ||
| + | int elements = end_idx - start_idx; | ||
| + | std::cout << " | ||
| + | << " elements" | ||
| + | |||
| + | for (int i = start_idx; i < end_idx; i++) { | ||
| + | c[i] = a[i] + b[i]; | ||
| + | } | ||
| + | } | ||
| + | |||
| + | int main(int argc, char** argv) { | ||
| + | if (argc != 2) { | ||
| + | std::cerr << " | ||
| + | return 1; | ||
| + | } | ||
| + | |||
| + | int num_threads = std:: | ||
| + | if (num_threads <= 0) { | ||
| + | std::cerr << " | ||
| + | return 1; | ||
| + | } | ||
| + | |||
| + | std::cout << "Using " << num_threads << " threads" | ||
| + | |||
| + | std:: | ||
| + | std:: | ||
| + | std:: | ||
| + | |||
| + | for (int i = 0; i < VECTOR_SIZE; | ||
| + | a[i] = i + 1; | ||
| + | b[i] = (i + 1) * 2; | ||
| + | } | ||
| + | |||
| + | int elements_per_thread = VECTOR_SIZE / num_threads; | ||
| + | |||
| + | std:: | ||
| + | for (unsigned int t = 0; t < num_threads; | ||
| + | int start_idx = t * elements_per_thread; | ||
| + | int end_idx = (t == num_threads - 1) ? VECTOR_SIZE : (t + 1) * elements_per_thread; | ||
| + | |||
| + | threads.emplace_back(vector_add_worker, | ||
| + | | ||
| + | } | ||
| + | |||
| + | for (auto& thread : threads) { | ||
| + | thread.join(); | ||
| + | } | ||
| + | |||
| + | std::cout << " | ||
| + | for (int i = 0; i < 5; i++) { | ||
| + | std::cout << c[i] << " "; | ||
| + | } | ||
| + | std::cout << std::endl; | ||
| + | |||
| + | return 0; | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | The following is the respective batch script for compiling and running the program. You can see the output of the program in the generated // | ||
| + | |||
| + | <code bash> | ||
| + | #!/bin/bash | ||
| + | #SBATCH --job-name=vector_sum_threads | ||
| + | #SBATCH --output=vector_sum_threads_%j.out | ||
| + | #SBATCH --error=vector_sum_threads_%j.err | ||
| + | #SBATCH --nodes=1 | ||
| + | #SBATCH --ntasks=1 | ||
| + | #SBATCH --cpus-per-task=8 | ||
| + | #SBATCH --time=00: | ||
| + | #SBATCH --partition=unite | ||
| + | |||
| + | echo "Job started at: $(date)" | ||
| + | echo " | ||
| + | echo " | ||
| + | echo " | ||
| + | |||
| + | module load gcc | ||
| + | |||
| + | echo " | ||
| + | g++ -std=c++11 -pthread -O3 vector_sum_threads.cpp -o vector_sum_threads | ||
| + | |||
| + | if [ $? -eq 0 ]; then | ||
| + | echo " | ||
| + | echo " | ||
| + | |||
| + | echo " | ||
| + | ./ | ||
| + | |||
| + | echo " | ||
| + | echo "Job finished at: $(date)" | ||
| + | else | ||
| + | echo " | ||
| + | exit 1 | ||
| + | fi | ||
| + | </ | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ====C++ program which uses GPU==== | ||
| + | |||
| + | The following is an example **Cuda** application which uses **Nvidia GPU** to perform element-wise addition of two vectors. Using **Cuda** with **Python** is similar assuming that you know how to manage **Python** dependencies on the cluster which is described in a previous section. What is important here is to understand how to manage the resources of the system. | ||
| + | |||
| + | <code C++> | ||
| + | #include < | ||
| + | #include < | ||
| + | #include < | ||
| + | #include < | ||
| + | |||
| + | #define CUDA_CHECK(call) \ | ||
| + | do { \ | ||
| + | cudaError_t error = call; \ | ||
| + | if (error != cudaSuccess) { \ | ||
| + | fprintf(stderr, | ||
| + | cudaGetErrorString(error)); | ||
| + | exit(EXIT_FAILURE); | ||
| + | } \ | ||
| + | } while(0) | ||
| + | |||
| + | /* | ||
| + | * CUDA kernel for vector addition | ||
| + | * Each thread computes one element of the result vector | ||
| + | * | ||
| + | * Parameters: | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | */ | ||
| + | __global__ void vectorAddKernel(const float *a, const float *b, float *c, int n) { | ||
| + | // Calculate global thread ID | ||
| + | int idx = blockIdx.x * blockDim.x + threadIdx.x; | ||
| + | |||
| + | // Check if thread is within bounds | ||
| + | if (idx < n) { | ||
| + | c[idx] = a[idx] + b[idx]; | ||
| + | } | ||
| + | } | ||
| + | |||
| + | int main() { | ||
| + | const int N = 50' | ||
| + | const size_t bytes = N * sizeof(float); | ||
| + | |||
| + | printf(" | ||
| + | printf(" | ||
| + | printf(" | ||
| + | printf(" | ||
| + | |||
| + | int deviceId; | ||
| + | cudaDeviceProp props; | ||
| + | CUDA_CHECK(cudaGetDevice(& | ||
| + | CUDA_CHECK(cudaGetDeviceProperties(& | ||
| + | |||
| + | printf(" | ||
| + | printf(" | ||
| + | printf(" | ||
| + | printf(" | ||
| + | | ||
| + | printf(" | ||
| + | printf(" | ||
| + | printf(" | ||
| + | |||
| + | printf(" | ||
| + | float *h_a = (float *)malloc(bytes); | ||
| + | float *h_b = (float *)malloc(bytes); | ||
| + | float *h_c_gpu = (float *)malloc(bytes); | ||
| + | float *h_c_cpu = (float *)malloc(bytes); | ||
| + | |||
| + | if (!h_a || !h_b || !h_c_gpu || !h_c_cpu) { | ||
| + | fprintf(stderr, | ||
| + | return 1; | ||
| + | } | ||
| + | |||
| + | for (int i = 0; i < N; i++) { | ||
| + | h_a[i] = (float)rand() / RAND_MAX; | ||
| + | h_b[i] = (float)rand() / RAND_MAX; | ||
| + | } | ||
| + | |||
| + | float *d_a, *d_b, *d_c; | ||
| + | CUDA_CHECK(cudaMalloc(& | ||
| + | CUDA_CHECK(cudaMalloc(& | ||
| + | CUDA_CHECK(cudaMalloc(& | ||
| + | |||
| + | CUDA_CHECK(cudaMemcpy(d_a, | ||
| + | CUDA_CHECK(cudaMemcpy(d_b, | ||
| + | |||
| + | int threadsPerBlock = 256; | ||
| + | int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; | ||
| + | |||
| + | printf(" | ||
| + | printf(" | ||
| + | printf(" | ||
| + | printf(" | ||
| + | |||
| + | vectorAddKernel<<< | ||
| + | |||
| + | CUDA_CHECK(cudaMemcpy(h_c_gpu, | ||
| + | |||
| + | printf(" | ||
| + | for (int i = 0; i < 5; i++) { | ||
| + | printf(" | ||
| + | } | ||
| + | |||
| + | CUDA_CHECK(cudaFree(d_a)); | ||
| + | CUDA_CHECK(cudaFree(d_b)); | ||
| + | CUDA_CHECK(cudaFree(d_c)); | ||
| + | free(h_a); | ||
| + | free(h_b); | ||
| + | free(h_c_gpu); | ||
| + | free(h_c_cpu); | ||
| + | |||
| + | return 0; | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | The following is the respective batch script for compiling and running the program. You can see the output of the program in the generated // | ||
| + | |||
| + | <code bash> | ||
| + | #!/bin/bash | ||
| + | #SBATCH --job-name=vector_sum_cuda | ||
| + | #SBATCH --output=vector_sum_cuda_%j.out | ||
| + | #SBATCH --error=vector_sum_cuda_%j.err | ||
| + | #SBATCH --nodes=1 | ||
| + | #SBATCH --ntasks=1 | ||
| + | #SBATCH --cpus-per-task=1 | ||
| + | #SBATCH --gres=gpu: | ||
| + | #SBATCH --time=00: | ||
| + | #SBATCH --partition=unite | ||
| + | |||
| + | echo " | ||
| + | echo "SLURM Job Information" | ||
| + | echo " | ||
| + | echo "Job ID: $SLURM_JOB_ID" | ||
| + | echo "Node: $SLURM_NODELIST" | ||
| + | echo " | ||
| + | echo " | ||
| + | echo "" | ||
| + | |||
| + | module load nvidia/ | ||
| + | |||
| + | echo " | ||
| + | nvcc -O3 -o vector_sum_cuda vector_sum_cuda.cu | ||
| + | |||
| + | if [ $? -ne 0 ]; then | ||
| + | echo " | ||
| + | exit 1 | ||
| + | fi | ||
| + | |||
| + | echo " | ||
| + | echo "" | ||
| + | |||
| + | echo " | ||
| + | ./ | ||
| + | |||
| + | echo "" | ||
| + | echo "Job finished at: $(date)" | ||
| </ | </ | ||
| ---- | ---- | ||
examples.1767034664.txt.gz · Last modified: 2025/12/29 20:57 by dimitar
