slurm_tutorial
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| slurm_tutorial [2025/04/07 19:36] – nshegunov | slurm_tutorial [2025/04/07 20:03] (current) – [SLURM - Simple Linux Utility for Resource Management] nshegunov | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== SLURM - Simple Linux Utility for Resource Management ====== | ====== SLURM - Simple Linux Utility for Resource Management ====== | ||
| - | SLURM is an open-source workload manager designed for Linux clusters of all sizes. It provides job scheduling and resource management to optimize cluster utilization.It is a highly scalable cluster management and job scheduling system for large and small Linux clusters. It is used by some of the world’s most powerful supercomputers. | + | SLURM is an open-source workload manager designed for Linux clusters of all sizes. It provides job scheduling and resource management to optimize cluster utilization.It is a highly scalable cluster management and job scheduling system for large and small Linux clusters. It is used by some of the world’s most powerful supercomputers. |
| Please refer to [[https:// | Please refer to [[https:// | ||
| Line 21: | Line 21: | ||
| ===== Basic Architecture ===== | ===== Basic Architecture ===== | ||
| + | | {{ : | ||
| + | | SLURM architecture overview ([[https:// | ||
| - | | + | Slurm is based on different components, to menage the cluster resources. Bellow you can find a short summary: |
| - | * **slurmd** - Node daemon that runs on each compute node to execute assigned tasks. | + | |
| - | * **slurmdbd** (optional) - Handles | + | |
| + | | ||
| + | - Handles | ||
| + | - Usually consists | ||
| + | |||
| + | * **slurmd | ||
| + | | ||
| + | - Responsible for launching, monitoring, and cleaning up jobs on the node. | ||
| + | - Communicates with the slurmctld | ||
| + | |||
| + | * **slurmdbd | ||
| + | | ||
| + | - Works with an external | ||
| + | - Enables commands like **sacct** and **sreport** for usage reporting. | ||
| + | |||
| + | * **Client Commands** | ||
| + | - Tools used by users and admins to interact with Slurm: | ||
| + | - **sbatch** – submit batch jobs | ||
| + | - **srun** – run parallel jobs interactively | ||
| + | - **scancel** – cancel jobs | ||
| + | - **squeue** – view job queues | ||
| + | |||
| + | * **Central Database** '' | ||
| + | - Stores job and usage records. | ||
| + | - Used in conjunction with **slurmdbd** for accounting and reporting. | ||
| + | - Supports multiple clusters if needed. | ||
| Each component communicates over a secure protocol to coordinate resource usage and job execution efficiently. | Each component communicates over a secure protocol to coordinate resource usage and job execution efficiently. | ||
slurm_tutorial.1744043780.txt.gz · Last modified: 2025/04/07 19:36 by nshegunov
