Submitting Jobs on PERUN – Example Slurm Scripts¶
This guide explains a basic Slurm batch script for the PERUN supercomputer and shows several common variants:
- CPU-only jobs
- Single-GPU jobs
- Multi-GPU jobs
- Multi-node jobs
All examples use bash and can be submitted with:
1. Basic GPU Job Script – Explained¶
Below is a simple Slurm script that runs a Python program (test.py) on a GPU node:
#!/bin/bash
#SBATCH --nodelist=gpu08 # Force a specific node (for testing/debug only)
#SBATCH --partition=GPU # Partition with GPU nodes
#SBATCH --gres=gpu:8 # Request 8 GPUs on the node
#SBATCH --cpus-per-task=4 # CPUs per task (for threading, OMP, etc.)
#SBATCH --open-mode=append # Append to output file instead of overwriting
echo "--- [START] Job $SLURM_JOB_ID on node: $(hostname) ---"
# Set number of threads based on allocated CPUs per task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
echo ">>> Running test.py..."
python3 test.py
echo "--- [END] ---"
What Each Line Does¶
-
#!/bin/bash
Starts the script with the Bash shell. -
#SBATCH --nodelist=gpu08
Forces the job to run on the node gpu08.
This is useful mainly for debugging or when asked by admins. -
#SBATCH --partition=GPU
Selects the GPU partition. -
#SBATCH --gres=gpu:8
Requests 8 GPUs on the node.
For a single GPU, change to--gres=gpu:1. -
#SBATCH --cpus-per-task=4
Allocates 4 CPU cores for the job (per task).
Often used for OpenMP, data loading, etc. -
#SBATCH --open-mode=append
When writing to the same output file, new logs are appended instead of overwriting. -
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
Sets the OpenMP thread count to the number of allocated CPUs.
Do Not Always Force Node
Using --nodelist= (e.g., gpu08) reduces scheduler flexibility.
For normal work, omit this line and let Slurm choose a node.
2. CPU-Only Job Example¶
This example runs a Python script on CPU nodes only (no GPUs).
#!/bin/bash
#SBATCH --job-name=cpu_job # Name of the job
#SBATCH --partition=cpu # CPU partition
#SBATCH --ntasks=1 # One task (one process)
#SBATCH --cpus-per-task=8 # 8 CPU cores for this task
#SBATCH --time=01:00:00 # Max runtime 1 hour
#SBATCH --output=cpu_job_%j.out # Output file (%j = job ID)
echo "--- [START] CPU job $SLURM_JOB_ID on node: $(hostname) ---"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
python3 my_cpu_script.py
echo "--- [END] ---"
When to Use CPU-Only Jobs
Use this for simulations, preprocessing, data analysis, or jobs that do not require GPUs.
3. Single-GPU Job Example¶
A more typical single-GPU job without forcing a specific node:
#!/bin/bash
#SBATCH --job-name=gpu_single
#SBATCH --partition=GPU # GPU partition
#SBATCH --gres=gpu:1 # Request 1 GPU
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=02:00:00
#SBATCH --output=gpu_single_%j.out
echo "--- [START] Single-GPU job $SLURM_JOB_ID on node: $(hostname) ---"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
python3 train_single_gpu.py
echo "--- [END] ---"
GPU Visibility
Inside the job, only the requested GPUs should be visible (e.g., via nvidia-smi).
4. Multi-GPU Job on a Single Node¶
For deep learning or parallel GPU workloads, you might want multiple GPUs:
#!/bin/bash
#SBATCH --job-name=gpu_multi
#SBATCH --partition=GPU
#SBATCH --gres=gpu:4 # 4 GPUs on a single node
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --time=04:00:00
#SBATCH --output=gpu_multi_%j.out
echo "--- [START] Multi-GPU job $SLURM_JOB_ID on node: $(hostname) ---"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Example: PyTorch multi-GPU using DataParallel or DistributedDataParallel
python3 train_multi_gpu.py
echo "--- [END] ---"
Example Use Case
- Training a large neural network with 4 GPUs on a single node.
- Using frameworks like PyTorch, TensorFlow, JAX, etc.
5. Tips for Beginners¶
General Recommendations
- Start with small test jobs (short time, few cores/GPUs).
- Always set a reasonable
--time=limit. - Use
squeue -u $USERto see your jobs. - Use
seff <jobid>to check job efficiency after it finishes.
Common Mistakes
- Requesting GPUs but not using them.
- Running GPU jobs in the CPU partition.
- Forgetting
--gres=gpu:Xin GPU jobs. - Overestimating runtime (very long jobs may wait longer in queue).
6. Minimal Template You Can Adapt¶
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --partition=cpu # or GPU
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=01:00:00
#SBATCH --output=my_job_%j.out
echo "Job $SLURM_JOB_ID on $(hostname)"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
python3 my_script.py
You can copy any of these examples into a file (e.g., job.sh) and submit it with:
If chceš, viem ti pripraviť aj špeciálne šablóny pre: - PyTorch (DDP, multi-node) - TensorFlow - Jupyter na výpočtovom node (tunelovanie cez SSH) - alebo joby na konkrétne PERUN moduly.