Skip to content

Submitting Jobs on PERUN – Example Slurm Scripts

This guide explains a basic Slurm batch script for the PERUN supercomputer and shows several common variants:

  • CPU-only jobs
  • Single-GPU jobs
  • Multi-GPU jobs
  • Multi-node jobs

All examples use bash and can be submitted with:

sbatch job.sh

1. Basic GPU Job Script – Explained

Below is a simple Slurm script that runs a Python program (test.py) on a GPU node:

#!/bin/bash
#SBATCH --nodelist=gpu08          # Force a specific node (for testing/debug only)
#SBATCH --partition=GPU           # Partition with GPU nodes
#SBATCH --gres=gpu:8              # Request 8 GPUs on the node
#SBATCH --cpus-per-task=4         # CPUs per task (for threading, OMP, etc.)
#SBATCH --open-mode=append        # Append to output file instead of overwriting

echo "--- [START] Job $SLURM_JOB_ID on node: $(hostname) ---"

# Set number of threads based on allocated CPUs per task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

echo ">>> Running test.py..."
python3 test.py

echo "--- [END] ---"

What Each Line Does

  • #!/bin/bash
    Starts the script with the Bash shell.

  • #SBATCH --nodelist=gpu08
    Forces the job to run on the node gpu08.
    This is useful mainly for debugging or when asked by admins.

  • #SBATCH --partition=GPU
    Selects the GPU partition.

  • #SBATCH --gres=gpu:8
    Requests 8 GPUs on the node.
    For a single GPU, change to --gres=gpu:1.

  • #SBATCH --cpus-per-task=4
    Allocates 4 CPU cores for the job (per task).
    Often used for OpenMP, data loading, etc.

  • #SBATCH --open-mode=append
    When writing to the same output file, new logs are appended instead of overwriting.

  • export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
    Sets the OpenMP thread count to the number of allocated CPUs.

Do Not Always Force Node

Using --nodelist= (e.g., gpu08) reduces scheduler flexibility.
For normal work, omit this line and let Slurm choose a node.


2. CPU-Only Job Example

This example runs a Python script on CPU nodes only (no GPUs).

#!/bin/bash
#SBATCH --job-name=cpu_job          # Name of the job
#SBATCH --partition=cpu             # CPU partition
#SBATCH --ntasks=1                  # One task (one process)
#SBATCH --cpus-per-task=8           # 8 CPU cores for this task
#SBATCH --time=01:00:00             # Max runtime 1 hour
#SBATCH --output=cpu_job_%j.out     # Output file (%j = job ID)

echo "--- [START] CPU job $SLURM_JOB_ID on node: $(hostname) ---"

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

python3 my_cpu_script.py

echo "--- [END] ---"

When to Use CPU-Only Jobs

Use this for simulations, preprocessing, data analysis, or jobs that do not require GPUs.


3. Single-GPU Job Example

A more typical single-GPU job without forcing a specific node:

#!/bin/bash
#SBATCH --job-name=gpu_single
#SBATCH --partition=GPU            # GPU partition
#SBATCH --gres=gpu:1               # Request 1 GPU
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=02:00:00
#SBATCH --output=gpu_single_%j.out

echo "--- [START] Single-GPU job $SLURM_JOB_ID on node: $(hostname) ---"

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

python3 train_single_gpu.py

echo "--- [END] ---"

GPU Visibility

Inside the job, only the requested GPUs should be visible (e.g., via nvidia-smi).


4. Multi-GPU Job on a Single Node

For deep learning or parallel GPU workloads, you might want multiple GPUs:

#!/bin/bash
#SBATCH --job-name=gpu_multi
#SBATCH --partition=GPU
#SBATCH --gres=gpu:4               # 4 GPUs on a single node
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --time=04:00:00
#SBATCH --output=gpu_multi_%j.out

echo "--- [START] Multi-GPU job $SLURM_JOB_ID on node: $(hostname) ---"

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Example: PyTorch multi-GPU using DataParallel or DistributedDataParallel
python3 train_multi_gpu.py

echo "--- [END] ---"

Example Use Case

  • Training a large neural network with 4 GPUs on a single node.
  • Using frameworks like PyTorch, TensorFlow, JAX, etc.

5. Tips for Beginners

General Recommendations

  • Start with small test jobs (short time, few cores/GPUs).
  • Always set a reasonable --time= limit.
  • Use squeue -u $USER to see your jobs.
  • Use seff <jobid> to check job efficiency after it finishes.

Common Mistakes

  • Requesting GPUs but not using them.
  • Running GPU jobs in the CPU partition.
  • Forgetting --gres=gpu:X in GPU jobs.
  • Overestimating runtime (very long jobs may wait longer in queue).

6. Minimal Template You Can Adapt

#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --partition=cpu         # or GPU
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=01:00:00
#SBATCH --output=my_job_%j.out

echo "Job $SLURM_JOB_ID on $(hostname)"

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

python3 my_script.py

You can copy any of these examples into a file (e.g., job.sh) and submit it with:

sbatch job.sh

If chceš, viem ti pripraviť aj špeciálne šablóny pre: - PyTorch (DDP, multi-node) - TensorFlow - Jupyter na výpočtovom node (tunelovanie cez SSH) - alebo joby na konkrétne PERUN moduly.