gpu-status – User Guide¶

The gpu-status tool allows you to monitor GPU status for your running jobs on the PERUN cluster directly from the command line – without needing to log in to compute nodes.

Basic Usage¶

gpu-status              # List all your GPU jobs
gpu-status <jobid>      # Show detailed GPU status for a specific job
gpu-status --help       # Show help

Availability

The gpu-status command is available on the login node of the PERUN cluster without loading any module.

Examples¶

List All Your GPU Jobs¶

gpu-status

Example Use Case

You want to quickly find out which of your jobs are running on GPU and how much time they have remaining – run gpu-status without any arguments.

The output displays a table of all your active GPU jobs:

Your GPU Jobs:
================================================================
JobID    Node     State      Runtime    Remaining
----------------------------------------------------------------
123456   gpu01    RUNNING    01:23:45   22:36:15
123789   gpu02    PENDING    0:00:00    23:59:59
Use: gpu-status <jobid> for detailed per-GPU breakdown

Detailed Status for a Specific Job¶

gpu-status 123456

Example Use Case

You are training a model and want to verify that all allocated GPUs are actually being utilized and how much memory each process is consuming.

The output consists of three sections:

1. Job Information

Job 123456:
  Owner: your_username
  Node: gpu01
  State: RUNNING
  Runtime: 01:23:45
  Remaining: 22:36:15

2. GPU Overview

================================================================
GPU Status on gpu01 (Job 123456)
GPU Allocation: gpu:4
================================================================
GPU | Name              | GPU% | Mem% | Memory Used  | Temp | Power
----+-------------------+------+------+--------------+------+-------
  0 | NVIDIA H200       |  98% |  85% | 72340 / 80000| 72°C | 650W
  1 | NVIDIA H200       |  97% |  84% | 71200 / 80000| 71°C | 645W
  2 | NVIDIA H200       |  96% |  83% | 70800 / 80000| 70°C | 640W
  3 | NVIDIA H200       |  97% |  84% | 71500 / 80000| 71°C | 647W

3. Process Details

Per-GPU Process Details:
================================================================
Process: python (PID: 98765)
----------------------------------------------------------------
  GPU 0: Memory: 72340 MiB
  GPU 1: Memory: 71200 MiB
  GPU 2: Memory: 70800 MiB
  GPU 3: Memory: 71500 MiB
  ---------------------------------------------------------------
  Total: 285840 MiB across 4 GPUs (avg: 71460 MiB/GPU)

GPU Overview Column Descriptions¶

Column	Description
GPU	GPU card index (0, 1, 2, ...)
Name	GPU model name
GPU%	Compute utilization percentage
Mem%	GPU memory utilization percentage
Memory Used	Used / total GPU memory (MiB)
Temp	GPU temperature in °C
Power	Current power draw in Watts

Job States¶

State	Description
`RUNNING`	Job is running – detailed GPU status is shown
`PENDING`	Job is queued – GPU status is not available
`COMPLETED`	Job has finished – GPU status is not available

Important

Detailed GPU output is available only for jobs in RUNNING state. GPU statistics are not shown for PENDING or completed jobs.