Skip to content

gpu-status – User Guide

The gpu-status tool allows you to monitor GPU status for your running jobs on the PERUN cluster directly from the command line – without needing to log in to compute nodes.


Basic Usage

gpu-status              # List all your GPU jobs
gpu-status <jobid>      # Show detailed GPU status for a specific job
gpu-status --help       # Show help

Availability

The gpu-status command is available on the login node of the PERUN cluster without loading any module.


Examples

List All Your GPU Jobs

gpu-status

Example Use Case

You want to quickly find out which of your jobs are running on GPU and how much time they have remaining – run gpu-status without any arguments.

The output displays a table of all your active GPU jobs:

Your GPU Jobs:
================================================================
JobID    Node     State      Runtime    Remaining
----------------------------------------------------------------
123456   gpu01    RUNNING    01:23:45   22:36:15
123789   gpu02    PENDING    0:00:00    23:59:59
Use: gpu-status <jobid> for detailed per-GPU breakdown


Detailed Status for a Specific Job

gpu-status 123456

Example Use Case

You are training a model and want to verify that all allocated GPUs are actually being utilized and how much memory each process is consuming.

The output consists of three sections:

1. Job Information

Job 123456:
  Owner: your_username
  Node: gpu01
  State: RUNNING
  Runtime: 01:23:45
  Remaining: 22:36:15

2. GPU Overview

================================================================
GPU Status on gpu01 (Job 123456)
GPU Allocation: gpu:4
================================================================
GPU | Name              | GPU% | Mem% | Memory Used  | Temp | Power
----+-------------------+------+------+--------------+------+-------
  0 | NVIDIA H200       |  98% |  85% | 72340 / 80000| 72°C | 650W
  1 | NVIDIA H200       |  97% |  84% | 71200 / 80000| 71°C | 645W
  2 | NVIDIA H200       |  96% |  83% | 70800 / 80000| 70°C | 640W
  3 | NVIDIA H200       |  97% |  84% | 71500 / 80000| 71°C | 647W

3. Process Details

Per-GPU Process Details:
================================================================
Process: python (PID: 98765)
----------------------------------------------------------------
  GPU 0: Memory: 72340 MiB
  GPU 1: Memory: 71200 MiB
  GPU 2: Memory: 70800 MiB
  GPU 3: Memory: 71500 MiB
  ---------------------------------------------------------------
  Total: 285840 MiB across 4 GPUs (avg: 71460 MiB/GPU)


GPU Overview Column Descriptions

Column Description
GPU GPU card index (0, 1, 2, ...)
Name GPU model name
GPU% Compute utilization percentage
Mem% GPU memory utilization percentage
Memory Used Used / total GPU memory (MiB)
Temp GPU temperature in °C
Power Current power draw in Watts

Job States

State Description
RUNNING Job is running – detailed GPU status is shown
PENDING Job is queued – GPU status is not available
COMPLETED Job has finished – GPU status is not available

Important

Detailed GPU output is available only for jobs in RUNNING state. GPU statistics are not shown for PENDING or completed jobs.