gpu-status – User Guide¶
The gpu-status tool allows you to monitor GPU status for your running jobs on the PERUN cluster directly from the command line – without needing to log in to compute nodes.
Basic Usage¶
gpu-status # List all your GPU jobs
gpu-status <jobid> # Show detailed GPU status for a specific job
gpu-status --help # Show help
Availability
The gpu-status command is available on the login node of the PERUN cluster without loading any module.
Examples¶
List All Your GPU Jobs¶
Example Use Case
You want to quickly find out which of your jobs are running on GPU and how much time they have remaining – run gpu-status without any arguments.
The output displays a table of all your active GPU jobs:
Your GPU Jobs:
================================================================
JobID Node State Runtime Remaining
----------------------------------------------------------------
123456 gpu01 RUNNING 01:23:45 22:36:15
123789 gpu02 PENDING 0:00:00 23:59:59
Use: gpu-status <jobid> for detailed per-GPU breakdown
Detailed Status for a Specific Job¶
Example Use Case
You are training a model and want to verify that all allocated GPUs are actually being utilized and how much memory each process is consuming.
The output consists of three sections:
1. Job Information
2. GPU Overview
================================================================
GPU Status on gpu01 (Job 123456)
GPU Allocation: gpu:4
================================================================
GPU | Name | GPU% | Mem% | Memory Used | Temp | Power
----+-------------------+------+------+--------------+------+-------
0 | NVIDIA H200 | 98% | 85% | 72340 / 80000| 72°C | 650W
1 | NVIDIA H200 | 97% | 84% | 71200 / 80000| 71°C | 645W
2 | NVIDIA H200 | 96% | 83% | 70800 / 80000| 70°C | 640W
3 | NVIDIA H200 | 97% | 84% | 71500 / 80000| 71°C | 647W
3. Process Details
Per-GPU Process Details:
================================================================
Process: python (PID: 98765)
----------------------------------------------------------------
GPU 0: Memory: 72340 MiB
GPU 1: Memory: 71200 MiB
GPU 2: Memory: 70800 MiB
GPU 3: Memory: 71500 MiB
---------------------------------------------------------------
Total: 285840 MiB across 4 GPUs (avg: 71460 MiB/GPU)
GPU Overview Column Descriptions¶
| Column | Description |
|---|---|
| GPU | GPU card index (0, 1, 2, ...) |
| Name | GPU model name |
| GPU% | Compute utilization percentage |
| Mem% | GPU memory utilization percentage |
| Memory Used | Used / total GPU memory (MiB) |
| Temp | GPU temperature in °C |
| Power | Current power draw in Watts |
Job States¶
| State | Description |
|---|---|
RUNNING |
Job is running – detailed GPU status is shown |
PENDING |
Job is queued – GPU status is not available |
COMPLETED |
Job has finished – GPU status is not available |
Important
Detailed GPU output is available only for jobs in RUNNING state. GPU statistics are not shown for PENDING or completed jobs.