Julia¶

What is Julia?¶

Julia is a high-performance programming language designed for scientific computing, numerical analysis, and data science. It combines the ease of use of Python with performance approaching C/Fortran, making it ideal for computationally intensive tasks including machine learning, optimization, and mathematical modeling.

Base Environment

Julia is provided via the environment modules system.

Login Node Usage

Do NOT run Julia interactively on login nodes with julia command.

Always submit Julia jobs through SLURM to CPU or GPU nodes.

Loading Julia¶

Load module

module load julia/1.11.2

Best Practice

Always verify the loaded version:

module list
julia --version

Basic Julia Script¶

hello.jl

Simple computation

println("Julia Test Script")
println("Julia version: ", VERSION)

# Vector operations
x = [1, 2, 3, 4, 5]
println("Sum: ", sum(x))
println("Mean: ", sum(x) / length(x))

# Function definition
function compute_squares(n)
    return [i^2 for i in 1:n]
end

squares = compute_squares(10)
println("Squares: ", squares)

SLURM Job Script (Serial)¶

julia_job.sh

Serial Julia job

#!/bin/bash
#SBATCH --job-name=julia_test
#SBATCH --partition=CPU
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=01:00:00
#SBATCH --output=julia_%j.out

module load julia/1.11.2

julia hello.jl

Submit job

sbatch julia_job.sh

Checking Job Output

View output

cat julia_*.out

Expected output:

Julia Test Script
Julia version: 1.11.2
Sum: 15
Mean: 3.0
Squares: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Multi-Threading¶

Julia supports shared-memory parallelism for computationally intensive operations.

matrix_multiply.jl

Multi-threaded matrix multiplication

using LinearAlgebra
using Base.Threads

# Force BLAS to use the correct number of threads
BLAS.set_num_threads(nthreads())

println("Julia threads: ", nthreads())
println("BLAS threads: ", BLAS.get_num_threads())

n = 9000
A = rand(n, n)
B = rand(n, n)

println("Matrix multiplication (", n, "x", n, ")...")
t = @elapsed C = A * B
println("Time: ", round(t, digits=3), "s")
println("GFLOPS: ", round(2*n^3 / t / 1e9, digits=2))

julia_threads.sh

Multi-threaded job script

#!/bin/bash
#SBATCH --job-name=julia_threads
#SBATCH --partition=CPU
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16 #generally --cpus-per-task=--threads
#SBATCH --mem=16G
#SBATCH --time=01:00:00
#SBATCH --output=julia_threads_%j.out

module load julia/1.11.2

echo "=== 1 thread ==="
julia --threads 1 matrix_multiply.jl

echo -e "\n=== 8 threads ==="
julia --threads 8 matrix_multiply.jl

echo -e "\n=== 16 threads ==="
julia --threads 16 matrix_multiply.jl

Submit

sbatch julia_threads.sh

Expected Output

With 16 threads on a 3000×3000 matrix multiplication:

=== 1 thread ===
Julia threads: 1
BLAS threads: 1
Matrix multiplication (9000x9000)...
Time: 15.292s
GFLOPS: 95.34

=== 8 threads ===
Julia threads: 8
BLAS threads: 8
Matrix multiplication (9000x9000)...
Time: 2.312s
GFLOPS: 630.5

=== 16 threads ===
Julia threads: 16
BLAS threads: 16
Matrix multiplication (9000x9000)...
Time: 1.441s
GFLOPS: 1012.11

BLAS Configuration

Always include BLAS.set_num_threads(nthreads()) at the start of your script.

Without this, BLAS may use 128 threads by default, causing severe performance degradation.

Package Management¶

Julia packages install to your home directory (~/.julia/) by default. Each user maintains their own package environment.

Installing packages (interactive on login node)

# Interactive Julia session (only for package installation on login node)
# Note: GPU packages like CUDA must be installed on GPU nodes - see CUDA section
module load julia/1.11.2
julia

In Julia REPL:

using Pkg
Pkg.add("DataFrames")
Pkg.add("Plots")
Pkg.status()  # List installed packages
exit()

Disk Usage

Julia packages and precompilation cache can consume significant disk space.

Check usage:

du -sh ~/.julia

Clean old artifacts (if this is a new Julia environment with all packages in use, nothing will be removed):

module load julia/1.11.2
julia
using Pkg
Pkg.gc()

Project Environments¶

For reproducible research, create project-specific environments:

Create project environment

mkdir my_research_project
cd my_research_project
julia --project=.

In Julia:

using Pkg
Pkg.status()
Pkg.add("DataFrames")
Pkg.status()

This creates Project.toml and Manifest.toml which can be version controlled. markdownThe second Pkg.status() command should return only DataFrames, not Plots, since Plots was installed in the base Julia environment.

Using project environments in SLURM

To run scripts using the defined environment, include the path to the folder containing Project.toml and Manifest.toml using --project:

julia --project=/path/to/my_research_project script.jl

GPU Computing with CUDA¶

Julia provides excellent GPU support through CUDA.jl for NVIDIA GPUs.

Installation Location

CUDA.jl must be installed on a GPU node, not the login node.

Installing on the login node will result in non-functional CUDA support.

Installing CUDA.jl¶

install_cuda.sh

Install CUDA.jl on GPU node

#!/bin/bash
#SBATCH --job-name=cuda_install
#SBATCH --partition=GPU
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1
#SBATCH --mem=8G
#SBATCH --time=00:30:00
#SBATCH --output=cuda_install_%j.out

module load julia/1.11.2

echo "=== Installing CUDA.jl ==="
julia -e 'using Pkg; Pkg.add("CUDA"); Pkg.build("CUDA")'

echo ""
echo "=== Testing CUDA functionality ==="
julia -e 'using CUDA; println("CUDA functional: ", CUDA.functional()); if CUDA.functional(); println("GPU: ", CUDA.name(CUDA.device())); end'

Submit installation job

sbatch install_cuda.sh

Expected Output

=== Installing CUDA.jl ===
[package installation logs]

=== Testing CUDA functionality ===
CUDA functional: true
GPU: NVIDIA H200

GPU Computation Example¶

gpu_test.jl

GPU matrix multiplication benchmark

using CUDA

println("Julia GPU Test")
println("CUDA available: ", CUDA.functional())

if CUDA.functional()
    println("GPU: ", CUDA.name(CUDA.device()))
    println("CUDA version: ", CUDA.runtime_version())
    println("Compute capability: ", CUDA.capability(CUDA.device()))
    println()

    # GPU computation
    n = 10000
    A = CUDA.rand(n, n)
    B = CUDA.rand(n, n)

    println("Matrix multiplication on GPU (", n, "x", n, ")...")

    # Warmup
    C = A * B
    CUDA.synchronize()

    # Benchmark
    t = @elapsed begin
        C = A * B
        CUDA.synchronize()
    end

    println("GPU Time: ", round(t, digits=3), "s")
    println("GPU GFLOPS: ", round(2*n^3 / t / 1e9, digits=2))

    # CPU comparison
    A_cpu = rand(n, n)
    B_cpu = rand(n, n)

    println("\nCPU comparison:")
    t_cpu = @elapsed C_cpu = A_cpu * B_cpu
    println("CPU Time: ", round(t_cpu, digits=3), "s")
    println("Speedup: ", round(t_cpu/t, digits=2), "x")
else
    println("ERROR: CUDA not functional!")
end

julia_gpu.sh

GPU job script

#!/bin/bash
#SBATCH --job-name=julia_gpu
#SBATCH --partition=GPU
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1
#SBATCH --mem=16G
#SBATCH --time=01:00:00
#SBATCH --output=julia_gpu_%j.out

module load julia/1.11.2

julia gpu_test.jl

Submit GPU job

sbatch julia_gpu.sh

Expected Output

On NVIDIA H200 with 10000×10000 matrices:

Julia GPU Test
CUDA available: true
GPU: NVIDIA H200
CUDA version: 13.0.0
Compute capability: 9.0.0

Matrix multiplication on GPU (10000x10000)...
GPU Time: 0.042s
GPU GFLOPS: 47775.25

CPU comparison:
CPU Time: 7.52s
Speedup: 179.64x

More Information¶

Documentation

Official Julia documentation:

https://docs.julialang.org/

CUDA.jl documentation:

https://cuda.juliagpu.org/stable/