Julia¶
What is Julia?¶
Julia is a high-performance programming language designed for scientific computing, numerical analysis, and data science. It combines the ease of use of Python with performance approaching C/Fortran, making it ideal for computationally intensive tasks including machine learning, optimization, and mathematical modeling.
Base Environment
Julia is provided via the environment modules system.
Login Node Usage
Do NOT run Julia interactively on login nodes with julia command.
Always submit Julia jobs through SLURM to CPU or GPU nodes.
Loading Julia¶
Basic Julia Script¶
hello.jl
Simple computation
println("Julia Test Script")
println("Julia version: ", VERSION)
# Vector operations
x = [1, 2, 3, 4, 5]
println("Sum: ", sum(x))
println("Mean: ", sum(x) / length(x))
# Function definition
function compute_squares(n)
return [i^2 for i in 1:n]
end
squares = compute_squares(10)
println("Squares: ", squares)
SLURM Job Script (Serial)¶
julia_job.sh
Serial Julia job
Checking Job Output
View output
Expected output:
Multi-Threading¶
Julia supports shared-memory parallelism for computationally intensive operations.
matrix_multiply.jl
Multi-threaded matrix multiplication
using LinearAlgebra
using Base.Threads
# Force BLAS to use the correct number of threads
BLAS.set_num_threads(nthreads())
println("Julia threads: ", nthreads())
println("BLAS threads: ", BLAS.get_num_threads())
n = 9000
A = rand(n, n)
B = rand(n, n)
println("Matrix multiplication (", n, "x", n, ")...")
t = @elapsed C = A * B
println("Time: ", round(t, digits=3), "s")
println("GFLOPS: ", round(2*n^3 / t / 1e9, digits=2))
julia_threads.sh
Multi-threaded job script
#!/bin/bash
#SBATCH --job-name=julia_threads
#SBATCH --partition=CPU
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16 #generally --cpus-per-task=--threads
#SBATCH --mem=16G
#SBATCH --time=01:00:00
#SBATCH --output=julia_threads_%j.out
module load julia/1.11.2
echo "=== 1 thread ==="
julia --threads 1 matrix_multiply.jl
echo -e "\n=== 8 threads ==="
julia --threads 8 matrix_multiply.jl
echo -e "\n=== 16 threads ==="
julia --threads 16 matrix_multiply.jl
Expected Output
With 16 threads on a 3000×3000 matrix multiplication:
=== 1 thread ===
Julia threads: 1
BLAS threads: 1
Matrix multiplication (9000x9000)...
Time: 15.292s
GFLOPS: 95.34
=== 8 threads ===
Julia threads: 8
BLAS threads: 8
Matrix multiplication (9000x9000)...
Time: 2.312s
GFLOPS: 630.5
=== 16 threads ===
Julia threads: 16
BLAS threads: 16
Matrix multiplication (9000x9000)...
Time: 1.441s
GFLOPS: 1012.11
BLAS Configuration
Always include BLAS.set_num_threads(nthreads()) at the start of your script.
Without this, BLAS may use 128 threads by default, causing severe performance degradation.
Package Management¶
Julia packages install to your home directory (~/.julia/) by default. Each user maintains their own package environment.
Installing packages (interactive on login node)
# Interactive Julia session (only for package installation on login node)
# Note: GPU packages like CUDA must be installed on GPU nodes - see CUDA section
module load julia/1.11.2
julia
In Julia REPL:
Disk Usage
Julia packages and precompilation cache can consume significant disk space.
Check usage:
Clean old artifacts (if this is a new Julia environment with all packages in use, nothing will be removed):
Project Environments¶
For reproducible research, create project-specific environments:
Create project environment
In Julia:
This creates Project.toml and Manifest.toml which can be version controlled.
markdownThe second Pkg.status() command should return only DataFrames, not Plots, since Plots was installed in the base Julia environment.
Using project environments in SLURM
To run scripts using the defined environment, include the path to the folder containing Project.toml and Manifest.toml using --project:
GPU Computing with CUDA¶
Julia provides excellent GPU support through CUDA.jl for NVIDIA GPUs.
Installation Location
CUDA.jl must be installed on a GPU node, not the login node.
Installing on the login node will result in non-functional CUDA support.
Installing CUDA.jl¶
install_cuda.sh
Install CUDA.jl on GPU node
#!/bin/bash
#SBATCH --job-name=cuda_install
#SBATCH --partition=GPU
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1
#SBATCH --mem=8G
#SBATCH --time=00:30:00
#SBATCH --output=cuda_install_%j.out
module load julia/1.11.2
echo "=== Installing CUDA.jl ==="
julia -e 'using Pkg; Pkg.add("CUDA"); Pkg.build("CUDA")'
echo ""
echo "=== Testing CUDA functionality ==="
julia -e 'using CUDA; println("CUDA functional: ", CUDA.functional()); if CUDA.functional(); println("GPU: ", CUDA.name(CUDA.device())); end'
Expected Output
GPU Computation Example¶
gpu_test.jl
GPU matrix multiplication benchmark
using CUDA
println("Julia GPU Test")
println("CUDA available: ", CUDA.functional())
if CUDA.functional()
println("GPU: ", CUDA.name(CUDA.device()))
println("CUDA version: ", CUDA.runtime_version())
println("Compute capability: ", CUDA.capability(CUDA.device()))
println()
# GPU computation
n = 10000
A = CUDA.rand(n, n)
B = CUDA.rand(n, n)
println("Matrix multiplication on GPU (", n, "x", n, ")...")
# Warmup
C = A * B
CUDA.synchronize()
# Benchmark
t = @elapsed begin
C = A * B
CUDA.synchronize()
end
println("GPU Time: ", round(t, digits=3), "s")
println("GPU GFLOPS: ", round(2*n^3 / t / 1e9, digits=2))
# CPU comparison
A_cpu = rand(n, n)
B_cpu = rand(n, n)
println("\nCPU comparison:")
t_cpu = @elapsed C_cpu = A_cpu * B_cpu
println("CPU Time: ", round(t_cpu, digits=3), "s")
println("Speedup: ", round(t_cpu/t, digits=2), "x")
else
println("ERROR: CUDA not functional!")
end
julia_gpu.sh
GPU job script
Expected Output
On NVIDIA H200 with 10000×10000 matrices:
More Information¶
Documentation
Official Julia documentation:
CUDA.jl documentation: