One drive. One login. Every node sees the same home directory.
In Episode 3, we set up the network, installed Rocky Linux on all six nodes, configured DHCP and NAT, and hardened SSH. The cluster is networked and secured. Now it needs two things before Slurm makes any sense: shared storage and centralized authentication.
Without these two pieces, you are manually copying files to every node and creating the same user account six times. This episode fixes both problems.
*(Click the image to watch the tutorial on YouTube)*1. Why Shared Storage Matters #
Without NFS, submitting an MPI job across two nodes means your input data has to exist on both nodes. You either copy it manually or write a script to sync it. Neither is sustainable.
With NFS, the Samsung 990 Pro on arbiter (the management node) exports a single /home directory. Every node in the cluster mounts it. Write a script on the login node, run it from any compute node. The file is already there.
This also matters for Slurm. When a job writes output files, they land in /home on the NFS share. You do not need to SSH into compute nodes to retrieve results.
Prerequisites
Before starting this episode:
- All nodes are running Rocky Linux 10 with network configured (Episode 3)
arbiterhas the Samsung 990 Pro NVMe drive installed (Episode 2)- SSH key-based login is working from
arbiterto all other nodes
2. Ansible Setup #
From this episode onward, we use Ansible to apply configuration across all nodes at once. Without it, every change means SSHing into six machines individually.
Ansible runs from arbiter. We keep it in /opt/ansible rather than a home directory so it stays off the NFS share. Ansible configuration files contain SSH keys and vault passwords that should not be visible to every node in the cluster.
Install Ansible #
[wpaik@arbiter ~]$ sudo dnf install ansible-core
[wpaik@arbiter ~]$ sudo mkdir -p /opt/ansible
[wpaik@arbiter ~]$ sudo chown wpaik:wpaik /opt/ansible
[wpaik@arbiter ~]$ cd /opt/ansibleSSH Key #
Generate a dedicated key for Ansible and distribute it to all nodes:
[wpaik@arbiter ansible]$ mkdir .ssh
[wpaik@arbiter ansible]$ ssh-keygen -t ed25519 -f .ssh/worker_ed25519 -N ""
[wpaik@arbiter ansible]$ for node in 192.168.50.1 192.168.50.15 192.168.50.32 192.168.50.11 192.168.50.19; do
ssh-copy-id -i .ssh/worker_ed25519.pub wpaik@$node
doneInventory and Config #
Create hosts.ini:
[head]
carrier.cluster.local ansible_host=192.168.50.1
[management]
arbiter.cluster.local ansible_host=192.168.50.50 ansible_connection=local
[workers]
interceptor-01.cluster.local ansible_host=192.168.50.15
interceptor-02.cluster.local ansible_host=192.168.50.32
[gpu]
corsair-01.cluster.local ansible_host=192.168.50.11
[visualization]
observer.cluster.local ansible_host=192.168.50.19
[compute:children]
workers
gpu
[all_nodes:children]
head
management
workers
gpu
visualization
[all_nodes:vars]
ansible_user=wpaik
cluster_network=192.168.50.0/24
cluster_domain=cluster.local
cluster_realm=CLUSTER.LOCALNote that arbiter uses ansible_connection=local since it is the Ansible controller itself.
Create ansible.cfg:
[defaults]
private_key_file = /opt/ansible/.ssh/worker_ed25519
inventory = ./hosts.ini
host_key_checking = False
log_path = ./log/ansible.log
vault_password_file = /opt/ansible/.ansible_vault_pw
remote_tmp = /var/tmp/.ansible-${USER}/tmpThe last line, remote_tmp, deserves a note since it is the one setting that bites you only later. By default Ansible writes its per-task staging files into ~/.ansible/tmp/ on the remote node. After we set up NFS in section 3, every node’s /home lives on the NFS share, so that staging directory ends up on NFS. Files written there get the nfs_t SELinux context, which dnf refuses to handle when installing local RPMs in later episodes. The failure mode is misleading as dnf reports No match for argument for an RPM file that visibly exists on disk. Pinning remote_tmp to a local path on each node (/var/tmp is always local) sidesteps this entirely. It costs nothing now and saves a long debugging session in Episode 5.
Verify connectivity:
[wpaik@arbiter ansible]$ ansible all -m ping
carrier.cluster.local | SUCCESS => { "ping": "pong" }
arbiter.cluster.local | SUCCESS => { "ping": "pong" }
interceptor-01.cluster.local | SUCCESS => { "ping": "pong" }
interceptor-02.cluster.local | SUCCESS => { "ping": "pong" }
corsair-01.cluster.local | SUCCESS => { "ping": "pong" }
observer.cluster.local | SUCCESS => { "ping": "pong" }All six nodes responding. From here on, playbooks handle the repetitive work.
3. NFS Server Setup #
All commands in this section run on arbiter.
Partition the NVMe Drive with LVM #
A single large partition works, but LVM gives us the flexibility to allocate separate volumes for home directories, work storage, shared software, and scratch space. This mirrors how storage is typically organized on a real HPC cluster.
First, verify the NVMe drive:
[wpaik@arbiter ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 223.6G 0 disk
├─sda1 8:1 0 600M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 222G 0 part
├─rl-root 253:0 0 70G 0 lvm /
└─rl-swap 253:1 0 7.7G 0 lvm [SWAP]
nvme0n1 259:0 0 931.5G 0 diskThe SATA boot drive is sda. The NVMe is nvme0n1. Create a physical volume, volume group, and four logical volumes:
# Install LVM tools
$ sudo dnf install -y lvm2
# Create physical volume and volume group
$ sudo pvcreate /dev/nvme0n1
$ sudo vgcreate vg_nfs /dev/nvme0n1
# Create logical volumes
$ sudo lvcreate -L 167G -n lv_home vg_nfs
$ sudo lvcreate -L 251G -n lv_work vg_nfs
$ sudo lvcreate -L 84G -n lv_shared vg_nfs
$ sudo lvcreate -L 251G -n lv_scratch vg_nfs
# Format as XFS
$ sudo mkfs.xfs /dev/vg_nfs/lv_home
$ sudo mkfs.xfs /dev/vg_nfs/lv_work
$ sudo mkfs.xfs /dev/vg_nfs/lv_shared
$ sudo mkfs.xfs /dev/vg_nfs/lv_scratchCreate mount points and mount:
$ sudo mkdir -p /nfsdata/{home,work,shared,scratch}
$ sudo mount /dev/vg_nfs/lv_home /nfsdata/home
$ sudo mount /dev/vg_nfs/lv_work /nfsdata/work
$ sudo mount /dev/vg_nfs/lv_shared /nfsdata/shared
$ sudo mount /dev/vg_nfs/lv_scratch /nfsdata/scratchAdd to /etc/fstab for persistence:
$ echo '/dev/vg_nfs/lv_home /nfsdata/home xfs defaults 0 0' | sudo tee -a /etc/fstab
$ echo '/dev/vg_nfs/lv_work /nfsdata/work xfs defaults 0 0' | sudo tee -a /etc/fstab
$ echo '/dev/vg_nfs/lv_shared /nfsdata/shared xfs defaults 0 0' | sudo tee -a /etc/fstab
$ echo '/dev/vg_nfs/lv_scratch /nfsdata/scratch xfs defaults 0 0' | sudo tee -a /etc/fstabBind mount /nfsdata/home to /home on arbiter itself, so the management node also uses the NFS storage:
$ echo '/nfsdata/home /home none bind 0 0' | sudo tee -a /etc/fstab
$ sudo mount -aVerify the final layout:
[wpaik@arbiter ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 223.6G 0 disk
├─sda1 8:1 0 600M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 222G 0 part
├─rl-root 253:0 0 70G 0 lvm /
├─rl-swap 253:1 0 7.7G 0 lvm [SWAP]
└─rl-home 253:6 0 144.3G 0 lvm
nvme0n1 259:0 0 931.5G 0 disk
├─vg_nfs-lv_home 253:2 0 167G 0 lvm /home
│ /nfsdata/home
├─vg_nfs-lv_work 253:3 0 251G 0 lvm /nfsdata/work
├─vg_nfs-lv_shared 253:4 0 84G 0 lvm /nfsdata/shared
└─vg_nfs-lv_scratch 253:5 0 251G 0 lvm /nfsdata/scratchThe bind mount makes lv_home appear twice: once at /nfsdata/home (the actual mount point) and once at /home (the bind mount that arbiter itself uses). The other three volumes only mount at their /nfsdata paths on arbiter. Client nodes will mount them at /work, /shared, and /scratch via NFS.
Configure the NFS Server #
$ sudo dnf install -y nfs-utils
$ sudo systemctl enable --now nfs-serverConfigure /etc/exports:
/nfsdata/home 192.168.50.0/24(rw,sync,no_root_squash,no_subtree_check)
/nfsdata/work 192.168.50.0/24(rw,sync,no_root_squash,no_subtree_check)
/nfsdata/shared 192.168.50.0/24(rw,sync,no_root_squash,no_subtree_check)
/nfsdata/scratch 192.168.50.0/24(rw,sync,no_root_squash,no_subtree_check)A quick note on the options: rw allows read and write, sync commits writes to disk before responding (safer), no_subtree_check avoids a performance penalty when exporting subdirectories, and no_root_squash lets root on client nodes act as root on the share, which Slurm will need later.
Note on
no_root_squash: This is appropriate for a trusted internal cluster network. Our cluster is physically isolated on the192.168.50.xsubnet. On a shared cluster with untrusted users, useroot_squashinstead.
Apply and open the firewall:
$ sudo exportfs -ra
$ sudo firewall-cmd --permanent --add-service={nfs,rpc-bind,mountd}
$ sudo firewall-cmd --reload
# Verify
$ sudo showmount -e localhost
Export list for localhost:
/nfsdata/scratch 192.168.50.0/24
/nfsdata/shared 192.168.50.0/24
/nfsdata/work 192.168.50.0/24
/nfsdata/home 192.168.50.0/244. NFS Client Setup #
Rather than SSHing into each node manually, use Ansible. Run from /opt/ansible on arbiter:
[wpaik@arbiter ansible]$ ansible-playbook playbooks/nfs_setup.yaml -KWhat the playbook does on each client node: installs nfs-utils, sets the SELinux boolean for NFS home directories, creates mount points for /work, /shared, and /scratch, adds all four NFS mounts to /etc/fstab with _netdev, and mounts them.
The _netdev option tells the system to wait for network availability before mounting. Without it, a node that boots faster than arbiter will fail to mount and potentially hang at boot.
The playbook also enables XFS quota on arbiter and reboots it to apply. This is covered in the full playbook in the GitHub repository.
Verify from carrier after rebooting:
[wpaik@carrier ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rl-root 70G 5.4G 65G 8% /
arbiter.cluster.local:/nfsdata/home 167G 8.2G 159G 5% /home
arbiter.cluster.local:/nfsdata/work 251G 4.9G 247G 2% /work
arbiter.cluster.local:/nfsdata/shared 84G 23G 62G 27% /shared
arbiter.cluster.local:/nfsdata/scratch 251G 22G 230G 9% /scratchNote: The playbook reboots worker and GPU nodes automatically. carrier (the head node) requires a manual reboot after the playbook completes since it is the SSH entry point into the cluster. After rebooting carrier, verify mounts with
df -h.
Before moving on to FreeIPA, run the Chrony playbook to synchronize time across all nodes:
[wpaik@arbiter ansible]$ ansible-playbook playbooks/chrony_setup.yaml -KThis sets up carrier as the NTP server for the cluster and configures all other nodes to sync from it. FreeIPA uses Kerberos for authentication, and Kerberos will reject tickets if the time difference between nodes exceeds 5 minutes. Running Chrony before FreeIPA avoids that problem.
Test that the share works:
# Create a test file from interceptor-01
[wpaik@interceptor-01 ~]$ touch /home/nfs_test.txt
# Verify it appears on interceptor-02
[wpaik@interceptor-02 ~]$ ls /home/nfs_test.txt
/home/nfs_test.txtOne file, visible everywhere.
5. Time Synchronization (Chrony) #
Before setting up FreeIPA, all nodes need to be synchronized to the same time source. FreeIPA uses Kerberos for authentication, and Kerberos will reject tickets if the clock difference between nodes exceeds 5 minutes. On a fresh cluster this is usually fine, but it is better to set it up explicitly.
carrier acts as the NTP server for the cluster. It syncs from external sources (time.cloudflare.com, pool.ntp.org) and serves time to all internal nodes. The other nodes sync from carrier.
[wpaik@arbiter ansible]$ ansible-playbook playbooks/chrony_setup.yaml -KVerify sync status on any node after the playbook completes:
$ chronyc tracking
Reference ID : C0A83201 (carrier.cluster.local)
Stratum : 3
System time : 0.000123456 seconds fast of NTP time
Last offset : +0.000045678 seconds
RMS offset : 0.000089012 secondsReference ID pointing to carrier.cluster.local confirms the node is syncing from carrier.
6. The Problem with Local Users #
NFS solves the file sharing problem. But it creates a new one.
NFS uses UID (User ID) and GID (Group ID) numbers to handle file permissions, not usernames. When user will on interceptor-01 has UID 1001, and user will on interceptor-02 has UID 1002 (because you created the accounts in a different order), they see different permissions on the same NFS files.
# On interceptor-01
$ id will
uid=1001(will) gid=1001(will)
# On interceptor-02
$ id will
uid=1002(will) gid=1002(will)
# The NFS file owned by will on interceptor-01 (uid=1001)
# looks like it belongs to a different user on interceptor-02You can work around this by manually synchronizing UIDs across every node. On a six-node cluster with a few users, that is tedious but manageable. On a real cluster with hundreds of users, it is not viable.
The proper solution is centralized authentication: one place where user accounts are defined, and every node pulls from that source. This is what FreeIPA provides.
Pre-flight: UID Alignment #
NFS does not compare usernames. It compares the numeric UID and GID stamped on every file. If wpaik has UID 1000 on arbiter but UID 1001 on interceptor-01, every file written from interceptor-01 lands on the share owned by UID 1001, and arbiter cannot find a matching user. Reads and writes silently misbehave or fail outright.
For a fresh six-node build done in one sitting, this usually does not bite. Rocky’s installer assigns UID 1000 to the first user created during installation, so as long as wpaik was the first user on every node, the numbers line up by themselves. The hazard appears later: a node reinstalled out of band, a kickstart that differs between machines, or an extra account created during install before wpaik. The UID drifts, NFS quietly breaks, and the failure mode is confusing because everything else looks fine.
Check before mounting anything:
[wpaik@arbiter ansible]$ ansible all_nodes -a "id wpaik"Every node should report the same uid= and gid=. If one differs, align it against arbiter’s value (typically 1000, but verify) before continuing.
The fix runs on the misaligned node, as a different sudoer or as root, with no active wpaik session. The example below assumes arbiter has wpaik at UID 1000 and the misaligned node currently has 1001. Substitute your actual values.
# On the misaligned node, as root or another sudoer
[root@interceptor-01 ~]# who | grep wpaik # confirm no live session
[root@interceptor-01 ~]# pkill -KILL -u wpaik # kill any leftovers
# If NFS is already mounted, unmount first
[root@interceptor-01 ~]# umount /home # use -l if busy
# Renumber the account
[root@interceptor-01 ~]# groupmod -g 1000 wpaik
[root@interceptor-01 ~]# usermod -u 1000 -g 1000 wpaik
# Fix ownership of files under the old UID.
# -xdev keeps find on the local filesystem, so other partitions
# and NFS mounts (if any are still present) are not touched.
[root@interceptor-01 ~]# find / -xdev -uid 1001 -exec chown -h 1000 {} +
[root@interceptor-01 ~]# find / -xdev -gid 1001 -exec chgrp -h 1000 {} +
# Verify
[root@interceptor-01 ~]# id wpaik
uid=1000(wpaik) gid=1000(wpaik) groups=1000(wpaik),10(wheel)If wpaik belonged to extra groups before (wheel, for example), check with groups wpaik and re-add anything that got dropped during the usermod.
This is a stopgap. FreeIPA in Section 7 replaces local accounts with centralized identity and the question stops mattering. Until then, UID alignment is something you manage by hand whenever a node joins the cluster out of cycle.
7. FreeIPA Server Installation #
FreeIPA bundles several services into one package: LDAP (directory), Kerberos (authentication), DNS, and a certificate authority. The installation is opinionated and sets everything up together.
All commands in this section run on arbiter.
Prerequisites #
FreeIPA requires a fully qualified domain name (FQDN). Verify it resolves correctly before proceeding:
[wpaik@arbiter ~]$ hostname -f
arbiter.cluster.local
[wpaik@arbiter ~]$ ping -c 1 arbiter.cluster.local
PING arbiter.cluster.local (192.168.50.50) 56(84) bytes of data.Also verify at least 1.5GB of free RAM. The installer is memory-hungry:
$ free -h
total used free
Mem: 15Gi 800Mi 14GiInstall and Run the Server Setup #
$ sudo dnf install -y freeipa-server freeipa-server-dns
$ sudo ipa-server-install \
--domain=cluster.local \
--realm=CLUSTER.LOCAL \
--ds-password=<your_directory_manager_password> \
--admin-password=<your_admin_password> \
--hostname=arbiter.cluster.local \
--ip-address=192.168.50.50 \
--no-ntp \
--unattendedA few things to note: --realm must be uppercase, --no-ntp skips NTP configuration since we manage time sync with Chrony separately, and --unattended skips interactive prompts. The installer takes 5-10 minutes and configures LDAP, Kerberos, and the CA.
After completion, open the required firewall ports:
$ sudo firewall-cmd --permanent --add-service={freeipa-ldap,freeipa-ldaps,kerberos,dns,http,https}
$ sudo firewall-cmd --reloadVerify the Installation #
$ kinit admin
Password for admin@CLUSTER.LOCAL:
$ klist
Ticket cache: KCM:0
Default principal: admin@CLUSTER.LOCAL
Valid starting Expires Service principal
04/27/26 09:00:00 04/28/26 09:00:00 krbtgt/CLUSTER.LOCAL@CLUSTER.LOCAL
$ ipa user-find
---------------
0 users matched
---------------No users yet. We will add them after enrollment.
Set the default shell to bash (the FreeIPA default is /bin/sh):
$ ipa config-mod --defaultshell=/bin/bash8. FreeIPA Client Enrollment #
Before enrolling, add arbiter to /etc/hosts on every node. The enrollment process needs to resolve arbiter.cluster.local, and at this point SSSD is not yet configured. Doing this beforehand ensures enrollment does not fail on DNS resolution.
The Ansible playbook handles this automatically:
[wpaik@arbiter ansible]$ ansible-playbook playbooks/freeipa_setup.yaml -KIf you prefer to do it manually on each node:
# Add arbiter to /etc/hosts
$ echo "192.168.50.50 arbiter.cluster.local arbiter" | sudo tee -a /etc/hosts
# Install and enroll
$ sudo dnf install -y freeipa-client oddjob-mkhomedir
$ sudo ipa-client-install \
--server=arbiter.cluster.local \
--domain=cluster.local \
--realm=CLUSTER.LOCAL \
--principal=admin \
--password=<your_admin_password> \
--mkhomedir \
--no-ntp \
--unattendedThe --mkhomedir flag tells the system to create a home directory on first login. Since /home is NFS-mounted from arbiter, the directory lands on the NFS share and is immediately visible from all nodes.
After enrollment, confirm each node can reach the IPA server:
[wpaik@interceptor-01 ~]$ ipa user-find
---------------
0 users matched
---------------If this returns a response (even 0 users), the client is enrolled and talking to the server.
Create a Test User #
Back on arbiter:
[wpaik@arbiter ~]$ kinit admin
$ ipa user-add testuser \
--first=Test \
--last=User \
--password
$ ipa user-find testuser
--------------
1 user matched
--------------
User login: testuser
First name: Test
Last name: User
Home directory: /home/testuser
Login shell: /bin/bash
UID: 99100XXXX
GID: 99100XXXXNotice the UID range. FreeIPA assigns UIDs starting well above the range used by local system accounts, avoiding any collision. The exact starting range depends on how FreeIPA was configured during installation, but whatever it assigns will be identical on every node in the cluster.
For ongoing user management, the scripts/user_creation.sh script in the GitHub repository handles the full process: FreeIPA account creation, home directory setup with correct NFS ownership, XFS quota, and Slurm accounting entry.
Accessing the FreeIPA Web UI #
The FreeIPA web interface is reachable from outside the cluster using sshuttle, a VPN-over-SSH tool that routes traffic through the login node.
On your local machine:
# Install sshuttle
$ sudo dnf install sshuttle # Fedora/RHEL
# or: pip install sshuttle
# Add arbiter to your local /etc/hosts
$ echo "192.168.50.50 arbiter arbiter.cluster.local" | sudo tee -a /etc/hosts
# Open the tunnel (keep this terminal open)
$ sshuttle -r wpaik@carrier.cluster.local 192.168.50.0/24 --dnsThen open a browser and go to https://arbiter.cluster.local/ipa/ui/. Accept the self-signed certificate warning and log in with the admin credentials.
9. Verification #
SSH as the new user from the login node to a compute node:
[wpaik@carrier ~]$ ssh testuser@interceptor-01
Password:
Creating home directory for testuser.
[testuser@interceptor-01 ~]$ pwd
/home/testuser
[testuser@interceptor-01 ~]$ id
uid=99100XXXX(testuser) gid=99100XXXX(testuser) groups=99100XXXX(testuser)Now check the same user from a different node:
[testuser@interceptor-02 ~]$ id
uid=99100XXXX(testuser) gid=99100XXXX(testuser) groups=99100XXXX(testuser)Same UID on both nodes. Files written on interceptor-01 have correct permissions on interceptor-02. The home directory is the same NFS path regardless of which node you land on.
One account. Every node. One home directory.
Troubleshooting Common Issues #
Enrollment fails with DNS error:
The playbook adds arbiter.cluster.local to /etc/hosts before enrollment. If it still fails, verify the entry exists on the failing node:
$ getent hosts arbiter.cluster.local
192.168.50.50 arbiter.cluster.local arbiterIf missing, add it manually:
$ echo "192.168.50.50 arbiter.cluster.local arbiter" | sudo tee -a /etc/hostsNFS mount fails after FreeIPA enrollment:
FreeIPA updates /etc/nsswitch.conf. Confirm files appears before sss for passwd and group:
$ grep -E "^(passwd|group)" /etc/nsswitch.conf
passwd: sss files systemd
group: sss files systemdIf NFS mounts hang after enrollment:
$ sudo setsebool -P use_nfs_home_dirs 1Home directory not created on first login:
$ sudo systemctl enable --now oddjobdNode freezes on boot after NFS setup:
A stale resume=UUID in GRUB can cause boot hangs. From the GRUB menu, press e, remove the resume=UUID=... argument, then Ctrl+X to boot. Once up:
$ grubby --update-kernel=ALL --remove-args="resume=UUID=<UUID>"10. What is Next #
The cluster now has shared storage and centralized authentication. Every node shares the same home directory and every user has a consistent identity across all nodes.
Next episode we install Slurm, the job scheduler. With NFS and FreeIPA already in place, Slurm has everything it needs to schedule jobs across nodes and write output files back to a shared location.
All configuration files and Ansible playbooks from this episode are in the GitHub repository.
Happy Computing!