9.8 KiB
Impermanence Rollout Strategy
Overview
This document covers rolling out impermanence (ephemeral root filesystem) to all hosts, using Juni as the template.
What is Impermanence?
Philosophy: Root filesystem (/) is wiped on every boot (tmpfs or reset subvolume), forcing you to explicitly declare what state to persist.
Benefits:
- Clean system by default - no accumulated cruft
- Forces documentation of important state
- Easy rollback (just reboot)
- Security (ephemeral root limits persistence of compromises)
- Reproducible server state
Current State
| Host | Impermanence | Notes |
|---|---|---|
| Juni | ✅ Implemented | bcachefs with @root/@persist subvolumes |
| H001 | ❌ Traditional | Most complex - many services |
| H002 | ❌ Traditional | NAS - may not need impermanence |
| H003 | ❌ Traditional | Router - good candidate |
| O001 | ❌ Traditional | Gateway - good candidate |
| L001 | ❌ Traditional | Headscale - good candidate |
Juni's Implementation (Reference)
Filesystem Layout
bcachefs (5 devices, 2x replication)
├── @root # Ephemeral - reset each boot
├── @nix # Persistent - Nix store
├── @persist # Persistent - bind mounts for state
└── @snapshots # Automatic snapshots
Boot Process
- Create snapshot of @root before reset
- Reset @root subvolume (or recreate)
- Boot into clean system
- Bind mount persisted paths from @persist
Persisted Paths (Juni)
environment.persistence."/persist" = {
hideMounts = true;
directories = [
"/var/log"
"/var/lib/nixos"
"/var/lib/systemd"
"/var/lib/tailscale"
"/var/lib/flatpak"
"/etc/NetworkManager/system-connections"
];
files = [
"/etc/machine-id"
"/etc/ssh/ssh_host_ed25519_key"
"/etc/ssh/ssh_host_ed25519_key.pub"
"/etc/ssh/ssh_host_rsa_key"
"/etc/ssh/ssh_host_rsa_key.pub"
];
users.josh = {
directories = [
".ssh"
".gnupg"
"projects"
".config"
".local/share"
];
};
};
Custom Tooling
Juni has bcache-impermanence with commands:
ls- List snapshotsgc- Garbage collect old snapshotsdiff- Show changes since last boot (auto-excludes persisted paths)
Retention policy: 5 recent + 1/week for 4 weeks + 1/month
Common Pain Point: Finding What Needs Persistence
"I often have issues adding new persistent layers and knowing what I need to add"
Discovery Workflow
Method 1: Use the Diff Tool
Before rebooting after installing new software:
# On Juni
bcache-impermanence diff
This shows files created/modified outside persisted paths.
Method 2: Boot and Observe Failures
# After reboot, check for failures
journalctl -b | grep -i "no such file"
journalctl -b | grep -i "failed to"
journalctl -b | grep -i "permission denied"
Method 3: Monitor File Changes
# Before making changes
find /var /etc -type f -printf '%T@ %p\n' 2>/dev/null | sort -n > /tmp/before.txt
# After running services
find /var /etc -type f -printf '%T@ %p\n' 2>/dev/null | sort -n > /tmp/after.txt
# Compare
diff /tmp/before.txt /tmp/after.txt
Method 4: Service-Specific Patterns
Most services follow predictable patterns:
| Pattern | Example | Usually Needs Persistence |
|---|---|---|
/var/lib/${service} |
/var/lib/postgresql |
Yes |
/var/cache/${service} |
/var/cache/nginx |
Usually no |
/var/log/${service} |
/var/log/nginx |
Optional |
/etc/${service} |
/etc/nginx |
Only if runtime-generated |
Server Impermanence Template
Minimal Server Persistence
environment.persistence."/persist" = {
hideMounts = true;
directories = [
# Core system
"/var/lib/nixos" # NixOS state DB
"/var/lib/systemd/coredump"
"/var/log"
# Network
"/var/lib/tailscale"
"/etc/NetworkManager/system-connections"
# ACME certificates
"/var/lib/acme"
];
files = [
"/etc/machine-id"
"/etc/ssh/ssh_host_ed25519_key"
"/etc/ssh/ssh_host_ed25519_key.pub"
"/etc/ssh/ssh_host_rsa_key"
"/etc/ssh/ssh_host_rsa_key.pub"
];
};
Per-Host Additions
H001 (Services)
environment.persistence."/persist".directories = [
# Add to minimal template:
"/var/lib/forgejo"
"/var/lib/zitadel"
"/var/lib/openbao"
"/bao-keys"
"/var/lib/trilium"
"/var/lib/opengist"
"/var/lib/open-webui"
"/var/lib/n8n"
"/var/lib/nixarr/state"
"/var/lib/containers" # Podman/container state
];
O001 (Gateway)
environment.persistence."/persist".directories = [
# Add to minimal template:
"/var/lib/vaultwarden"
"/var/lib/postgresql"
"/var/lib/fail2ban"
];
L001 (Headscale)
environment.persistence."/persist".directories = [
# Add to minimal template:
"/var/lib/headscale"
];
H003 (Router)
environment.persistence."/persist".directories = [
# Add to minimal template:
"/var/lib/AdGuardHome"
"/var/lib/dnsmasq"
];
environment.persistence."/persist".files = [
# Add to minimal template:
"/boot/keyfile_nvme0n1p1" # LUKS key - CRITICAL
];
Rollout Strategy
Phase 1: Lowest Risk (VPS Hosts)
Start with L001 and O001:
- Easy to rebuild from scratch if something goes wrong
- Smaller state footprint
- Good practice before tackling complex hosts
L001 Steps:
- Back up
/var/lib/headscale/ - Add impermanence module
- Test on spare VPS first
- Migrate
O001 Steps:
- Back up Vaultwarden and PostgreSQL
- Add impermanence module
- Test carefully (Vaultwarden is critical!)
Phase 2: Router (H003)
H003 is medium complexity:
- Relatively small state
- But critical for network (test during maintenance window)
- LUKS keyfile needs special handling
Phase 3: Complex Host (H001)
H001 is most complex due to:
- Multiple containerized services
- Database state in containers
- Many stateful applications
Approach:
- Inventory all state paths (see backup docs)
- Test with snapshot before committing
- Gradual rollout with extensive persistence list
- May need to persist more than expected initially
Phase 4: NAS (H002) - Maybe Skip
H002 may not benefit from impermanence:
- Primary purpose is persistent data storage
- bcachefs replication already provides redundancy
- Impermanence adds complexity without clear benefit
Filesystem Options
Option A: bcachefs with Subvolumes (Like Juni)
Pros:
- Flexible, modern
- Built-in snapshots
- Replication support
Setup:
fileSystems = {
"/" = {
device = "/dev/disk/by-label/nixos";
fsType = "bcachefs";
options = [ "subvol=@root" ];
};
"/nix" = {
device = "/dev/disk/by-label/nixos";
fsType = "bcachefs";
options = [ "subvol=@nix" ];
};
"/persist" = {
device = "/dev/disk/by-label/nixos";
fsType = "bcachefs";
options = [ "subvol=@persist" ];
neededForBoot = true;
};
};
Option B: BTRFS with Subvolumes
Similar to bcachefs but more mature:
# Reset @root on boot
boot.initrd.postDeviceCommands = lib.mkAfter ''
mkdir -p /mnt
mount -o subvol=/ /dev/disk/by-label/nixos /mnt
btrfs subvolume delete /mnt/@root
btrfs subvolume create /mnt/@root
umount /mnt
'';
Option C: tmpfs Root
Simplest but uses RAM:
fileSystems."/" = {
device = "none";
fsType = "tmpfs";
options = [ "defaults" "size=2G" "mode=755" ];
};
Best for: VPS hosts with limited disk but adequate RAM.
Troubleshooting
Service Fails After Reboot
# Check what's missing
journalctl -xeu servicename
# Common fixes:
# 1. Add /var/lib/servicename to persistence
# 2. Ensure directory permissions are correct
# 3. Check if service expects specific files in /etc
"No such file or directory" Errors
# Find what's missing
journalctl -b | grep "No such file"
# Add missing paths to persistence
Slow Boot (Too Many Bind Mounts)
If you have many persisted paths, consider:
- Consolidating related paths
- Using symlinks instead of bind mounts for some paths
- Persisting parent directories instead of many children
Container State Issues
Containers may have their own state directories:
# For NixOS containers
environment.persistence."/persist".directories = [
"/var/lib/nixos-containers"
];
# For Podman
environment.persistence."/persist".directories = [
"/var/lib/containers/storage/volumes"
# NOT overlay - that's regenerated
];
Tooling Improvements
Automated Discovery Script
Create a helper that runs periodically to detect unpersisted changes:
#!/usr/bin/env bash
# /usr/local/bin/impermanence-check
# Get list of persisted paths
PERSISTED=$(nix eval --raw '.#nixosConfigurations.hostname.config.environment.persistence."/persist".directories' 2>/dev/null | tr -d '[]"' | tr ' ' '\n')
# Find modified files outside persisted paths
find / -xdev -type f -mmin -60 2>/dev/null | while read -r file; do
is_persisted=false
for path in $PERSISTED; do
if [[ "$file" == "$path"* ]]; then
is_persisted=true
break
fi
done
if ! $is_persisted; then
echo "UNPERSISTED: $file"
fi
done
Pre-Reboot Check
Add to your workflow:
# Before rebooting
bcache-impermanence diff # or custom script
# Review changes, add to persistence if needed, then reboot
Action Items
Immediate
- Document all state paths for each host (see backup docs)
- Create shared impermanence module in flake
Phase 1 (L001/O001)
- Back up current state
- Add impermanence to L001
- Test thoroughly
- Roll out to O001
Phase 2 (H003)
- Plan maintenance window
- Add impermanence to H003
- Verify LUKS key persistence
Phase 3 (H001)
- Complete state inventory
- Test with extensive persistence list
- Gradual rollout