# Migrating Services Between Hosts ## Overview This document covers procedures for migrating services between NixOS hosts with minimal downtime. ## General Migration Strategy ### Pre-Migration Checklist - [ ] New host is configured in flake with identical service config - [ ] New host has required secrets (agenix/sops) - [ ] Network connectivity verified (Tailscale IP assigned) - [ ] Disk space sufficient on new host - [ ] Backup of current state completed ### Migration Types | Type | Downtime | Complexity | Use When | |------|----------|------------|----------| | Cold migration | 5-30 min | Low | Simple services, maintenance windows | | Warm migration | 2-5 min | Medium | Most services | | Hot migration | <1 min | High | Databases with replication | --- ## Cold Migration (Simple) Best for: Stateless or rarely-accessed services. ### Steps ```bash # 1. Stop service on old host ssh oldhost 'systemctl stop myservice' # 2. Copy state to new host rsync -avz --progress oldhost:/var/lib/myservice/ newhost:/var/lib/myservice/ # 3. Start on new host ssh newhost 'systemctl start myservice' # 4. Update reverse proxy (if applicable) # Edit nginx config: proxyPass = "http://" # Rebuild: ssh proxy 'nixos-rebuild switch' # 5. Verify service works # 6. Clean up old host (after verification period) ssh oldhost 'rm -rf /var/lib/myservice' ``` **Downtime:** Duration of rsync + service start + proxy update. --- ## Warm Migration (Recommended) Best for: Most services with moderate state. ### Strategy 1. Sync state while service is running (initial sync) 2. Stop service briefly for final sync 3. Start on new host 4. Update routing ### Steps ```bash # 1. Initial sync (service still running) rsync -avz --progress oldhost:/var/lib/myservice/ newhost:/var/lib/myservice/ # 2. Stop service on old host ssh oldhost 'systemctl stop myservice' # 3. Final sync (quick - only changes since initial sync) rsync -avz --progress oldhost:/var/lib/myservice/ newhost:/var/lib/myservice/ # 4. Start on new host ssh newhost 'systemctl start myservice' # 5. Update reverse proxy immediately ssh proxy 'nixos-rebuild switch' # 6. Verify curl https://myservice.joshuabell.xyz ``` **Downtime:** 2-5 minutes (final rsync + start + proxy switch). --- ## Hot Migration (Database Services) Best for: PostgreSQL, critical services requiring near-zero downtime. ### PostgreSQL Logical Replication #### On Source (Old Host) ```nix services.postgresql = { settings = { wal_level = "logical"; max_replication_slots = 4; max_wal_senders = 4; }; }; # Add replication user services.postgresql.ensureUsers = [{ name = "replicator"; ensurePermissions."ALL TABLES IN SCHEMA public" = "SELECT"; }]; ``` #### Set Up Replication ```sql -- On source: Create publication CREATE PUBLICATION my_pub FOR ALL TABLES; -- On target: Create subscription CREATE SUBSCRIPTION my_sub CONNECTION 'host=oldhost dbname=mydb user=replicator' PUBLICATION my_pub; ``` #### Cutover ```bash # 1. Verify replication is caught up # Check lag on target: SELECT * FROM pg_stat_subscription; # 2. Stop writes on source (maintenance mode) # 3. Wait for final sync # 4. Promote target (drop subscription) DROP SUBSCRIPTION my_sub; # 5. Update application connection strings # 6. Update reverse proxy ``` **Downtime:** <1 minute (just the cutover). --- ## Service-Specific Procedures ### Forgejo (Git Server) **State locations:** - `/var/lib/forgejo/data/` - Git repositories, LFS - `/var/lib/forgejo/postgres/` - PostgreSQL database - `/var/lib/forgejo/backups/` - Existing backups **Procedure (Warm Migration):** ```bash # 1. Put Forgejo in maintenance mode (optional) ssh h001 'touch /var/lib/forgejo/data/maintenance' # 2. Backup database inside container ssh h001 'nixos-container run forgejo -- pg_dumpall -U forgejo > /var/lib/forgejo/backups/pre-migration.sql' # 3. Initial sync rsync -avz --progress h001:/var/lib/forgejo/ newhost:/var/lib/forgejo/ # 4. Stop container ssh h001 'systemctl stop container@forgejo' # 5. Final sync rsync -avz --progress h001:/var/lib/forgejo/ newhost:/var/lib/forgejo/ # 6. Start on new host ssh newhost 'systemctl start container@forgejo' # 7. Update O001 nginx # Change: proxyPass = "http://100.64.0.13" → "http://" ssh o001 'nixos-rebuild switch' # 8. Verify git clone https://git.joshuabell.xyz/test/repo.git # 9. Remove maintenance mode ssh newhost 'rm /var/lib/forgejo/data/maintenance' ``` **Downtime:** ~5 minutes. ### Zitadel (SSO) **State locations:** - `/var/lib/zitadel/postgres/` - PostgreSQL database - `/var/lib/zitadel/backups/` - Backups **Critical notes:** - SSO is used by other services - coordinate downtime - Test authentication after migration - May need to clear client caches **Procedure:** Same as Forgejo. ### Vaultwarden (Password Manager) **State locations:** - `/var/lib/vaultwarden/` - SQLite database, attachments **Critical notes:** - MOST CRITICAL SERVICE - users depend on this constantly - Prefer hot migration or schedule during low-usage time - Verify emergency access works after migration **Procedure:** ```bash # 1. Enable read-only mode (if supported) # 2. Sync while running rsync -avz --progress o001:/var/lib/vaultwarden/ newhost:/var/lib/vaultwarden/ # 3. Quick cutover ssh o001 'systemctl stop vaultwarden' rsync -avz --progress o001:/var/lib/vaultwarden/ newhost:/var/lib/vaultwarden/ ssh newhost 'systemctl start vaultwarden' # 4. Update DNS/proxy immediately # 5. Verify with mobile app and browser extension ``` **Downtime:** 2-3 minutes (coordinate with users). ### Headscale **State locations:** - `/var/lib/headscale/` - SQLite database with node registrations **Critical notes:** - ALL mesh connectivity depends on this - Existing connections continue during migration - New connections will fail during downtime **Procedure:** ```bash # 1. Backup current state restic -r /backup/l001 backup /var/lib/headscale --tag pre-migration # 2. Sync to new VPS rsync -avz --progress l001:/var/lib/headscale/ newvps:/var/lib/headscale/ # 3. Stop on old host ssh l001 'systemctl stop headscale' # 4. Final sync rsync -avz --progress l001:/var/lib/headscale/ newvps:/var/lib/headscale/ # 5. Start on new host ssh newvps 'systemctl start headscale' # 6. Update DNS # headscale.joshuabell.xyz → new IP # 7. Verify headscale nodes list tailscale status # 8. Test new device joining ``` **Downtime:** 5-10 minutes (include DNS propagation time). ### AdGuard Home **State locations:** - `/var/lib/AdGuardHome/` - Config, query logs, filters **Critical notes:** - LAN DNS will fail during migration - Configure backup DNS on clients first **Procedure:** ```bash # 1. Add temporary DNS to DHCP (e.g., 1.1.1.1) # Or have clients use secondary DNS server # 2. Quick migration ssh h003 'systemctl stop adguardhome' rsync -avz --progress h003:/var/lib/AdGuardHome/ newhost:/var/lib/AdGuardHome/ ssh newhost 'systemctl start adguardhome' # 3. Update DHCP to point to new host # 4. Verify DNS resolution dig @new-host-ip google.com ``` **Downtime:** 2-3 minutes (clients use backup DNS). --- ## Reverse Proxy Updates When migrating services proxied through O001: ### Current Proxy Mappings (O001 nginx.nix) | Domain | Backend | |--------|---------| | chat.joshuabell.xyz | 100.64.0.13 (H001) | | git.joshuabell.xyz | 100.64.0.13 (H001) | | notes.joshuabell.xyz | 100.64.0.13 (H001) | | sec.joshuabell.xyz | 100.64.0.13 (H001) | | sso.joshuabell.xyz | 100.64.0.13 (H001) | | llm.joshuabell.xyz | 100.64.0.13:8095 (H001) | ### Updating Proxy 1. Edit `hosts/oracle/o001/nginx.nix` 2. Change `proxyPass` to new Tailscale IP 3. Commit and push 4. `ssh o001 'cd /etc/nixos && git pull && nixos-rebuild switch'` Or for faster updates without commit: ```bash # Quick test (non-persistent) ssh o001 'sed -i "s/100.64.0.13/100.64.0.XX/g" /etc/nginx/nginx.conf && nginx -s reload' # Then update flake and rebuild properly ``` --- ## Rollback Procedures If migration fails: ### Quick Rollback ```bash # 1. Stop on new host ssh newhost 'systemctl stop myservice' # 2. Start on old host (state should still be there) ssh oldhost 'systemctl start myservice' # 3. Revert proxy changes ssh proxy 'nixos-rebuild switch --rollback' ``` ### If Old State Was Deleted ```bash # Restore from backup restic -r /backup/oldhost restore latest --target / --include /var/lib/myservice # Start service systemctl start myservice # Revert proxy ``` --- ## Post-Migration Checklist - [ ] Service responds correctly - [ ] Authentication works (if applicable) - [ ] Data integrity verified - [ ] Monitoring updated to new host - [ ] DNS/proxy pointing to new location - [ ] Old host state cleaned up (after grace period) - [ ] Backup job updated for new location - [ ] Documentation updated --- ## Common Issues ### "Permission denied" on New Host ```bash # Ensure correct ownership chown -R serviceuser:servicegroup /var/lib/myservice # Check SELinux/AppArmor if applicable ``` ### Service Can't Connect to Database ```bash # Verify PostgreSQL is running systemctl status postgresql # Check connection settings cat /var/lib/myservice/config.yaml | grep -i database ``` ### SSL Certificate Issues ```bash # Certificates are tied to domain, not host # Should work automatically if domain unchanged # If issues, force ACME renewal systemctl restart acme-myservice.joshuabell.xyz.service ``` ### Tailscale IP Changed ```bash # Get new Tailscale IP tailscale ip -4 # Update all references to old IP grep -r "100.64.0.XX" /etc/nixos/ ```