WiseHosting
Networking

WireGuard mesh

Step-by-step guide to the self-hosted WireGuard tunnel between the control plane and every worker — keys, addresses, ports, troubleshooting.

This page explains the WireGuard mesh that carries every byte of control traffic between the WiseHosting control plane and its workers. If you are setting up a new worker, fixing a broken tunnel, or just trying to understand why ports look the way they do, start here.

Who needs to read this?

  • Operators standing up a new worker — follow Setting it up end to end.
  • On-callers debugging a worker that fell off the network — jump to Troubleshooting.
  • Curious newcomers — start with Why WireGuard? for the high-level picture.

Why WireGuard?

We previously used Tailscale to reach workers privately. WireGuard replaces it because:

  1. No third-party identity provider. A Tailscale outage or account problem could lock out the entire fleet. WireGuard keys live in /etc/wireguard/ on each host — fully self-contained.
  2. Stable, owned IPs. Every worker gets a deterministic 10.50.0.x address that we control, with no risk of churn from a coordination server.
  3. Minimal moving parts. A [Peer] block per worker, a wg syncconf to reload, and systemctl enable wg-quick@wg0 to make it persist. No daemons beyond the kernel module.
  4. Encrypted end-to-end. Curve25519 key exchange + ChaCha20-Poly1305 in the kernel. Once the tunnel is up, plain HTTP across it is fine — the kernel encrypts every packet before it leaves the NIC.

The tradeoff: you have to manually copy a public key between the two hosts the first time you bring a worker up. That's it. Once the peer block is on the CP, the tunnel comes back up automatically across reboots and network blips.

Topology at a glance

ItemValueWhy
Subnet10.50.0.0/24Private RFC-1918 space, no overlap with wisehosting-build (10.89.0.0/16) or Podman default (10.88.0.0/16).
Control plane address10.50.0.1Always the first address. Workers and the proxy always dial this.
Worker N address10.50.0.{N+1}Worker --worker-id 110.50.0.2, --worker-id 210.50.0.3, …
Proxy server address10.50.0.30Fixed address for the dedicated proxy VPS. The proxy Traefik reaches worker containers via these WG IPs.
UDP port51821We deliberately skip 51820 so a wg-easy instance can co-exist on a host.
Listen address0.0.0.0The CP must be reachable on its public IPv4 to receive worker and proxy handshakes.
Worker → CP control URLhttp://10.50.0.1:8081Plain HTTP — encryption is handled by the kernel WireGuard layer, not TLS.
Proxy → CP config URLhttp://10.50.0.1:8081Same tunnel; proxy polls /v1/traefik/proxy-config every 5 s.

Public exposure stays small

The only ports the CP exposes publicly are the dashboard (:8081 via Cloudflare) and :51821/udp for WG handshakes. The proxy server exposes :80 and :443 for end-user app traffic. Workers expose no public ports — all container traffic reaches them via the proxy over WireGuard. Everything else is firewalled or simply not bound.

Key concepts (newcomers, start here)

WireGuard is intentionally tiny — there are really only four ideas to keep in mind.

  • Peers, not clients/servers. Both ends are equal. Each side has a private key and knows the other side's public key.
  • AllowedIPs — the list of remote addresses you accept and route over this tunnel. On the worker we set AllowedIPs = 10.50.0.1/32: only the CP. On the CP we set AllowedIPs = 10.50.0.<worker>/32 per worker. This double-restricts traffic; nothing else can ride the tunnel.
  • Endpoint — the public IP:port you handshake with. Workers know the CP's public IPv4. The CP doesn't need to know the worker's public IP; the kernel learns it from the first handshake packet.
  • PersistentKeepalive — every 25 seconds, the worker sends a keepalive so home-NAT / cloud-NAT mappings stay open. Without it, a worker behind NAT becomes unreachable from the CP after a few minutes of silence.

What the script does

scripts/wireguard-setup.sh is the only setup file you need. It has two modes:

# On the control plane:
./scripts/wireguard-setup.sh control

# On each worker:
./scripts/wireguard-setup.sh worker \
    --cp-pubkey   '<paste from CP /etc/wireguard/cp_public.key>' \
    --cp-public-ip '<CP public IPv4>' \
    --worker-id    2

Both modes:

  • Install wireguard-tools if missing (with a dpkg -i fallback if apt is in a half-broken state — common after a partial Podman upgrade).
  • Create /etc/wireguard/ with mode 0700.
  • Generate a Curve25519 keypair into cp_private.key/cp_public.key (CP) or worker_private.key/worker_public.key (worker), if one doesn't already exist.
  • Write /etc/wireguard/wg0.conf with mode 0600.
  • Open UDP 51821 in iptables (idempotent — uses -C to check first).
  • systemctl enable --now wg-quick@wg0.

Re-running the script is safe: existing keys are preserved, and the iptables rule check prevents duplicates.

Control-plane mode

Generates wg0.conf like:

[Interface]
Address = 10.50.0.1/24
ListenPort = 51821
PrivateKey = <CP private key>

# Add a [Peer] block per worker (re-run this script with role=worker
# on each worker, then paste the worker's public key + endpoint here).

Note: the [Peer] blocks are empty initially — you append them as workers come online (see next).

Worker mode

The script needs three flags:

FlagWhat goes there
--cp-pubkeyPaste the contents of /etc/wireguard/cp_public.key from the CP.
--cp-public-ipThe CP's public IPv4 (the script prints it via curl ipify.org after control mode finishes).
--worker-idA small integer unique to this worker. The address becomes 10.50.0.{ID + 1}.

It produces:

[Interface]
Address = 10.50.0.{ID+1}/24
ListenPort = 51821
PrivateKey = <worker private key>

[Peer]
# control plane
PublicKey = <CP public key>
AllowedIPs = 10.50.0.1/32
Endpoint = <CP public IPv4>:51821
PersistentKeepalive = 25

After this brings up wg0, the worker's tunnel is half-finished — it's set up to talk to the CP, but the CP has no [Peer] block for this worker yet. The worker can't reach 10.50.0.1 until you do that.

Setting it up

The whole flow takes maybe two minutes per worker:

  1. CP side, once: sudo ./scripts/wireguard-setup.sh control. Save the output — it prints both the CP public key and the public IPv4. Workers will need both.
  2. Worker side, once per worker:
    sudo ./scripts/wireguard-setup.sh worker \
        --cp-pubkey   "<paste CP public key>" \
        --cp-public-ip "<paste CP public IPv4>" \
        --worker-id    2
    It prints the worker's own public key. Save that.
  3. CP side again — paste the worker's public key as a peer:
    sudo tee -a /etc/wireguard/wg0.conf <<EOF
    
    [Peer]
    # worker-2
    PublicKey = <worker public key>
    AllowedIPs = 10.50.0.3/32
    EOF
    Then reload without dropping existing peers:
    sudo wg syncconf wg0 <(wg-quick strip wg0)
    wg syncconf is the magic incantation: it diffs the running config against the file and applies only the delta. Other workers stay connected.
  4. Worker side, point the agent at the tunnel. Edit /etc/wisehosting/config.yaml:
    api_server:
      url: "http://10.50.0.1:8081"
    Then sudo systemctl restart wisehosting-worker.
  5. Verify:
    # On the worker:
    ping -c 1 10.50.0.1
    curl -fsS http://10.50.0.1:8081/healthz
    
    # On the CP:
    sudo wg show           # latest handshake should be < 30 s old
    journalctl -u wisehosting -f | grep "hub: worker"
    # → "hub: worker <id> (<name>) connected via WSS"

Why port 8081, not 8080?

The control plane HTTP server listens on :8081. The worker's Traefik listens on :8080 for end-user app traffic. They're different services on different hosts; the worker config's api_server.url points at the control plane, hence :8081.

Why plain HTTP across the tunnel?

The validator in internal/worker/agent.go deliberately permits http:// for loopback and RFC-1918 addresses, including 10.50.0.0/24:

// Allows http:// only when the host is loopback or an RFC-1918 address
// (the WireGuard tunnel encrypts those end-to-end). Public hosts must use https.

The reasoning: WireGuard already encrypts every packet with ChaCha20-Poly1305 and authenticates it with Poly1305 — adding TLS on top would re-encrypt the same bytes for no extra security and add a CA-management burden (self-signed cert pinning on every worker, expiry rotations, …). Public addresses still require https://, so a misconfigured worker that points at https://hosting.example.com and falls back to public DNS doesn't accidentally downgrade.

Reloading after config changes

ActionCommand
Add or remove peersedit /etc/wireguard/wg0.conf, then sudo wg syncconf wg0 <(wg-quick strip wg0)
Full restart (drops all peers briefly)sudo systemctl restart wg-quick@wg0
Bring tunnel downsudo wg-quick down wg0
Bring tunnel upsudo wg-quick up wg0
Permanent (persists across reboots)sudo systemctl enable wg-quick@wg0

wg syncconf is preferred for live changes because it doesn't tear down the kernel interface — every other worker on the mesh keeps its handshake.

Troubleshooting

Tunnel is up locally but ping 10.50.0.1 fails

Run sudo wg show on the worker:

  • No handshake at all → the CP never received the first packet. Check that:
    • The CP's iptables -L INPUT actually has the udp dpt:51821 ACCEPT rule.
    • The hosting provider's firewall (DigitalOcean / Hetzner / GCP / …) allows inbound UDP 51821.
    • The --cp-public-ip you passed to the worker is correct. A wrong endpoint just times out silently.
  • Handshake older than 3 minutes → keepalives stopped. Confirm PersistentKeepalive = 25 is in the worker's [Peer] block. Some providers drop UDP NAT mappings aggressively.
  • Latest handshake < 30 s ago, ping still fails → the CP hasn't been told about this worker. Did you append the worker's [Peer] block on the CP and run wg syncconf?

Permission denied reading /etc/wireguard/

The directory is 0700 and files are 0600 — that's intentional, only root reads them. Use sudo.

apt fails halfway through install_wg_tools

The script falls back to:

apt download wireguard-tools
dpkg -i wireguard-tools_*.deb

This handles the case where unrelated podman / buildah packages have left apt in a broken state (a common occurrence when the alvistack OBS repo goes missing). If even that fails, install wireguard-tools manually from your distro mirror.

Worker config still points at the old Tailscale address

If you're migrating from the previous Tailscale design, update /etc/wisehosting/config.yaml:

api_server:
  url: "http://10.50.0.1:8081"   # was 100.x.y.z under Tailscale

Then systemctl restart wisehosting-worker. The agent re-registers, fetches a new JWT, and reconnects WSS.

systemctl status wg-quick@wg0 says dead

sudo journalctl -u wg-quick@wg0 -n 50

Common causes:

  • Syntax error in wg0.conf (a stray space, a missing =).
  • Two [Interface] blocks (don't run the script twice with different role flags on the same host).
  • Kernel module missing (modprobe wireguard should succeed; on very old kernels you need wireguard-dkms).

Key rotation

Compromised key, periodic rotation, or rebuilding a host?

  1. On the affected host, rename the old keypair:
    sudo mv /etc/wireguard/{cp,worker}_private.key{,.bak}
    sudo mv /etc/wireguard/{cp,worker}_public.key{,.bak}
  2. Re-run the matching role of the script. It will generate a fresh keypair and rewrite wg0.conf.
  3. Update the other side's [Peer] block with the new public key.
  4. sudo wg syncconf wg0 <(wg-quick strip wg0) on the other side.
  5. Delete the .bak files.

The control-plane API key (the one in /etc/wisehosting/config.yaml under worker.api_key) is separate from the WireGuard key and rotates independently.

Where this fits in the rest of the system

  • The control plane binds :8081 for everything: dashboard, OAuth, REST API, worker control endpoints (/v1/workers/*), and the two Traefik HTTP-provider endpoints (/v1/traefik/config legacy, /v1/traefik/proxy-config for the proxy server). Cloudflare WAF rules drop public hits to the worker and Traefik paths at the edge, but defence-in-depth means workers and the proxy only dial those paths via the tunnel.
  • The worker's WSS connection (/v1/workers/ws) goes to :8081 over the tunnel. See Worker & WSS reference for the connection lifecycle.
  • The proxy Traefik polls /v1/traefik/proxy-config every 5 s over the tunnel. See Proxy server setup for full details.
  • The proxy server also uses WireGuard to reach workers — container ports are bound on the worker's WG IP (10.50.0.x) and the proxy forwards HTTP directly to them. No public port on the worker is required.

Reference: full files

CP /etc/wireguard/wg0.conf (after two workers and the proxy server are joined):

[Interface]
Address = 10.50.0.1/24
ListenPort = 51821
PrivateKey = <CP private key>

[Peer]
# worker-1
PublicKey = <worker-1 public key>
AllowedIPs = 10.50.0.2/32

[Peer]
# worker-2
PublicKey = <worker-2 public key>
AllowedIPs = 10.50.0.3/32

[Peer]
# proxy server (192.99.14.173)
PublicKey = <proxy public key>
AllowedIPs = 10.50.0.30/32

Worker-1 /etc/wireguard/wg0.conf:

[Interface]
Address = 10.50.0.2/24
ListenPort = 51821
PrivateKey = <worker-1 private key>

[Peer]
# control plane
PublicKey = <CP public key>
AllowedIPs = 10.50.0.1/32
Endpoint = <CP public IPv4>:51821
PersistentKeepalive = 25

Proxy server /etc/wireguard/wg0.conf:

[Interface]
Address = 10.50.0.30/24
ListenPort = 51821
PrivateKey = <proxy private key>

[Peer]
# control plane
PublicKey = <CP public key>
AllowedIPs = 10.50.0.0/24
Endpoint = <CP public IPv4>:51821
PersistentKeepalive = 25

Proxy AllowedIPs is broader

The proxy peer uses AllowedIPs = 10.50.0.0/24 (the whole mesh) rather than just 10.50.0.1/32. This lets Traefik on the proxy route directly to any worker container IP in the mesh without an additional route. Workers keep their narrower AllowedIPs = 10.50.0.1/32 since they only need to reach the control plane.

On this page