November 2020 – Going Clockless

I currently live with roommates, and as great as that is, it has the drawback that anything that happens to the router affects everyone. After the router crashed (more on that later) I decided to offload the VPN server to a separate device. Since I already have a PiHole set up, I decided to add it to that. I’m going to go through all the config needed for that here.

First, my router crashed. It looks like the config I used in my last post may have had some issues; the symptoms point to an as-yet unidentified resource leak somewhere. It started by showing unusually high CPU load on one of the cores, and high RAM usage. Over time, various non-essential services shut off one by one, starting with SSH access, and eventually the web management interface entirely. Interestingly, the core routing/NAT services never stopped functioning.

So, I factory-reset the unit and restored a very early backup from before I started using the /jffs/ partition at all. This got everyone’s internet back up pretty quickly. Now I just had to keep it from happening again. I decided to work with the VPN server on a spare Raspberry Pi I had lying around (as one does) instead of putting it back on the router. This way the VPN server had (nearly) no chance of breaking anyone else’s internet while I worked on it.

I chose to stick with OpenVPN, and decided to use certificate authentication. This is a little more effort to set up, but it means the server simply needs the certificate to authenticate clients. It doesn’t actually need to have a list of usernames/passwords. To generate the certificates ans keys I used EasyRSA, you just build a Certificate Authority somewhere secure, and then generate requests and sign them. I then moved the files to the devices that needed them over the local network.

The OpenVPN installation on Debian systems creates an example configuration file for you in /usr/share/doc/openvpn/examples/. I started with that and set up most of the same settings as before. I enabled tls-crypt for additional security during the handshake setup process. This has the side benefit that connecting to and using the VPN looks just like browsing with HTTPS to anyone watching your client devices’ connections.

Now for the networking fun. First I needed the VPN to work locally, with the client on the local network sending pings through the VPN server and back to some other local machine. So I turned the VPN on, connected successfully (yay) and … nothing. A few minutes of tcpdump debugging later and I discovered that the VPN server (a Pi) was dropping all of my packets after receiving them. I already had IP forwarding turned on, but I had to add a firewall rule for the interface.

# Keep things simple, let's let clients send all traffic they want to
sudo ufw route allow in on tun0
sudo ufw reload

I could have done this directly with iptables, but I thought UFW was a nice way to get the rules to clean up after themselves for me, and it’s easier to go read the rule list later than to dump the entire firewall. A little testing and I have pings entering tun0 and exiting eth0, but still no replies. Huh.

More tcpdump on the target device (my PC) and it turns out that the pings were being received and responded to, but they were getting sent to the router, which dropped them because it doesn’t have a route to the VPN subnet. Well, that’s easy enough, the router has a “Routes” tab under the LAN page where I added the route. Finally, my pings got through, huzzah! This more-or-less indicates the VPN server itself is working correctly now. I still need to:

Enable external access by port-forwarding
Let the VPN clients access the internet through the router
- Route tunnel-traffic through the WAN interface on the router
- Route client connections (after exiting the tunnel) through the upstream VPN (a client running on the router)
- Route any internet traffic from the VPN Server itself to the VPN Client

The first one is easy, just go add a port-forward profile under the WAN page of the router’s admin interface. The rest get a little more complicated. At first glance, you might expect the second one to “just work” if you turn off the upstream VPN, the router is already the default gateway for the VPN server, and the router lets the VPN server access the internet. You would actually be right, as long as we keep the VPN Client on the router off, or don’t send our traffic through it, but that’s boring and frankly, just too easy for me to let myself get away with.

Once I turned on the VPN Client, an odd thing happened, my packets would get to the router, and I could see the log rules I added getting hit, but they would then never get replies. Thinking this sounded familiar I checked for any bad routing table information, but nothing came up, we only need the default gateway for this anyways.

It turns out that the default iptables rules generated by the router’s OpenVPN Client don’t NAT everything, they only set MASQUERADE on packets from the “local network” (192.168.1.0/24 for example). So my new VPN subnet was being sent out un-NATed (or dropped because it’s a Class-C private address range). After poking around a bit I determined I wasn’t going to find a way to enable this in the admin panel. Before diving into /jffs/scripts/ territory, I decided to run the NAT on the test Pi and make sure that fixed the issue.

sudo iptables -t nat -A POSTROUTING -s 10.8.0.0/24 -o eth0 -j MASQUERADE

Great, my pings now show up as the server’s IP, but connectivity is fine, and even better: I can ping remote hosts (like 8.8.8.8)! Still, I don’t really like double-NAT if I don’t need it, so let’s revisit the router’s scripting options. I;m still a little gun-shy after my last config attempts seem to have resulted in a resource leak, but I’ll just have to be more careful.

The actual iptables command is actually the same as what I used on the Pi, and after removing other unneeded config from /jffs/scripts/firewall-start, it seems to be working smoothly. I did learn that the router seems to respond differently if you launch the script manually from an ssh session, versus if you let the router launch it, so I stopped testing it manually, and instead triggered a firewall reload by adding/removing a useless static route to 10.0.0.0/24 in the admin panel. This reloads the firewall and invokes the firewall-start script. After many trials of doing that, it seems to be stable, so I;m assuming my other config before (lots of unnecessary rule-deletes and such so I could run the script manually) may have been the issue before, if there was one in the firewall-start script.

With the masquerade rule in place, everything seems fine, I can connect to the VPN Server and all my traffic connects as expected, either locally, or through the VPN Client to the internet. So, I loaded a new client certificate on my phone and connected to mobile data. Aaand no connection. Figures. I turned off the VPN Client, and suddenly I can connect. Odd. Time for some router packet logging:

iptables -t mangle -I PREROUTING -p udp -m udp --dport 1194 -j LOG --log-prefix "Client->VPN mangle:PREROUTING "
iptables -t mangle -I POSTROUTING -p udp -m udp --sport 1194 -j LOG --log-prefix "VPN->Client mangle:POSTROUTING "

This was enough to discover that my outgoing packets were going though the VPN Client, and therefore the TLS handshake was failing to complete. I need VPN-tunnel traffic (encrypted stuff on port 1194) to go to the WAN. This involves matching on port parameters for the VPN-routing rules, which the web interface can’t do. Well, that’s frustrating: I can’t set the rules in the firewall-start script, because the VPN Client might not be ready and I’d be referencing a non-existent interface. Time to revisit openvpn-event.

I’d like to mention here that a lot of effort could have been saved if someone had thought to state in the documentation that the script is given the important info about why it’s being run (what event happened to which openvpn interface). Specifically, that the event is in the $script_type environment variable, and the interface in question is in $dev. The actual arguments to the script don’t really give you enough to tell what needs to be done. The arguments for an “up” event are identical to a “down” event.

With that learning in hand, I now felt like I could make a script that properly cleaned up after itself and didn’t cause a resource leak. But first, let’s take a look at how aswrt-merlin adds the selective routing policies from the admin panel. Cutting to the results of that investivation: If you run ip rule show on the router, you get something like the follwoing

$ ip rule show

# VPN Client off/no policy rules:
0:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default

# VPN Client on with some rules
0:	from all lookup local
10101:	from 192.168.1.3 lookup ovpnc1
10101:	from 192.168.1.192/26 lookup ovpnc1
10102:	from 10.8.0.0/16 lookup ovpnc1
32766:	from all lookup main
32767:	from all lookup default

That’s it, the policy rules state to use different routing tables depending on the source (or destination, if specified) address. We need to modify this to also use other match conditions. There are two steps here:

Identify all tunnel-traffic and mark it
Add an ip rule that matches marked traffic and routes it to main before it hits the VPN rules

To do this, I had the script (for “up” events on “tun11”) create a rule chain vpn-server-mark in the mangle table. You can’t use filter because you need to use the PREROUTING chain as a starting point.

iptables -t mangle -N vpn-server-mark
iptables -t mangle -A PREROUTING -j vpn-server-mark
iptables -t mangle -A vpn-server-mark -p tcp --sport 1194 -s 192.168.1.0/24 -j MARK --set-mark 0x8000/0x8000
iptables -t mangle -A vpn-server-mark -p udp --sport 1194 -s 192.168.1.0/24 -j MARK --set-mark 0x8000/0x8000

This sets bit 15 of the mark for any packets that

Come from our local network (you could remove the subnet check entirely) AND
Are either TCP or UDP port 1194 (the default that I’m using for the OpenVPN server)

The second point is to add a pair of rules for the VPN server with ip rule add

# Send marked packets from the VPN server to the WAN
ip rule add from 192.168.1.3 fwmark 0x8000/0x8000 table main   priority 1000

# Send unmarked packets from the vpn server to the WAN
ip rule add from 192.168.1.3 fwmark 0x0000/0x8000 table ovpnc1 priority 1001

This now checks both the IP and source port of packets to make sure that VPN-tunnel traffic is sent directly to the client, not through another VPN.

Make sure that you add a case to the script for the “down” event to undo all of this! The whole openvpn-event script:

if [ "$script_type" = "up" -a "$dev" = "tun11" ]; then
  echo "VPN Raised, setting routing rules"
  # Add ip routes for marked packets
  # Bit 15: Do Not Use VPN upstream
  ip rule add from 192.168.104.3 fwmark 0x8000/0x8000 table main   priority 1000
  ip rule add from 192.168.104.3 fwmark 0x0000/0x8000 table ovpnc1 priority 1001

  # Create rules to mark packets
  iptables -t mangle -N vpn-server-mark
  iptables -t mangle -A PREROUTING -j vpn-server-mark
  iptables -t mangle -A vpn-server-mark -p tcp --sport 1194 -s 192.168.1.0/24 -j MARK --set-mark 0x8000/0x8000
  iptables -t mangle -A vpn-server-mark -p udp --sport 1194 -s 192.168.1.0/24 -j MARK --set-mark 0x8000/0x8000
elif [ "$script_type" = "down" -a "$dev" = "tun11" ]; then
  echo "VPN Lowered, removing routing rules"
  # Teardown
  iptables -t mangle -D PREROUTING -j vpn-server-mark
  iptables -t mangle --flush vpn-server-mark
  iptables -t mangle -X vpn-server-mark
  ip rule del from 192.168.1.3 fwmark 0x8000/0x8000 table main   priority 1000
  ip rule del from 192.168.1.3 fwmark 0x0000/0x8000 table ovpnc1 priority 1001
fi

Once I tested everything to my satisfaction, I moved the VPN server to my PiHole, which necessitated the changing of a bunch of IP addresses in the scripts, and the addition of the following on the PiHole:

# Tell dnamasq to listen on the VPN tunnel, so it can serve DNS to VPN Clients
echo "interface=tun0" >> /etc/dnsmasq.d/02-ovpn.conf

That’s all, it seems to be working fine even after several VPN on/off cycles and moving between local connection and mobile data for en external connection. If it breaks badly again, I’ll be back with another post!

Month: November 2020

VPN Chain Configuration with a Separate Server