VPN Chain Configuration with a Separate Server

I currently live with roommates, and as great as that is, it has the drawback that anything that happens to the router affects everyone. After the router crashed (more on that later) I decided to offload the VPN server to a separate device. Since I already have a PiHole set up, I decided to add it to that. I’m going to go through all the config needed for that here.

First, my router crashed. It looks like the config I used in my last post may have had some issues; the symptoms point to an as-yet unidentified resource leak somewhere. It started by showing unusually high CPU load on one of the cores, and high RAM usage. Over time, various non-essential services shut off one by one, starting with SSH access, and eventually the web management interface entirely. Interestingly, the core routing/NAT services never stopped functioning.

So, I factory-reset the unit and restored a very early backup from before I started using the /jffs/ partition at all. This got everyone’s internet back up pretty quickly. Now I just had to keep it from happening again. I decided to work with the VPN server on a spare Raspberry Pi I had lying around (as one does) instead of putting it back on the router. This way the VPN server had (nearly) no chance of breaking anyone else’s internet while I worked on it.

I chose to stick with OpenVPN, and decided to use certificate authentication. This is a little more effort to set up, but it means the server simply needs the certificate to authenticate clients. It doesn’t actually need to have a list of usernames/passwords. To generate the certificates ans keys I used EasyRSA, you just build a Certificate Authority somewhere secure, and then generate requests and sign them. I then moved the files to the devices that needed them over the local network.

The OpenVPN installation on Debian systems creates an example configuration file for you in /usr/share/doc/openvpn/examples/. I started with that and set up most of the same settings as before. I enabled tls-crypt for additional security during the handshake setup process. This has the side benefit that connecting to and using the VPN looks just like browsing with HTTPS to anyone watching your client devices’ connections.

Now for the networking fun. First I needed the VPN to work locally, with the client on the local network sending pings through the VPN server and back to some other local machine. So I turned the VPN on, connected successfully (yay) and … nothing. A few minutes of tcpdump debugging later and I discovered that the VPN server (a Pi) was dropping all of my packets after receiving them. I already had IP forwarding turned on, but I had to add a firewall rule for the interface.

# Keep things simple, let's let clients send all traffic they want to
sudo ufw route allow in on tun0
sudo ufw reload

I could have done this directly with iptables, but I thought UFW was a nice way to get the rules to clean up after themselves for me, and it’s easier to go read the rule list later than to dump the entire firewall. A little testing and I have pings entering tun0 and exiting eth0, but still no replies. Huh.

More tcpdump on the target device (my PC) and it turns out that the pings were being received and responded to, but they were getting sent to the router, which dropped them because it doesn’t have a route to the VPN subnet. Well, that’s easy enough, the router has a “Routes” tab under the LAN page where I added the route. Finally, my pings got through, huzzah! This more-or-less indicates the VPN server itself is working correctly now. I still need to:

Enable external access by port-forwarding
Let the VPN clients access the internet through the router
- Route tunnel-traffic through the WAN interface on the router
- Route client connections (after exiting the tunnel) through the upstream VPN (a client running on the router)
- Route any internet traffic from the VPN Server itself to the VPN Client

The first one is easy, just go add a port-forward profile under the WAN page of the router’s admin interface. The rest get a little more complicated. At first glance, you might expect the second one to “just work” if you turn off the upstream VPN, the router is already the default gateway for the VPN server, and the router lets the VPN server access the internet. You would actually be right, as long as we keep the VPN Client on the router off, or don’t send our traffic through it, but that’s boring and frankly, just too easy for me to let myself get away with.

Once I turned on the VPN Client, an odd thing happened, my packets would get to the router, and I could see the log rules I added getting hit, but they would then never get replies. Thinking this sounded familiar I checked for any bad routing table information, but nothing came up, we only need the default gateway for this anyways.

It turns out that the default iptables rules generated by the router’s OpenVPN Client don’t NAT everything, they only set MASQUERADE on packets from the “local network” (192.168.1.0/24 for example). So my new VPN subnet was being sent out un-NATed (or dropped because it’s a Class-C private address range). After poking around a bit I determined I wasn’t going to find a way to enable this in the admin panel. Before diving into /jffs/scripts/ territory, I decided to run the NAT on the test Pi and make sure that fixed the issue.

sudo iptables -t nat -A POSTROUTING -s 10.8.0.0/24 -o eth0 -j MASQUERADE

Great, my pings now show up as the server’s IP, but connectivity is fine, and even better: I can ping remote hosts (like 8.8.8.8)! Still, I don’t really like double-NAT if I don’t need it, so let’s revisit the router’s scripting options. I;m still a little gun-shy after my last config attempts seem to have resulted in a resource leak, but I’ll just have to be more careful.

The actual iptables command is actually the same as what I used on the Pi, and after removing other unneeded config from /jffs/scripts/firewall-start, it seems to be working smoothly. I did learn that the router seems to respond differently if you launch the script manually from an ssh session, versus if you let the router launch it, so I stopped testing it manually, and instead triggered a firewall reload by adding/removing a useless static route to 10.0.0.0/24 in the admin panel. This reloads the firewall and invokes the firewall-start script. After many trials of doing that, it seems to be stable, so I;m assuming my other config before (lots of unnecessary rule-deletes and such so I could run the script manually) may have been the issue before, if there was one in the firewall-start script.

With the masquerade rule in place, everything seems fine, I can connect to the VPN Server and all my traffic connects as expected, either locally, or through the VPN Client to the internet. So, I loaded a new client certificate on my phone and connected to mobile data. Aaand no connection. Figures. I turned off the VPN Client, and suddenly I can connect. Odd. Time for some router packet logging:

iptables -t mangle -I PREROUTING -p udp -m udp --dport 1194 -j LOG --log-prefix "Client->VPN mangle:PREROUTING "
iptables -t mangle -I POSTROUTING -p udp -m udp --sport 1194 -j LOG --log-prefix "VPN->Client mangle:POSTROUTING "

This was enough to discover that my outgoing packets were going though the VPN Client, and therefore the TLS handshake was failing to complete. I need VPN-tunnel traffic (encrypted stuff on port 1194) to go to the WAN. This involves matching on port parameters for the VPN-routing rules, which the web interface can’t do. Well, that’s frustrating: I can’t set the rules in the firewall-start script, because the VPN Client might not be ready and I’d be referencing a non-existent interface. Time to revisit openvpn-event.

I’d like to mention here that a lot of effort could have been saved if someone had thought to state in the documentation that the script is given the important info about why it’s being run (what event happened to which openvpn interface). Specifically, that the event is in the $script_type environment variable, and the interface in question is in $dev. The actual arguments to the script don’t really give you enough to tell what needs to be done. The arguments for an “up” event are identical to a “down” event.

With that learning in hand, I now felt like I could make a script that properly cleaned up after itself and didn’t cause a resource leak. But first, let’s take a look at how aswrt-merlin adds the selective routing policies from the admin panel. Cutting to the results of that investivation: If you run ip rule show on the router, you get something like the follwoing

$ ip rule show

# VPN Client off/no policy rules:
0:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default

# VPN Client on with some rules
0:	from all lookup local
10101:	from 192.168.1.3 lookup ovpnc1
10101:	from 192.168.1.192/26 lookup ovpnc1
10102:	from 10.8.0.0/16 lookup ovpnc1
32766:	from all lookup main
32767:	from all lookup default

That’s it, the policy rules state to use different routing tables depending on the source (or destination, if specified) address. We need to modify this to also use other match conditions. There are two steps here:

Identify all tunnel-traffic and mark it
Add an ip rule that matches marked traffic and routes it to main before it hits the VPN rules

To do this, I had the script (for “up” events on “tun11”) create a rule chain vpn-server-mark in the mangle table. You can’t use filter because you need to use the PREROUTING chain as a starting point.

iptables -t mangle -N vpn-server-mark
iptables -t mangle -A PREROUTING -j vpn-server-mark
iptables -t mangle -A vpn-server-mark -p tcp --sport 1194 -s 192.168.1.0/24 -j MARK --set-mark 0x8000/0x8000
iptables -t mangle -A vpn-server-mark -p udp --sport 1194 -s 192.168.1.0/24 -j MARK --set-mark 0x8000/0x8000

This sets bit 15 of the mark for any packets that

Come from our local network (you could remove the subnet check entirely) AND
Are either TCP or UDP port 1194 (the default that I’m using for the OpenVPN server)

The second point is to add a pair of rules for the VPN server with ip rule add

# Send marked packets from the VPN server to the WAN
ip rule add from 192.168.1.3 fwmark 0x8000/0x8000 table main   priority 1000

# Send unmarked packets from the vpn server to the WAN
ip rule add from 192.168.1.3 fwmark 0x0000/0x8000 table ovpnc1 priority 1001

This now checks both the IP and source port of packets to make sure that VPN-tunnel traffic is sent directly to the client, not through another VPN.

Make sure that you add a case to the script for the “down” event to undo all of this! The whole openvpn-event script:

if [ "$script_type" = "up" -a "$dev" = "tun11" ]; then
  echo "VPN Raised, setting routing rules"
  # Add ip routes for marked packets
  # Bit 15: Do Not Use VPN upstream
  ip rule add from 192.168.104.3 fwmark 0x8000/0x8000 table main   priority 1000
  ip rule add from 192.168.104.3 fwmark 0x0000/0x8000 table ovpnc1 priority 1001

  # Create rules to mark packets
  iptables -t mangle -N vpn-server-mark
  iptables -t mangle -A PREROUTING -j vpn-server-mark
  iptables -t mangle -A vpn-server-mark -p tcp --sport 1194 -s 192.168.1.0/24 -j MARK --set-mark 0x8000/0x8000
  iptables -t mangle -A vpn-server-mark -p udp --sport 1194 -s 192.168.1.0/24 -j MARK --set-mark 0x8000/0x8000
elif [ "$script_type" = "down" -a "$dev" = "tun11" ]; then
  echo "VPN Lowered, removing routing rules"
  # Teardown
  iptables -t mangle -D PREROUTING -j vpn-server-mark
  iptables -t mangle --flush vpn-server-mark
  iptables -t mangle -X vpn-server-mark
  ip rule del from 192.168.1.3 fwmark 0x8000/0x8000 table main   priority 1000
  ip rule del from 192.168.1.3 fwmark 0x0000/0x8000 table ovpnc1 priority 1001
fi

Once I tested everything to my satisfaction, I moved the VPN server to my PiHole, which necessitated the changing of a bunch of IP addresses in the scripts, and the addition of the following on the PiHole:

# Tell dnamasq to listen on the VPN tunnel, so it can serve DNS to VPN Clients
echo "interface=tun0" >> /etc/dnsmasq.d/02-ovpn.conf

That’s all, it seems to be working fine even after several VPN on/off cycles and moving between local connection and mobile data for en external connection. If it breaks badly again, I’ll be back with another post!

June 26, 2020June 26, 2020

OpenVPN Chain Configuration on ASUSWRT-Merlin

I had a hard time finding this information, so I’m aggregating it here. What you need:

ASUS Router with ASUSWRT-Merlin installed
SSH client you can access the router from
A device to test your VPN Server (e.g. mobile phone with a data connection)
An Upstream VPN service

Step 1: VPN Server

Log in to the router and go the the VPN page, then “VPN Server”. Configure the OpenVPN server, really however you want, just note the IP subnet. I will use 10.8.0.0/24 in this writing as the VPN subnet, just replace it with whatever you use.

Step 2: VPN Client

Go to the VPN Client tab and create a connection to your upstream VPN provider. Use their OpenVPN configuration file, username/password, etc. At the bottom, set the “Force Internet traffic through tunnel” to “Policy Rules”. This enables a table of routing rules. This table is evaluated in order, and items marked for VPN are sent to the VPN Client interface. The default is WAN.

Start with your most specific rules. For example, we need all traffic destined for a VPN client to go over the WAN, otherwise it will be sent through the Upstream VPN, and that means the client won’t see it properly as a response, because it will have a different IP. This rule should go BEFORE any catch-all “Send everything through the VPN” rules. So, you could have a table like the following:

Desc	Source	Dest	Interface
VPN Clients		10.8.0.0/24	WAN
Local	192.168.1.0/24		VPN
VPN	10.8.0.0/24		VPN

An example configuration

You can make this more complicated with some devices on your network using the Upstream VPN and some not using it. You can even follow this guide to have rules that make use of port numbers, mac addresses, or any other firewall matching parameters.

Now, if you test this right now, it won’t work. Why? Well the router knows how to route packets, but it doesn’t know to apply NAT. You need to modify the nat table POSTROUTING chain for that, and AFAIK, that can only be done from a script, not the UI.

Enable SSH on your router (Administration->System) and custom scripts (“Enable JFFS custom scripts and configs” : Yes).

Login via SSH. Go to /jffs/scripts and create the file firewall-start. Make it executable. If the script doesn’t exist, you can just run the following in your SSH session:

echo "#!/bin/sh" > /jffs/scripts/firewall-start
echo "iptables -D POSTROUTING -t nat -s $(nvram get vpn_server1_sn)/24 -o tun1+ -j MASQUERADE" >> /jffs/scripts/firewall-start
echo "ptables -I POSTROUTING -t nat -s $(nvram get vpn_server1_sn)/24 -o tun1+ -j MASQUERADE" >> /jffs/scripts/firewall-start

chmod +x /jffs/scripts/firewall-start

Now, you can restart your router to make it use the script, or just run the script now: /jffs/scripts/firewall-start. This will be run anytime the firewall is cleared and re-enabled (reboot or settings change). Look in the system log for messages about it after a reboot if you have issues.

These rules simply tell the router to perform NAT on any outgoing packets from the VPN clients which are destined for the VPN server(s). This way, the VPN server only sees your router as a single client, and nothing gets messed up.

I hope this helps someone. Send me any questions you may have!

February 20, 2019

FPGA NCL Ring

After several hours of getting oscillatory behavior from the register ring with the resets tied to switches, I decided to try a different approach. I had been avoiding using the processing system in the design because I was working at a bench and I assumed that it would be simpler to only deal with the hardware. As part of a class at my university, I was setting up a set of remote-access systems for FPGA development, and I thought I’d try something interesting.

By using a GPIO in the design, I was able to avoid switch bouncing entirely, which seems to have helped. I built a 4-stage ring with a shared reset and an input to force a DATA0 or DATA1 out of one stage (OR with the corresponding signal between the registers).

The GPIO was connected to these 3 control signals, and the outputs of each stage (8 inputs). Setting the reset should bring all 4 registers to NULL, and forcing a DATA value should cause it to propagate to all but one register. The first register to receive that data signal will transition to requesting NULL, and therefore the register before it (the one who’s outputs are being overridden) will remain at null. Once the force_data# line is cleared, register 2 receives NULL and requests data, and so on as the wavefronts cycle.

For more information on register rings, see this post.

The synchronous portion of the design interfaces to this using a GPIO module on a 100 MHz clock. This is almost certainly slower than the NCL portion of the design will operate, but by carefully measuring the system under reset states, or while a force_data# line is active, certain conditions can be tested. The synchronous part of the design has a lot of platform-specific blocks, but can be simplified to a processor connected to a General-Purpose Input-Output module. The code I used to test the module from the processor is given below (the struct is just used as a way to name bits in the GPIO’s registers).

struct gpio_pins {
    union {
        struct {
            u8 s00 :1;
            u8 s01 :1;
            u8 s10 :1;
            u8 s11 :1;
            u8 s20 :1;
            u8 s21 :1;
            u8 s30 :1;
            u8 s31 :1;
            u32 :24;
        };
        u32 values;
    };
    u32 :32;
    struct {
        u8 rst :1;
        u8 force_data0 :1;
        u8 force_data1 :1;
        u32 :29;
    };
};

int main() {
    // Make a pointer to the pins, using the bitfield to interpret them.
    volatile struct gpio_pins* pins =
            (volatile struct gpio_pins*) (XPAR_AXI_GPIO_0_BASEADDR);
    struct gpio_pins reading0 = *pins;
    pins->force_data0 = 0;
    pins->force_data1 = 0;
    pins->rst = 1;
    struct gpio_pins reading1 = *pins;
    pins->rst = 0;
    struct gpio_pins reading2 = *pins;
    struct gpio_pins reading3 = *pins;
    int a;
    for (a = 0; a < 100; a++) {
        reading3.values |= pins->values;
    }
    if (reading3.values) {
        while (1)
            ;
    }
    pins->force_data0 = 1;
    struct gpio_pins reading4 = *pins;
    for (a = 0; a < 100; a++) {
        if (pins->values != 0x51) {
            while (1)
                ;
        }
    }
    pins->rst = 1;
    pins->force_data0 = 0;
    pins->force_data1 = 1;
    pins->rst = 0;
    struct gpio_pins reading5 = *pins;
    for (a = 0; a < 100; a++) {
        if (pins->values != 0xA2) {
            while (1)
                ;
        }
    }
    pins->force_data1 = 0;
    for (a = 0; a < 1000000; a++) {
        // DATA0 lines should not get set
        if ((pins->values &0x55)) {
            while (1)
                ;
        }
        // at least one DATA1 line should be set
        if ((pins->values &0xAA) == 0) {
            while (1)
                ;
        }
        // at least one DATA1 line should not be set
        if ((pins->values & 0xAA) == 0xAA) {
            while (1)
                ;
        }
    }
    pins->rst = 1;
    pins->force_data0 = 1;
    pins->rst = 0;
    pins->force_data0 = 0;
    for (a = 0; a < 1000000; a++) {
        // DATA1 lines should not get set
        if ((pins->values & 0xAA)) {
            while (1)
                ;
        }
        // at least one DATA0 line should be set
        if ((pins->values &0x55) == 0) {
            while (1)
                ;
        }
        // at least one DATA0 line should not be set
        if ((pins->values & 0x55) == 0x55) {
            while (1)
                ;
        }
    }
    pins->rst = 1;
    return 0;
}

The application first runs static tests, checking that when in reset, all stages output 0 and that when a data value is forced, then the data propagates, but does not fill the ring entirely. The last two loops are testing for the dynamic behavior of the ring. I couldn’t hope to actually measure movement of the waves, so just check that at any given time it is in a valid state, when DATA0 was forced, then no DATA1 lines were asserted and vice versa, and that there was a DATA wave and a NULL wave at all times (as measured). The loops take a couple seconds to run, so I feel confident that any errors would have been caught. I put breakpoints in the while (1) loops, which were not hit, and the breakpoint at the last pins->rst = 1; line always ran. Also if a wave ever disappeared, it shouldn’t have ever come back, so I’m confident that this worked as I wanted it to.

Manual inspection of the values (at low frequency using the debugger reading the GPIO pins) showed that the DATA0/DATA1 lines were behaving as expected as well.

What’s next? I want to do the same experiment with multiple data wavefronts (at least 5 stages) and then start building some simple state machines. After that, I want to look at methods of introducing timing into the system by using a clock or GPIO enable as a data input, with a TH22 gate used to wait for the input to match the wavefront.

February 17, 2019

More NCL on an FPGA

I have been running some tests with registers on an FPGA, specifically attempting to create a register ring and see what happens. Unfortunately, I have been getting an inverter loop where not desired (when the registers are released from reset, data lines start flipping). I did not get this behavior without the ring, so I think it’s more or less a delay problem in my design. Additionally, the simulator will not launch for me, which I am working on resolving. The up-side is that I have created several Vivado blocks, which are at least close to working (they appear to work when not in a ring). I believe that once I get the register ring working, I will be able to start moving on the larger designs (finite state machines, etc).

January 24, 2019

First NCL FPGA Test

Last week, I took the NCL 2-rail register and tweaked it to work with the FPGA development environment I have access to at my university. Specifically, I put it into Xilinx Vivado 2018.3 as a RTL module in a block design.

Setup

I used a single 2-rail bit configuration and started with just one register. I tied the inputs to switches and the outputs to LEDs. One thing to note is that the from_next signal used to control the forward propagation of wavefronts was attached to the switch through an RC lowpass filter to de-bounce it.

I had to make some changes to the register component due to Vivado’s restrictions. I wanted to use a block design, so I had to make all the ports std_logic or std_logic_vector. I had been using a custom record type (ncl_pair) to group the DATA0 and DATA1 lines. For now, I just split them apart into two std_logic_vector(0 to 0) lines.

Aside on switch bouncing: Normally when a switch is flipped, there is a small amount of ‘bounce’ where the contact opens and closes rapidly for a short time (as the contacts get very close). By running the output of the switch through a lowpass filter, the fast on/off cycles are somewhat flattened, so that the output is only rising, or only falling when the switch is flipped. Otherwise, the register would have tried to pass several wavefronts before I could see it. For anyone duplicating this, I used a time constant of about 50 ms.

Testing

The register behaved as I expected it to from simulation. initially, the data outputs were all off (NULL) with the to_prev signal asserted (requesting data). When I activated one of the switches (the bounce would have technically made it toggle between DATA to NULL for a bit, but when NULL, all inputs would be NULL, so it didn’t break anything). As I was still requesting NULL from the register (the from_next switch was left de-asserted) the output from the register was still NULL and it was requesting DATA. When I toggled the from_next input to the register to a request for DATA, it outputted the value I had inputted (DATA0 or DATA1 depending on the test case) and switched to requesting NULL. I de-asserted the data line input and set the from_next request to NULL causing the outputs to switch to NULL.

Extension

With this success, I wanted to see if a slightly more complex test would work as well. I added 4 more registers (5 total). The design was still using one 2-rail bit, but now had the capacity to ‘store’ some wavefronts internally. Remember that a sequence of NCL registers can pass a NULL/DATA wave all the way through or can have several stored (approximately N/2) while they wait to pass them on. This held up in testing.

All the same inputs/outputs were used, but the from_next switch was going into the last register, the to_prev LED cam out of the first register, and the data inputs/outputs were connected to the first/last register. I used the following test pattern with NULL waves between each:

DATA0, DATA1, DATA0, DATA0, DATA1, DATA1

I ran this several times with different orders of adding and removing data from the pipeline. E.g. the first time, I kept adding DATA/NULL at the input as long as the to_prev line kept toggling and removed DATA by toggling the from_next line when the pipeline was full. In the next run, I first added 2 DATA waves, then extracted 2 DATA waves, and so on.

I was able to get the same behavior I saw in simulation, which was very exciting.

Moving Forward

This test has given me some confidence that I can successfully map the NCL components to an FPGA. I plan to move forward by building a Vivado IP library. As I go, I will be making a VHDL library for 2-rail NCL. During my long break from working on this, I took a course that gave me experience with these tools designing hardware modules and using FPGAs to implement systems. During that class, I learned that implementing components structurally has its place, but is often a very slow way to design. I will be working on making a more expressive VHDL library than what I have been working with in previous posts (entirely structural). I will be discussing this more in future posts.

To summarize, I have two sub-projects to work on that are highly connected:

Building an expressive VHDL library for Null Convention Logic
Creating a Vivado IP Library for the components from (1).

The git repository is currently in a somewhat inconsistent state as I am re-doing the library and many components need to be re-done.