Routing table behaviour coupled with bound sockets - unexpected behaviour

Question

Problem

I am experiencing some rather odd behavior where a seemingly unrelated default gateway route is having unexpected side-effects. I managed to replicate this issue with a minimal example. The aim here is mostly educational and I stumbled upon this while experimenting with a more complex scenario. In short, I am managing to connect to a web server on 192.168.0.3 when I believe I should not.

My laptop is connected to my home network using WiFi (192.168.0.0/24 network). The routing table is listed below:

kevin@kevin-UX305LA:~$ ip route
default via 192.168.0.1 dev wlp2s0 proto dhcp metric 600 
169.254.0.0/16 dev wlp2s0 scope link metric 1000 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
192.168.0.0/24 dev wlp2s0 proto kernel scope link src 192.168.0.210 metric 600

Both curl 192.168.0.3 and curl --interface wlp2s0 192.168.0.3 currently work and give the following response:

<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/1.25.2</center>
</body>
</html>

Now, I proceed to remove all the routes related to the 192.168.0.0/24 network such that the remaining routes are:

kevin@kevin-UX305LA:~$ sudo ip route del default via 192.168.0.1 
kevin@kevin-UX305LA:~$ sudo ip route del 192.168.0.0/24 
kevin@kevin-UX305LA:~$ ip route
169.254.0.0/16 dev wlp2s0 scope link metric 1000 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown

Additionally, ip route get with and without binding to an interface are shown below:

kevin@kevin-UX305LA:~$ ip route get 192.168.0.3
RTNETLINK answers: Network is unreachable
kevin@kevin-UX305LA:~$ ip route get oif wlp2s0 192.168.0.3
192.168.0.3 dev wlp2s0 src 192.168.0.210 uid 1000 
    cache

Running curl 192.168.0.3 gives curl: (7) Couldn't connect to server and running curl --interface wlp2s0 192.168.0.3 gives nothing (curl is blocked). Here's a snippet of what strace shows:

setsockopt(5, SOL_SOCKET, SO_BINDTODEVICE, "wlp2s0\0", 7) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.168.0.3")}, 16) = -1 EINPROGRESS (Operation now in progress)

This is fine and what I was expecting, i.e., laptop shouldn't be able to reach 192.168.0.3.

Now here is the strange part. If I add a dummy default gateway (say the docker0 interface via a random address) such that my routing table is as follows:

kevin@kevin-UX305LA:~$ sudo ip route add default via 172.17.0.2
kevin@kevin-UX305LA:~$ ip route
default via 172.17.0.2 dev docker0 linkdown 
169.254.0.0/16 dev wlp2s0 scope link metric 1000 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown

Running curl 192.168.0.3 fails but running curl --interface wlp2s0 192.168.0.3 succeeds with the previous HTML reply.

kevin@kevin-UX305LA:~$ curl 192.168.0.3
curl: (7) Failed to connect to 192.168.0.3 port 80 after 3068 ms: No route to host
kevin@kevin-UX305LA:~$ ip route get 192.168.0.3
192.168.0.3 via 172.17.0.2 dev docker0 src 172.17.0.1 uid 1000 
    cache
kevin@kevin-UX305LA:~$ curl --interface wlp2s0 192.168.0.3
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/1.25.2</center>
</body>
</html>
kevin@kevin-UX305LA:~$ ip route get oif wlp2s0 192.168.0.3
192.168.0.3 dev wlp2s0 src 192.168.0.210 uid 1000 
    cache

I researched aboutSO_BINDTODEVICE and found out that it still should be obeying the routing table. Why does the addition of a default gateway through a different interface via a random address make curl --interface wlp2s0 192.168.0.3 succeed?

Address Details

The following command lists the addresses associated with the interfaces. I have checked that after each command executed above, the result of this command is always the same.

kevin@kevin-UX305LA:~$ ip -br addr
lo               UNKNOWN        127.0.0.1/8 ::1/128 
wlp2s0           UP             192.168.0.210/24 fe80::7a03:3420:b8b0:4db7/64 
docker0          DOWN           172.17.0.1/16

In summary 192.168.0.210 is this laptop, 192.168.0.3 is another machine on the network hosting a web server and 192.168.0.1 is the default gateway (my router).

Environment

I am running Linux Mint. Here is some info.

kevin@kevin-UX305LA:~$ uname -a
Linux kevin-UX305LA 5.15.0-84-generic #93-Ubuntu SMP Tue Sep 5 17:16:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
kevin@kevin-UX305LA:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Linuxmint
Description:    Linux Mint 21.2
Release:    21.2
Codename:   victoria

A.B · Accepted Answer · 2023-10-07T11:01:56.713

When binding to the interface, unless adequate routes are found on this interface and then reused (for their gateway), the packet is always emitted to the interface even without proper route. So this only matters when there's no such default route already present.

Note: the lack of gateway can cause problems with layer 2 interfaces (eg: trying to reach 8.8.8.8 would send an ARP request for 8.8.8.8. It wouldn't cause problems with layer 3 interfaces such as IP tunnels which don't need a gateway at all. Since in this question the destination is in the LAN anyway and thus doesn't require a gateway, this doesn't cause problems either.

The effect of the curl commands can be verified by querying the kernel for its route outcome like below using ip route get:

$ ip route get 192.168.0.3
RTNETLINK answers: Network is unreachable
# ip route get oif wlp2s0 192.168.0.3
192.168.0.3 dev wlp2s0 src 192.168.0.210 uid 0 
    cache 
$ ip route get oif wlp2s0 8.8.8.8
8.8.8.8 dev wlp2s0 src 192.168.0.210 uid 1000 
    cache

which resolves fine to reach 192.168.0.3.

To prevent this, one can add a policy rule that will select an alternate route only when binding to the interface with the oif selector. Normally that would be used to provide a route with an adequate gateway to fix the problem described above with a lack of gateway, for example like this:

ip route add default via 192.168.0.1 dev wlp2s0 onlink table 1000
ip rule add oif wlp2s0 lookup table 1000

But for this precise case, the goal is to prevent such kind of route to exist, which is more difficult, because oif wlp2s0 keeps only routes having dev wlp2s0 in them: simply adding a blackhole or unreachable route can't also include dev wlp2s0 so it will always get ignored. Instead this requires a bogus route that will make the final result fail. Choosing one own's address as a gateway is the same as choosing no gateway: won't work either. So this require to pick an arbitrary address in the LAN that is guaranteed to not exist (to prevent it to do an ICMP redirect should it also be a router). Let's assume 192.168.0.4 doesn't exist and will be reserved for this use:

# ip route add default via 192.168.0.4 onlink dev wlp2s0 table 1000
# ip rule add oif wlp2s0 lookup 1000

which now gets:

$ ip route get oif wlp2s0 192.168.0.3
192.168.0.3 via 192.168.0.4 dev wlp2s0 table 1000 src 192.168.0.210 uid 1000 
    cache

which will trigger an ARP request for 192.168.0.4 and fail ~ 3 seconds later (the standard ARP maximum delay for such attempt) with an EHOSTUNREACH (No route to host):

curl: (7) Failed to connect to 192.168.0.3 port 80 after 3071 ms: Couldn't connect to server

Note that the arbitrary IP address doesn't even have to be within 192.168.0.0/24. This would have worked the same, as long as the ARP request fails in the end:

ip route add default via 192.0.2.2 onlink dev wlp2s0 table 1000

Note: the same trick wouldn't work with a layer 3 interface (eg: WireGuard or OpenVPN in tun mode) since there's no concept of gateway on it.

UPDATE

Solving OP's problem

I didn't address before the problem OP had: connectivity was failing rather than succeeding, and that adding a default route elsewhere made the attempt work.

The reason is that for now only half the communication was checked: from host to server, which could successfully emit a packet, not return traffic from server to host which has not been considered.

It was also discovered (through chat) that OP is using these rp_filter settings:

$ sysctl net.ipv4.conf.all.rp_filter
net.ipv4.conf.all.rp_filter = 2
$ sysctl net.ipv4.conf.wlp2s0.rp_filter
net.ipv4.conf.wlp2s0.rp_filter = 2

which sets wlp2s0 interfaces into Loose Reverse Path Forwarding mode defined in RFC 3704.

2.4. Loose Reverse Path Forwarding

Loose Reverse Path Forwarding (Loose RPF) is algorithmically similar to strict RPF, but differs in that it checks only for the existence of a route (even a default route, if applicable), not where the route points to.

Reverse Path Forwarding (RPF) is checking the reverse path: the return traffic from server to host, is compatible with the "normal" path, ie the route from host to server.

So when there's a default route (which is the most common scenario) Loose RPF actually always succeeds. Without the presence of a default route nor any route to the target, it will fail. Forcing an interface always succeeds for outgoing traffic, but this doesn't change the reverse traffic received from server to host. The reception of such ingress traffic is independent of the fact that the egress traffic was forced to an interface and cannot account for such case. Again this can be checked with ip route get when there's no default route anywhere. Starting from OP's case without added rule:

# ip route get from 192.168.0.3 iif wlp2s0 to 192.168.0.210
RTNETLINK answers: Invalid cross-device link

Invalid cross-device link is the error chosen by the network stack to tell RPF failed: such path is invalid, meaning what should be the return traffic from the server. Having the implicit default selected when binding to an interface doesn't affect ingress traffic, so doesn't change this result: timeout (but not due to ARP, so not limited to 3s).

The same check with rp_filter=0:

# sysctl -w net.ipv4.conf.all.rp_filter=0
net.ipv4.conf.all.rp_filter = 0
# sysctl -w net.ipv4.conf.all.rp_filter=0
net.ipv4.conf.all.rp_filter = 0
# ip route get from 192.168.0.3 iif wlp2s0 to 192.168.0.210
local 192.168.0.210 from 192.168.0.3 dev lo table local 
    cache <local> iif wlp2s0

works fine since the check is not done anymore.

Else, reverting back to rp_filter=2, doing this:

# ip route add 192.168.0.3/32 via 172.17.0.2
# ip route get from 192.168.0.3 iif wlp2s0 to 192.168.0.210
local 192.168.0.210 from 192.168.0.3 dev lo table local 
    cache <local> iif wlp2s0

also makes it work.

Merely adding a wrong route for 192.168.0.3 makes it pass the algorithm for Loose RPF. Of course adding a default route would has the same effect: as long as there's a route for 192.168.0.3, the Loose RPF will pass.

So either disable the rp_filter check completely as done above, or have any route for 192.168.0.3 wherever it is, including a default route using an other interface.

Routing table behaviour coupled with bound sockets - unexpected behaviour

Problem

Address Details

Environment

1 Answers1

Solving OP's problem