Updated:

This page contains notes from my various tests and experiments. It is a raw record of what I did, without correction for errors, or later update for things that I learn. Use at your own risk.

Persisting a Docker bridge

Since the robot will be running from a Docker-compatible bridge, ideally that bridge would exist even without Docker running. Let me test how to do that.

How to manage persistent connections

First question, what mechanism is actually being used on my computers to manage the network? I’ll check two computers: my desktop running Ubuntu 20.04, and an AWS remote running Ubuntu 20.04 minimal.

What does systemd see? Desktop:

kent@ubutower:/etc/NetworkManager$ systemctl list-unit-files | grep etwork
network-manager.service                    enabled         enabled      
networkd-dispatcher.service                enabled         enabled      
NetworkManager-dispatcher.service          enabled         enabled      
NetworkManager-wait-online.service         enabled         enabled      
NetworkManager.service                     enabled         enabled      
systemd-network-generator.service          disabled        enabled      
systemd-networkd-wait-online.service       enabled-runtime enabled      
systemd-networkd.service                   enabled-runtime enabled      
systemd-networkd.socket                    disabled        enabled      
network-online.target                      static          enabled      
network-pre.target                         static          disabled     
network.target                             static          disabled     
kent@ubutower:/etc/NetworkManager$ 

AWS:

ubuntu@openvpn:/etc/netplan$ systemctl list-unit-files | grep etwork
network-manager.service                        enabled         enabled      
networkd-dispatcher.service                    enabled         enabled      
NetworkManager-dispatcher.service              enabled         enabled      
NetworkManager-wait-online.service             enabled         enabled      
NetworkManager.service                         enabled         enabled      
systemd-network-generator.service              disabled        enabled      
systemd-networkd-wait-online.service           enabled         enabled      
systemd-networkd.service                       enabled         enabled      
systemd-networkd.socket                        enabled         enabled      
network-online.target                          static          enabled      
network-pre.target                             static          disabled     
network.target                                 static          disabled

That did not help, virtually the same. I’m pretty sure that both use netplan. Checking its config.

Desktop:

kent@ubutower:/etc/netplan$ ls
01-network-manager-all.yaml

AWS:

ubuntu@openvpn:/etc/netplan$ ls
50-cloud-init.yaml

So I am guessing that while the AWS system has NetworkManager installed, it is not actually using it. Let me try to confirm that by trying a NetManager config. First, show NetworkManager at work like this:

kent@ubutower:/~$ sudo nmcli con add type bridge ifname brtest
Connection 'bridge-brtest' (6680a86e-3138-4421-955c-92212d5671aa) successfully added.

ip a shows the bridge. Reboot. After reboot, the bridge is recreated:

kent@ubutower:~$ ip a show brtest
3: brtest: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 72:64:de:24:af:9c brd ff:ff:ff:ff:ff:ff

Try the same thing on my AWS Ubuntu instance.

ubuntu@openvpn:~$ sudo nmcli con add type bridge ifname brtest
Connection 'bridge-brtest' (7f364d9c-1ee6-4bc1-a32d-00919e959d52) successfully added.

This time, unlike the desktop, ip a does not show the connection. I can see that NetworkManager knows about it:

ubuntu@openvpn:~$ sudo nmcli c
NAME           UUID                                  TYPE    DEVICE 
bridge-brtest  7f364d9c-1ee6-4bc1-a32d-00919e959d52  bridge  --  

but trying to bring it up gives me:

ubuntu@openvpn:~$ sudo nmcli c up bridge-brtest
Error: Connection activation failed: Activation failed because the device is unmanaged

So the conclusion I reach is this: Using a netplan hook is the way to persist a connection on both my Desktop as well as on an AWS clould instance

Use sudo nmcli c delete bridge-brtest to delete the bridge.

Netplan bridge survival with Docker

The general plan for OpenVPN networking with ROS2 is to use a Docker bridge as the –net type, then add the OpenVPN tap0 link to that bridge. I need to see how Docker interacts with all of that. I recall reading somewhere that if Docker sees a bridge already in existence it will not recreate it. So the general plan is to persistently create the bridge, then link it to Docker.

I’m going to create a bridge using netplan that adds the same bridge that Docker would create, so that it works with Docker unloaded. Create a file /etc/netplan/90-rosbridge.yaml with this content, which I adapted from here:

network:
  version: 2
  renderer: networkd
  bridges:
    rosbridge:
      addresses: [10.231.168.1/20]
      mtu: 1500
      nameservers:
        addresses: [8.8.8.8, 8.8.4.4]
      parameters:
        stp: false
      dhcp4: false
      dhcp6: false

Use sudo netplan apply to get netplan to act on the change. That created the bridge:

kent@ubutower:/etc/netplan$ ip a show rosbridge
4: rosbridge: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether e2:47:07:91:c5:36 brd ff:ff:ff:ff:ff:ff
    inet 10.231.168.1/20 brd 10.231.175.255 scope global rosbridge
       valid_lft forever preferred_lft forever

Now use docker to create the same bridge.

kent@ubutower:/etc/netplan$ docker network create --driver=bridge \
> --subnet=10.231.160.0/20 \
> --gateway=10.231.168.1 \
> --ip-range=10.231.168.128/29 \
> --opt com.docker.network.bridge.name=rosbridge \
> rosbridge

That worked. Check that it works:

kent@ubutower:~$ docker run -it --rm --net rosbridge busybox
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
8: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue 
    link/ether 02:42:0a:e7:a8:80 brd ff:ff:ff:ff:ff:ff
    inet 10.231.168.128/20 brd 10.231.175.255 scope global eth0
       valid_lft forever preferred_lft forever
/ # 

Yes. Now, I’m going to reboot with Docker disabled … OK I’m back. ip a shows the rosbridge, but I have a new problem: routes are messed up.

kent@ubutower:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.231.160.0    0.0.0.0         255.255.240.0   U     0      0        0 rosbridge
kent@ubutower:~$ ping 8.8.8.8
ping: connect: Network is unreachable

That’s bad. Removing the new file from /etc/netplan then sudo netplan apply did not fix the problem. ip a shows that the ethernet link is down. sudo ip link set up eno1 does not help. Hardware issue? Let me power down … Yep that fixed it. Weird. Add back the rosbridge create file to /etc/netplan, and reboot again.

Problem reappeared. Removed /etc/netplan rosbridge file, rebooted, this time ethernet came back without a power down. I tried a few more times, when the new /etc/netplan file is there, ethernet fails on reboot.

On a hunch, I remove the renderer line from the /etc/netplan/ file. sudo netplan apply still restores the bridge. How about on reboot? … That worked! So apparently, the renderer line in my first attempt:

network:
  version: 2
  renderer: networkd

removed NetworkManager from the startup. Don’t want that line!

Back to docker. Use systemctl to start it. I can do docker run –net rosbridge … successfully.

Now on the OpenVPN. What I need to do is to add the tap0 device to rosbridge when OpenVPN is connected. I got my AWS openvpn gateway up, so the tap0 interface exists on desktop and remote, and I can ping the remote from the desktop:

kent@ubutower:~$ ip a show dev tap0
7: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 100
    link/ether 1a:a0:1b:22:8e:3e brd ff:ff:ff:ff:ff:ff
    inet 10.231.160.2/20 brd 10.231.175.255 scope global tap0
       valid_lft forever preferred_lft forever
    inet6 fe80::18a0:1bff:fe22:8e3e/64 scope link 
       valid_lft forever preferred_lft forever

kent@ubutower:~$ ping 10.231.161.1
PING 10.231.161.1 (10.231.161.1) 56(84) bytes of data.
64 bytes from 10.231.161.1: icmp_seq=1 ttl=64 time=11.6 ms
64 bytes from 10.231.161.1: icmp_seq=2 ttl=64 time=12.1 ms

Now I want to move the tap0 device under the docker bridge.

kent@ubutower:~$ sudo ip link set master rosbridge dev tap0^C
kent@ubutower:~$ ip a show dev tap0
7: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master rosbridge state UNKNOWN group default qlen 100
    link/ether 1a:a0:1b:22:8e:3e brd ff:ff:ff:ff:ff:ff
    inet 10.231.160.2/20 brd 10.231.175.255 scope global tap0
       valid_lft forever preferred_lft forever
    inet6 fe80::18a0:1bff:fe22:8e3e/64 scope link 
       valid_lft forever preferred_lft forever

Note the ‘master rosbridge’ under tap0. Now from the remote system, I can ping Docker bridge:

ubuntu@openvpn:~$ ping 10.231.168.1
PING 10.231.168.1 (10.231.168.1) 56(84) bytes of data.
64 bytes from 10.231.168.1: icmp_seq=1 ttl=64 time=23.1 ms
64 bytes from 10.231.168.1: icmp_seq=2 ttl=64 time=11.8 ms

I created a docker container on the desktop:

docker run -it --rm --net rosbridge busybox

and tried to ping the remote. Did not work.Checking the routing table:

kent@ubutower:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.11    0.0.0.0         UG    100    0        0 eno1
10.231.160.0    0.0.0.0         255.255.240.0   U     0      0        0 tap0
10.231.160.0    0.0.0.0         255.255.240.0   U     425    0        0 rosbridge
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 rosbridge
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 eno1

we still have the tap0 route. Remove it:

kent@ubutower:~$ sudo ip route del 10.231.160.0/20 dev tap0
kent@ubutower:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.11    0.0.0.0         UG    100    0        0 eno1
10.231.160.0    0.0.0.0         255.255.240.0   U     425    0        0 rosbridge
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 rosbridge
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 eno1

and now the ping works:

kent@ubutower:~$ docker run -it --rm --net rosbridge busybox
/ # ping 10.231.161.1
PING 10.231.161.1 (10.231.161.1): 56 data bytes
64 bytes from 10.231.161.1: seq=0 ttl=64 time=20.824 ms
64 bytes from 10.231.161.1: seq=1 ttl=64 time=10.746 ms

Checking the OpenVPN manual I tried adding these commands to the .ovpn file used to start openvpn on the desktop (though not sure of the syntax):

ifconfig-noexec
up 'ip route del 10.231.160.0/20 dev tap0 && ip link set tap0 up'

Tried that, grrr, *Options error: –up script fails with ‘ip’: No such file or directory (errno=2) *. OK:

ifconfig-noexec
up '/usr/sbin/ip route del 10.231.160.0/20 dev tap0 && /usr/sbin/ip link set tap0 up'

Grrr, Fri Oct 29 11:40:20 2021 WARNING: External program may not be called unless ‘–script-security 2’ or higher is enabled. See –help text or man page for detailed info. OK again:

script-security 2
ifconfig-noexec
up 'ip route del 10.231.160.0/20 dev tap0 && ip link set tap0 up'

Grrr, Error: either “to” is duplicate, or “&&” is a garbage. OK, let’s let bash run it:

script-security 2
ifconfig-noexec
up "/bin/bash -c 'route del 10.231.160.0/20 dev tap0 && ip link set tap0 up'"

Still an error, wrong command. Again:

script-security 2
ifconfig-noexec
up "/bin/bash -c 'ip link set master rosbridge dev tap0 && ip link set tap0 up'"

and this works, a docker with –net rosbridge on the desktop can ping the remote tap0 interface.