Content-type: text/html; charset=UTF-8
passt (Plug A Simple Socket Transport) provides full, quasi-native network connectivity to virtual machines in user-mode without requiring any capabilities or privileges.
The data plane implements a translation layer between a Layer-2 virtual network interface and native Layer-4 (TCP, UDP, ping) sockets on the host, giving the illusion that application processes residing on the guest are running on the local host, from a networking perspective.
Built-in ARP, DHCP, NDP, and DHCPv6 implementations are designed to provide the guest with a network configuration that tightly resembles the host native configuration. With the default options, guest and host share IP addresses, routes, and port bindings.
Port forwarding and translation allow networking services running in the guest to be reachable from both local and remote hosts.
Unlike slirp4netns(1), passt doesn't implement a full TCP stack: the TCP translation layer has no stateful data buffering and operates by reflecting one peer's observed parameters (congestion window size, acknowledged data, etc.) to the corresponding peer.
Currently, the only supported hypervisor is qemu(1), connecting to passt by means of a UNIX domain socket. This is supported starting from qemu 7.2. For older qemu versions, see the qrap(1) wrapper.
pasta (Pack A Subtle Tap Abstraction) provides equivalent functionality to network namespaces, as the one offered by passt for virtual machines.
If PID or --netns are given, pasta associates to an existing user and network namespace. Otherwise, pasta creates a new user and network namespace, and spawns the given command or a default shell within this context. A tap device within the network namespace is created to provide network connectivity.
For local TCP and UDP traffic only, pasta also implements a bypass path directly mapping Layer-4 sockets between init and target namespaces, for performance reasons.
Unless otherwise noted below, if conflicting or multiple options are given, the last one takes effect.
Note that this configuration option is deprecated and will be removed in a future version.
Specifying this option multiple times does not lead to multiple log files: the last given option takes effect.
Specifying this option multiple times does not lead to multiple capture files: the last given option takes effect.
Note: these addresses are also used as source address for packets directed to the guest or to the target namespace having a loopback or local source address, to allow mapping of local traffic to guest and target namespace. See the NOTES below for more details about this mechanism.
If no interfaces are given, the interface with the first default routes for each IP version is selected. If no default routes are available and there is just one interface with any route, that interface will be chosen instead.
By default, the interface given by the default route is selected. If no default routes are available and there is just one interface with any route, that interface will be chosen instead.
By default, the interface given by the default route is selected. If no default routes are available and there is just one interface with any route, that interface will be chosen instead.
If addr is 'none', no address is mapped (this implies --no-map-gw). Only one IPv4 and one IPv6 address can be translated, if the option is specified multiple times, the last one takes effect.
Default is to translate the guest's default gateway address, unless --no-map-gw is given, in which case no address is mapped.
If addr is 'none', no address is mapped. Only one IPv4 and one IPv6 address can be translated, and if the option is specified multiple times, the last one for each address type takes effect.
Default is no mapping.
This option implies the behaviour described for --one-off, once this socket is closed.
Specifying excluded ranges only implies that all other ports are forwarded. In this case, no failures are reported for unavailable ports, unless no ports could be forwarded at all.
Examples:
Default is none.
Note: unless overridden, UDP ports with numbers corresponding to forwarded TCP port numbers are forwarded too, without, however, any port translation. IPv6 bound ports are also forwarded for IPv4.
Default is none.
Specifying excluded ranges only implies that all other ports are forwarded. In this case, no failures are reported for unavailable ports, unless no ports could be forwarded at all.
Examples:
IPv6 bound ports are also forwarded for IPv4.
Default is auto.
Note: unless overridden, UDP ports with numbers corresponding to forwarded TCP port numbers are forwarded too, without, however, any port translation.
IPv6 bound ports are also forwarded for IPv4.
Default is auto.
Default is auto.
Default is auto.
This option can't be specified with a PID.
Without this option, pasta will terminate if the target network namespace is bound to the filesystem, and the given path is deleted, or if the target network namespace is represented by a procfs entry, and that entry is deleted, representing the fact that a process with the given PID terminated.
Default is to copy all the routing entries from the interface in the outer namespace to the target namespace, translating the output interface attribute to the outbound interface in the namespace.
Note that this configuration option is deprecated and will be removed in a future version. It is not expected to be of any use, and it simply reflects a legacy behaviour. If you have any use for this, refer to REPORTING BUGS below.
Default is to copy all the addresses, except for link-local ones, from the interface from the outer namespace to the target namespace.
Note that this configuration option is deprecated and will be removed in a future version. It is not expected to be of any use, and it simply reflects a legacy behaviour. If you have any use for this, refer to REPORTING BUGS below.
Default is to let the tap driver build a pseudorandom hardware address.
$ iperf3 -s -D $ ./pasta Outbound interface: eth0, namespace interface: eth0 ARP: address: 28:16:ad:39:a9:ea DHCP: assign: 192.168.1.118 mask: 255.255.255.0 router: 192.168.1.1 NDP/DHCPv6: assign: 2a02:6d40:3ca5:2001:b81d:fa4a:8cdd:cf17 router: fe80::62e3:27ff:fe33:2b01 # # dhclient -4 --no-pid # dhclient -6 --no-pid # ip address show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UNKNOWN group default qlen 1000 link/ether 5e:90:02:eb:b0:2a brd ff:ff:ff:ff:ff:ff inet 192.168.1.118/24 brd 192.168.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 2a02:6d40:3ca5:2001:b81d:fa4a:8cdd:cf17/128 scope global valid_lft forever preferred_lft forever inet6 2a02:6d40:3ca5:2001:5c90:2ff:feeb:b02a/64 scope global dynamic mngtmpaddr valid_lft 3591sec preferred_lft 3591sec inet6 fe80::5c90:2ff:feeb:b02a/64 scope link valid_lft forever preferred_lft forever # ip route show default via 192.168.1.1 dev eth0 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.118 # ip -6 route show 2a02:6d40:3ca5:2001:b81d:fa4a:8cdd:cf17 dev eth0 proto kernel metric 256 pref medium 2a02:6d40:3ca5:2001::/64 dev eth0 proto kernel metric 256 expires 3584sec pref medium fe80::/64 dev eth0 proto kernel metric 256 pref medium default via fe80::62e3:27ff:fe33:2b01 dev eth0 proto ra metric 1024 expires 3584sec pref medium # iperf3 -c 127.0.0.1 -t1 Connecting to host 127.0.0.1, port 5201 [ 5] local 127.0.0.1 port 51938 connected to 127.0.0.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 4.46 GBytes 38.3 Gbits/sec 0 3.93 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-1.00 sec 4.46 GBytes 38.3 Gbits/sec 0 sender [ 5] 0.00-1.41 sec 4.45 GBytes 27.1 Gbits/sec receiver iperf Done. # iperf3 -c ::1 -t1 Connecting to host ::1, port 5201 [ 5] local ::1 port 50108 connected to ::1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 4.35 GBytes 37.4 Gbits/sec 0 4.99 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-1.00 sec 4.35 GBytes 37.4 Gbits/sec 0 sender [ 5] 0.00-1.41 sec 4.35 GBytes 26.4 Gbits/sec receiver iperf Done. # ping -c1 -4 spaghetti.pizza PING spaghetti.pizza (172.67.192.217) 56(84) bytes of data. 64 bytes from 172.67.192.217: icmp_seq=1 ttl=255 time=37.3 ms --- spaghetti.pizza ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms # ping -c1 -6 spaghetti.pizza PING spaghetti.pizza(2606:4700:3034::6815:147a (2606:4700:3034::6815:147a)) 56 data bytes 64 bytes from 2606:4700:3034::6815:147a: icmp_seq=1 ttl=255 time=35.6 ms --- spaghetti.pizza ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 35.605/35.605/35.605/0.000 ms # logout $
Connect an existing user and network namespace
$ unshare -rUn # echo $$ 2446678[From another terminal]
$ ./pasta 2446678 Outbound interface: eth0, namespace interface: eth0 ARP: address: 28:16:ad:39:a9:ea DHCP: assign: 192.168.1.118 mask: 255.255.255.0 router: 192.168.1.1 NDP/DHCPv6: assign: 2a02:6d40:3ca5:2001:b81d:fa4a:8cdd:cf17 router: fe80::62e3:27ff:fe33:2b01[Back to the original terminal]
# dhclient -4 --no-pid # dhclient -6 --no-pid # ip address show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UNKNOWN group default qlen 1000 link/ether fa:c1:2a:27:92:a9 brd ff:ff:ff:ff:ff:ff inet 192.168.1.118/24 brd 192.168.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 2a02:6d40:3ca5:2001:b81d:fa4a:8cdd:cf17/128 scope global valid_lft forever preferred_lft forever inet6 2a02:6d40:3ca5:2001:f8c1:2aff:fe27:92a9/64 scope global dynamic mngtmpaddr valid_lft 3594sec preferred_lft 3594sec inet6 fe80::f8c1:2aff:fe27:92a9/64 scope link valid_lft forever preferred_lft forever
$ ./passt -f -t 2222:22 Outbound interface: eth0 ARP: address: 28:16:ad:39:a9:ea DHCP: assign: 192.168.1.118 mask: 255.255.255.0 router: 192.168.1.1 search: redhat.com NDP/DHCPv6: assign: 2a02:6d40:3ca5:2001:b81d:fa4a:8cdd:cf17 router: fe80::62e3:27ff:fe33:2b01 search: redhat.com UNIX domain socket bound at /tmp/passt_1.socket You can now start qrap: ./qrap 5 qemu-system-x86_64 ... -net socket,fd=5 -net nic,model=virtio or directly qemu, patched with: qemu/0001-net-Allow-also-UNIX-domain-sockets-to-be-used-as-net.patch as follows: qemu-system-x86_64 ... -net socket,connect=/tmp/passt_1.socket -net nic,model=virtio[From another terminal]
$ ./qrap 5 qemu-system-x86_64 test.qcow2 -m 1024 -display none -nodefaults -nographic -net socket,fd=5 -net nic,model=virtio Connected to /tmp/passt_1.socket[Back to the original terminal]
passt: DHCP: ack to request passt: from 52:54:00:12:34:56 passt: NDP: received NS, sending NA passt: NDP: received RS, sending RA passt: DHCPv6: received SOLICIT, sending ADVERTISE passt: NDP: received NS, sending NA passt: DHCPv6: received REQUEST/RENEW/CONFIRM, sending REPLY passt: NDP: received NS, sending NA[From yet another terminal]
$ ssh -p 2222 root@localhost root@localhost's password:[...]
# ip address show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff inet 192.168.1.118/24 brd 192.168.1.255 scope global noprefixroute ens2 valid_lft forever preferred_lft forever inet6 2a02:6d40:3ca5:2001:b81d:fa4a:8cdd:cf17/128 scope global noprefixroute valid_lft forever preferred_lft forever inet6 2a02:6d40:3ca5:2001:b019:9ae2:a2fe:e6b4/64 scope global dynamic noprefixroute valid_lft 3588sec preferred_lft 3588sec inet6 fe80::1f98:d09f:9309:9e77/64 scope link noprefixroute valid_lft forever preferred_lft forever
Both passt and pasta can bind on ports with a loopback address (127.0.0.0/8 or ::1), depending on the configuration. Loopback destination or source addresses need to be changed before packets are delivered to the guest or target namespace: most operating systems would drop packets received with loopback addresses on non-loopback interfaces, and it would also be impossible for guest or target namespace to route answers back.
For convenience, the source address on these packets is translated to the address specified by the --map-host-loopback option (with some exceptions in pasta mode, see next section below). If not specified this defaults, somewhat arbitrarily, to the address of default IPv4 or IPv6 gateway (if any) -- this is known to be an existing, valid address on the same subnet. If --no-map-gw or --map-host-loopback none are specified this translation is disabled and packets with loopback addresses are simply dropped.
Loopback destination addresses are translated to the observed external address of the guest or target namespace. For IPv6, the observed link-local address is used if the translated source address is link-local, otherwise the observed global address is used. For both IPv4 and IPv6, if no addresses have been seen yet, the configured addresses will be used instead.
For example, if passt or pasta receive a connection from 127.0.0.1, with destination 127.0.0.10, and the default IPv4 gateway is 192.0.2.1, while the last observed source address from guest or namespace is 192.0.2.2, this will be translated to a connection from 192.0.2.1 to 192.0.2.2.
Similarly, for traffic coming from guest or namespace, packets with destination address corresponding to the --map-host-loopback address will have their destination address translated to a loopback address.
Depending on the configuration, pasta can bind to local ports in the init namespace, in the target namespace, or both, and forward connections and packets to corresponding ports in the other namespace.
To avoid unnecessary overhead, these connections and packets are not forwarded through the tap device connecting the namespaces: pasta creates a socket in the destination namespace, with matching Layer-4 protocol, and uses it to forward local data. For TCP, data is forwarded between the originating socket and the new socket using the splice(2) system call, and for UDP, a pair of recvmmsg(2) and sendmmsg(2) system calls deals with packet transfers.
Because it's not possible to bind sockets to foreign addresses, this bypass only applies to local connections and traffic. It also means that the address translation differs slightly from passt mode. Connections from loopback to loopback on the host will appear to come from the target namespace's public address within the guest, unless --host-lo-to-ns-lo is specified, in which case they will appear to come from loopback in the namespace as well. The latter behaviour used to be the default, but is usually undesirable, since it can unintentionally expose namespace local services to the host.
If the port forwarding configuration requires binding to ports with numbers lower than 1024, passt and pasta will try to bind to them, but will fail, unless, either:
sysctl -w net.ipv4.ip_unprivileged_port_start=443
Note: this is the recommended way of enabling passt and pasta to bind to ports with numbers below 1024.
This is, in general, not the recommended way, because passt and pasta might be used as vector to effectively use this capability from another process.
However, if your environment is sufficiently controlled by an LSM (Linux Security Module) such as AppArmor, SELinux, Smack or TOMOYO, and no other processes can interact in such a way in virtue of this, granting this capability to passt and pasta only can effectively prevent other processes from utilising it.
Note that this will not work for automatic detection and forwarding of ports with pasta, because pasta will relinquish this capability at runtime.
To grant this capability, you can issue, as root:
for p in $(which passt passt.avx2); do setcap 'cap_net_bind_service=+ep' "${p}" done
ICMP and ICMPv6 Echo requests coming from guest or target namespace are handled using so-called "ping" sockets, introduced in Linux 2.6.30. To preserve the original identifier (see RFC 792, page 14, for ICMP, and RFC 4443, section 4.1, for ICMPv6), passt and pasta try to bind these sockets using the observed source identifier as "port" -- that corresponds to Echo identifiers for "ping" sockets.
As bind(2) failures were seen with particularly restrictive SELinux policies, a fall-back mechanism maps different identifiers to different sockets, and identifiers in replies will be mapped back to the original identifier of the request. However, if bind(2) fails and the fall-back mechanism is used, echo requests will be forwarded with different, albeit unique, identifiers.
For ICMP and ICMPv6 Echo requests to work, the ping_group_range parameter needs to include the PID of passt or pasta, see icmp(7).
As pasta connects to an existing namespace, or once it creates a new namespace, it will also ensure that the loopback interface, lo, is brought up. This is needed to bind ports using the loopback address in the namespace.
To synchronise the TCP sending window from host Layer-4 sockets to the TCP parameters announced in TCP segments sent over the Layer-2 interface, passt and pasta routinely query the size of the sending window seen by the kernel on the corresponding socket using the TCP_INFO socket option, see tcp(7). Before Linux 5.3, i.e. before Linux kernel commit 8f7baad7f035 ("tcp: Add snd_wnd to TCP_INFO"), the sending window (snd_wnd field) is not available.
If the sending window cannot be queried, it will always be announced as the current sending buffer size to guest or target namespace. This might affect throughput of TCP connections.
Currently, IGMP/MLD proxying (RFC 4605) and support for SCTP (RFC 4960) are not implemented.
TCP Selective Acknowledgment (RFC 2018), as well as Protection Against Wrapped Sequences (PAWS) and Round-Trip Time Measurement (RTTM), both described by RFC 7232, are currently not implemented.
Stefano Brivio <sbrivio@redhat.com>, David Gibson <david@gibson.dropbear.id.au>.
Please report issues on the bug tracker at https://passt.top/passt/bugs, or send a message to the passt-user@passt.top mailing list, see https://passt.top/passt/lists.
Copyright (c) 2020-2022 Red Hat GmbH.
passt and pasta are free software: you can redistribute them and/or modify them under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.
namespaces(7), qemu(1), qrap(1), slirp4netns(1).
High-level documentation is available at https://passt.top/passt/about/.