aboutgitcodebugslistschat
path: root/README.md
diff options
context:
space:
mode:
authorStefano Brivio <sbrivio@redhat.com>2021-09-26 19:31:37 +0200
committerStefano Brivio <sbrivio@redhat.com>2021-09-27 01:28:02 +0200
commitcc8db1c5bcca6d84e7bd3ba7e0a5108b62937578 (patch)
tree3830b0c99106b988e2e4a6a6fbdd5571fe65ff9c /README.md
parentb216df04a1d6c57d5c92b61af29081af18f83311 (diff)
downloadpasst-cc8db1c5bcca6d84e7bd3ba7e0a5108b62937578.tar
passt-cc8db1c5bcca6d84e7bd3ba7e0a5108b62937578.tar.gz
passt-cc8db1c5bcca6d84e7bd3ba7e0a5108b62937578.tar.bz2
passt-cc8db1c5bcca6d84e7bd3ba7e0a5108b62937578.tar.lz
passt-cc8db1c5bcca6d84e7bd3ba7e0a5108b62937578.tar.xz
passt-cc8db1c5bcca6d84e7bd3ba7e0a5108b62937578.tar.zst
passt-cc8db1c5bcca6d84e7bd3ba7e0a5108b62937578.zip
README: pasta mode, CI, performance, updated links, etc.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Diffstat (limited to 'README.md')
-rw-r--r--README.md251
1 files changed, 185 insertions, 66 deletions
diff --git a/README.md b/README.md
index 14c89b2..85b2e90 100644
--- a/README.md
+++ b/README.md
@@ -1,11 +1,13 @@
+<span style="font-weight: bold; color: red;">While functional and tested to some extent, this project is still in early development phase: don't use in production or critical environments yet.</span>
+
# passt: Plug A Simple Socket Transport
-_passt_ implements a translation layer between a Layer-2 network interface (tap)
-and native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host. It doesn't
+_passt_ implements a translation layer between a Layer-2 network interface and
+native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host. It doesn't
require any capabilities or privileges, and it can be used as a simple
replacement for Slirp.
-<img src="/builds/passt_overview.png" usemap="#image-map" class="bright" style="z-index: 20; position: relative;">
+<img src="/builds/latest/web/passt_overview.png" usemap="#image-map" class="bright" style="z-index: 20; position: relative;">
<map name="image-map" id="map_overview">
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/tcp.7.html" coords="229,275,246,320,306,294,287,249" shape="poly">
<area class="map_area" target="_blank" href="https://lwn.net/Articles/420799/" coords="230,201,243,246,297,232,289,186" shape="poly">
@@ -35,7 +37,7 @@ replacement for Slirp.
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man4/veth.4.html" coords="1044,471,1090,461,1126,462,1150,464,1176,479,1160,491,1121,500,1081,501,1044,491,1037,483" shape="poly">
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/network_namespaces.7.html" coords="240,379,524,452" shape="rect">
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/netlink.7.html" coords="1119,278,1117,293,1165,304,1169,288" shape="poly">
- <area class="map_area" target="_blank" href="https://passt.top/passt/tree/passt.c#n195" coords="989,294,1040,264,1089,280,986,344" shape="poly">
+ <area class="map_area" target="_blank" href="https://passt.top/passt/tree/conf.c" coords="989,294,1040,264,1089,280,986,344" shape="poly">
</map>
<canvas id="map_highlight" style="border: 0px; z-index: 10; position: fixed; pointer-events: none"></canvas>
<script>
@@ -92,17 +94,35 @@ for (var i = 0; i < map_areas.length; i++) {
}
</script>
-- [General idea](#general-idea)
+# pasta: Pack A Subtle Tap Abstraction
+
+_pasta_ (same binary as _passt_, different command) offers equivalent
+functionality, for network namespaces: traffic is forwarded using a tap
+interface inside the namespace, without the need to create further interfaces on
+the host, hence not requiring any capabilities or privileges.
+
+It also implements a tap bypass path for local connections: packets with a local
+destination address are moved directly between Layer-4 sockets, avoiding Layer-2
+translations, using the _splice_(2) and _recvmmsg_(2)/_sendmmsg_(2) system calls
+for TCP and UDP, respectively.
+
+<img src="/builds/latest/web/pasta_overview.png" class="bright" style="z-index: 20; position: relative;">
+
+- [Motivation](#motivation)
- [Non-functional Targets](#non-functional-targets)
- [Interfaces and Environment](#interfaces-and-environment)
- [Services](#services)
- [Addresses](#addresses)
- [Protocols](#protocols)
- [Ports](#ports)
+- [Continuous Integration](#continuous-integration)
+- [Performance](#performance)
- [Try it](#try-it)
- [Contribute](#contribute)
-## General idea
+## Motivation
+
+### passt
When container workloads are moved to virtual machines, the network traffic is
typically forwarded by interfaces operating at data link level. Some components
@@ -110,19 +130,17 @@ in the containers ecosystem (such as _service meshes_), however, expect
applications to run locally, with visible sockets and processes, for the
purposes of socket redirection, monitoring, port mapping.
-To solve this issue, user mode networking as provided e.g. by _Slirp_,
-_libslirp_, _slirp4netns_ can be used. However, these existing solutions
-implement a full TCP/IP stack, replaying traffic on sockets that are local to
-the pod of the service mesh. This creates the illusion of application processes
-running on the same host, eventually separated by user namespaces.
+To solve this issue, user mode networking, as provided e.g. by _libslirp_,
+can be used. Existing solutions implement a full TCP/IP stack, replaying traffic
+on sockets that are local to the pod of the service mesh. This creates the
+illusion of application processes running on the same host, eventually separated
+by user namespaces.
While being almost transparent to the service mesh infrastructure, that kind of
solution comes with a number of downsides:
* three different TCP/IP stacks (guest, adaptation and host) need to be
- traversed for every service request. There are no chances to implement
- zero-copy mechanisms, and the amount of context switches increases
- dramatically
+ traversed for every service request
* addressing needs to be coordinated to create the pretense of consistent
addresses and routes between guest and host environments. This typically needs
a NAT with masquerading, or some form of packet bridging
@@ -135,21 +153,43 @@ solution comes with a number of downsides:
would if deployed with regular containers
_passt_ implements a thinner layer between guest and host, that only implements
-what's strictly needed to pretend processes are running locally. A further, full
-TCP/IP stack is not necessarily needed. Some sort of TCP adaptation is needed,
-however, as this layer runs without the `CAP_NET_RAW` capability: we can't
-create raw IP sockets on the pod, and therefore need to map packets at Layer-2
-to Layer-4 sockets offered by the host kernel.
+what's strictly needed to pretend processes are running locally. The TCP
+adaptation doesn't keep per-connection packet buffers, and reflects observed
+sending windows and acknowledgements between the two sides. This TCP adaptation
+is needed as _passt_ runs without the `CAP_NET_RAW` capability: it can't create
+raw IP sockets on the pod, and therefore needs to map packets at Layer-2 to
+Layer-4 sockets offered by the host kernel.
The problem and this approach are illustrated in more detail, with diagrams,
[here](https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md).
+### pasta
+
+On Linux, regular users can create network namespaces and run application
+services inside them. However, connecting namespaces to other namespaces and to
+external hosts requires the creation of network interfaces, such as `veth`
+pairs, which needs in turn elevated privileges or the `CAP_NET_ADMIN`
+capability. _pasta_, similarly to _slirp4netns_, solves this problem by creating
+a tap interface available to processes in the namespace, and mapping network
+traffic outside the namespace using native Layer-4 sockets.
+
+Existing approaches typically implement a full, generic TCP/IP stack for this
+translation between data and transport layers, without the possibility of
+speeding up local connections, and usually requiring NAT. _pasta_:
+* avoids the need for a generic, full-fledged TCP/IP stack by coordinating TCP
+connection dynamics between sender and receiver
+* offers a fast bypass path for local connections: if a process connects to
+another process on the same host across namespaces, data is directly forwarded
+using pairs of Layer-4 sockets
+* with default options, maps routing and addressing information to the
+namespace, avoiding any need for NAT
+
## Non-functional Targets
Security and maintainability goals:
* no dynamic memory allocation
-* ~2 000 LoC target
+* ~5 000 LoC target
* no external dependencies
## Interfaces and Environment
@@ -166,83 +206,125 @@ TCP. Two temporary solutions are available:
This approach, compared to using a _tap_ device, doesn't require any security
capabilities, as we don't need to create any interface.
+_pasta_ runs out of the box with any recent (post-3.8) Linux kernel.
+
## Services
-_passt_ provides some minimalistic implementations of networking services that
-can't practically run on the host:
+_passt_ and _pasta_ provide some minimalistic implementations of networking
+services:
* [ARP proxy](https://passt.top/passt/tree/arp.c), that resolves the address of
the host (which is used as gateway) to the original MAC address of the host
* [DHCP server](https://passt.top/passt/tree/dhcp.c), a simple implementation
- handing out one single IPv4 address to the guest, namely, the same address as
- the first one configured for the upstream host interface, and passing the
- nameservers configured on the host
+ handing out one single IPv4 address to the guest or namespace, namely, the
+ same address as the first one configured for the upstream host interface, and
+ passing the nameservers configured on the host
* [NDP proxy](https://passt.top/passt/tree/ndp.c), which can also assign prefix
and nameserver using SLAAC
* [DHCPv6 server](https://passt.top/passt/tree/dhcpv6.c): a simple
- implementation handing out one single IPv6 address to the guest, namely, the
- the same address as the first one configured for the upstream host interface,
- and passing the first nameserver configured on the host
+ implementation handing out one single IPv6 address to the guest or namespace,
+ namely, the the same address as the first one configured for the upstream host
+ interface, and passing the nameservers configured on the host
## Addresses
-For IPv4, the guest is assigned, via DHCP, the same address as the upstream
-interface of the host, and the same default gateway as the default gateway of
-the host. Addresses are translated in case the guest is seen using a different
-address from the assigned one.
+For IPv4, the guest or namespace is assigned, via DHCP, the same address as the
+upstream interface of the host, and the same default gateway as the default
+gateway of the host. Addresses are translated in case the guest is seen using a
+different address from the assigned one.
+
+For IPv6, the guest or namespace is assigned, via SLAAC, the same prefix as the
+upstream interface of the host, the same default route as the default route of
+the host, and, if a DHCPv6 client is running in the guest or namespace, also the
+same address as the upstream address of the host. This means that, with a DHCPv6
+client in the guest or namespace, addresses don't need to be translated. Should
+the client use a different address, the destination address is translated for
+packets going to the guest or to the namespace.
-For IPv6, the guest is assigned, via SLAAC, the same prefix as the upstream
-interface of the host, the same default route as the default route of the
-host, and, if a DHCPv6 client is running on the guest, also the same address as
-the upstream address of the host. This means that, with a DHCPv6 client on the
-guest, addresses don't need to be translated. Should the client use a different
-address, the destination address is translated for packets going to the guest.
+### Local connections with _passt_
-For UDP and TCP, for both IPv4 and IPv6, packets addressed to a loopback address
-are forwarded to the guest with their source address changed to the address of
-the gateway or first hop of the default route. This mapping is reversed as the
-guest replies to those packets (on the same TCP connection, or using destination
-port and address that were used as source for UDP).
+For UDP and TCP, for both IPv4 and IPv6, packets from the host addressed to a
+loopback address are forwarded to the guest with their source address changed to
+the address of the gateway or first hop of the default route. This mapping is
+reversed on the other way.
+
+### Local connections with _pasta_
+
+Packets addressed to a loopback address in either namespace are directly
+forwarded to the corresponding (or configured) port in the other namespace.
+Similarly as _passt_, packets from the non-init namespace addressed to the
+default gateway, which are therefore sent via the tap device, will have their
+destination address translated to the loopback address.
## Protocols
-_passt_ supports TCP, UDP and ICMP/ICMPv6 echo (requests and replies). More
-details about the TCP implementation are available
+_passt_ and _pasta_ support TCP, UDP and ICMP/ICMPv6 echo (requests and
+replies). More details about the TCP implementation are available
[here](https://passt.top/passt/tree/tcp.c), and for the UDP
implementation [here](https://passt.top/passt/tree/udp.c).
-An IGMP proxy is currently work in progress.
+An IGMP/MLD proxy is currently work in progress.
## Ports
-To avoid the need for explicit port mapping configuration, _passt_ binds to all
-unbound non-ephemeral (0-49152) TCP and UDP ports. Binding to low ports (0-1023)
-will fail without additional capabilities, and ports already bound (service
-proxies, etc.) will also not be used.
+### passt
+
+To avoid the need for explicit port mapping configuration, _passt_ can bind to
+all unbound non-ephemeral (0-49152) TCP and UDP ports. Binding to low ports
+(0-1023) will fail without additional capabilities, and ports already bound
+(service proxies, etc.) will also not be used. Smaller subsets of ports, with
+port translations, are also configurable.
UDP ephemeral ports are bound dynamically, as the guest uses them.
-Service proxies and other services running in the container need to be started
-before _passt_ starts.
+If all ports are forwarded, service proxies and other services running in the
+container need to be started before _passt_ starts.
+
+### pasta
+
+With default options, _pasta_ scans for bound ports on init and non-init
+namespaces, and automatically forwards them from the other side. Port forwarding
+is fully configurable with command line options.
+
+## Continuous Integration
+
+<p><video id="ci_video" style="width: 90%; height: auto; max-height: 90%" controls>
+ <source src="/builds/latest/web/ci.webm" type="video/webm">
+</video></p>
+
+<script src="/builds/latest/web/ci.js"></script>
+
+Test logs [here](https://passt.top/builds/latest/test/).
+
+## Performance
+
+<script src="/builds/latest/web/perf.js"></script>
## Try it
+### passt
+
* build from source:
git clone https://passt.top/passt
cd passt
make
- * to make _passt_ not fork into background when it starts, and to get verbose
- debug information, build with:
+ * alternatively, static builds for x86_64, with or without AVX2 instructions,
+ as of the latest commit are also available for convenience
+ [here](https://passt.top/builds/latest/x86_64/avx2/) and
+ [here](https://passt.top/builds/latest/x86_64/). Convenience, non-official
+ packages for Debian (and derivatives) and RPM-based distributions are also
+ available there. These binaries and packages are simply built with:
- CFLAGS="-DDEBUG" make
+ CFLAGS="-static" make avx2
+ make pkgs
+ make static
+ make pkgs
-* a static build for x86_64 as of the latest commit is also available for
- convenience [here](https://passt.top/builds/static/). These binaries are
- simply built with:
+* have a look at the _man_ page for synopsis and options:
- CFLAGS="-static" make
+ man ./passt.1
* run the demo script, that creates a network namespace called `passt`, sets up
sets up a _veth_ pair and and addresses, together with NAT for IPv4 and NDP
@@ -283,14 +365,51 @@ before _passt_ starts.
ssh 192.0.2.2
-## Contribute
+### passt
-Send patches and issue reports to [sbrivio@redhat.com](mailto:sbrivio@redhat.com).
+* build from source:
-<p><video id="ci_video" style="width: 90%; height: auto; max-height: 90%" controls>
- <source src="/builds/ci.mp4" type="video/mp4">
-</video></p>
+ git clone https://passt.top/passt
+ cd passt
+ make
+
+ * alternatively, static builds for x86_64, with or without AVX2 instructions,
+ as of the latest commit are also available for convenience
+ [here](https://passt.top/builds/latest/x86_64/avx2/) and
+ [here](https://passt.top/builds/latest/x86_64/). Convenience, non-official
+ packages for Debian (and derivatives) and RPM-based distributions are also
+ available there. These binaries and packages are simply built with:
+
+ CFLAGS="-static" make avx2
+ make pkgs
+ make static
+ make pkgs
+
+* have a look at the _man_ page for synopsis and options:
+
+ man ./pasta.1
-<script src="/builds/perf.js"></script>
+* start pasta with:
+
+ ./pasta
+
+* you're now inside a new user and network namespace. For IPv6, SLAAC happens
+ right away as _pasta_ sets up the interface, but DHCPv6 support is available
+ as well. For IPv4, configure the interface with a DHCP client:
+
+ dhclient
+
+ and, optionally:
+
+ dhclient -6
+
+* and that's it, you should now have TCP connections, UDP, and ICMP/ICMPv6
+ echo working from/to the guest for IPv4 and IPv6
+
+* to connect to a service inside the namespace, just connect to the same port
+ using the loopback address.
+
+## Contribute
-<script src="/builds/video_links.js"></script>
+Public bug tracker and mailing lists are coming soon. For the moment being, send
+patches and issue reports to [sbrivio@redhat.com](mailto:sbrivio@redhat.com).