passt - Plug A Simple Socket Transport

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	hooks: Copy .webp diagram versions too	Stefano Brivio	2022-03-30	1	-0/+1
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	README: Drop red notice about early development phase	Stefano Brivio	2022-03-30	1	-3/+1
\| \| \| \| \| \|	Last famous words: it should be tested enough by now. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	contrib: Add example of Debian package files	Stefano Brivio	2022-03-30	7	-1/+66
\| \| \| \| \| \| \|	...using dh_apparmor to ship and apply AppArmor profiles. Tried on current Debian testing (Bookworm, 12). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	contrib: Add example spec file for Fedora	Stefano Brivio	2022-03-30	1	-0/+95
\| \| \| \| \| \| \|	...with SELinux package, too. Tested on Fedora 35, but it should work on pretty much any version. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tap: Re-read from tap in tap_handler_pasta() on buffer full	Stefano Brivio	2022-03-30	1	-2/+9
\| \| \| \| \| \| \| \|	read() will return zero if we pass a zero length, which makes no sense: instead, track explicitly that we exhausted the buffer, flush packets to handlers and redo. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tap: Allow ioctl() and openat() for tap_ns_tun() re-initialisation	Stefano Brivio	2022-03-30	2	-1/+3
\| \| \| \| \| \| \|	If the tun interface disappears, we'll call tap_ns_tun() after the seccomp profile is applied: add ioctl() and openat() to it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tap, tcp, udp, icmp: Cut down on some oversized buffers	Stefano Brivio	2022-03-29	6	-31/+72
\| \| \| \| \| \| \| \| \|	The existing sizes provide no measurable differences in throughput and packet rates at this point. They were probably needed as batched implementations were not complete, but they can be decreased quite a bit now. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt, pasta: Add examples of SELinux policy modules	Stefano Brivio	2022-03-29	7	-0/+364
\| \| \| \| \| \|	These should cover any reasonably common use case in distributions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt, pasta: Add examples of AppArmor policies	Stefano Brivio	2022-03-29	2	-0/+125
\| \| \| \| \| \|	These should cover any reasonably common use case in distributions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp: Fix warning by gcc 5.4 on ppc64le about comparison in CONN_OR_NULL()	Stefano Brivio	2022-03-29	1	-13/+13
\| \| \| \| \| \| \|	...we don't really need two extra bits, but it's easier to organise things differently than to silence this. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Accurate error reporting for sandbox()	Stefano Brivio	2022-03-29	1	-10/+26
\| \| \| \| \| \| \|	It's actually quite easy to make it fail depending on the environment, accurately report errors here. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Makefile: Allow implicit test for bugprone-suspicious-string-compare checker	Stefano Brivio	2022-03-29	1	-4/+1
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Fix android-cloexec-* clang-tidy warnings, re-enable checks	Stefano Brivio	2022-03-29	8	-31/+30
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Move flags before ts in struct udp_tap_port, avoid end padding	Stefano Brivio	2022-03-29	1	-3/+3
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Mark constant references as const	Stefano Brivio	2022-03-29	29	-168/+192
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Add include guards	Stefano Brivio	2022-03-29	15	-0/+75
\| \| \| \| \| \| \|	...at the moment, just for consistency with packet.h, icmp.h, tcp.h and udp.h. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	treewide: Packet abstraction with mandatory boundary checks	Stefano Brivio	2022-03-29	23	-700/+999
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement a packet abstraction providing boundary and size checks based on packet descriptors: packets stored in a buffer can be queued into a pool (without storage of its own), and data can be retrieved referring to an index in the pool, specifying offset and length. Checks ensure data is not read outside the boundaries of buffer and descriptors, and that packets added to a pool are within the buffer range with valid offset and indices. This implies a wider rework: usage of the "queueing" part of the abstraction mostly affects tap_handler_{passt,pasta}() functions and their callees, while the "fetching" part affects all the guest or tap facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6 handlers. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	util: Fix function declaration style of write_pidfile()	Stefano Brivio	2022-03-29	1	-1/+2
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp, tcp_splice: Use less awkward syntax to swap in/out sockets from pools	Stefano Brivio	2022-03-29	2	-12/+10
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	dhcp: Minimum option length implied by RFC 951 is 60 bytes, not 62	Stefano Brivio	2022-03-29	1	-3/+5
\| \| \| \| \| \| \|	In section 3 ("Packet Format"), "vend" is 64 bytes long, minus the magic that's 60 bytes, not 62. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp: Fit struct tcp_conn into a single 64-byte cacheline	Stefano Brivio	2022-03-29	2	-137/+166
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...by: - storing the chained-hash next connection pointer as numeric reference rather than as pointer - storing the MSS as 14-bit value, and rounding it - using only the effective amount of bits needed to store the hash bucket number - explicitly limiting window scaling factors to 4-bit values (maximum factor is 14, from RFC 7323) - scaling SO_SNDBUF values, and using a 8-bit representation for the duplicate ACK sequence - keeping window values unscaled, as received and sent Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	README: Update Interfaces and Availability sections	Stefano Brivio	2022-03-29	1	-4/+9
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	README: Avoid "here" links	Stefano Brivio	2022-03-29	1	-20/+19
\| \| \| \| \| \|	They look a bit lame: rephrase sentences to avoid them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test/perf: Work-around for virtio_net hang before long streams from guest	Stefano Brivio	2022-03-29	2	-0/+30
\| \| \| \| \| \| \|	I didn't have time to investigate the root cause for the virtio_net TX hang yet. Add a quick work-around for the moment being. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp_splice: Close sockets right away on high number of open files	Stefano Brivio	2022-03-29	5	-7/+27
\| \| \| \| \| \| \| \| \| \| \| \| \|	We can't take for granted that the hard limit for open files is big enough as to allow to delay closing sockets to a timer. Store the value of RTLIMIT_NOFILE we set at start, and use it to understand if we're approaching the limit with pending, spliced TCP connections. If that's the case, close sockets right away as soon as they're not needed, instead of deferring this task to a timer. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp: Rework timers to use timerfd instead of periodic bitmap scan	Stefano Brivio	2022-03-29	5	-241/+288
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With a lot of concurrent connections, the bitmap scan approach is not really sustainable. Switch to per-connection timerfd timers, set based on events and on two new flags, ACK_FROM_TAP_DUE and ACK_TO_TAP_DUE. Timers are added to the common epoll list, and implement the existing timeouts. While at it, drop the CONN_ prefix from flag names, otherwise they get quite long, and fix the logic to decide if a connection has a local, possibly unreachable endpoint: we shouldn't go through the rest of tcp_conn_from_tap() if we reset the connection due to a successful bind(2), and we'll get EACCES if the port number is low. Suggested by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp, udp, util: Enforce 24-bit limit on socket numbers	Stefano Brivio	2022-03-29	5	-1/+42
\| \| \| \| \| \| \|	This should never happen, but there are no formal guarantees: ensure socket numbers are below SOCKET_MAX. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test, seccomp, Makefile: Switch to valgrind runs for passt functional tests	Stefano Brivio	2022-03-29	8	-14/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pass to seccomp.sh a list of additional syscalls valgrind needs as EXTRA_SYSCALLS in a new 'valgrind' make target, and add corresponding support in seccomp.sh itself. In test setup functions, start passt with valgrind, but not for performance tests. Add tests checking that valgrind exits without errors after all the other tests in the group are done. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test: Add asciinema(1) as requirement for CI in README	Stefano Brivio	2022-03-28	1	-1/+1
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Makefile: Enable a few hardening flags	Stefano Brivio	2022-03-28	1	-2/+8
\| \| \| \| \| \| \|	They don't have a measurable performance impact and make things a bit safer. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Use flags for local, loopback, and configured unicast binds	Stefano Brivio	2022-03-28	1	-25/+23
\| \| \| \| \| \| \| \| \| \| \|	There's no value in keeping a separate timestamp for activity and for aging of local binds, given that they have the same timeout. Reduce that to a single timestamp, with a flag indicating the local bind. Also use flags instead of separate int fields for loopback and configured unicast address usage as source address. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	dhcpv6, tap, tcp: Use IN6_ARE_ADDR_EQUAL instead of open-coded memcmp()	Stefano Brivio	2022-03-28	3	-9/+9
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Split buffer queueing/writing parts of udp_sock_handler()	Stefano Brivio	2022-03-28	1	-171/+193
\| \| \| \| \| \| \| \| \| \|	...it became too hard to follow: split it off to udp_sock_fill_data_v{4,6}. While at it, use IN6_ARE_ADDR_EQUAL(a, b), courtesy of netinet/in.h, instead of open-coded memcmp(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Drop _splice from recv, send, sendto static buffer names	Stefano Brivio	2022-03-28	1	-29/+23
\| \| \| \| \| \| \|	It's already implied by the fact they don't have "l2" in their names, and dropping it improves readability a bit. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test/lib/video: Fill in href attributes of video shortcuts	Stefano Brivio	2022-03-28	1	-1/+1
\| \| \| \| \| \|	...so that they can be indexed. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp: Refactor to use events instead of states, split out spliced implementation	Stefano Brivio	2022-03-28	7	-1501/+1755
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using events and flags instead of states makes the implementation much more straightforward: actions are mostly centered on events that occurred on the connection rather than states. An example is given by the ESTABLISHED_SOCK_FIN_SENT and FIN_WAIT_1_SOCK_FIN abominations: we don't actually care about which side started closing the connection to handle closing of connection halves. Split out the spliced implementation, as it has very little in common with the "regular" TCP path. Refactor things here and there to improve clarity. Add helpers to trace where resets and flag settings come from. No functional changes intended. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	util: Use standard int types	Stefano Brivio	2022-03-25	1	-6/+6
\| \| \| \| \| \|	...instead of kernel-like short notations. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	util: Drop CHECK_SET_MIN_MAX{,_PROTO_FD} macros	Stefano Brivio	2022-03-25	1	-14/+0
\| \| \| \| \| \| \|	...those were used when epoll references used to be socket numbers, they should have gone away a long time ago. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pcap: Fix mistake in printed string	Stefano Brivio	2022-03-25	1	-1/+1
\| \| \| \| \| \|	Packets are saved to a file, not at it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf, util, tap: Implement --trace option for extra verbose logging	Stefano Brivio	2022-03-25	9	-4/+55
\| \| \| \| \| \| \| \|	--debug can be a bit too noisy, especially as single packets or socket messages are logged: implement a new option, --trace, implying --debug, that enables all debug messages. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	README: Make it somewhat readable on mobile devices	Stefano Brivio	2022-03-04	6	-35/+161
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	hooks, README: gzipped js snippets, webp alternatives for png	Stefano Brivio	2022-03-02	2	-4/+17
\| \| \| \| \| \| \| \|	Upload gzipped js snippets for usage with gzip_static in nginx or equivalent. Convert png drawings to webp for smaller size, use them as alternatives in README. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test/lib/setup: Unshare PID namespace in pasta_setup()	Stefano Brivio	2022-03-02	1	-4/+8
\| \| \| \| \| \|	...otherwise, we'll leave processes (dhclient) around. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	README: Don't preload CI recording, show poster from end of run	Stefano Brivio	2022-03-01	1	-1/+1
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	README: s/guest/namespace/ in pasta "Try it" section	Stefano Brivio	2022-03-01	1	-1/+1
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Makefile, hooks: Static target precondition for pkgs, copy .avx2 builds	Stefano Brivio	2022-03-01	3	-9/+7
\| \| \| \| \| \|	Convenience packages are anyway built from static builds. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	demo/pasta: Clean up before rebuilding with -g	Stefano Brivio	2022-03-01	1	-0/+1
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	arp, dhcp: Fix strict aliasing warnings reported by gcc 4.9 with -Ofast	Stefano Brivio	2022-02-28	2	-8/+9
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt, pasta: Run-time selection of AVX2 build	Stefano Brivio	2022-02-28	9	-48/+92
\| \| \| \| \| \| \| \| \| \| \| \| \|	Build-time selection of AVX2 flags and routines is not practical for distributions, but limiting AVX2 usage to checksum routines with specific run-time detection doesn't allow for easy performance gains from auto-vectorisation of batched packet handling routines. For x86_64, build non-AVX2 and AVX2 binaries, and implement a simple wrapper replacing the current executable with the AVX2 build if it's available, and if AVX2 is supported by the current CPU. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test/distro/opensuse: Add Tumbleweed armv7l test	Stefano Brivio	2022-02-28	1	-0/+29
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>