aboutgitcodebugslistschat
Commit message (Collapse)AuthorAgeFilesLines
* contrib: Add example of Debian package filesStefano Brivio2022-03-307-1/+66
| | | | | | | ...using dh_apparmor to ship and apply AppArmor profiles. Tried on current Debian testing (Bookworm, 12). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* contrib: Add example spec file for FedoraStefano Brivio2022-03-301-0/+95
| | | | | | | ...with SELinux package, too. Tested on Fedora 35, but it should work on pretty much any version. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tap: Re-read from tap in tap_handler_pasta() on buffer fullStefano Brivio2022-03-301-2/+9
| | | | | | | | read() will return zero if we pass a zero length, which makes no sense: instead, track explicitly that we exhausted the buffer, flush packets to handlers and redo. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tap: Allow ioctl() and openat() for tap_ns_tun() re-initialisationStefano Brivio2022-03-302-1/+3
| | | | | | | If the tun interface disappears, we'll call tap_ns_tun() after the seccomp profile is applied: add ioctl() and openat() to it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tap, tcp, udp, icmp: Cut down on some oversized buffersStefano Brivio2022-03-296-31/+72
| | | | | | | | | The existing sizes provide no measurable differences in throughput and packet rates at this point. They were probably needed as batched implementations were not complete, but they can be decreased quite a bit now. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Add examples of SELinux policy modulesStefano Brivio2022-03-297-0/+364
| | | | | | These should cover any reasonably common use case in distributions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Add examples of AppArmor policiesStefano Brivio2022-03-292-0/+125
| | | | | | These should cover any reasonably common use case in distributions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Fix warning by gcc 5.4 on ppc64le about comparison in CONN_OR_NULL()Stefano Brivio2022-03-291-13/+13
| | | | | | | ...we don't really need two extra bits, but it's easier to organise things differently than to silence this. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Accurate error reporting for sandbox()Stefano Brivio2022-03-291-10/+26
| | | | | | | It's actually quite easy to make it fail depending on the environment, accurately report errors here. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: Allow implicit test for bugprone-suspicious-string-compare checkerStefano Brivio2022-03-291-4/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Fix android-cloexec-* clang-tidy warnings, re-enable checksStefano Brivio2022-03-298-31/+30
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Move flags before ts in struct udp_tap_port, avoid end paddingStefano Brivio2022-03-291-3/+3
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Mark constant references as constStefano Brivio2022-03-2929-168/+192
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Add include guardsStefano Brivio2022-03-2915-0/+75
| | | | | | | ...at the moment, just for consistency with packet.h, icmp.h, tcp.h and udp.h. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Packet abstraction with mandatory boundary checksStefano Brivio2022-03-2923-700/+999
| | | | | | | | | | | | | | | | | | | | Implement a packet abstraction providing boundary and size checks based on packet descriptors: packets stored in a buffer can be queued into a pool (without storage of its own), and data can be retrieved referring to an index in the pool, specifying offset and length. Checks ensure data is not read outside the boundaries of buffer and descriptors, and that packets added to a pool are within the buffer range with valid offset and indices. This implies a wider rework: usage of the "queueing" part of the abstraction mostly affects tap_handler_{passt,pasta}() functions and their callees, while the "fetching" part affects all the guest or tap facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6 handlers. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util: Fix function declaration style of write_pidfile()Stefano Brivio2022-03-291-1/+2
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, tcp_splice: Use less awkward syntax to swap in/out sockets from poolsStefano Brivio2022-03-292-12/+10
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcp: Minimum option length implied by RFC 951 is 60 bytes, not 62Stefano Brivio2022-03-291-3/+5
| | | | | | | In section 3 ("Packet Format"), "vend" is 64 bytes long, minus the magic that's 60 bytes, not 62. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Fit struct tcp_conn into a single 64-byte cachelineStefano Brivio2022-03-292-137/+166
| | | | | | | | | | | | | | | | | | | | | | ...by: - storing the chained-hash next connection pointer as numeric reference rather than as pointer - storing the MSS as 14-bit value, and rounding it - using only the effective amount of bits needed to store the hash bucket number - explicitly limiting window scaling factors to 4-bit values (maximum factor is 14, from RFC 7323) - scaling SO_SNDBUF values, and using a 8-bit representation for the duplicate ACK sequence - keeping window values unscaled, as received and sent Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Update Interfaces and Availability sectionsStefano Brivio2022-03-291-4/+9
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Avoid "here" linksStefano Brivio2022-03-291-20/+19
| | | | | | They look a bit lame: rephrase sentences to avoid them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/perf: Work-around for virtio_net hang before long streams from guestStefano Brivio2022-03-292-0/+30
| | | | | | | I didn't have time to investigate the root cause for the virtio_net TX hang yet. Add a quick work-around for the moment being. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp_splice: Close sockets right away on high number of open filesStefano Brivio2022-03-295-7/+27
| | | | | | | | | | | | | We can't take for granted that the hard limit for open files is big enough as to allow to delay closing sockets to a timer. Store the value of RTLIMIT_NOFILE we set at start, and use it to understand if we're approaching the limit with pending, spliced TCP connections. If that's the case, close sockets right away as soon as they're not needed, instead of deferring this task to a timer. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Rework timers to use timerfd instead of periodic bitmap scanStefano Brivio2022-03-295-241/+288
| | | | | | | | | | | | | | | | | | With a lot of concurrent connections, the bitmap scan approach is not really sustainable. Switch to per-connection timerfd timers, set based on events and on two new flags, ACK_FROM_TAP_DUE and ACK_TO_TAP_DUE. Timers are added to the common epoll list, and implement the existing timeouts. While at it, drop the CONN_ prefix from flag names, otherwise they get quite long, and fix the logic to decide if a connection has a local, possibly unreachable endpoint: we shouldn't go through the rest of tcp_conn_from_tap() if we reset the connection due to a successful bind(2), and we'll get EACCES if the port number is low. Suggested by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, udp, util: Enforce 24-bit limit on socket numbersStefano Brivio2022-03-295-1/+42
| | | | | | | This should never happen, but there are no formal guarantees: ensure socket numbers are below SOCKET_MAX. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test, seccomp, Makefile: Switch to valgrind runs for passt functional testsStefano Brivio2022-03-298-14/+99
| | | | | | | | | | | | | | Pass to seccomp.sh a list of additional syscalls valgrind needs as EXTRA_SYSCALLS in a new 'valgrind' make target, and add corresponding support in seccomp.sh itself. In test setup functions, start passt with valgrind, but not for performance tests. Add tests checking that valgrind exits without errors after all the other tests in the group are done. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test: Add asciinema(1) as requirement for CI in READMEStefano Brivio2022-03-281-1/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: Enable a few hardening flagsStefano Brivio2022-03-281-2/+8
| | | | | | | They don't have a measurable performance impact and make things a bit safer. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Use flags for local, loopback, and configured unicast bindsStefano Brivio2022-03-281-25/+23
| | | | | | | | | | | There's no value in keeping a separate timestamp for activity and for aging of local binds, given that they have the same timeout. Reduce that to a single timestamp, with a flag indicating the local bind. Also use flags instead of separate int fields for loopback and configured unicast address usage as source address. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcpv6, tap, tcp: Use IN6_ARE_ADDR_EQUAL instead of open-coded memcmp()Stefano Brivio2022-03-283-9/+9
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Split buffer queueing/writing parts of udp_sock_handler()Stefano Brivio2022-03-281-171/+193
| | | | | | | | | | ...it became too hard to follow: split it off to udp_sock_fill_data_v{4,6}. While at it, use IN6_ARE_ADDR_EQUAL(a, b), courtesy of netinet/in.h, instead of open-coded memcmp(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Drop _splice from recv, send, sendto static buffer namesStefano Brivio2022-03-281-29/+23
| | | | | | | It's already implied by the fact they don't have "l2" in their names, and dropping it improves readability a bit. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/video: Fill in href attributes of video shortcutsStefano Brivio2022-03-281-1/+1
| | | | | | ...so that they can be indexed. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Refactor to use events instead of states, split out spliced implementationStefano Brivio2022-03-287-1501/+1755
| | | | | | | | | | | | | | | | | | | | | Using events and flags instead of states makes the implementation much more straightforward: actions are mostly centered on events that occurred on the connection rather than states. An example is given by the ESTABLISHED_SOCK_FIN_SENT and FIN_WAIT_1_SOCK_FIN abominations: we don't actually care about which side started closing the connection to handle closing of connection halves. Split out the spliced implementation, as it has very little in common with the "regular" TCP path. Refactor things here and there to improve clarity. Add helpers to trace where resets and flag settings come from. No functional changes intended. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util: Use standard int typesStefano Brivio2022-03-251-6/+6
| | | | | | ...instead of kernel-like short notations. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util: Drop CHECK_SET_MIN_MAX{,_PROTO_FD} macrosStefano Brivio2022-03-251-14/+0
| | | | | | | ...those were used when epoll references used to be socket numbers, they should have gone away a long time ago. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pcap: Fix mistake in printed stringStefano Brivio2022-03-251-1/+1
| | | | | | Packets are saved *to* a file, not *at* it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, util, tap: Implement --trace option for extra verbose loggingStefano Brivio2022-03-259-4/+55
| | | | | | | | --debug can be a bit too noisy, especially as single packets or socket messages are logged: implement a new option, --trace, implying --debug, that enables all debug messages. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Make it somewhat readable on mobile devicesStefano Brivio2022-03-046-35/+161
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* hooks, README: gzipped js snippets, webp alternatives for pngStefano Brivio2022-03-022-4/+17
| | | | | | | | Upload gzipped js snippets for usage with gzip_static in nginx or equivalent. Convert png drawings to webp for smaller size, use them as alternatives in README. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/setup: Unshare PID namespace in pasta_setup()Stefano Brivio2022-03-021-4/+8
| | | | | | ...otherwise, we'll leave processes (dhclient) around. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Don't preload CI recording, show poster from end of runStefano Brivio2022-03-011-1/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: s/guest/namespace/ in pasta "Try it" sectionStefano Brivio2022-03-011-1/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile, hooks: Static target precondition for pkgs, copy .avx2 buildsStefano Brivio2022-03-013-9/+7
| | | | | | Convenience packages are anyway built from static builds. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* demo/pasta: Clean up before rebuilding with -gStefano Brivio2022-03-011-0/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* arp, dhcp: Fix strict aliasing warnings reported by gcc 4.9 with -OfastStefano Brivio2022-02-282-8/+9
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Run-time selection of AVX2 buildStefano Brivio2022-02-289-48/+92
| | | | | | | | | | | | | Build-time selection of AVX2 flags and routines is not practical for distributions, but limiting AVX2 usage to checksum routines with specific run-time detection doesn't allow for easy performance gains from auto-vectorisation of batched packet handling routines. For x86_64, build non-AVX2 and AVX2 binaries, and implement a simple wrapper replacing the current executable with the AVX2 build if it's available, and if AVX2 is supported by the current CPU. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/distro/opensuse: Add Tumbleweed armv7l testStefano Brivio2022-02-281-0/+29
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/term: Don't run demo when started as ./runStefano Brivio2022-02-281-1/+1
| | | | | | | I changed this in a previous commit by mistake, restore the original command. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp, tcp: Add fcntl64 to pasta syscalls for armv6l, armv7lStefano Brivio2022-02-281-1/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>