aboutgitcodebugslistschat
Commit message (Collapse)AuthorAgeFilesLines
...
* treewide: Add include guardsStefano Brivio2022-03-2915-0/+75
| | | | | | | ...at the moment, just for consistency with packet.h, icmp.h, tcp.h and udp.h. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* treewide: Packet abstraction with mandatory boundary checksStefano Brivio2022-03-2923-700/+999
| | | | | | | | | | | | | | | | | | | | Implement a packet abstraction providing boundary and size checks based on packet descriptors: packets stored in a buffer can be queued into a pool (without storage of its own), and data can be retrieved referring to an index in the pool, specifying offset and length. Checks ensure data is not read outside the boundaries of buffer and descriptors, and that packets added to a pool are within the buffer range with valid offset and indices. This implies a wider rework: usage of the "queueing" part of the abstraction mostly affects tap_handler_{passt,pasta}() functions and their callees, while the "fetching" part affects all the guest or tap facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6 handlers. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util: Fix function declaration style of write_pidfile()Stefano Brivio2022-03-291-1/+2
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, tcp_splice: Use less awkward syntax to swap in/out sockets from poolsStefano Brivio2022-03-292-12/+10
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcp: Minimum option length implied by RFC 951 is 60 bytes, not 62Stefano Brivio2022-03-291-3/+5
| | | | | | | In section 3 ("Packet Format"), "vend" is 64 bytes long, minus the magic that's 60 bytes, not 62. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Fit struct tcp_conn into a single 64-byte cachelineStefano Brivio2022-03-292-137/+166
| | | | | | | | | | | | | | | | | | | | | | ...by: - storing the chained-hash next connection pointer as numeric reference rather than as pointer - storing the MSS as 14-bit value, and rounding it - using only the effective amount of bits needed to store the hash bucket number - explicitly limiting window scaling factors to 4-bit values (maximum factor is 14, from RFC 7323) - scaling SO_SNDBUF values, and using a 8-bit representation for the duplicate ACK sequence - keeping window values unscaled, as received and sent Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Update Interfaces and Availability sectionsStefano Brivio2022-03-291-4/+9
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Avoid "here" linksStefano Brivio2022-03-291-20/+19
| | | | | | They look a bit lame: rephrase sentences to avoid them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/perf: Work-around for virtio_net hang before long streams from guestStefano Brivio2022-03-292-0/+30
| | | | | | | I didn't have time to investigate the root cause for the virtio_net TX hang yet. Add a quick work-around for the moment being. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp_splice: Close sockets right away on high number of open filesStefano Brivio2022-03-295-7/+27
| | | | | | | | | | | | | We can't take for granted that the hard limit for open files is big enough as to allow to delay closing sockets to a timer. Store the value of RTLIMIT_NOFILE we set at start, and use it to understand if we're approaching the limit with pending, spliced TCP connections. If that's the case, close sockets right away as soon as they're not needed, instead of deferring this task to a timer. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Rework timers to use timerfd instead of periodic bitmap scanStefano Brivio2022-03-295-241/+288
| | | | | | | | | | | | | | | | | | With a lot of concurrent connections, the bitmap scan approach is not really sustainable. Switch to per-connection timerfd timers, set based on events and on two new flags, ACK_FROM_TAP_DUE and ACK_TO_TAP_DUE. Timers are added to the common epoll list, and implement the existing timeouts. While at it, drop the CONN_ prefix from flag names, otherwise they get quite long, and fix the logic to decide if a connection has a local, possibly unreachable endpoint: we shouldn't go through the rest of tcp_conn_from_tap() if we reset the connection due to a successful bind(2), and we'll get EACCES if the port number is low. Suggested by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, udp, util: Enforce 24-bit limit on socket numbersStefano Brivio2022-03-295-1/+42
| | | | | | | This should never happen, but there are no formal guarantees: ensure socket numbers are below SOCKET_MAX. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test, seccomp, Makefile: Switch to valgrind runs for passt functional testsStefano Brivio2022-03-298-14/+99
| | | | | | | | | | | | | | Pass to seccomp.sh a list of additional syscalls valgrind needs as EXTRA_SYSCALLS in a new 'valgrind' make target, and add corresponding support in seccomp.sh itself. In test setup functions, start passt with valgrind, but not for performance tests. Add tests checking that valgrind exits without errors after all the other tests in the group are done. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test: Add asciinema(1) as requirement for CI in READMEStefano Brivio2022-03-281-1/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: Enable a few hardening flagsStefano Brivio2022-03-281-2/+8
| | | | | | | They don't have a measurable performance impact and make things a bit safer. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Use flags for local, loopback, and configured unicast bindsStefano Brivio2022-03-281-25/+23
| | | | | | | | | | | There's no value in keeping a separate timestamp for activity and for aging of local binds, given that they have the same timeout. Reduce that to a single timestamp, with a flag indicating the local bind. Also use flags instead of separate int fields for loopback and configured unicast address usage as source address. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* dhcpv6, tap, tcp: Use IN6_ARE_ADDR_EQUAL instead of open-coded memcmp()Stefano Brivio2022-03-283-9/+9
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Split buffer queueing/writing parts of udp_sock_handler()Stefano Brivio2022-03-281-171/+193
| | | | | | | | | | ...it became too hard to follow: split it off to udp_sock_fill_data_v{4,6}. While at it, use IN6_ARE_ADDR_EQUAL(a, b), courtesy of netinet/in.h, instead of open-coded memcmp(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Drop _splice from recv, send, sendto static buffer namesStefano Brivio2022-03-281-29/+23
| | | | | | | It's already implied by the fact they don't have "l2" in their names, and dropping it improves readability a bit. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/video: Fill in href attributes of video shortcutsStefano Brivio2022-03-281-1/+1
| | | | | | ...so that they can be indexed. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Refactor to use events instead of states, split out spliced implementationStefano Brivio2022-03-287-1501/+1755
| | | | | | | | | | | | | | | | | | | | | Using events and flags instead of states makes the implementation much more straightforward: actions are mostly centered on events that occurred on the connection rather than states. An example is given by the ESTABLISHED_SOCK_FIN_SENT and FIN_WAIT_1_SOCK_FIN abominations: we don't actually care about which side started closing the connection to handle closing of connection halves. Split out the spliced implementation, as it has very little in common with the "regular" TCP path. Refactor things here and there to improve clarity. Add helpers to trace where resets and flag settings come from. No functional changes intended. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util: Use standard int typesStefano Brivio2022-03-251-6/+6
| | | | | | ...instead of kernel-like short notations. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util: Drop CHECK_SET_MIN_MAX{,_PROTO_FD} macrosStefano Brivio2022-03-251-14/+0
| | | | | | | ...those were used when epoll references used to be socket numbers, they should have gone away a long time ago. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pcap: Fix mistake in printed stringStefano Brivio2022-03-251-1/+1
| | | | | | Packets are saved *to* a file, not *at* it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, util, tap: Implement --trace option for extra verbose loggingStefano Brivio2022-03-259-4/+55
| | | | | | | | --debug can be a bit too noisy, especially as single packets or socket messages are logged: implement a new option, --trace, implying --debug, that enables all debug messages. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Make it somewhat readable on mobile devicesStefano Brivio2022-03-046-35/+161
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* hooks, README: gzipped js snippets, webp alternatives for pngStefano Brivio2022-03-022-4/+17
| | | | | | | | Upload gzipped js snippets for usage with gzip_static in nginx or equivalent. Convert png drawings to webp for smaller size, use them as alternatives in README. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/setup: Unshare PID namespace in pasta_setup()Stefano Brivio2022-03-021-4/+8
| | | | | | ...otherwise, we'll leave processes (dhclient) around. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Don't preload CI recording, show poster from end of runStefano Brivio2022-03-011-1/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: s/guest/namespace/ in pasta "Try it" sectionStefano Brivio2022-03-011-1/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile, hooks: Static target precondition for pkgs, copy .avx2 buildsStefano Brivio2022-03-013-9/+7
| | | | | | Convenience packages are anyway built from static builds. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* demo/pasta: Clean up before rebuilding with -gStefano Brivio2022-03-011-0/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* arp, dhcp: Fix strict aliasing warnings reported by gcc 4.9 with -OfastStefano Brivio2022-02-282-8/+9
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Run-time selection of AVX2 buildStefano Brivio2022-02-289-48/+92
| | | | | | | | | | | | | Build-time selection of AVX2 flags and routines is not practical for distributions, but limiting AVX2 usage to checksum routines with specific run-time detection doesn't allow for easy performance gains from auto-vectorisation of batched packet handling routines. For x86_64, build non-AVX2 and AVX2 binaries, and implement a simple wrapper replacing the current executable with the AVX2 build if it's available, and if AVX2 is supported by the current CPU. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/distro/opensuse: Add Tumbleweed armv7l testStefano Brivio2022-02-281-0/+29
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/term: Don't run demo when started as ./runStefano Brivio2022-02-281-1/+1
| | | | | | | I changed this in a previous commit by mistake, restore the original command. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp, tcp: Add fcntl64 to pasta syscalls for armv6l, armv7lStefano Brivio2022-02-281-1/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* hooks/pre-push: Keep original cast on gzip, fix uploading with dashStefano Brivio2022-02-281-2/+3
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* demo/pasta: Exit namespace in 'ns' pane before restarting pastaStefano Brivio2022-02-281-0/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp: Adjust list of allowed syscalls for armv6l, armv7lStefano Brivio2022-02-263-5/+9
| | | | | | | | | | | It looks like glibc commonly implements clock_gettime(2) with clock_gettime64(), and uses recv() instead of recvfrom(), send() instead of sendto(), and sigreturn() instead of rt_sigreturn() on armv6l and armv7l. Adjust the list of system calls for armv6l and armv7l accordingly. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Don't warn on failed madvise()Stefano Brivio2022-02-261-2/+1
| | | | | | | | A kernel might not be configured with CONFIG_TRANSPARENT_HUGEPAGE, especially on embedded systems. Ignore the error, it doesn't affect functionality. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: Fix up AUDIT_ARCH for armv6l, armv7lStefano Brivio2022-02-261-0/+1
| | | | | | | There's a single AUDIT_ARCH_ARM define available (and big-endian shouldn't be a concern with those). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tap: Cast ETH_MAX_MTU to signed in comparisonsStefano Brivio2022-02-261-2/+2
| | | | | | At least gcc 8.3 and 10.2 emit a warning on armv6l and armv7l. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* seccomp.sh: Handle syscall number defines in the (x + y) formStefano Brivio2022-02-261-0/+3
| | | | | | | This is the case at least for current glibc headers on armv6l and armv7l. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* udp: Explicitly initialise sin6_scope_id and sin_zero in sockaddr_in{,6}Stefano Brivio2022-02-251-0/+2
| | | | | | | Not functionally needed, but gcc versions 7 to 9 (at least) will issue a warning otherwise. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Explicitly check return value of chdir()Stefano Brivio2022-02-251-1/+3
| | | | | | | | ...it doesn't actually matter as we're checking errno at the very end, but, depending on build flags, chdir() might be declared with warn_unused_result and the compiler issues a warning. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* hooks: Uploaded compressed .cast files tooStefano Brivio2022-02-231-3/+6
| | | | | | ...to benefit from gzip_static in nginx or equivalent. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt.1: Drop duplicate --dns sectionStefano Brivio2022-02-231-11/+1
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, ndp: Disable router advertisements on --config-netStefano Brivio2022-02-232-1/+6
| | | | | | | | | If we statically configure a default route, and also advertise it for SLAAC, the kernel will try moments later to add the same route: ICMPv6: RA: ndisc_router_discovery failed to add default route Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* netlink: Avoid left-over bytes in request on MTU configurationStefano Brivio2022-02-231-4/+7
| | | | | | | | | When nl_link() configures the MTU, it shouldn't send extra bytes, otherwise we'll get a kernel warning: netlink: 4 bytes leftover after parsing attributes in process `pasta'. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>