aboutgitcodebugslistschat
path: root/Makefile
Commit message (Collapse)AuthorAgeFilesLines
* treewide: Fix android-cloexec-* clang-tidy warnings, re-enable checksStefano Brivio2022-03-291-9/+0
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test, seccomp, Makefile: Switch to valgrind runs for passt functional testsStefano Brivio2022-03-291-1/+7
| | | | | | | | | | | | | | Pass to seccomp.sh a list of additional syscalls valgrind needs as EXTRA_SYSCALLS in a new 'valgrind' make target, and add corresponding support in seccomp.sh itself. In test setup functions, start passt with valgrind, but not for performance tests. Add tests checking that valgrind exits without errors after all the other tests in the group are done. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: Enable a few hardening flagsStefano Brivio2022-03-281-2/+8
| | | | | | | They don't have a measurable performance impact and make things a bit safer. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile, hooks: Static target precondition for pkgs, copy .avx2 buildsStefano Brivio2022-03-011-1/+1
| | | | | | Convenience packages are anyway built from static builds. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Run-time selection of AVX2 buildStefano Brivio2022-02-281-10/+28
| | | | | | | | | | | | | Build-time selection of AVX2 flags and routines is not practical for distributions, but limiting AVX2 usage to checksum routines with specific run-time detection doesn't allow for easy performance gains from auto-vectorisation of batched packet handling routines. For x86_64, build non-AVX2 and AVX2 binaries, and implement a simple wrapper replacing the current executable with the AVX2 build if it's available, and if AVX2 is supported by the current CPU. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: Fix up AUDIT_ARCH for armv6l, armv7lStefano Brivio2022-02-261-0/+1
| | | | | | | There's a single AUDIT_ARCH_ARM define available (and big-endian shouldn't be a concern with those). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: By default, quit if filesystem-bound net namespace goes awayStefano Brivio2022-02-211-1/+2
| | | | | | | | | | | | This should be convenient for users managing filesystem-bound network namespaces: monitor the base directory of the namespace and exit if the namespace given as PATH or NAME target is deleted. We can't add an inotify watch directly on the namespace directory, that won't work with nsfs. Add an option to disable this behaviour, --no-netns-quit. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile, conf, passt: Drop passt4netns references, explicit argc checkStefano Brivio2022-02-211-5/+2
| | | | | | | | | | | Nobody currently calls this as passt4netns, that was the name I used before 'pasta', drop any reference before it's too late. While at it, explicitly check that argc is bigger than or equal to one, just as a defensive measure: argv[0] being NULL is not an issue anyway. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Address new clang-tidy warnings from LLVM 13.0.1Stefano Brivio2022-01-301-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | clang-tidy from LLVM 13.0.1 reports some new warnings from these checkers: - altera-unroll-loops, altera-id-dependent-backward-branch: ignore for the moment being, add a TODO item - bugprone-easily-swappable-parameters: ignore, nothing to do about those - readability-function-cognitive-complexity: ignore for the moment being, add a TODO item - altera-struct-pack-align: ignore, alignment is forced in protocol headers - concurrency-mt-unsafe: ignore for the moment being, add a TODO item Fix bugprone-implicit-widening-of-multiplication-result warnings, though, that's doable and they seem to make sense. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp, netlink, HAS{BYTES_ACKED,MIN_RTT,GETRANDOM} and NETLINK_GET_STRICT_CHKStefano Brivio2022-01-261-0/+15
| | | | | | | | | | | | | | | | tcpi_bytes_acked and tcpi_min_rtt are only available on recent kernel versions: provide fall-back paths (incurring some grade of performance penalty). Support for getrandom() was introduced in Linux 3.17 and glibc 2.25: provide an alternate mechanism for that as well, reading from /dev/random. Also check if NETLINK_GET_STRICT_CHK is defined before using it: it's not strictly needed, we'll filter out irrelevant results from netlink anyway. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile, seccomp: Fix build for i386, ppc64, ppc64leStefano Brivio2022-01-261-2/+12
| | | | | | | | | | | | | | | | | On some distributions, on ppc64, ulimit -s returns 'unlimited': add a reasonable default, and also make sure ulimit is invoked using the default shell, which should ensure ulimit is actually implemented. Also note that AUDIT_ARCH doesn't follow closely the naming reported by 'uname -m': convert for i386 and ppc as needed. While at it, move inclusion of seccomp.h after util.h, the former is less generic (cosmetic/clang-tidy only). Older kernel headers might lack a definition for AUDIT_ARCH_PPC64LE: define that explicitly if it's not available. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: cppcheck target: Suppress unmatchedSuppression, pass CFLAGSStefano Brivio2021-10-211-3/+16
| | | | | | | | | | Some of those warnings don't trigger even on systems with very similar toolchains, suppress unmatchedSuppression warnings, they're basically useless. While at it, pass CFLAGS to cppcheck. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add cppcheck target, test, and address resulting warningsStefano Brivio2021-10-211-1/+31
| | | | | | | ...mostly false positives, but a number of very relevant ones too, in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Fix build with gcc 7, use std=c99, enable some more Clang checkersStefano Brivio2021-10-211-5/+1
| | | | | | | | | | | | | | Unions and structs, you all have names now. Take the chance to enable bugprone-reserved-identifier, cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy. Provide a ffsl() weak declaration using gcc built-in. Start reordering includes, but that's not enough for the llvm-include-order checker yet. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile, tcp: Don't try to use tcpi_snd_wnd from tcp_info on pre-5.3 kernelsStefano Brivio2021-10-211-0/+5
| | | | | | | Detect missing tcpi_snd_wnd in struct tcp_info at build time, otherwise build fails with a pre-5.3 linux/tcp.h header. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add clang-tidy Makefile target and test, take care of warningsStefano Brivio2021-10-201-0/+75
| | | | | | | Most are just about style and form, but a few were actually serious mistakes (NDP-related). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Address gcc 11 warningsStefano Brivio2021-10-201-0/+14
| | | | | | | | | A mix of unchecked return values, a missing permission mask for open(2) with O_CREAT, and some false positives from -Wstringop-overflow and -Wmaybe-uninitialized. Reported-by: Martin Hauke <mardnh@gmx.de> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* LICENSES: Add license text files, add missing notices, fix SPDX tagsStefano Brivio2021-10-201-0/+11
| | | | | | | | | | SPDX tags don't replace license files. Some notices were missing and some tags were not according to the SPDX specification, too. Now reuse --lint from the REUSE tool (https://reuse.software/) passes. Reported-by: Martin Hauke <mardnh@gmx.de> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: Install man pages to /usr/share/man instead of /usr/manStefano Brivio2021-10-201-5/+5
| | | | | Reported-by: Martin Hauke <mardnh@gmx.de> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: Prefix installation paths with $(DESTDIR)Stefano Brivio2021-10-191-9/+9
| | | | | | | | | Martin reports that DESTDIR is ignored in install/uninstall targets, see also: https://www.gnu.org/prep/standards/html_node/DESTDIR.html Reported-by: Martin Hauke <mardnh@gmx.de> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Static builds: don't redefine __vsyslog(), skip getpwnam() and ↵Stefano Brivio2021-10-161-1/+1
| | | | | | initgroups() Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Add seccomp supportStefano Brivio2021-10-141-9/+8
| | | | | | | | | | | | | | | | | | List of allowed syscalls comes from comments in the form: #syscalls <list> for syscalls needed both in passt and pasta mode, and: #syscalls:pasta <list> #syscalls:passt <list> for syscalls specifically needed in pasta or passt mode only. seccomp.sh builds a list of BPF statements from those comments, prefixed by a binary search tree to keep lookup fast. While at it, clean up a bit the Makefile using wildcards. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* conf, tap: Split netlink and pasta functions, allow interface configurationStefano Brivio2021-10-141-4/+5
| | | | | | | | | | Move netlink routines to their own file, and use netlink to configure or fetch all the information we need, except for the TUNSETIFF ioctl. Move pasta-specific functions to their own file as well, add parameters and calls to configure the tap interface in the namespace. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* pasta: Allow specifying paths and names of namespacesGiuseppe Scrivano2021-10-071-0/+1
| | | | | | | | | | | | | | | | | | Based on a patch from Giuseppe Scrivano, this adds the ability to: - specify paths and names of target namespaces to join, instead of a PID, also for user namespaces, with --userns - request to join or create a network namespace only, without entering or creating a user namespace, with --netns-only - specify the base directory for netns mountpoints, with --nsrun-dir Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> [sbrivio: reworked logic to actually join the given namespaces when they're not created, implemented --netns-only and --nsrun-dir, updated pasta demo script and man page] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Align pkt_buf to PAGE_SIZE (start and size), try to fit in huge pagesStefano Brivio2021-09-271-0/+1
| | | | | | | | | If transparent huge pages are available, madvise() will do the trick. While at it, decrease EPOLL_EVENTS for the main loop from 10 to 8, for slightly better socket fairness. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: Visually separate CFLAGS from input files in resulting cc commandsStefano Brivio2021-09-271-2/+4
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: Make sure destination directories exist on installStefano Brivio2021-09-011-0/+1
| | | | | | Mostly theoretical, but convenient for testing. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: Quick hack to build convenience Debian and RPM packagesStefano Brivio2021-09-011-1/+15
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* Makefile: Add install, uninstall targetsStefano Brivio2021-09-011-1/+16
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, pasta: Introduce command-line options and port re-mappingStefano Brivio2021-09-011-11/+6
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* checksum: Introduce AVX2 implementation, unify helpersStefano Brivio2021-07-261-2/+17
| | | | | | | | | | | | | | | | | | | Provide an AVX2-based function using compiler intrinsics for TCP/IP-style checksums. The load/unpack/add idea and implementation is largely based on code from BESS (the Berkeley Extensible Software Switch) licensed as 3-Clause BSD, with a number of modifications to further decrease pipeline stalls and to minimise cache pollution. This speeds up considerably data paths from sockets to tap interfaces, decreasing overhead for checksum computation, with 16-64KiB packet buffers, from approximately 11% to 7%. The rest is just syscalls at this point. While at it, provide convenience targets in the Makefile for avx2, avx2_debug, and debug targets -- these simply add target-specific CFLAGS to the build. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add PASTA mode, major reworkStefano Brivio2021-07-171-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PASTA (Pack A Subtle Tap Abstraction) provides quasi-native host connectivity to an otherwise disconnected, unprivileged network and user namespace, similarly to slirp4netns. Given that the implementation is largely overlapping with PASST, no separate binary is built: 'pasta' (and 'passt4netns' for clarity) both link to 'passt', and the mode of operation is selected depending on how the binary is invoked. Usage example: $ unshare -rUn # echo $$ 1871759 $ ./pasta 1871759 # From another terminal # udhcpc -i pasta0 2>/dev/null # ping -c1 pasta.pizza PING pasta.pizza (64.190.62.111) 56(84) bytes of data. 64 bytes from 64.190.62.111 (64.190.62.111): icmp_seq=1 ttl=255 time=34.6 ms --- pasta.pizza ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 34.575/34.575/34.575/0.000 ms # ping -c1 spaghetti.pizza PING spaghetti.pizza(2606:4700:3034::6815:147a (2606:4700:3034::6815:147a)) 56 data bytes 64 bytes from 2606:4700:3034::6815:147a (2606:4700:3034::6815:147a): icmp_seq=1 ttl=255 time=29.0 ms --- spaghetti.pizza ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 28.967/28.967/28.967/0.000 ms This entails a major rework, especially with regard to the storage of tracked connections and to the semantics of epoll(7) references. Indexing TCP and UDP bindings merely by socket proved to be inflexible and unsuitable to handle different connection flows: pasta also provides Layer-2 to Layer-2 socket mapping between init and a separate namespace for local connections, using a pair of splice() system calls for TCP, and a recvmmsg()/sendmmsg() pair for UDP local bindings. For instance, building on the previous example: # ip link set dev lo up # iperf3 -s $ iperf3 -c ::1 -Z -w 32M -l 1024k -P2 | tail -n4 [SUM] 0.00-10.00 sec 52.3 GBytes 44.9 Gbits/sec 283 sender [SUM] 0.00-10.43 sec 52.3 GBytes 43.1 Gbits/sec receiver iperf Done. epoll(7) references now include a generic part in order to demultiplex data to the relevant protocol handler, using 24 bits for the socket number, and an opaque portion reserved for usage by the single protocol handlers, in order to track sockets back to corresponding connections and bindings. A number of fixes pertaining to TCP state machine and congestion window handling are also included here. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Introduce packet capture implementationStefano Brivio2021-05-211-2/+2
| | | | | | | | With -DDEBUG, passt now saves guest-side traffic captures in pcap format at /tmp/passt_<ISO8601 timestamp>.pcap. The timestamp refers to time and date of start-up. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* qrap: Find qemu command if not passed, patch command lineStefano Brivio2021-05-101-1/+1
| | | | | | | | | | | | | | It might be impractical to pass options to qrap when using libvirt, because the <emulator/> tag expects a path to an executable, without further arguments. If the first argument is not a plausible socket number, and the second argument is not a valid executable, look up a qemu command from a list of possible names, then start it patching the command line to include the -netdev fd= parameter corresponding to the AF_UNIX domain socket we just opened. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Introduce a DHCPv6 serverStefano Brivio2021-04-131-2/+2
| | | | | | | | | | | | This implementation, similarly to the IPv4 DHCP one, hands out a single address, which is the same as the upstream address for the host. This avoids the need for address translation as long as the client runs a DHCPv6 client. The NDP "Managed" flag is now set in Router Advertisements. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Introduce ICMP echo proxyStefano Brivio2021-03-181-2/+2
| | | | | | | | It's nice to be able to confirm connectivity using ICMP or ICMPv6 echo requests, and "ping" sockets on Linux (IPPROTO_ICMP datagram) allow us to do that without any special capability. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Add siphash implementation for initial sequence numbersStefano Brivio2021-03-171-2/+2
| | | | | | | | | | | | | Implement siphash routines for initial TCP sequence numbers (12 bytes input for IPv4, 36 bytes input for IPv6), and while at it, also functions we'll use later on for hash table indices and TCP timestamp offsets (with 8, 20, 32 bytes of input). Use these to set the initial sequence number, according to RFC 6528, for connections originating either from the tap device or from sockets. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: New design and implementation with native Layer 4 socketsStefano Brivio2021-02-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a reimplementation, partially building on the earlier draft, that uses L4 sockets (SOCK_DGRAM, SOCK_STREAM) instead of SOCK_RAW, providing L4-L2 translation functionality without requiring any security capability. Conceptually, this follows the design presented at: https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md The most significant novelty here comes from TCP and UDP translation layers. In particular, the TCP state and translation logic follows the intent of being minimalistic, without reimplementing a full TCP stack in either direction, and synchronising as much as possible the TCP dynamic and flows between guest and host kernel. Another important introduction concerns addressing, port translation and forwarding. The Layer 4 implementations now attempt to bind on all unbound ports, in order to forward connections in a transparent way. While at it: - the qemu 'tap' back-end can't be used as-is by qrap anymore, because of explicit checks now introduced in qemu to ensure that the corresponding file descriptor is actually a tap device. For this reason, qrap now operates on a 'socket' back-end type, accounting for and building the additional header reporting frame length - provide a demo script that sets up namespaces, addresses and routes, and starts the daemon. A virtual machine started in the network namespace, wrapped by qrap, will now directly interface with passt and communicate using Layer 4 sockets provided by the host kernel. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt: Add IPv6 and NDP support, further fixes for IPv4 CTStefano Brivio2021-02-161-2/+2
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* merd: Rename to PASSTStefano Brivio2021-02-161-6/+6
| | | | | | Plug A Simple Socket Transport. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* merd: ARP and DHCP handlers, connection tracking fixesStefano Brivio2021-02-161-2/+2
| | | | | | | With this, merd provides a fully functional IPv4 environment to guests, requiring a single capability, CAP_NET_RAW. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* merd: Switch to AF_UNIX for qemu tap, provide wrapperStefano Brivio2021-02-161-3/+6
| | | | | | | | | | | We can bypass a full-fledged network interface between qemu and merd by connecting the qemu tap file descriptor to a provided UNIX domain socket: this could be implemented in qemu eventually, qrap covers this meanwhile. This also avoids the need for the AF_PACKET socket towards the guest. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* merd: Initial importStefano Brivio2021-02-161-0/+10
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>