aboutgitcodebugslistschat
Commit message (Collapse)AuthorAgeFilesLines
* tcp_splice: Allow splicing of connections from IPv4-mapped loopbackDavid Gibson2022-11-251-8/+11
| | | | | | | | | | | For non-spliced connections we now treat IPv4-mapped IPv6 addresses the same as the corresponding IPv4 addresses. However currently we won't splice a connection from ::ffff:127.0.0.1 the way we would one from 127.0.0.1. Correct this so that we can splice connections from IPv4 localhost that have been received on an IPv6 dual stack socket. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: NAT IPv4-mapped IPv6 addresses like IPv4 addressesDavid Gibson2022-11-252-38/+59
| | | | | | | | | | | | | | | | | | | passt usually doesn't NAT, but it does do so for the remapping of the gateway address to refer to the host. Currently we perform this NAT with slightly different rules on both IPv4 addresses and IPv6 addresses, but not on IPv4-mapped IPv6 addresses. This means we won't correctly handle the case of an IPv4 connection over an IPv6 socket, which is possible on Linux (and probably other platforms). Refactor tcp_conn_from_sock() to perform the NAT after converting either address family into an inany_addr, so IPv4 and and IPv4-mapped addresses have the same representation. With two new helpers this lets us remove the IPv4 and IPv6 specific paths from tcp_conn_from_sock(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Remove v6 flag from tcp_epoll_refDavid Gibson2022-11-253-13/+7
| | | | | | | | | | | | | | | | | | | This bit in the TCP specific epoll reference indicates whether the connection is IPv6 or IPv4. However the sites which refer to it are already calling accept() which (optionally) returns an address for the remote end of the connection. We can use the sa_family field in that address to determine the connection type independent of the epoll reference. This does have a cost: for the spliced case, it means we now need to get that address from accept() which introduces an extran copy_to_user(). However, in future we want to allow handling IPv4 connectons through IPv6 sockets, which means we won't be able to determine the IP version at the time we create the listening socket and epoll reference. So, at some point we'll have to pay this cost anyway. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Fix small errors in tcp_seq_init() time handlingDavid Gibson2022-11-251-2/+2
| | | | | | | | | | | | | | | | | | | It looks like tcp_seq_init() is supposed to advance the sequence number by one every 32ns. However we only right shift the ns part of the timespec not the seconds part, meaning that we'll advance by an extra 32 steps on each second. I don't know if that's exploitable in any way, but it doesn't appear to be the intent, nor what RFC 6528 suggests. In addition, we convert from seconds to nanoseconds with a multiplication by '1E9'. In C '1E9' is a floating point constant, forcing a conversion to floating point and back for what should be an integer calculation (confirmed with objdump and Makefile default compiler flags). Spell out 1000000000 in full to avoid that. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Have tcp_seq_init() take its parameters from struct tcp_connDavid Gibson2022-11-251-26/+10
| | | | | | | | | | | tcp_seq_init() takes a number of parameters for the connection, but at every call site, these are already populated in the tcp_conn structure. Likewise we always store the result into the @seq_to_tap field. Use this to simplify tcp_seq_init(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Unify initial sequence number calculation for IPv4 and IPv6David Gibson2022-11-252-28/+19
| | | | | | | | | | | | | tcp_seq_init() has separate paths for IPv4 and IPv6 addresses, which means we will calculate different sequence numbers for IPv4 and equivalent IPv4-mapped IPv6 addresses. Change it to treat these the same by always converting the input address into an inany_addr representation and use that to calculate the sequence number. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Simplify tcp_hash_match() to take an inany_addrDavid Gibson2022-11-251-12/+4
| | | | | | | | | | | tcp_hash_match() can take either an IPv4 (struct in_addr) or IPv6 (struct in6_addr) address. It has two different paths for each of those cases. However, its only caller has already constructed an equivalent inany representation of the address, so we can have tcp_hash_match take that directly and use a simpler comparison with the inany_equals() helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Take tcp_hash_insert() address from struct tcp_connDavid Gibson2022-11-251-12/+5
| | | | | | | | | | | | | tcp_hash_insert() takes an address to control which hash bucket the connection will go into. However, an inany_addr representation of that address is already stored in struct tcp_conn. Now that we've made the hashing of IPv4 and IPv4-mapped IPv6 addresses equivalent, we can simplify tcp_hash_insert() to use the address in struct tcp_conn, rather than taking it as an extra parameter. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Hash IPv4 and IPv4-mapped-IPv6 addresses the sameDavid Gibson2022-11-252-34/+19
| | | | | | | | | | | | | | | | | In the tcp_conn structure, we represent the address with an inany_addr which could be an IPv4 or IPv6 address. However, we have different paths which will calculate different hashes for IPv4 and equivalent IPv4-mapped IPv6 addresses. This will cause problems for some future changes. Make the hash function work the same for these two cases, by taking an inany_addr directly. Since this represents IPv4 and IPv4-mapped IPv6 addresses the same way, it will trivially hash the same for both cases. Callers are changed to construct an inany_addr from whatever they have. Some of that will be elided in later changes. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* inany: Helper functions for handling addresses which could be IPv4 or IPv6David Gibson2022-11-255-61/+117
| | | | | | | | | | | | | | struct tcp_conn stores an address which could be IPv6 or IPv4 using a union. We can do this without an additional tag by encoding IPv4 addresses as IPv4-mapped IPv6 addresses. This approach is useful wider than the specific place in tcp_conn, so expose a new 'union inany_addr' like this from a new inany.h. Along with that create a number of helper functions to make working with these "inany" addresses easier. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Don't store hash bucket in connection structuresDavid Gibson2022-11-252-10/+24
| | | | | | | | | | | | | | | Currently when we insert a connection into the hash table, we store its bucket number so we can find it when removing entries. However, we can recompute the hash value from other contents of the structure so we don't need to store it. This brings the size of tcp_tap_conn down to 64 bytes again, which means it will fit in a single cacheline on common machines. This change also removes a non-obvious constraint that the hash table have less than twice TCP_MAX_CONNS buckets, because of the way TCP_HASH_BUCKET_BITS was constructed. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Remove splice from tcp_epoll_refDavid Gibson2022-11-254-46/+46
| | | | | | | | | | | | Currently the epoll reference for tcp sockets includes a bit indicating whether the socket maps to a spliced connection. However, the reference also has the index of the connection structure which also indicates whether it is spliced. We can therefore avoid the splice bit in the epoll_ref by unifying the first part of the non-spliced and spliced handlers where we look up the connection state. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Use the same sockets to listen for spliced and non-spliced connectionsDavid Gibson2022-11-253-102/+56
| | | | | | | | | | | | | | | | | | | | In pasta mode, tcp_sock_init[46]() create separate sockets to listen for spliced connections (these are bound to localhost) and non-spliced connections (these are bound to the host address). This introduces a subtle behavioural difference between pasta and passt: by default, pasta will listen only on a single host address, whereas passt will listen on all addresses (0.0.0.0 or ::). This also prevents us using some additional optimizations that only work with the unspecified (0.0.0.0 or ::) address. However, it turns out we don't need to do this. We can splice a connection if and only if it originates from the loopback address. Currently we ensure this by having the "spliced" listening sockets listening only on loopback. Instead, defer the decision about whether to splice a connection until after accept(), by checking if the connection was made from the loopback address. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Unify part of spliced and non-spliced conn_from_sock pathDavid Gibson2022-11-253-50/+77
| | | | | | | | | | | | | | In tcp_sock_handler() we split off to handle spliced sockets before checking anything else. However the first steps of the "new connection" path for each case are the same: allocate a connection entry and accept() the connection. Remove this duplication by making tcp_conn_from_sock() handle both spliced and non-spliced cases, with help from more specific tcp_tap_conn_from_sock and tcp_splice_conn_from_sock functions for the later stages which differ. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Separate helpers to create ns listening socketsDavid Gibson2022-11-253-49/+93
| | | | | | | | | | | | | | | | | tcp_sock_init*() can create either sockets listening on the host, or in the pasta network namespace (with @ns==1). There are, however, a number of differences in how these two cases work in practice though. "ns" sockets are only used in pasta mode, and they always lead to spliced connections only. The functions are also only ever called in "ns" mode with a NULL address and interface name, and it doesn't really make sense for them to be called any other way. Later changes will introduce further differences in behaviour between these two cases, so it makes more sense to use separate functions for creating the ns listening sockets than the regular external/host listening sockets. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Unify the IN_EPOLL flagDavid Gibson2022-11-253-19/+19
| | | | | | | | | | | | There is very little common between the tcp_tap_conn and tcp_splice_conn structures. However, both do have an IN_EPOLL flag which has the same meaning in each case, though it's stored in a different location. Simplify things slightly by moving this bit into the common header of the two structures. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Partially unify tcp_timer() and tcp_splice_timer()David Gibson2022-11-254-42/+37
| | | | | | | | | | These two functions scan all the non-splced and spliced connections respectively and perform timed updates on them. Avoid scanning the now unified table twice, by having tcp_timer scan it once calling the relevant per-connection function for each one. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Unify tcp_defer_handler and tcp_splice_defer_handler()David Gibson2022-11-254-33/+13
| | | | | | | | | | | These two functions each step through non-spliced and spliced connections respectively and clean up entries for closed connections. To avoid scanning the connection table twice, we merge these into a single function which scans the unified table and performs the appropriate sort of cleanup action on each one. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Unify spliced and non-spliced connection tablesDavid Gibson2022-11-255-71/+51
| | | | | | | | | | | | | | | | | | | | | | | Currently spliced and non-spliced connections are stored in completely separate tables, so there are completely independent limits on the number of spliced and non-spliced connections. This is a bit counter-intuitive. More importantly, the fact that the tables are separate prevents us from unifying some other logic between the two cases. So, merge these two tables into one, using the 'c.spliced' common field to distinguish between them when necessary. For now we keep a common limit of 128k connections, whether they're spliced or non-spliced, which means we save memory overall. If necessary we could increase this to a 256k or higher total, which would cost memory but give some more flexibility. For now, the code paths which need to step through all extant connections are still separate for the two cases, just skipping over entries which aren't for them. We'll improve that in later patches. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Improved helpers to update connections after movingDavid Gibson2022-11-252-10/+23
| | | | | | | | | | When we compact the connection tables (both spliced and non-spliced) we need to move entries from one slot to another. That requires some updates in the entries themselves. Add helpers to make all the necessary updates for the spliced and non-spliced cases. This will simplify later cleanups. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Add connection union typeDavid Gibson2022-11-253-0/+36
| | | | | | | | | | | | | | | | Currently, the tables for spliced and non-spliced connections are entirely separate, with different types in different arrays. We want to unify them. As a first step, create a union type which can represent either a spliced or non-spliced connection. For them to be distinguishable, the individual types need to have a common header added, with a bit indicating which type this structure is. This comes at the cost of increasing the size of tcp_tap_conn to over one (64 byte) cacheline. This isn't ideal, but it makes things simpler for now and we'll re-optimize this later. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Move connection state structures into a shared headerDavid Gibson2022-11-254-225/+245
| | | | | | | | | | | | | | | Currently spliced and non-spliced connections use completely independent tracking structures. We want to unify these, so as a preliminary step move the definitions for both variants into a new tcp_conn.h header, shared by tcp.c and tcp_splice.c. This requires renaming some #defines with the same name but different meanings between the two cases. In the process we correct some places that are slightly out of sync between the comments and the code for various event bit names. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp_splice: Helpers for converting from index to/from tcp_splice_connDavid Gibson2022-11-251-18/+25
| | | | | | | | | | | Like we already have for non-spliced connections, create a CONN_IDX() macro for looking up the index of spliced connection structures. Change the name of the array of spliced connections to be different from that for non-spliced connections (even though they're in different modules). This will make subsequent changes a bit safer. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Better helpers for converting between connection pointer and indexDavid Gibson2022-11-251-38/+45
| | | | | | | | | | | | | | The macro CONN_OR_NULL() is used to look up connections by index with bounds checking. Replace it with an inline function, which means: - Better type checking - No danger of multiple evaluation of an @index with side effects Also add a helper to perform the reverse translation: from connection pointer to index. Introduce a macro for this which will make later cleanups easier and safer. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp: Remove unused TCP_MAX_SOCKS constantDavid Gibson2022-11-251-1/+0
| | | | | | | Presumably it meant something in the past, but it's no longer used. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tcp_splice: #include tcp_splice.h in tcp_splice.cDavid Gibson2022-11-252-4/+1
| | | | | | | | | This obvious include was omitted, which means that declarations in the header weren't checked against definitions in the .c file. This shows up an old declaration for a function that is now static, and a duplicate Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* style: Minor corrections to function commentsDavid Gibson2022-11-252-6/+6
| | | | | | | Some style issues and a typo. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* clang-tidy: Suppress warning about assignments in if statementsDavid Gibson2022-11-251-0/+5
| | | | | | | | | | | | clang-tools 15.0.0 appears to have added a new warning that will always complain about assignments in if statements, which we use in a number of places in passt/pasta. Encountered on Fedora 37 with clang-tools-extra-15.0.0-3.fc37.x86_64. Suppress the new warning so that we can compile and test. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Add link to weekly development meetingStefano Brivio2022-11-211-0/+4
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Fix left-over and indentation for Podman example commandStefano Brivio2022-11-161-5/+3
| | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: The upcoming version of Podman adds support for pasta2022_11_16.ace074cStefano Brivio2022-11-161-3/+7
| | | | | | | Don't update the demo yet: we could just wait until Podman 4.3.2 is out and packaged. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* util, pasta: Add do_clone() wrapper around __clone2() and clone()Stefano Brivio2022-11-163-8/+31
| | | | | | | | | | | | | | | Spotted in Debian's buildd logs: on ia64, clone(2) is not available: the glibc wrapper is named __clone2() and it takes, additionally, the size of the stack area passed by the caller. Add a do_clone() wrapper handling the different cases, and also taking care of pointing the child's stack in the middle of the allocated area: on PA-RISC (hppa), handled by clone(), the stack grows up, and on ia64 the stack grows down, but the register backing store grows up -- and I think it might be actually used here. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/lib/test: Clean up iperf3 JSON files before starting the serverStefano Brivio2022-11-161-2/+2
| | | | | | | | ...instead of doing it after the test. Now that we have pre-built guest images, we might also have old JSON files from previous, interrupted test runs. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tap: Revert recently added checks in tap_handler_passt()Stefano Brivio2022-11-161-16/+4
| | | | | | | | | | | | | | | This reverts commit 198f87835dc4 ("tap: Return -EIO from tap_handler_passt() on inconsistent packet stream") and commit 510dace86ccf ("tap: Keep stream consistent if qemu length descriptor spans two recv() calls"). I can hit occasional failures in perf/passt_tcp tests where we seem to be getting excess data at the end of a recv(), and for some reason I couldn't figure out yet, if we just ignore it, subsequent recv() calls from qemu return correct data. If we close the connection, qemu can't talk to us anymore, of course. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* arp, tap, util: Don't use perror() after seccomp filter is installedStefano Brivio2022-11-163-8/+10
| | | | | | | | | | | | | | | If stderr is closed, after we fork to background, glibc's implementation of perror() will try to re-open it by calling dup(), upon which the seccomp filter causes the process to terminate, because dup() is not included in the list of allowed syscalls. Replace perror() calls that might happen after isolation_postfork(). We could probably replace all of them, but early ones need a bit more attention as we have to check whether log.c functions work in early stages. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* Remove contrib/debian, Debian package development now happens on SalsaStefano Brivio2022-11-166-63/+0
| | | | | | | | | The development of the Debian package is now at: https://salsa.debian.org/sbrivio/passt Drop contrib/debian, it's finally obsolete. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* contrib/apparmor: Merge pasta and passt profiles, update rulesStefano Brivio2022-11-162-88/+51
| | | | | | | | | | | | | AppArmor resolves executable links before profile attachment rules are evaluated, so, as long as pasta is installed as a link to passt, there's no way to differentiate the two cases. Merge the two profiles and leave a TODO note behind, explaining two possible ways forward. Update the rules so that passt and pasta are actually usable, once the profile is installed. Most required changes are related to isolation and sandboxing features. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* README: Add links to Debian package trackerStefano Brivio2022-11-161-7/+10
| | | | | Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* Makefile: Change HPPA into PARISC while building PASST_AUDIT_ARCHStefano Brivio2022-11-161-0/+1
| | | | | | | | | | | The AUDIT_ARCH defines in seccomp.h corresponding to HPPA are AUDIT_ARCH_PARISC and AUDIT_ARCH_PARISC64. Build error spotted in Debian's buildd log on phantom.physik.fu-berlin.de. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* Makefile: It's AUDIT_ARCH_MIPSEL64, not AUDIT_ARCH_MIPS64ELStefano Brivio2022-11-161-0/+1
| | | | | | | | | | | | On mips64el, gcc -dumpmachine correctly reports mips64el as architecture prefix, but for some reason seccomp.h defines AUDIT_ARCH_MIPSEL64 and not AUDIT_ARCH_MIPS64EL. Mangle AUDIT_ARCH accordingly. Build error spotted in Debian's buildd logs from Loongson build. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* Makefile: Don't filter out -O2 from supplied flags for AVX2 buildsStefano Brivio2022-11-161-1/+1
| | | | | | | | | | Drop it from the internal FLAGS variable, but honour -O2 if passed in CFLAGS. In Debian packages, dpkg-buildflags uses it as hardening flag, and we get a QA warning if we drop it: https://qa.debian.org/bls/bytag/W-dpkg-buildflags-missing.html Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* Makefile: Honour passed CPPFLAGS, not just CFLAGSStefano Brivio2022-11-161-7/+7
| | | | | | | | | | | CPPFLAGS allow the user to pass pre-processor flags. This is unlikely to be needed at the moment, but the Debian Hardening Walkthrough reasonably requests it to be handled in order to fully support hardened build flags: https://wiki.debian.org/HardeningWalkthrough#Handling_dpkg-buildflags_in_your_upstream_build_system Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* conf, udp: Drop mostly duplicated dns_send arrays, rename related fieldsStefano Brivio2022-11-166-71/+62
| | | | | | | | | | | | | | | | | | | | | Given that we use just the first valid DNS resolver address configured, or read from resolv.conf(5) on the host, to forward DNS queries to, in case --dns-forward is used, we don't need to duplicate dns[] to dns_send[]: - rename dns_send[] back to dns[]: those are the resolvers we advertise to the guest/container - for forwarding purposes, instead of dns[], use a single field (for each protocol version): dns_host - and rename dns_fwd to dns_match, so that it's clear this is the address we are matching DNS queries against, to decide if they need to be forwarded Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* conf: Fix mask calculation from prefix_len in conf_print()2022_11_10.4129764Stefano Brivio2022-11-101-1/+4
| | | | | | | Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: dd09cceaee21 ("Minor improvements to IPv4 netmask handling") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* tcp, udp: Don't initialise IPv6/IPv4 sockets if IPv4/IPv6 are not enabledStefano Brivio2022-11-102-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | If we disable a given IP version automatically (no corresponding default route on host) or administratively (--ipv4-only or --ipv6-only options), we don't initialise related buffers and services (DHCP for IPv4, NDP and DHCPv6 for IPv6). The "tap" handlers will also ignore packets with a disabled IP version. However, in commit 3c6ae625101a ("conf, tcp, udp: Allow address specification for forwarded ports") I happily changed socket initialisation functions to take AF_UNSPEC meaning "any enabled IP version", but I forgot to add checks back for the "enabled" part. Reported by Paul: on a host without default IPv6 route, but IPv6 enabled, connect, using IPv6, to a port handled by pasta, which tries to send data to a tap device without initialised buffers for that IP version and exits because the resulting write() fails. Simpler way to reproduce: pasta -6 and inbound IPv4 connection, or pasta -4 and inbound IPv6 connection. Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: 3c6ae625101a ("conf, tcp, udp: Allow address specification for forwarded ports") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
* passt: Move __setlogmask() calls before output unrelated to configurationStefano Brivio2022-11-101-8/+8
| | | | | | | | | | | ...so that we avoid printing some lines twice because log-level is still set to LOG_EMERG, as if logging configuration didn't happen yet. While at it, note that logging to stderr doesn't really depend on whether debug mode is enabled or not. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tap: Return -EIO from tap_handler_passt() on inconsistent packet streamStefano Brivio2022-11-101-2/+2
| | | | | | | | While it's important to fail in that case, it makes little sense to fail quietly: it's better to tell qemu explicitly that something went wrong and that we won't recover, by closing the socket. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* tap: Keep stream consistent if qemu length descriptor spans two recv() callsStefano Brivio2022-11-101-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | I got all paranoid after triggering a divide-by-zero general protection fault in passt with a qemu version without the virtio_net TX hang fix, while flooding UDP. I start thinking this was actually coming from some random changes I was playing with, but before reaching this conclusion I reviewed once more the relatively short path in tap_handler_passt() before we start using packet_*() functions, and found this. Never observed in practice, but artificially reproduced with changes in qemu's socket interface: if we don't receive from qemu a complete length descriptor in one recv() call, or if we receive a partial one at the end of one call, we currently disregard the rest, which would make the stream inconsistent. Nothing really bad happens, except that from that point on we would disregard all the packets we get until, if ever, we get the stream back in sync by chance. Force reading a complete packet length descriptor with a blocking recv(), if needed -- not just a complete packet later. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* test/memory/passt: Change passt.avx2 path to /bin in test itself2022_11_04.e308018Stefano Brivio2022-11-041-2/+2
| | | | | | | | Now that we install the binary in /bin, and we have a link from /usr/bin, change the path in the test itself as well. Otherwise it works with bash but not with dash for some reason. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
* passt, qrap, README: Update notes and documentation for AF_UNIX support in qemuStefano Brivio2022-11-045-28/+18
| | | | | | | | We can't get rid of qrap quite yet, but at least we should start telling users it's not going to be needed anymore starting from qemu 7.2. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>