passt - Plug A Simple Socket Transport

	Commit message (Collapse)	Author	Age	Files	Lines
*	pasta: Improve error handling on failure to join network namespace	Stefano Brivio	2023-05-23	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In pasta_wait_for_ns(), open() failing with ENOENT is expected: we're busy-looping until the network namespace appears. But any other failure is not something we're going to recover from: return right away if we don't get either success or ENOENT. Now that pasta_wait_for_ns() can actually fail, handle that in pasta_start_ns() by reporting the issue and exiting. Looping on EPERM, when pasta doesn't actually have the permissions to join a given namespace, isn't exactly a productive thing to do. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	netlink: Fix comment about response buffer size for nl_req()	Stefano Brivio	2023-05-23	1	-1/+1
\| \| \| \| \| \|	Fixes: fde8004ab0b4 ("netlink: Use 8 KiB * netlink message header size as response buffer") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	isolation: Initially Keep CAP_SETFCAP if running as UID 0 in non-init	Stefano Brivio	2023-05-23	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \|	If pasta spawns a child process while running as UID 0, which is only allowed from a non-init namespace, we need to keep CAP_SETFCAP before pasta_start_ns() is called: otherwise, starting from Linux 5.12, we won't be able to update /proc/self/uid_map with the intended mapping (from 0 to 0). See user_namespaces(7). Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	pasta: Detach mount namespace, (re)mount procfs before spawning command	Stefano Brivio	2023-05-23	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we want /proc contents to be consistent after pasta spawns a child process in a new PID namespace (only for operation without a pre-existing namespace), we need to mount /proc after the clone(2) call with CLONE_NEWPID, and we enable the child to do that by passing, in the same call, the CLONE_NEWNS flag, as described by pid_namespaces(7). This is not really a remount: in fact, passing MS_REMOUNT to mount(2) would make the call fail. We're in another mount namespace now, so it's a fresh mount that has the effect of hiding the existing one. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	util, conf: Add and use ns_is_init() helper	Stefano Brivio	2023-05-23	3	-15/+28
\| \| \| \| \| \| \| \| \| \|	We'll need this in isolate_initial(). While at it, don't rely on BUFSIZ: the earlier issue we had with musl reminded me it's not a magic "everything will fit" value. Size the read buffer to what we actually need from uid_map, and check for the final newline too, because uid_map is organised in lines. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tap: Don't update ip6.addr_seen to ::	David Gibson	2023-05-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we receive packets from the tap side, we update the addr_seen fields to reflect the last known address of the guest or ns. For ip4.addr_seen we, sensibly, only update if the address we've just seen isn't 0 (0.0.0.0). This case can occur during early DHCP transactions. We have no equivalent case for IPv6. We're less likely to hit this, because DHCPv6 uses link-local addresses, however we can see an source address of :: with certain multicast operations. This can bite us if we try to make an incoming connection very early after starting pasta with --config-net: we may have only seen some of those multicast packets, updated addr_seen to :: and not had any "real" packets to update it to a global address. I've seen this with some of the avocado test conversions. In any case, it can never make sense to update addr_seen to ::, so explicitly exclude that case. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	correct -6 option in manpage2023_05_09.96f8d55	lemmi	2023-05-09	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: lemmi <lemmi@nerd2nerd.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Fix error check for signal(), improve error messages	Stefano Brivio	2023-04-13	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Valtteri reports that if SIGPIPE already has a disposition set by the parent process, such as systemd with the default setting of IgnoreSIGPIPE=yes, signal() will return the previous value, not zero, and this is not an error: check for SIG_ERR instead. While at it, split messages for failures of sigaction() and signal(), and report the actual error. Reported-by: Valtteri Vuorikoski <vuori@notcom.org> Fixes: 8534be076c73 ("Catch failures when installing signal handlers") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	nstool: Enter holder's cwd when changing mount ns with nstool exec	David Gibson	2023-04-08	1	-1/+11
\| \| \| \| \| \| \| \| \| \|	If we enter a mount namespace with nstool exec our working directory will be changed to / in the new mount ns. This is surprising if we haven't actually altered any mounts yet in the new ns. Instead, change the working directory to match that of the holder process in this situation. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	nstool: Advertise the holder's cwd (in its mountns) across the socket	David Gibson	2023-04-08	1	-0/+4
\| \| \| \| \| \| \| \|	This is possible useful in nstool info and has further uses for nstool exec. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test: Use "nstool exec" to slightly simplify tests	David Gibson	2023-04-08	3	-26/+23
\| \| \| \| \| \| \| \|	Using this, rather than using "nstool info" to get the pid then manually connecting with nsenter makes things a little simpler. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test: Initialise ${TRACE} properly	David Gibson	2023-04-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	Unlike ${DEBUG} we don't initialize ${TRACE} to 0 if not set, which cases failures when testing it later. That failure acts as though it is false, however it emits spurious errors in script.log, which can make it harder to spot real errors. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	nstool: Add --keep-caps option to nstool exec	David Gibson	2023-04-08	1	-9/+78
\| \| \| \| \| \| \| \|	This allows you to run commands within a user namespace with the privilege that comes from owning that userns. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	nstool: Add nstool exec command to execute commands in an nstool namespace	David Gibson	2023-04-08	1	-2/+137
\| \| \| \| \| \| \| \| \|	This combines nstool info -pw <sock> with nsenter with various options for a more convenient and less verbose of entering existing nstool managed namespaces. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	nstool: Helpers to iterate through namespace types	David Gibson	2023-04-08	1	-10/+12
\| \| \| \| \| \| \|	Will make things a bit less verbose in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	nstool: Add magic number to advertized information	David Gibson	2023-04-08	1	-0/+9
\| \| \| \| \| \| \| \|	So that we'll probably give a better error if you point it at something that's not an nstool hold control socket. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	nstool: Detect what namespaces target is in	David Gibson	2023-04-08	1	-14/+143
\| \| \| \| \| \| \| \| \| \|	Give nstool the ability to detect what namespaces the target process is in, relative to where it's called. That is, those namespace types for which the target is not in the same namespace as the caller. For now, just print this information with "info", which can be useful for debugging. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	nstool: Replace "pid" subcommand with "info" subcommand	David Gibson	2023-04-08	2	-17/+55
\| \| \| \| \| \| \| \| \|	The new subcommand gives more information about the holder process and its namespace, and may be further extended in future. Add some options which give the old behaviour for existing scripts. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	nstool: Split some command line parsing and socket setup to subcommands	David Gibson	2023-04-08	1	-34/+68
\| \| \| \| \| \| \| \|	This will make it easier to differentiate the options to those commands further in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	nstool: Move description of its operation modes from comment to usage	David Gibson	2023-04-08	1	-15/+11
\| \| \| \| \| \| \|	Easier to see it there. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	nstool: Reverse parameters to nstool	David Gibson	2023-04-08	2	-28/+28
\| \| \| \| \| \| \| \|	Having the "subcommand" first is more conventional and will make it more natural for future extensions I have planned. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	nstool: Rename nsholder to nstool	David Gibson	2023-04-08	4	-29/+29
\| \| \| \| \| \| \|	In preparation for extending what it does. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	test: Remove race between commands run in the same context	David Gibson	2023-04-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	context_run() has a race condition if two commands are run in close proximity (generally involving at least one in the background). Because we always use the same name for the temporary fifo files, if another command is issued while the fifos for the first still exist, mkfifo will fail, typically causing the entire test script to jam. Create unique names for the temporary fifos to avoid this problem. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt: Relicense to GPL 2.0, or any later version	Stefano Brivio	2023-04-06	120	-780/+236
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In practical terms, passt doesn't benefit from the additional protection offered by the AGPL over the GPL, because it's not suitable to be executed over a computer network. Further, restricting the distribution under the version 3 of the GPL wouldn't provide any practical advantage either, as long as the passt codebase is concerned, and might cause unnecessary compatibility dilemmas. Change licensing terms to the GNU General Public License Version 2, or any later version, with written permission from all current and past contributors, namely: myself, David Gibson, Laine Stump, Andrea Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fedora: Adjust path for SELinux policy and interface file to latest guidelines2023_03_29.b10b983	Stefano Brivio	2023-03-29	1	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Forget about: https://fedoraproject.org/wiki/SELinux_Policy_Modules_Packaging_Draft and: https://fedoraproject.org/wiki/PackagingDrafts/SELinux_Independent_Policy The guidelines to follow are: https://fedoraproject.org/wiki/SELinux/IndependentPolicy Start from fixing the most pressing issue, that is, a path conflict with policy-selinux-devel about passt.if, and, while at it, adjust the installation paths for policy files too. Reported-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182476 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fedora: Don't install useless SELinux interface file for pasta	Stefano Brivio	2023-03-29	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	That was meant to be an example, and I just dropped it in the previous commit -- passt.if should be more than enough as a possible example. Reported-by: Carl G. <carlg@fedoraproject.org> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182145 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	selinux: Drop useless interface file for pasta	Stefano Brivio	2023-03-29	1	-25/+0
\| \| \| \| \| \| \| \| \|	This was meant to be an example, but I managed to add syntax errors to it. Drop it altogether. Reported-by: Carl G. <carlg@fedoraproject.org> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182145 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	conf: Allow binding to ports on an interface without a specific address	Stefano Brivio	2023-03-29	2	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Somebody might want to bind listening sockets to a specific interface, but not a specific address, and there isn't really a reason to prevent that. For example: -t %eth0/2022 Alternatively, we support options such as -t 0.0.0.0%eth0/2022 and -t ::%eth0/2022, but not together, for the same port. Enable this kind of syntax and add examples to the man page. Reported-by: Paul Holzinger <pholzing@redhat.com> Link: https://github.com/containers/podman/issues/14425#issuecomment-1485192195 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp: Clear ACK_FROM_TAP_DUE also on unchanged ACK sequence from peer	Stefano Brivio	2023-03-29	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit cc6d8286d104 ("tcp: Reset ACK_FROM_TAP_DUE flag only as needed, update timer"), we don't clear ACK_FROM_TAP_DUE whenever we process an ACK segment, but, more correctly, only if we're really not waiting for a further ACK segment, that is, only if the acknowledged sequence matches what we sent. In the new function implementing this, tcp_update_seqack_from_tap(), we also reset the retransmission counter and store the updated ACK sequence. Both should be done iff forward progress is acknowledged, implied by the fact that the new ACK sequence is greater than the one we previously stored. At that point, it looked natural to also include the statements that clear and set the ACK_FROM_TAP_DUE flag inside the same conditional block: if we're not making forward progress, the need for an ACK, or lack thereof, should remain unchanged. There might be cases where this isn't true, though: without the previous commit 4e73e9bd655c ("tcp: Don't special case the handling of the ack of a syn"), this would happen if a tap-side client initiated a connection, and the server didn't send any data. At that point we would never, in the established state of the connection, call tcp_update_seqack_from_tap() with reported forward progress. That issue itself is fixed by the previous commit, now, but clearing ACK_FROM_TAP_DUE only on ACK sequence progress doesn't really follow any logic. Clear the ACK_FROM_TAP_DUE flag regardless of reported forward progress. If we clear it when it's already unset, conn_flag() will do nothing with it. This doesn't fix any known functional issue, rather a conceptual one. Fixes: cc6d8286d104 ("tcp: Reset ACK_FROM_TAP_DUE flag only as needed, update timer") Reported-by: David Gibson <david@gibson.dropbear.id.au> Analysed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp: Don't special case the handling of the ack of a syn	David Gibson	2023-03-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TCP treats the SYN packets as though they occupied 1 byte in the logical data stream described by the sequence numbers. That is, the very first ACK (or SYN-ACK) each side sends should acknowledge a sequence number one greater than the initial sequence number given in the SYN or SYN-ACK it's responding to. In passt we were tracking that by advancing conn->seq_to_tap by one when we send a SYN or SYN-ACK (in tcp_send_flag()). However, we also initialized conn->seq_ack_from_tap, representing the acks we've already seen from the tap side, to ISN+1, meaning we treated it has having acknowledged the SYN before it actually did. There were apparently reasons for this in earlier versions, but it causes problems now. Because of this when we actually did receive the initial ACK or SYN-ACK, we wouldn't see the acknoweldged serial number as advancing, and so wouldn't clear the ACK_FROM_TAP_DUE flag. In most cases we'd get away because subsequent packets would clear the flag. However if one (or both) sides didn't send any data, the other side would (correctly) keep sending ISN+1 as the acknowledged sequence number, meaning we would never clear the ACK_FROM_TAP_DUE flag. That would mean we'd treat the connection as if we needed to retransmit (although we had 0 bytes to retransmit), and eventaully (after around 30s) reset the connection due to too many retransmits. Specifically this could cause the iperf3 throughput tests in the testsuite to fail if set for a long enough test period. Correct this by initializing conn->seq_ack_from_tap to the ISN and only advancing it when we actually get the first ACK (or SYN-ACK). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp: Clarify allowed state for tcp_data_from_tap()	David Gibson	2023-03-29	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Comments suggest that this should only be called for an ESTABLISHED connection. However, it's non-trivial to ascertain that from the actual control flow in the caller. Add an ASSERT() to make it very clear that this is only called in ESTABLISHED state. In fact, there were some circumstances where it could be called on a CLOSED connection. In a sense that is "established", but with that assert this does require specific (trivial) handling to avoid a spurious abort(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp: Don't reset ACK_TO_TAP_DUE on any ACK, reschedule timer as needed2023_03_21.1ee2f7c	Stefano Brivio	2023-03-21	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is mostly symmetric with commit cc6d8286d104 ("tcp: Reset ACK_FROM_TAP_DUE flag only as needed, update timer"): we shouldn't reset the ACK_TO_TAP_DUE flag on any inbound ACK segment, but only once we acknowledge everything we received from the guest or the container. If we don't, a client might unnecessarily hold off further data, especially during slow start, and in general we won't converge to the usable bandwidth. This is very visible especially with traffic tests on links with non-negligible latency, such as in the reported issue. There, a public iperf3 server sometimes aborts the test due do what appears to be a low iperf3's --rcv-timeout (probably less than a second). Even if this doesn't happen, the throughput will converge to a fraction of the usable bandwidth. Clear ACK_TO_TAP_DUE if we acknowledged everything, set it if we didn't, and reschedule the timer in case the flag is still set as the timer expires. While at it, decrease the ACK timer interval to 10ms. A 50ms interval is short enough for any bandwidth-delay product I had in mind (local connections, or non-local connections with limited bandwidth), but here I am, testing 1gbps transfers to a peer with 100ms RTT. Indeed, we could eventually make the timer interval dependent on the current window and estimated bandwidth-delay product, but at least for the moment being, 10ms should be long enough to avoid any measurable syscall overhead, yet usable for any real-world application. Reported-by: Lukas Mrtvy <lukas.mrtvy@gmail.com> Link: https://bugs.passt.top/show_bug.cgi?id=44 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp: When a connection flag it set, don't negate it for debug print	Stefano Brivio	2023-03-21	1	-1/+1
\| \| \| \| \| \| \| \| \|	Fix a copy and paste typo I added in commit 5474bc5485d8 ("tcp, tcp_splice: Get rid of false positive CWE-394 Coverity warning from fls()") and --debug altogether. Fixes: 5474bc5485d8 ("tcp, tcp_splice: Get rid of false positive CWE-394 Coverity warning from fls()") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Fix false positive if cppcheck doesn't give a false positive	David Gibson	2023-03-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	da46fdac "tcp: Suppress knownConditionTrueFalse cppcheck false positive" introduced a suppression to work around a cppcheck bug causing a false positive warning. However, the suppression will itself cause a spurious unmatchedSuppression warning if used with a version of cppcheck from before the bug was introduced. That includes the packaged version of cppcheck in Fedora. Suppress the unmatchedSuppression as well. Fixes: da46fdac3605 ("tcp: Suppress knownConditionTrueFalse cppcheck false positive") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Work around weird false positives with cppcheck-2.9.1	David Gibson	2023-03-21	6	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 89e38f55 "treewide: Fix header includes to build with musl" added extra #includes to work with musl. Unfortunately with the cppcheck version I'm using (cppcheck-2.9-1.fc37.x86_64 in Fedora 37) this causes weird false positives: specifically cppcheck seems to hit a #error in <bits/unistd.h> complaining about including it directly instead of via <unistd.h> (which is not something we're doing). I have no idea why that would be happening; but I'm guessing it has to be a bug in the cpp implementation in that cppcheck version. In any case, it's possible to work around this by moving the include of <unistd.h> before the include of <signal.h>. So, do that. Fixes: 89e38f55405d ("treewide: Fix header includes to build with musl") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Actually bind detected namespace ports in init namespace	Stefano Brivio	2023-03-21	1	-1/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I reworked udp_init() to move most of the port binding logic to conf_ports, I accidentally dropped this bit of automatic port detection (and binding) at start-up. On -U auto, in pasta mode, udp_sock_init_ns() binds ports in the namespace that correspond to ports bound in the init namespace, but on -u auto, nothing actually happens after port detection. Add udp_sock_init_init() to deal with this, and while at it fix the comment to udp_sock_init_ns(): the latter takes care of outbound "connections". This is currently not covered by tests, and the UDP port needs to be already bound in the namespace when pasta starts (periodic detection for UDP is a missing feature at the moment). It can be checked like this: $ unshare -rUn # echo $$ 590092 # socat -u UDP-LISTEN:5555 STDOUT $ pasta -q -u auto 590092 $ echo "test" \| socat -u STDIN UDP:localhost:5555 Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: 3c6ae625101a ("conf, tcp, udp: Allow address specification for forwarded ports") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	pasta: fix tcp port forwarding in auto mode	Paul Holzinger	2023-03-21	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The logic in tcp_timer() was inverted. fwd_out should expose the host ports in the ns. Therfore it must read the ports on the host and then bind them in the netns. The same for fwd_in which checks ports in the ns and then exposes them on the host. Note that this only fixes tcp ports, udp does not seems to work at all right now with the auto mode. Signed-off-by: Paul Holzinger <pholzing@redhat.com> Fixes: 1128fa03fe73 ("Improve types and names for port forwarding configuration") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	fedora: Refresh SELinux labels in scriptlets, require -selinux package2023_03_17.dd23496	Stefano Brivio	2023-03-17	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of: https://fedoraproject.org/wiki/SELinux_Policy_Modules_Packaging_Draft follow this: https://fedoraproject.org/wiki/PackagingDrafts/SELinux_Independent_Policy which seems to make more sense and fixes the issue that, on a fresh install, without a reboot, the file contexts for the binaries are not actually updated. In detail: - labels are refreshed using the selinux_relabel_pre and selinux_relabel_post on install, upgrade, and uninstall - use the selinux_modules_install and selinux_modules_uninstall macros, instead of calling 'semodule' directly (no functional changes in our case) - require the -selinux package on SELinux-enabled environments and if the current system policy is "targeted" Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	Makefile: Enable external override for TARGET	Stefano Brivio	2023-03-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A cross-architecture build might pass a target-specific CC on 'make', and not on 'make install', and this is what happens in Debian cross-qa tests. Given that we select binaries to be installed depending on the target architecture, this means we would build AVX2 binaries in any case on a x86_64 build machine. By overriding TARGET in package build rules, we can tell the Makefile about the target architecture, also for the 'install' (Makefile) target. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	passt.1: Fix description of --mtu option	Stefano Brivio	2023-03-17	1	-2/+4
\| \| \| \| \| \| \| \|	By default, 65520 bytes are advertised, and zero disables DHCP and NDP options. Fixes: ec2b58ea4dc4 ("conf, dhcp, ndp: Fix message about default MTU, make NDP consistent") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	log: Avoid time_t/__syscall_slong_t format mismatch with long int on X32 ABI	Stefano Brivio	2023-03-17	1	-10/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On X32 (ILP32 using AMD64 system call ABI) and glibc, struct timespec::tv_nsec is __syscall_slong_t and not a long int, see also https://sourceware.org/bugzilla/show_bug.cgi?id=16437 and timespec(3type). Fine, we could cast that down to long and be done with it. But it turns out that also time_t (not guaranteed to be equivalent to any type) is a long long int, and there we can't downcast. To keep it simple, cast both to long long int, and change formats to %lli, to avoid format warnings from gcc. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	fedora: Install SELinux interface files to shared include directory2023_03_10.70c0765	Stefano Brivio	2023-03-10	1	-0/+4
\| \| \| \| \|	Link: https://github.com/fedora-selinux/selinux-policy/pull/1613 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	contrib/selinux: Split interfaces into smaller bits	Stefano Brivio	2023-03-10	1	-10/+61
\| \| \| \| \| \| \|	...to fit accepted Fedora practices. Link: https://github.com/fedora-selinux/selinux-policy/pull/1613 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	contrib/selinux: Drop unused passt_read_data() interface	Stefano Brivio	2023-03-10	1	-8/+0
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	contrib/selinux: Drop "example" from headers: this is the actual policy	Stefano Brivio	2023-03-10	6	-6/+6
\| \| \| \|	Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	README: Update Features section, plus minor improvements2023_03_09.7c7625d	Stefano Brivio	2023-03-09	1	-47/+28
\| \| \| \| \| \| \|	...it's been a while. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	contrib: Drop libvirt out-of-tree patch, integration mostly works in 9.1.0	Stefano Brivio	2023-03-09	1	-427/+0
\| \| \| \| \| \| \| \|	...and in any case, this patch doesn't offer any advantage over the current upstream integration. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	contrib: Drop QEMU out-of-tree patches	Stefano Brivio	2023-03-09	2	-208/+0
\| \| \| \| \| \| \|	Native support was introduced with commit 13c6be96618c, QEMU 7.2. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	contrib: Drop Podman out-of-tree patch, integration is upstream now	Stefano Brivio	2023-03-09	1	-605/+0
\| \| \| \| \| \| \| \|	See https://github.com/containers/podman/pull/16141, shipped in Podman 4.4. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
*	tcp: Clamp MSS value when queueing data to tap, also for pasta	Stefano Brivio	2023-03-09	1	-14/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tom reports that a pattern of repated ~1 MiB chunks downloads over NNTP over TLS, on Podman 4.4 using pasta as network back-end, results in pasta taking one full CPU thread after a while, and the download never succeeds. On that setup, we end up re-sending the same frame over and over, with a consistent 65 534 bytes size, and never get an acknowledgement from the tap-side client. This only happens for the default MTU value (65 520 bytes) or for values that are slightly smaller than that (down to 64 499 bytes). We hit this condition because the MSS value we use in tcp_data_from_sock(), only in pasta mode, is simply clamped to USHRT_MAX, and not to the actual size of the buffers we pre-cooked for sending, which is a bit less than that. It looks like we got away with it until commit 0fb7b2b9080a ("tap: Use different io vector bases depending on tap type") fixed the setting of iov_len. Luckily, since it's pasta, we're queueing up to two frames at a time, so the worst that can happen is a badly segmented TCP stream: we always have some space at the tail of the buffer. Clamp the MSS value to the appropriate maximum given by struct tcp{4,6}_buf_data_t, no matter if we're running in pasta or passt mode. While at it, fix the comments to those structs to reflect the current struct size. This is not really relevant for any further calculation or consideration, but it's convenient to know while debugging this kind of issues. Thanks to Tom for reporting the issue in a very detailed way and for providing a test setup. Reported-by: Tom Mombourquette <tom@devnode.com> Link: https://github.com/containers/podman/issues/17703 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>