aboutgitcodebugslistschat
diff options
context:
space:
mode:
authorStefano Brivio <sbrivio@redhat.com>2024-05-08 09:25:24 +0200
committerStefano Brivio <sbrivio@redhat.com>2024-05-10 16:52:29 +0200
commit7e6a606c32341c81b0889a6791ec12e418a4eeec (patch)
treec07c7f14976fd78f9bd56d02581509ecc565e275
parent1ba76c9e8c14b21d8c2c7cb71abdaa85feb96605 (diff)
downloadpasst-7e6a606c32341c81b0889a6791ec12e418a4eeec.tar
passt-7e6a606c32341c81b0889a6791ec12e418a4eeec.tar.gz
passt-7e6a606c32341c81b0889a6791ec12e418a4eeec.tar.bz2
passt-7e6a606c32341c81b0889a6791ec12e418a4eeec.tar.lz
passt-7e6a606c32341c81b0889a6791ec12e418a4eeec.tar.xz
passt-7e6a606c32341c81b0889a6791ec12e418a4eeec.tar.zst
passt-7e6a606c32341c81b0889a6791ec12e418a4eeec.zip
tcp_splice: Set OUT_WAIT_ flag whenever pipe isn't emptied
In tcp_splice_sock_handler(), if we get EAGAIN on the second splice(), from pipe to receiving socket, that doesn't necessarily mean that the pipe is empty: the receiver buffer might be full instead. Hence, we can't use the 'never_read' flag to decide that there's nothing to wait for: even if we didn't read anything from the sending side in a given iteration, we might still have data to send in the pipe. Use read/written counters, instead. This fixes an issue where large bulk transfers would occasionally hang. From a corresponding strace: 0.000061 epoll_wait(4, [{events=EPOLLOUT, data={u32=29442, u64=12884931330}}], 8, 1000) = 1 0.005003 epoll_ctl(4, EPOLL_CTL_MOD, 211, {events=EPOLLIN|EPOLLRDHUP, data={u32=54018, u64=8589988610}}) = 0 0.000089 epoll_ctl(4, EPOLL_CTL_MOD, 115, {events=EPOLLIN|EPOLLRDHUP, data={u32=29442, u64=12884931330}}) = 0 0.000081 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) 0.000073 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 1048576 0.000087 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) 0.000045 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 520415 0.000060 splice(211, NULL, 151, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) 0.000044 splice(150, NULL, 115, NULL, 1048576, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) 0.000044 epoll_wait(4, [], 8, 1000) = 0 we're reading from socket 211 into to the pipe end numbered 151, which connects to pipe end 150, and from there we're writing into socket 115. We initially drop EPOLLOUT from the set of monitored flags for socket 115, because it already signaled it's ready for output. Then we read nothing from socket 211 (the sender had nothing to send), and we keep emptying the pipe into socket 115 (first 1048576 bytes, then 520415 bytes). This call of tcp_splice_sock_handler() ends with EAGAIN on the writing side, and we just exit this function without setting the OUT_WAIT_1 flag (and, in turn, EPOLLOUT for socket 115). However, it turns out, the pipe wasn't actually emptied, and while socket 211 had nothing more to send, we should have waited on socket 115 to be ready for output again. As a further step, we could consider not clearing EPOLLOUT at all, unless the read/written counters match, but I'm first trying to fix this ugly issue with a minimal patch. Link: https://github.com/containers/podman/issues/22575 Link: https://github.com/containers/podman/issues/22593 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
-rw-r--r--tcp_splice.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/tcp_splice.c b/tcp_splice.c
index 42b7be0..4c36b72 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -616,7 +616,7 @@ eintr:
if (errno != EAGAIN)
goto close;
- if (never_read)
+ if (conn->read[fromside] == conn->written[fromside])
break;
conn_event(c, conn,