<feed xmlns='http://www.w3.org/2005/Atom'>
<title>passt/udp_flow.c, branch bug165c</title>
<subtitle>Plug A Simple Socket Transport</subtitle>
<link rel='alternate' type='text/html' href='https://passt.top/passt/'/>
<entry>
<title>udp: Use epoll instance management for UDP flows</title>
<updated>2025-10-30T14:33:42+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>lvivier@redhat.com</email>
</author>
<published>2025-10-21T21:01:15+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=aaa8f347351841fc16bd2a012b84f2b976c9dfb2'/>
<id>aaa8f347351841fc16bd2a012b84f2b976c9dfb2</id>
<content type='text'>
Store the epoll id in the flow_common structure for UDP flows using
flow_epollid_set() and retrieve the corresponding epoll file descriptor
with flow_epollfd() instead of passing c-&gt;epollfd directly. This makes
UDP consistent with the recent TCP and ICMP changes.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Store the epoll id in the flow_common structure for UDP flows using
flow_epollid_set() and retrieve the corresponding epoll file descriptor
with flow_epollfd() instead of passing c-&gt;epollfd directly. This makes
UDP consistent with the recent TCP and ICMP changes.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>util: Move epoll registration out of sock_l4_sa()</title>
<updated>2025-10-30T14:32:30+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>lvivier@redhat.com</email>
</author>
<published>2025-10-21T21:01:12+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=05972c7c4daf0b2479a415bf7240944b999d9081'/>
<id>05972c7c4daf0b2479a415bf7240944b999d9081</id>
<content type='text'>
Move epoll_add() calls from sock_l4_sa() to the protocol-specific code
(icmp.c, pif.c, udp_flow.c) to give callers more control over epoll
registration. This allows sock_l4_sa() to focus solely on socket
creation and binding, while epoll management happens at a higher level.

Remove the data parameter from sock_l4_sa() and flowside_sock_l4() as
it's no longer needed - callers now construct the full epoll_ref and
register the socket themselves after creation.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move epoll_add() calls from sock_l4_sa() to the protocol-specific code
(icmp.c, pif.c, udp_flow.c) to give callers more control over epoll
registration. This allows sock_l4_sa() to focus solely on socket
creation and binding, while epoll management happens at a higher level.

Remove the data parameter from sock_l4_sa() and flowside_sock_l4() as
it's no longer needed - callers now construct the full epoll_ref and
register the socket themselves after creation.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>epoll_ctl: Extract epoll operations</title>
<updated>2025-10-30T14:32:12+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>lvivier@redhat.com</email>
</author>
<published>2025-10-21T21:01:11+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=965ea66068e653934c0016281df86c17e2a65625'/>
<id>965ea66068e653934c0016281df86c17e2a65625</id>
<content type='text'>
Centralize epoll_add() and epoll_del() helper functions into new
epoll_ctl.c/h files.

This also moves the union epoll_ref definition from passt.h to
epoll_ctl.h where it's more logically placed.

The new epoll_add() helper simplifies adding file descriptors to epoll
by taking an epoll_ref and events, handling error reporting
consistently across all call sites.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
[sbrivio: Include epoll_ctl.h from netlink.c as it's now needed there]
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Centralize epoll_add() and epoll_del() helper functions into new
epoll_ctl.c/h files.

This also moves the union epoll_ref definition from passt.h to
epoll_ctl.h where it's more logically placed.

The new epoll_add() helper simplifies adding file descriptors to epoll
by taking an epoll_ref and events, handling error reporting
consistently across all call sites.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
[sbrivio: Include epoll_ctl.h from netlink.c as it's now needed there]
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>util: Simplify epoll_del() interface to take epollfd directly</title>
<updated>2025-10-30T14:32:03+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>lvivier@redhat.com</email>
</author>
<published>2025-10-21T21:01:10+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=8bfa47a5cf0576dd18e8716e1c1e142954a0b72d'/>
<id>8bfa47a5cf0576dd18e8716e1c1e142954a0b72d</id>
<content type='text'>
Change epoll_del() to accept the epoll file descriptor directly instead
of the full context structure. This simplifies the interface and aligns
with the threading refactoring by reducing dependency on the context
structure for basic epoll operations as we will manage an epollfd per
thread.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change epoll_del() to accept the epoll file descriptor directly instead
of the full context structure. This simplifies the interface and aligns
with the threading refactoring by reducing dependency on the context
structure for basic epoll operations as we will manage an epollfd per
thread.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Correct various function comment headers</title>
<updated>2025-06-04T10:32:04+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>lvivier@redhat.com</email>
</author>
<published>2025-05-19T08:52:56+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=2c883498b58a3dab702b3376a2ca828e61d3283d'/>
<id>2c883498b58a3dab702b3376a2ca828e61d3283d</id>
<content type='text'>
This commit refines function comment headers for improved accuracy
and consistency. Key changes include:

- Corrected parameter/return descriptions (e.g., `logtime`, `__daemon`).
- Added missing and removed incorrect parameter documentation (e.g.,
  `tcp_vu_sock_recv`, `ndp`).
- Standardized comments to the `/** ... */` style for functions
  like `udp_flow_close` and `ns_enter`.
- Ensured function names in comments consistently use `()`.
- Addressed minor typos and updated comments for renamed functions.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This commit refines function comment headers for improved accuracy
and consistency. Key changes include:

- Corrected parameter/return descriptions (e.g., `logtime`, `__daemon`).
- Added missing and removed incorrect parameter documentation (e.g.,
  `tcp_vu_sock_recv`, `ndp`).
- Standardized comments to the `/** ... */` style for functions
  like `udp_flow_close` and `ns_enter`.
- Ensured function names in comments consistently use `()`.
- Addressed minor typos and updated comments for renamed functions.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>flow: close socket fd on error</title>
<updated>2025-05-12T19:04:57+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>lvivier@redhat.com</email>
</author>
<published>2025-05-12T16:47:00+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=8ec134109eb136432a29bdf5a14f8b1fd4e46208'/>
<id>8ec134109eb136432a29bdf5a14f8b1fd4e46208</id>
<content type='text'>
In eea8a76caf85 ("flow: fix podman issue #26073"), we unregister
the fd from epoll_ctl() in case of error, but we also need to close it.

As flowside_sock_l4() also calls sock_l4_sa() via flowside_sock_splice()
we can do it unconditionally.

Fixes: eea8a76caf85 ("flow: fix podman issue #26073")
Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In eea8a76caf85 ("flow: fix podman issue #26073"), we unregister
the fd from epoll_ctl() in case of error, but we also need to close it.

As flowside_sock_l4() also calls sock_l4_sa() via flowside_sock_splice()
we can do it unconditionally.

Fixes: eea8a76caf85 ("flow: fix podman issue #26073")
Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>flow: fix podman issue #26073</title>
<updated>2025-05-07T12:42:48+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>lvivier@redhat.com</email>
</author>
<published>2025-05-07T12:36:34+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=eea8a76caf85f4bae5f92b695d09b9ddea354b57'/>
<id>eea8a76caf85f4bae5f92b695d09b9ddea354b57</id>
<content type='text'>
While running pasta, we trigger the following assert:

  ASSERTION FAILED in udp_at_sidx (udp_flow.c:35): flow-&gt;f.type == FLOW_UDP

in udp_at_sidx() in the following path:

 902 void udp_sock_handler(const struct ctx *c, union epoll_ref ref,
 903                       uint32_t events, const struct timespec *now)
 904 {
 905         struct udp_flow *uflow = udp_at_sidx(ref.flowside);

The invalid sidx is comming from the epoll_ref provided by epoll_wait().

This assert follows the following error:

  Couldn't connect flow socket: Permission denied

It appears that an error happens in udp_flow_sock() and the recently
created fd is not removed from the epoll_ctl() pool:

 71 static int udp_flow_sock(const struct ctx *c,
 72                          struct udp_flow *uflow, unsigned sidei)
 73 {
...
 82         s = flowside_sock_l4(c, EPOLL_TYPE_UDP, pif, side, fref.data);
 83         if (s &lt; 0) {
 84                 flow_dbg_perror(uflow, "Couldn't open flow specific socket");
 85                 return s;
 86         }
 87
 88         if (flowside_connect(c, s, pif, side) &lt; 0) {
 89                 int rc = -errno;
 90                 flow_dbg_perror(uflow, "Couldn't connect flow socket");
 91                 return rc;
 92         }
...

flowside_sock_l4() calls sock_l4_sa() that adds 's' to the epoll_ctl()
pool.

So to cleanly manage the error of flowside_connect() we need to remove
's' from the epoll_ctl() pool using epoll_del().

Link: https://github.com/containers/podman/issues/26073
Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
While running pasta, we trigger the following assert:

  ASSERTION FAILED in udp_at_sidx (udp_flow.c:35): flow-&gt;f.type == FLOW_UDP

in udp_at_sidx() in the following path:

 902 void udp_sock_handler(const struct ctx *c, union epoll_ref ref,
 903                       uint32_t events, const struct timespec *now)
 904 {
 905         struct udp_flow *uflow = udp_at_sidx(ref.flowside);

The invalid sidx is comming from the epoll_ref provided by epoll_wait().

This assert follows the following error:

  Couldn't connect flow socket: Permission denied

It appears that an error happens in udp_flow_sock() and the recently
created fd is not removed from the epoll_ctl() pool:

 71 static int udp_flow_sock(const struct ctx *c,
 72                          struct udp_flow *uflow, unsigned sidei)
 73 {
...
 82         s = flowside_sock_l4(c, EPOLL_TYPE_UDP, pif, side, fref.data);
 83         if (s &lt; 0) {
 84                 flow_dbg_perror(uflow, "Couldn't open flow specific socket");
 85                 return s;
 86         }
 87
 88         if (flowside_connect(c, s, pif, side) &lt; 0) {
 89                 int rc = -errno;
 90                 flow_dbg_perror(uflow, "Couldn't connect flow socket");
 91                 return rc;
 92         }
...

flowside_sock_l4() calls sock_l4_sa() that adds 's' to the epoll_ctl()
pool.

So to cleanly manage the error of flowside_connect() we need to remove
's' from the epoll_ctl() pool using epoll_del().

Link: https://github.com/containers/podman/issues/26073
Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Improve robustness against sockaddrs of unexpected family</title>
<updated>2025-04-22T10:42:00+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2025-04-17T01:55:41+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=4668e9137806b551f6ee44609064cc40243c2b6b'/>
<id>4668e9137806b551f6ee44609064cc40243c2b6b</id>
<content type='text'>
inany_from_sockaddr() expects a socket address of family AF_INET or
AF_INET6 and ASSERT()s if it gets anything else.  In many of the callers we
can handle an unexpected family more gracefully, though, e.g. by failing
a single flow rather than killing passt.

Change inany_from_sockaddr() to return an error instead of ASSERT()ing,
and handle those errors in the callers.  Improve the reporting of any such
errors while we're at it.

With this greater robustness, allow inany_from_sockaddr() to take a void *
rather than specifically a union sockaddr_inany *.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
inany_from_sockaddr() expects a socket address of family AF_INET or
AF_INET6 and ASSERT()s if it gets anything else.  In many of the callers we
can handle an unexpected family more gracefully, though, e.g. by failing
a single flow rather than killing passt.

Change inany_from_sockaddr() to return an error instead of ASSERT()ing,
and handle those errors in the callers.  Improve the reporting of any such
errors while we're at it.

With this greater robustness, allow inany_from_sockaddr() to take a void *
rather than specifically a union sockaddr_inany *.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udp, udp_flow: Track our specific address on socket interfaces</title>
<updated>2025-04-10T17:46:16+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2025-04-10T07:16:40+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=59cc89f4cc018988428637d97745cc4c919126cb'/>
<id>59cc89f4cc018988428637d97745cc4c919126cb</id>
<content type='text'>
So far for UDP flows (like TCP connections) we didn't record our address
(oaddr) in the flow table entry for socket based pifs.  That's because we
didn't have that information when a flow was initiated by a datagram coming
to a "listening" socket with 0.0.0.0 or :: address.  Even when we did have
the information, we didn't record it, to simplify address matching on
lookups.

This meant that in some circumstances we could send replies on a UDP flow
from a different address than the originating request came to, which is
surprising and breaks certain setups.

We now have code in udp_peek_addr() which does determine our address for
incoming UDP datagrams.  We can use that information to properly populate
oaddr in the flow table for flow initiated from a socket.

In order to be able to consistently match datagrams to flows, we must
*always* have a specific oaddr, not an unspecified address (that's how the
flow hash table works).  So, we also need to fill in oaddr correctly for
flows we initiate *to* sockets.  Our forwarding logic doesn't specify
oaddr here, letting the kernel decide based on the routing table.  In this
case we need to call getsockname() after connect()ing the socket to find
which local address the kernel picked.

This adds getsockname() to our seccomp profile for all variants.

Link: https://bugs.passt.top/show_bug.cgi?id=99
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
So far for UDP flows (like TCP connections) we didn't record our address
(oaddr) in the flow table entry for socket based pifs.  That's because we
didn't have that information when a flow was initiated by a datagram coming
to a "listening" socket with 0.0.0.0 or :: address.  Even when we did have
the information, we didn't record it, to simplify address matching on
lookups.

This meant that in some circumstances we could send replies on a UDP flow
from a different address than the originating request came to, which is
surprising and breaks certain setups.

We now have code in udp_peek_addr() which does determine our address for
incoming UDP datagrams.  We can use that information to properly populate
oaddr in the flow table for flow initiated from a socket.

In order to be able to consistently match datagrams to flows, we must
*always* have a specific oaddr, not an unspecified address (that's how the
flow hash table works).  So, we also need to fill in oaddr correctly for
flows we initiate *to* sockets.  Our forwarding logic doesn't specify
oaddr here, letting the kernel decide based on the routing table.  In this
case we need to call getsockname() after connect()ing the socket to find
which local address the kernel picked.

This adds getsockname() to our seccomp profile for all variants.

Link: https://bugs.passt.top/show_bug.cgi?id=99
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udp_flow: Don't discard packets that arrive between bind() and connect()</title>
<updated>2025-04-07T19:44:31+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2025-04-04T10:15:42+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=9725e79888374a4e4060a2d798f3407c0006cc8a'/>
<id>9725e79888374a4e4060a2d798f3407c0006cc8a</id>
<content type='text'>
When we establish a new UDP flow we create connect()ed sockets that will
only handle datagrams for this flow.  However, there is a race between
bind() and connect() where they might get some packets queued for a
different flow.  Currently we handle this by simply discarding any
queued datagrams after the connect.  UDP protocols should be able to handle
such packet loss, but it's not ideal.

We now have the tools we need to handle this better, by redirecting any
datagrams received during that race to the appropriate flow.  We need to
use a deferred handler for this to avoid unexpectedly re-ordering datagrams
in some edge cases.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
[sbrivio: Update comment to udp_flow_defer()]
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we establish a new UDP flow we create connect()ed sockets that will
only handle datagrams for this flow.  However, there is a race between
bind() and connect() where they might get some packets queued for a
different flow.  Currently we handle this by simply discarding any
queued datagrams after the connect.  UDP protocols should be able to handle
such packet loss, but it's not ideal.

We now have the tools we need to handle this better, by redirecting any
datagrams received during that race to the appropriate flow.  We need to
use a deferred handler for this to avoid unexpectedly re-ordering datagrams
in some edge cases.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
[sbrivio: Update comment to udp_flow_defer()]
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
