passt/flow.h, branch bug165c

tcp, flow: Replace per-connection in_epoll flag with an epollid in flow_common

2025-10-30T14:32:50+00:00

The in_epoll boolean flag in tcp_tap_conn and tcp_splice_conn only tracked
whether a connection was registered with epoll, not which epoll instance.
This limited flexibility for future multi-epoll support.

Replace the boolean with an epollid field in flow_common that identifies
which epoll instance the flow is registered with.
Use FLOW_EPOLLID_INVALID to indicate when a flow is not registered with
any epoll instance. An epoll_id_to_fd[] mapping table translates
epoll ids to their corresponding epoll file descriptors.

Add helper functions:
- flow_in_epoll() to check if a flow is registered with epoll
- flow_epollfd() to retrieve the epoll fd for a flow's thread
- flow_epollid_register() to register an epoll fd with an epollid
- flow_epollid_set() to set the epollid of a flow
- flow_epollid_clear() to reset the epoll id of a flow

This change also simplifies tcp_timer_ctl() and conn_flag_do() by removing
the need to pass the context 'c', since the epoll fd is now directly
accessible from the flow structure via flow_epollfd().

Add a defensive check at the beginning of tcp_flow_repair_queue() to
avoid a false positive with "make clang-tidy":
  error: The 1st argument to 'send' is < 0 but should be >= 0
   3230 |                 ssize_t rc = send(conn->sock, p, MIN(len, chunk), 0);

Signed-off-by: Laurent Vivier 
Reviewed-by: David Gibson 
Signed-off-by: Stefano Brivio

util: Move epoll registration out of sock_l4_sa()

2025-10-30T14:32:30+00:00

Move epoll_add() calls from sock_l4_sa() to the protocol-specific code
(icmp.c, pif.c, udp_flow.c) to give callers more control over epoll
registration. This allows sock_l4_sa() to focus solely on socket
creation and binding, while epoll management happens at a higher level.

Remove the data parameter from sock_l4_sa() and flowside_sock_l4() as
it's no longer needed - callers now construct the full epoll_ref and
register the socket themselves after creation.

Signed-off-by: Laurent Vivier 
Reviewed-by: David Gibson 
Signed-off-by: Stefano Brivio

flow: add MAC address of LAN local remote hosts to flow

2025-10-30T11:01:01+00:00

When communicating with remote hosts on the local network, some guest
applications want to see the real MAC address of that host instead
of PASST/PASTA's own tap address. The flow_common structure is a
convenient location for storing that address, so we do that in this
commit.

Note that we don´t add actual usage of this address here, that will
be done in later commits.

Signed-off-by: Jon Maloy 
Reviewed-by: David Gibson 
Signed-off-by: Stefano Brivio

udp, udp_flow: Track our specific address on socket interfaces

2025-04-10T17:46:16+00:00

So far for UDP flows (like TCP connections) we didn't record our address
(oaddr) in the flow table entry for socket based pifs.  That's because we
didn't have that information when a flow was initiated by a datagram coming
to a "listening" socket with 0.0.0.0 or :: address.  Even when we did have
the information, we didn't record it, to simplify address matching on
lookups.

This meant that in some circumstances we could send replies on a UDP flow
from a different address than the originating request came to, which is
surprising and breaks certain setups.

We now have code in udp_peek_addr() which does determine our address for
incoming UDP datagrams.  We can use that information to properly populate
oaddr in the flow table for flow initiated from a socket.

In order to be able to consistently match datagrams to flows, we must
*always* have a specific oaddr, not an unspecified address (that's how the
flow hash table works).  So, we also need to fill in oaddr correctly for
flows we initiate *to* sockets.  Our forwarding logic doesn't specify
oaddr here, letting the kernel decide based on the routing table.  In this
case we need to call getsockname() after connect()ing the socket to find
which local address the kernel picked.

This adds getsockname() to our seccomp profile for all variants.

Link: https://bugs.passt.top/show_bug.cgi?id=99
Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

flow: Add flow_perror() helper

2025-02-18T12:33:12+00:00

Our general logging helpers include a number of _perror() variants which,
like perror(3) include the description of the current errno.  We didn't
have those for our flow specific logging helpers, though.  Fill this gap
with flow_perror() and flow_dbg_perror(), and use them where it's useful.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

migrate: Migrate TCP flows

2025-02-17T07:29:03+00:00

This implements flow preparation on the source, transfer of data with
a format roughly inspired by struct tcp_tap_conn, plus a specific
structure for parameters that don't fit in the flow table, and flow
insertion on the target, with all the appropriate window options,
window scaling, MSS, etc.

Contents of pending queues are transferred as well.

The target side is rather convoluted because we first need to create
sockets and switch them to repair mode, before we can apply options
that are *not* stored in the flow table. This also means that, if
we're testing this on the same machine, in the same namespace, we need
to close the listening socket on the source before we can start moving
data.

Further, we need to connect() the socket on the target before we can
restore data queues, but we can't do that (again, on the same machine)
as long as the matching source socket is open, which implies an
arbitrary limit on queue sizes we can transfer, because we can only
dump pending queues on the source as long as the socket is open, of
course.

Co-authored-by: David Gibson 
Reviewed-by: David Gibson 
Tested-by: David Gibson 
Signed-off-by: Stefano Brivio

flow: Helpers to log details of a flow

2024-09-06T10:53:30+00:00

The details of a flow - endpoints, interfaces etc. - can be pretty
important for debugging.  We log this on flow state transitions, but it can
also be useful to log this when we report specific conditions.  Add some
helper functions and macros to make it easy to do that.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

treewide: Use "our address" instead of "forwarding address"

2024-08-21T09:59:29+00:00

The term "forwarding address" to indicate the local-to-passt address was
well-intentioned, but ends up being kinda confusing.  As discussed on a
recent call, let's try "our" instead.

(While we're there correct an error in flow_initiate_af()s comments where
we referred to parameters by the wrong name).

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

udp: Handle "spliced" datagrams with per-flow sockets

2024-07-19T16:33:42+00:00

When forwarding a datagram to a socket, we need to find a socket with a
suitable local address to send it.  Currently we keep track of such sockets
in an array indexed by local port, but this can't properly handle cases
where we have multiple local addresses in active use.

For "spliced" (socket to socket) cases, improve this by instead opening
a socket specifically for the target side of the flow.  We connect() as
well as bind()ing that socket, so that it will only receive the flow's
reply packets, not anything else.  We direct datagrams sent via that socket
using the addresses from the flow table, effectively replacing bespoke
addressing logic with the unified logic in fwd.c

When we create the flow, we also take a duplicate of the originating
socket, and use that to deliver reply datagrams back to the origin, again
using addresses from the flow table entry.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

udp: Create flows for datagrams from originating sockets

2024-07-19T16:33:39+00:00

This implements the first steps of tracking UDP packets with the flow table
rather than its own (buggy) set of port maps.  Specifically we create flow
table entries for datagrams received from a socket (PIF_HOST or
PIF_SPLICE).

When splitting datagrams from sockets into batches, we group by the flow
as well as splicesrc.  This may result in smaller batches, but makes things
easier down the line.  We can re-optimise this later if necessary.  For now
we don't do anything else with the flow, not even match reply packets to
the same flow.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio