passt/tcp_splice.h, branch 2023_03_21.1ee2f7c

tcp: Remove splice from tcp_epoll_ref

2022-11-25T00:35:25+00:00

Currently the epoll reference for tcp sockets includes a bit indicating
whether the socket maps to a spliced connection.  However, the reference
also has the index of the connection structure which also indicates whether
it is spliced.  We can therefore avoid the splice bit in the epoll_ref by
unifying the first part of the non-spliced and spliced handlers where we
look up the connection state.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

tcp: Use the same sockets to listen for spliced and non-spliced connections

2022-11-25T00:35:22+00:00

In pasta mode, tcp_sock_init[46]() create separate sockets to listen for
spliced connections (these are bound to localhost) and non-spliced
connections (these are bound to the host address).  This introduces a
subtle behavioural difference between pasta and passt: by default, pasta
will listen only on a single host address, whereas passt will listen on
all addresses (0.0.0.0 or ::).  This also prevents us using some additional
optimizations that only work with the unspecified (0.0.0.0 or ::) address.

However, it turns out we don't need to do this.  We can splice a connection
if and only if it originates from the loopback address.  Currently we
ensure this by having the "spliced" listening sockets listening only on
loopback.  Instead, defer the decision about whether to splice a connection
until after accept(), by checking if the connection was made from the
loopback address.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

tcp: Unify part of spliced and non-spliced conn_from_sock path

2022-11-25T00:35:19+00:00

In tcp_sock_handler() we split off to handle spliced sockets before
checking anything else.  However the first steps of the "new connection"
path for each case are the same: allocate a connection entry and accept()
the connection.

Remove this duplication by making tcp_conn_from_sock() handle both spliced
and non-spliced cases, with help from more specific tcp_tap_conn_from_sock
and tcp_splice_conn_from_sock functions for the later stages which differ.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

tcp: Partially unify tcp_timer() and tcp_splice_timer()

2022-11-25T00:34:58+00:00

These two functions scan all the non-splced and spliced connections
respectively and perform timed updates on them.  Avoid scanning the now
unified table twice, by having tcp_timer scan it once calling the
relevant per-connection function for each one.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

tcp: Unify tcp_defer_handler and tcp_splice_defer_handler()

2022-11-25T00:34:54+00:00

These two functions each step through non-spliced and spliced connections
respectively and clean up entries for closed connections.  To avoid
scanning the connection table twice, we merge these into a single function
which scans the unified table and performs the appropriate sort of cleanup
action on each one.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

tcp: Unify spliced and non-spliced connection tables

2022-11-25T00:34:51+00:00

Currently spliced and non-spliced connections are stored in completely
separate tables, so there are completely independent limits on the number
of spliced and non-spliced connections.  This is a bit counter-intuitive.

More importantly, the fact that the tables are separate prevents us from
unifying some other logic between the two cases.  So, merge these two
tables into one, using the 'c.spliced' common field to distinguish between
them when necessary.

For now we keep a common limit of 128k connections, whether they're spliced
or non-spliced, which means we save memory overall.  If necessary we could
increase this to a 256k or higher total, which would cost memory but give
some more flexibility.

For now, the code paths which need to step through all extant connections
are still separate for the two cases, just skipping over entries which
aren't for them.  We'll improve that in later patches.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

tcp_splice: #include tcp_splice.h in tcp_splice.c

2022-11-25T00:34:30+00:00

This obvious include was omitted, which means that declarations in the
header weren't checked against definitions in the .c file.  This shows up
an old declaration for a function that is now static, and a duplicate

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

treewide: Add include guards

2022-03-29T13:35:38+00:00

...at the moment, just for consistency with packet.h, icmp.h,
tcp.h and udp.h.

Signed-off-by: Stefano Brivio

treewide: Packet abstraction with mandatory boundary checks

2022-03-29T13:35:38+00:00

Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.

Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.

This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Stefano Brivio

tcp_splice: Close sockets right away on high number of open files

2022-03-29T13:35:38+00:00

We can't take for granted that the hard limit for open files is
big enough as to allow to delay closing sockets to a timer.

Store the value of RTLIMIT_NOFILE we set at start, and use it to
understand if we're approaching the limit with pending, spliced
TCP connections. If that's the case, close sockets right away as
soon as they're not needed, instead of deferring this task to a
timer.

Signed-off-by: Stefano Brivio