passt/vu_common.c, branch 2026_01_17.81c97f6

tcp, udp: Pad batched frames for vhost-user modes to 60 bytes (802.3 minimum)

2025-12-08T03:47:46+00:00

For both TCP and UDP, instead of just expecting the first provided
buffer to be large enough to contain the headers we need (from 42
bytes for UDP data over IPv4 to 82 bytes for TCP with options over
IPv6), change that assumption to make sure that buffers are anyway
at least ETH_ZLEN-sized buffers (60 bytes).

This looks reasonable because there are no known vhost-user
hypervisor implementations that would give us smaller buffers than
that, and we would anyway hit an assertion failure with IPv6 if we
ever had less than 60 bytes per buffer.

At this point, all we need to do is to pad the first (and only)
buffer, should data and headers use less than that.

Link: https://bugs.passt.top/show_bug.cgi?id=166
Signed-off-by: Stefano Brivio 
Reviewed-by: David Gibson

vu_common: Clarify prototype of vu_collect()

2025-12-07T22:18:04+00:00

The last argument of vu_collect() can't exceed @size, but this isn't
documented and it wasn't obvious to me, to the point that I actually
proposed a change to "fix" callers so that they wouldn't overrun input
buffers.

Rename @frame_size to @collected, and clarify that it can only be up
to @size bytes.

Signed-off-by: Stefano Brivio 
Reviewed-by: David Gibson

epoll_ctl: Extract epoll operations

2025-10-30T14:32:12+00:00

Centralize epoll_add() and epoll_del() helper functions into new
epoll_ctl.c/h files.

This also moves the union epoll_ref definition from passt.h to
epoll_ctl.h where it's more logically placed.

The new epoll_add() helper simplifies adding file descriptors to epoll
by taking an epoll_ref and events, handling error reporting
consistently across all call sites.

Signed-off-by: Laurent Vivier 
[sbrivio: Include epoll_ctl.h from netlink.c as it's now needed there]
Reviewed-by: David Gibson 
Signed-off-by: Stefano Brivio

packet: Refactor vhost-user memory region handling

2025-09-03T18:43:48+00:00

This patch refactors the handling of vhost-user memory regions by
introducing a new `struct vdev_memory` to encapsulate the regions
array and their count (`nregions`) within the main `vu_dev` structure.

This new `vdev_memory` structure is then passed to the packet pool by
re-using the existing `p->buf` field. A `p->buf_size` of 0 indicates
that `p->buf` holds a pointer to `struct vdev_memory` instead of a
regular packet buffer. A new helper, `get_vdev_memory()`, is added to
abstract this access pattern.

Previous implementation was using a marker at the end of the memory
regions array. We can now uses all the slots.

Signed-off-by: Laurent Vivier 
Reviewed-by: David Gibson 
Signed-off-by: Stefano Brivio

tap: Use iov_tail with tap_add_packet()

2025-09-03T18:42:20+00:00

Use IOV_PEEK_HEADER() to get the ethernet header from the iovec.

Move the workaround about multiple iovec array from vu_handle_tx() to
tap_add_packet(). Removing the offset out of the iovec array should
reduce the iovec count to 1.

Signed-off-by: Laurent Vivier 
Reviewed-by: David Gibson 
Signed-off-by: Stefano Brivio

tap: Make size of pool_tap[46] purely a tuning parameter

2025-03-20T19:33:09+00:00

Currently we attempt to size pool_tap[46] so they have room for the maximum
possible number of packets that could fit in pkt_buf (TAP_MSGS).  However,
the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60) as
the minimum possible L2 frame size.  But ETH_ZLEN is based on physical
constraints of Ethernet, which don't apply to our virtual devices.  It is
possible to generate a legitimate frame smaller than this, for example an
empty payload UDP/IPv4 frame on the 'pasta' backend is only 42 bytes long.

Further more, the same limit applies for vhost-user, which is not limited
by the size of pkt_buf like the other backends.  In that case we don't even
have full control of the maximum buffer size, so we can't really calculate
how many packets could fit in there.

If we exceed do TAP_MSGS we'll drop packets, not just use more batches,
which is moderately bad.  The fact that this needs to be sized just so for
correctness not merely for tuning is a fairly non-obvious coupling between
different parts of the code.

To make this more robust, alter the tap code so it doesn't rely on
everything fitting in a single batch of TAP_MSGS packets, instead breaking
into multiple batches as necessary.  This leaves TAP_MSGS as purely a
tuning parameter, which we can freely adjust based on performance measures.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

packet: More cautious checks to avoid pointer arithmetic UB

2025-03-20T19:33:06+00:00

packet_check_range and vu_packet_check_range() verify that the packet or
section of packet we're interested in lies in the packet buffer pool we
expect it to.  However, in doing so it doesn't avoid the possibility of
an integer overflow while performing pointer arithmetic, with is UB.  In
fact, AFAICT it's UB even to use arbitrary pointer arithmetic to construct
a pointer outside of a known valid buffer.

To do this safely, we can't calculate the end of a memory region with
pointer addition when then the length as untrusted.  Instead we must work
out the offset of one memory region within another using pointer
subtraction, then do integer checks against the length of the outer region.
We then need to be careful about the order of checks so that those integer
checks can't themselves overflow.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

vu_common: Tighten vu_packet_check_range()

2025-03-20T19:32:50+00:00

This function verifies that the given packet is within the mmap()ed memory
region of the vhost-user device.  We can do better, however.  The packet
should be not only within the mmap()ed range, but specifically in the
subsection of that range set aside for shared buffers, which starts at
dev_region->mmap_offset within there.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

packet: Don't pass start and offset separately to packet_check_range()

2025-02-18T07:43:12+00:00

Fundamentally what packet_check_range() does is to check whether a given
memory range is within the allowed / expected memory set aside for packets
from a particular pool.  That range could represent a whole packet (from
packet_add_do()) or part of a packet (from packet_get_do()), but it doesn't
really matter which.

However, we pass the start of the range as two parameters: @start which is
the start of the packet, and @offset which is the offset within the packet
of the range we're interested in.  We never use these separately, only as
(start + offset).  Simplify the interface of packet_check_range() and
vu_packet_check_range() to directly take the start of the relevant range.
This will allow some additional future improvements.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

migrate: Skeleton of live migration logic

2025-02-12T18:47:07+00:00

Introduce facilities for guest migration on top of vhost-user
infrastructure.  Add migration facilities based on top of the current
vhost-user infrastructure, moving vu_migrate() and related functions
to migrate.c.

Versioned migration stages define function pointers to be called on
source or target, or data sections that need to be transferred.

The migration header consists of a magic number, a version number for the
encoding, and a "compat_version" which represents the oldest version which
is compatible with the current one.  We don't use it yet, but that allows
for the future possibility of backwards compatible protocol extensions.

Co-authored-by: David Gibson 
Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio