passt/tap.h, branch bug165c

arp/ndp: don't send messages on uninitialized tap interface

2025-11-27T21:29:25+00:00

When running pasta without --config-net, the tap interface is opened
and assigned a valid file descriptor, but intentionally not brought
up in the namespace. This is the expected behavior when the user wants
to configure the namespace manually.

However, in PASTA mode the code is attempting to send ARP announcements
and NDP messages (initial requests and unsolicited NAs) based solely on
whether c->fd_tap is valid, without checking if the interface actually
is up and ready to transmit. This results in send failures, and when
debug is activated (pasta -d) we see error printouts for these early
messages.

We now add new function tap_is_ready() which checks both conditions:
- Whether fd_tap is valid (all modes)
- Whether the tap interface is up (pasta mode only). In this mode, we
  use the existing c->pasta_conf_ns flag, which indicates whether
  pasta_ns_conf() configured and brought up the interface. This test
  is simple, and good enough for now.

We update all functions that send unsolicited ARP/NDP messages to
check with the new function before making any send attempt.

This eliminates spurious send errors when starting pasta without
--config-net, while preserving correct behavior when the interface
is properly initialized.

Signed-off-by: Jon Maloy 
Signed-off-by: Stefano Brivio

icmp: let icmp use mac address from flowside structure

2025-10-30T11:01:01+00:00

Even ICMP needs to be updated to use the external MAC address instead
of just the own tap address when applicable. We do that here.

Signed-off-by: Jon Maloy 
Reviewed-by: David Gibson 
Signed-off-by: Stefano Brivio

tap: change signature of function tap_push_l2h()

2025-10-30T11:01:01+00:00

In the next commit it must be possible for the callers of function
tap_push_l2h() to specify which source MAC address should be
added to the ethernet header sent over the tap interface. As a
preparation, we now add a new argument to that function, still
without any logical changes.

Signed-off-by: Jon Maloy 
Reviewed-by: David Gibson 
Signed-off-by: Stefano Brivio

packet: Refactor vhost-user memory region handling

2025-09-03T18:43:48+00:00

This patch refactors the handling of vhost-user memory regions by
introducing a new `struct vdev_memory` to encapsulate the regions
array and their count (`nregions`) within the main `vu_dev` structure.

This new `vdev_memory` structure is then passed to the packet pool by
re-using the existing `p->buf` field. A `p->buf_size` of 0 indicates
that `p->buf` holds a pointer to `struct vdev_memory` instead of a
regular packet buffer. A new helper, `get_vdev_memory()`, is added to
abstract this access pattern.

Previous implementation was using a marker at the end of the memory
regions array. We can now uses all the slots.

Signed-off-by: Laurent Vivier 
Reviewed-by: David Gibson 
Signed-off-by: Stefano Brivio

tap: Use iov_tail with tap_add_packet()

2025-09-03T18:42:20+00:00

Use IOV_PEEK_HEADER() to get the ethernet header from the iovec.

Move the workaround about multiple iovec array from vu_handle_tx() to
tap_add_packet(). Removing the offset out of the iovec array should
reduce the iovec count to 1.

Signed-off-by: Laurent Vivier 
Reviewed-by: David Gibson 
Signed-off-by: Stefano Brivio

style: Fix 'Return' comment style

2025-07-18T17:19:24+00:00

We always use imperative (no 'Returns:'), no tab after the ':' and
only one space, the first character is always lowercase.

This is fixed with:

 sed -i "s/Returns:/Return:/;s/Return:    /Return: /;s/Return:  */Return: /;s/Return: \([A-Z]\)/Return: \L\1/" *.[ch]

And manually updated to fix alignment of multiline comment and words
that must keep uppercase (like IPv4, TCP, UDP, Layer-4).

Signed-off-by: Laurent Vivier 
Signed-off-by: Stefano Brivio

tap: Make size of pool_tap[46] purely a tuning parameter

2025-03-20T19:33:09+00:00

Currently we attempt to size pool_tap[46] so they have room for the maximum
possible number of packets that could fit in pkt_buf (TAP_MSGS).  However,
the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60) as
the minimum possible L2 frame size.  But ETH_ZLEN is based on physical
constraints of Ethernet, which don't apply to our virtual devices.  It is
possible to generate a legitimate frame smaller than this, for example an
empty payload UDP/IPv4 frame on the 'pasta' backend is only 42 bytes long.

Further more, the same limit applies for vhost-user, which is not limited
by the size of pkt_buf like the other backends.  In that case we don't even
have full control of the maximum buffer size, so we can't really calculate
how many packets could fit in there.

If we exceed do TAP_MSGS we'll drop packets, not just use more batches,
which is moderately bad.  The fact that this needs to be sized just so for
correctness not merely for tuning is a fairly non-obvious coupling between
different parts of the code.

To make this more robust, alter the tap code so it doesn't rely on
everything fitting in a single batch of TAP_MSGS packets, instead breaking
into multiple batches as necessary.  This leaves TAP_MSGS as purely a
tuning parameter, which we can freely adjust based on performance measures.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

pcap: Correctly set snaplen based on tap backend type

2025-03-12T22:08:33+00:00

The pcap header includes a value indicating how much of each frame is
captured.  We always capture the entire frame, so we want to set this to
the maximum possible frame size.  Currently we do that by setting it to
ETH_MAX_MTU, but that's a confusingly named constant which might not always
be correct depending on the details of our tap backend.

Instead add a tap_l2_max_len() function that explicitly returns the maximum
frame size for the current mode and use that to set snaplen.  While we're
there, there's no particular need for the pcap header to be defined in a
global; make it local to pcap_init() instead.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

tap: Use explicit defines for maximum length of L2 frame

2025-03-12T22:08:33+00:00

Currently in tap.c we (mostly) use ETH_MAX_MTU as the maximum length of
an L2 frame.  This define comes from the kernel, but it's badly named and
used confusingly.

First, it doesn't really have anything to do with Ethernet, which has no
structural limit on frame lengths.  It comes more from either a) IP which
imposes a 64k datagram limit or b) from internal buffers used in various
places in the kernel (and in passt).

Worse, MTU generally means the maximum size of the IP (L3) datagram which
may be transferred, _not_ counting the L2 headers.  In the kernel
ETH_MAX_MTU is sometimes used that way, but sometimes seems to be used as
a maximum frame length, _including_ L2 headers.  In tap.c we're mostly
using it in the second way.

Finally, each of our tap backends could have different limits on the frame
size imposed by the mechanisms they're using.

Start clearing up this confusion by replacing it in tap.c with new
L2_MAX_LEN_* defines which specifically refer to the maximum L2 frame
length for each backend.

Signed-off-by: David Gibson 
Signed-off-by: Stefano Brivio

udp: create and send ICMPv6 to local peer when applicable

2025-03-07T01:21:24+00:00

When a local peer sends a UDP message to a non-existing port on an
existing remote host, that host will return an ICMPv6 message containing
the error code ICMP6_DST_UNREACH_NOPORT, plus the IPv6 header, UDP header
and the first 1232 bytes of the original message, if any. If the sender
socket has been connected, it uses this message to issue a
"Connection Refused" event to the user.

Until now, we have only read such events from the externally facing
socket, but we don't forward them back to the local sender because
we cannot read the ICMP message directly to user space. Because of
this, the local peer will hang and wait for a response that never
arrives.

We now fix this for IPv6 by recreating and forwarding a correct ICMP
message back to the internal sender. We synthesize the message based
on the information in the extended error structure, plus the returned
part of the original message body.

Note that for the sake of completeness, we even produce ICMP messages
for other error types and codes. We have noticed that at least
ICMP_PROT_UNREACH is propagated as an error event back to the user.

Reviewed-by: David Gibson 
Signed-off-by: Jon Maloy 
[sbrivio: fix cppcheck warning, udp_send_conn_fail_icmp6() doesn't
 modify saddr which can be declared as const]
Signed-off-by: Stefano Brivio