<feed xmlns='http://www.w3.org/2005/Atom'>
<title>passt/icmp.h, branch 2023_03_21.1ee2f7c</title>
<subtitle>Plug A Simple Socket Transport</subtitle>
<link rel='alternate' type='text/html' href='https://passt.top/passt/'/>
<entry>
<title>icmp: Don't discard first reply sequence for a given echo ID</title>
<updated>2022-10-26T22:18:21+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2022-10-26T15:55:53+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=f21204494064e28e9f0bb20f71a6071bead0d0e8'/>
<id>f21204494064e28e9f0bb20f71a6071bead0d0e8</id>
<content type='text'>
In pasta mode, ICMP and ICMPv6 echo sockets relay back to us any
reply we send: we're on the same host as the target, after all. We
discard them by comparing the last sequence we sent with the sequence
we receive.

However, on the first reply for a given identifier, the sequence
might be zero, depending on the implementation of ping(8): we need
another value to indicate we haven't sent any sequence number, yet.

Use -1 as initialiser in the echo identifier map.

This is visible with Busybox's ping, and was reported by Paul on the
integration at https://github.com/containers/podman/pull/16141, with:

  $ podman run --net=pasta alpine ping -c 2 192.168.188.1

...where only the second reply would be routed back.

Reported-by: Paul Holzinger &lt;pholzing@redhat.com&gt;
Fixes: 33482d5bf293 ("passt: Add PASTA mode, major rework")
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In pasta mode, ICMP and ICMPv6 echo sockets relay back to us any
reply we send: we're on the same host as the target, after all. We
discard them by comparing the last sequence we sent with the sequence
we receive.

However, on the first reply for a given identifier, the sequence
might be zero, depending on the implementation of ping(8): we need
another value to indicate we haven't sent any sequence number, yet.

Use -1 as initialiser in the echo identifier map.

This is visible with Busybox's ping, and was reported by Paul on the
integration at https://github.com/containers/podman/pull/16141, with:

  $ podman run --net=pasta alpine ping -c 2 192.168.188.1

...where only the second reply would be routed back.

Reported-by: Paul Holzinger &lt;pholzing@redhat.com&gt;
Fixes: 33482d5bf293 ("passt: Add PASTA mode, major rework")
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Mark constant references as const</title>
<updated>2022-03-29T13:35:38+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2022-03-26T06:23:21+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=48582bf47f5ef7a1bf136ca455d182addad08028'/>
<id>48582bf47f5ef7a1bf136ca455d182addad08028</id>
<content type='text'>
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Packet abstraction with mandatory boundary checks</title>
<updated>2022-03-29T13:35:38+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2022-03-25T12:02:47+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=bb708111833e23cafda1a5dd377e13400fa1e452'/>
<id>bb708111833e23cafda1a5dd377e13400fa1e452</id>
<content type='text'>
Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.

Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.

This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.

Suggested-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.

Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.

This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.

Suggested-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>passt: Fix build with gcc 7, use std=c99, enable some more Clang checkers</title>
<updated>2021-10-21T02:26:08+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2021-10-21T02:26:08+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=dd942eaa480a0744fd64844f34233900a0da6893'/>
<id>dd942eaa480a0744fd64844f34233900a0da6893</id>
<content type='text'>
Unions and structs, you all have names now.

Take the chance to enable bugprone-reserved-identifier,
cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy.

Provide a ffsl() weak declaration using gcc built-in.

Start reordering includes, but that's not enough for the
llvm-include-order checker yet.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Unions and structs, you all have names now.

Take the chance to enable bugprone-reserved-identifier,
cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy.

Provide a ffsl() weak declaration using gcc built-in.

Start reordering includes, but that's not enough for the
llvm-include-order checker yet.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>LICENSES: Add license text files, add missing notices, fix SPDX tags</title>
<updated>2021-10-20T06:29:30+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2021-10-19T10:43:28+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=087b5f4dbb9e3f767a8afbb6c1001c509965940b'/>
<id>087b5f4dbb9e3f767a8afbb6c1001c509965940b</id>
<content type='text'>
SPDX tags don't replace license files. Some notices were missing and
some tags were not according to the SPDX specification, too.

Now reuse --lint from the REUSE tool (https://reuse.software/) passes.

Reported-by: Martin Hauke &lt;mardnh@gmx.de&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
SPDX tags don't replace license files. Some notices were missing and
some tags were not according to the SPDX specification, too.

Now reuse --lint from the REUSE tool (https://reuse.software/) passes.

Reported-by: Martin Hauke &lt;mardnh@gmx.de&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tap: Completely de-serialise input message batches</title>
<updated>2021-09-26T23:28:02+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2021-09-26T21:38:22+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=dd581730e54b934f80d5b6a820136707dc71c664'/>
<id>dd581730e54b934f80d5b6a820136707dc71c664</id>
<content type='text'>
Until now, messages would be passed to protocol handlers in a single
batch only if they happened to be dequeued in a row. Packets
interleaved between different connections would result in multiple
calls to the same protocol handler for a single connection.

Instead, keep track of incoming packet descriptors, arrange them in
sequences, and call protocol handlers only as we completely sorted
input messages in batches.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Until now, messages would be passed to protocol handlers in a single
batch only if they happened to be dequeued in a row. Packets
interleaved between different connections would result in multiple
calls to the same protocol handler for a single connection.

Instead, keep track of incoming packet descriptors, arrange them in
sequences, and call protocol handlers only as we completely sorted
input messages in batches.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>icmp: Work around possible failure on bind() due to e.g. broken SELinux policy</title>
<updated>2021-07-26T15:43:10+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2021-07-26T13:26:36+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=9663378d6d6dcd8275d60b826356cc4be0538231'/>
<id>9663378d6d6dcd8275d60b826356cc4be0538231</id>
<content type='text'>
If we can't bind() ping sockets, the echo identifier sent out from
the socket won't be the original one seen from the tap. Binding a
ping socket doesn't require any security capability, but it might
still fail due to a broken SELinux policy, see for example:
	https://bugzilla.redhat.com/show_bug.cgi?id=1848929

Track the ICMP echo identifier as part of the epoll reference for
the socket and replace it in the reply on mismatch. We won't send
out the original identifier as sent from the guest, but still better
than missing replies.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we can't bind() ping sockets, the echo identifier sent out from
the socket won't be the original one seen from the tap. Binding a
ping socket doesn't require any security capability, but it might
still fail due to a broken SELinux policy, see for example:
	https://bugzilla.redhat.com/show_bug.cgi?id=1848929

Track the ICMP echo identifier as part of the epoll reference for
the socket and replace it in the reply on mismatch. We won't send
out the original identifier as sent from the guest, but still better
than missing replies.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>passt: Add PASTA mode, major rework</title>
<updated>2021-07-17T09:04:22+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2021-07-17T06:34:53+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=33482d5bf29312464b208beb01a5302257e82fe6'/>
<id>33482d5bf29312464b208beb01a5302257e82fe6</id>
<content type='text'>
PASTA (Pack A Subtle Tap Abstraction) provides quasi-native host
connectivity to an otherwise disconnected, unprivileged network
and user namespace, similarly to slirp4netns. Given that the
implementation is largely overlapping with PASST, no separate binary
is built: 'pasta' (and 'passt4netns' for clarity) both link to
'passt', and the mode of operation is selected depending on how the
binary is invoked. Usage example:

	$ unshare -rUn
	# echo $$
	1871759

	$ ./pasta 1871759	# From another terminal

	# udhcpc -i pasta0 2&gt;/dev/null
	# ping -c1 pasta.pizza
	PING pasta.pizza (64.190.62.111) 56(84) bytes of data.
	64 bytes from 64.190.62.111 (64.190.62.111): icmp_seq=1 ttl=255 time=34.6 ms

	--- pasta.pizza ping statistics ---
	1 packets transmitted, 1 received, 0% packet loss, time 0ms
	rtt min/avg/max/mdev = 34.575/34.575/34.575/0.000 ms
	# ping -c1 spaghetti.pizza
	PING spaghetti.pizza(2606:4700:3034::6815:147a (2606:4700:3034::6815:147a)) 56 data bytes
	64 bytes from 2606:4700:3034::6815:147a (2606:4700:3034::6815:147a): icmp_seq=1 ttl=255 time=29.0 ms

	--- spaghetti.pizza ping statistics ---
	1 packets transmitted, 1 received, 0% packet loss, time 0ms
	rtt min/avg/max/mdev = 28.967/28.967/28.967/0.000 ms

This entails a major rework, especially with regard to the storage of
tracked connections and to the semantics of epoll(7) references.

Indexing TCP and UDP bindings merely by socket proved to be
inflexible and unsuitable to handle different connection flows: pasta
also provides Layer-2 to Layer-2 socket mapping between init and a
separate namespace for local connections, using a pair of splice()
system calls for TCP, and a recvmmsg()/sendmmsg() pair for UDP local
bindings. For instance, building on the previous example:

	# ip link set dev lo up
	# iperf3 -s

	$ iperf3 -c ::1 -Z -w 32M -l 1024k -P2 | tail -n4
	[SUM]   0.00-10.00  sec  52.3 GBytes  44.9 Gbits/sec  283             sender
	[SUM]   0.00-10.43  sec  52.3 GBytes  43.1 Gbits/sec                  receiver

	iperf Done.

epoll(7) references now include a generic part in order to
demultiplex data to the relevant protocol handler, using 24
bits for the socket number, and an opaque portion reserved for
usage by the single protocol handlers, in order to track sockets
back to corresponding connections and bindings.

A number of fixes pertaining to TCP state machine and congestion
window handling are also included here.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
PASTA (Pack A Subtle Tap Abstraction) provides quasi-native host
connectivity to an otherwise disconnected, unprivileged network
and user namespace, similarly to slirp4netns. Given that the
implementation is largely overlapping with PASST, no separate binary
is built: 'pasta' (and 'passt4netns' for clarity) both link to
'passt', and the mode of operation is selected depending on how the
binary is invoked. Usage example:

	$ unshare -rUn
	# echo $$
	1871759

	$ ./pasta 1871759	# From another terminal

	# udhcpc -i pasta0 2&gt;/dev/null
	# ping -c1 pasta.pizza
	PING pasta.pizza (64.190.62.111) 56(84) bytes of data.
	64 bytes from 64.190.62.111 (64.190.62.111): icmp_seq=1 ttl=255 time=34.6 ms

	--- pasta.pizza ping statistics ---
	1 packets transmitted, 1 received, 0% packet loss, time 0ms
	rtt min/avg/max/mdev = 34.575/34.575/34.575/0.000 ms
	# ping -c1 spaghetti.pizza
	PING spaghetti.pizza(2606:4700:3034::6815:147a (2606:4700:3034::6815:147a)) 56 data bytes
	64 bytes from 2606:4700:3034::6815:147a (2606:4700:3034::6815:147a): icmp_seq=1 ttl=255 time=29.0 ms

	--- spaghetti.pizza ping statistics ---
	1 packets transmitted, 1 received, 0% packet loss, time 0ms
	rtt min/avg/max/mdev = 28.967/28.967/28.967/0.000 ms

This entails a major rework, especially with regard to the storage of
tracked connections and to the semantics of epoll(7) references.

Indexing TCP and UDP bindings merely by socket proved to be
inflexible and unsuitable to handle different connection flows: pasta
also provides Layer-2 to Layer-2 socket mapping between init and a
separate namespace for local connections, using a pair of splice()
system calls for TCP, and a recvmmsg()/sendmmsg() pair for UDP local
bindings. For instance, building on the previous example:

	# ip link set dev lo up
	# iperf3 -s

	$ iperf3 -c ::1 -Z -w 32M -l 1024k -P2 | tail -n4
	[SUM]   0.00-10.00  sec  52.3 GBytes  44.9 Gbits/sec  283             sender
	[SUM]   0.00-10.43  sec  52.3 GBytes  43.1 Gbits/sec                  receiver

	iperf Done.

epoll(7) references now include a generic part in order to
demultiplex data to the relevant protocol handler, using 24
bits for the socket number, and an opaque portion reserved for
usage by the single protocol handlers, in order to track sockets
back to corresponding connections and bindings.

A number of fixes pertaining to TCP state machine and congestion
window handling are also included here.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>icmp: Implement ping tracking based on echo identifiers</title>
<updated>2021-05-21T09:14:50+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2021-05-21T09:14:50+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=d303cfdd55a7fb30bedd1727216e0a3926c4a95c'/>
<id>d303cfdd55a7fb30bedd1727216e0a3926c4a95c</id>
<content type='text'>
Open and bind a socket for each possible ICMP/ICMPv6 echo identifier,
and add a tracking mechanism. Otherwise, multiple pings in parallel
won't work, and a single ping to a different destination would make
an existing ping sequence stop working.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Open and bind a socket for each possible ICMP/ICMPv6 echo identifier,
and add a tracking mechanism. Otherwise, multiple pings in parallel
won't work, and a single ping to a different destination would make
an existing ping sequence stop working.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udp, passt: Introduce socket packet buffer, avoid getsockname() for UDP</title>
<updated>2021-04-30T12:52:18+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2021-04-30T12:52:18+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=e07f539ae0aa3ad623c4e8afcaca26906fd1eb17'/>
<id>e07f539ae0aa3ad623c4e8afcaca26906fd1eb17</id>
<content type='text'>
This is in preparation for scatter-gather IO on the UDP receive path:
save a getsockname() syscall by setting a flag if we get the numbering
of all bound sockets in a strict sequence (expected, in practice) and
repurpose the tap buffer to be also a socket receive buffer, passing
it down to protocol handlers.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is in preparation for scatter-gather IO on the UDP receive path:
save a getsockname() syscall by setting a flag if we get the numbering
of all bound sockets in a strict sequence (expected, in practice) and
repurpose the tap buffer to be also a socket receive buffer, passing
it down to protocol handlers.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
