<feed xmlns='http://www.w3.org/2005/Atom'>
<title>passt/pcap.h, branch 2025_05_03.587980c</title>
<subtitle>Plug A Simple Socket Transport</subtitle>
<link rel='alternate' type='text/html' href='https://passt.top/passt/'/>
<entry>
<title>pcap: Add an offset argument in pcap_iov()</title>
<updated>2024-10-04T12:51:02+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>lvivier@redhat.com</email>
</author>
<published>2024-10-03T14:51:05+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=fd8334b25dfa0cf4a93bb7fad6728f3bd0e31c6d'/>
<id>fd8334b25dfa0cf4a93bb7fad6728f3bd0e31c6d</id>
<content type='text'>
The offset is passed directly to pcap_frame() and allows
any headers that are not part of the frame to
capture to be skipped.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The offset is passed directly to pcap_frame() and allows
any headers that are not part of the frame to
capture to be skipped.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Standardise variable names for various packet lengths</title>
<updated>2024-05-02T14:13:23+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2024-05-01T06:53:49+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=5566386f5f1134c86db82464a4c10656ef1e11fe'/>
<id>5566386f5f1134c86db82464a4c10656ef1e11fe</id>
<content type='text'>
At various points we need to track the lengths of a packet including or
excluding various different sets of headers.  We don't always use the same
variable names for doing so.  Worse in some places we use the same name
for different things: e.g. tcp_fill_headers[46]() use ip_len for the
length including the IP headers, but then tcp_send_flag() which calls it
uses it to mean the IP payload length only.

To improve clarity, standardise on these names:
   dlen:		L4 protocol payload length ("data length")
   l4len:		plen + length of L4 protocol header
   l3len:		l4len + length of IPv4/IPv6 header
   l2len:		l3len + length of L2 (ethernet) header

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
At various points we need to track the lengths of a packet including or
excluding various different sets of headers.  We don't always use the same
variable names for doing so.  Worse in some places we use the same name
for different things: e.g. tcp_fill_headers[46]() use ip_len for the
length including the IP headers, but then tcp_send_flag() which calls it
uses it to mean the IP payload length only.

To improve clarity, standardise on these names:
   dlen:		L4 protocol payload length ("data length")
   l4len:		plen + length of L4 protocol header
   l3len:		l4len + length of IPv4/IPv6 header
   l2len:		l3len + length of L2 (ethernet) header

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pcap: add pcap_iov()</title>
<updated>2024-03-06T07:03:30+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>lvivier@redhat.com</email>
</author>
<published>2024-03-06T05:58:30+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=94502fa15e0b7386318e4a7dad78ea05462010d0'/>
<id>94502fa15e0b7386318e4a7dad78ea05462010d0</id>
<content type='text'>
Introduce a new function pcap_iov() to capture packet desribed by an IO
vector.

Update pcap_frame() to manage iovcnt &gt; 1.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Message-ID: &lt;20240303135114.1023026-2-lvivier@redhat.com&gt;
[dwg: Fixed trivial cppcheck regressions]
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce a new function pcap_iov() to capture packet desribed by an IO
vector.

Update pcap_frame() to manage iovcnt &gt; 1.

Signed-off-by: Laurent Vivier &lt;lvivier@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Message-ID: &lt;20240303135114.1023026-2-lvivier@redhat.com&gt;
[dwg: Fixed trivial cppcheck regressions]
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pcap: Handle short writes in pcap_frame()</title>
<updated>2024-02-29T05:35:01+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2024-02-28T01:52:04+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=dda7945ca9c9d2371fc37cfaed688f92bd627224'/>
<id>dda7945ca9c9d2371fc37cfaed688f92bd627224</id>
<content type='text'>
Currently pcap_frame() assumes that if write() doesn't return an error, it
has written everything we want.  That's not necessarily true, because it
could return a short write.  That's not likely to happen on a regular file,
but there's not a lot of reason not to be robust here; it's conceivable we
might want to direct the pcap fd at a named pipe or similar.

So, make pcap_frame() handle short frames by using the write_remainder()
helper.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
[sbrivio: Formatting fix, and avoid gcc warning in pcap_frame()]
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently pcap_frame() assumes that if write() doesn't return an error, it
has written everything we want.  That's not necessarily true, because it
could return a short write.  That's not likely to happen on a regular file,
but there's not a lot of reason not to be robust here; it's conceivable we
might want to direct the pcap fd at a named pipe or similar.

So, make pcap_frame() handle short frames by using the write_remainder()
helper.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
[sbrivio: Formatting fix, and avoid gcc warning in pcap_frame()]
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>passt: Relicense to GPL 2.0, or any later version</title>
<updated>2023-04-06T16:00:33+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2023-04-05T18:11:44+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=ca2749e1bd520c6a1dbca24f1561ee31dd833a54'/>
<id>ca2749e1bd520c6a1dbca24f1561ee31dd833a54</id>
<content type='text'>
In practical terms, passt doesn't benefit from the additional
protection offered by the AGPL over the GPL, because it's not
suitable to be executed over a computer network.

Further, restricting the distribution under the version 3 of the GPL
wouldn't provide any practical advantage either, as long as the passt
codebase is concerned, and might cause unnecessary compatibility
dilemmas.

Change licensing terms to the GNU General Public License Version 2,
or any later version, with written permission from all current and
past contributors, namely: myself, David Gibson, Laine Stump, Andrea
Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian
Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In practical terms, passt doesn't benefit from the additional
protection offered by the AGPL over the GPL, because it's not
suitable to be executed over a computer network.

Further, restricting the distribution under the version 3 of the GPL
wouldn't provide any practical advantage either, as long as the passt
codebase is concerned, and might cause unnecessary compatibility
dilemmas.

Change licensing terms to the GNU General Public License Version 2,
or any later version, with written permission from all current and
past contributors, namely: myself, David Gibson, Laine Stump, Andrea
Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian
Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udp: Use tap_send_frames()</title>
<updated>2023-01-23T17:55:04+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2023-01-06T00:43:22+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=54502cca7f7a89eadce92bb7a2b6ab61b878a5c6'/>
<id>54502cca7f7a89eadce92bb7a2b6ab61b878a5c6</id>
<content type='text'>
To send frames on the tap interface, the UDP uses a fairly complicated two
level batching.  First multiple frames are gathered into a single "message"
for the qemu stream socket, then multiple messages are send with
sendmmsg().  We now have tap_send_frames() which already deals with sending
a number of frames, including batching and handling partial sends.  Use
that to considerably simplify things.

This does make a couple of behavioural changes:
  * We used to split messages to keep them under 32kiB (except when a
    single frame was longer than that).  The comments claim this is
    needed to stop qemu from closing the connection, but we don't have any
    equivalent logic for TCP.  I wasn't able to reproduce the problem with
    this series, although it was apparently easy to reproduce earlier.

    My suspicion is that there was never an inherent need to keep messages
    small, however with larger messages (and default kernel buffer sizes)
    the chances of needing more than one resend for partial send()s is
    greatly increased.  We used not to correctly handle that case of
    multiple resends, but now we do.

  * Previously when we got a partial send on UDP, we would resend the
    remainder of the entire "message", including multiple frames.  The
    common code now only resends the remainder of a single frame, simply
    dropping any frames which weren't even partially sent.  This is what
    TCP always did and is probably a better idea for UDP too.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To send frames on the tap interface, the UDP uses a fairly complicated two
level batching.  First multiple frames are gathered into a single "message"
for the qemu stream socket, then multiple messages are send with
sendmmsg().  We now have tap_send_frames() which already deals with sending
a number of frames, including batching and handling partial sends.  Use
that to considerably simplify things.

This does make a couple of behavioural changes:
  * We used to split messages to keep them under 32kiB (except when a
    single frame was longer than that).  The comments claim this is
    needed to stop qemu from closing the connection, but we don't have any
    equivalent logic for TCP.  I wasn't able to reproduce the problem with
    this series, although it was apparently easy to reproduce earlier.

    My suspicion is that there was never an inherent need to keep messages
    small, however with larger messages (and default kernel buffer sizes)
    the chances of needing more than one resend for partial send()s is
    greatly increased.  We used not to correctly handle that case of
    multiple resends, but now we do.

  * Previously when we got a partial send on UDP, we would resend the
    remainder of the entire "message", including multiple frames.  The
    common code now only resends the remainder of a single frame, simply
    dropping any frames which weren't even partially sent.  This is what
    TCP always did and is probably a better idea for UDP too.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pcap: Replace pcapm() with pcap_multiple()</title>
<updated>2023-01-23T17:54:27+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2023-01-06T00:43:06+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=d3089eb0eab412e34bd91f7853e3e72429912edb'/>
<id>d3089eb0eab412e34bd91f7853e3e72429912edb</id>
<content type='text'>
pcapm() captures multiple frames from a msghdr, however the only thing it
cares about in the msghdr is the list of buffers, where it assumes there is
one frame to capture per buffer.  That's what we want for its single caller
but it's not the only obvious choice here (one frame per msghdr would
arguably make more sense in isolation).  In addition pcapm() has logic
that only makes sense in the context of the passt specific path its called
from: it skips the first 4 bytes of each buffer, because those have the
qemu vnet_len rather than the frame proper.

Make this clearer by replacing pcapm() with pcap_multiple() which more
explicitly takes one struct iovec per frame, and parameterizes how much of
each buffer to skip (i.e. the offset of the frame within the buffer).

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
pcapm() captures multiple frames from a msghdr, however the only thing it
cares about in the msghdr is the list of buffers, where it assumes there is
one frame to capture per buffer.  That's what we want for its single caller
but it's not the only obvious choice here (one frame per msghdr would
arguably make more sense in isolation).  In addition pcapm() has logic
that only makes sense in the context of the passt specific path its called
from: it skips the first 4 bytes of each buffer, because those have the
qemu vnet_len rather than the frame proper.

Make this clearer by replacing pcapm() with pcap_multiple() which more
explicitly takes one struct iovec per frame, and parameterizes how much of
each buffer to skip (i.e. the offset of the frame within the buffer).

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Mark constant references as const</title>
<updated>2022-03-29T13:35:38+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2022-03-26T06:23:21+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=48582bf47f5ef7a1bf136ca455d182addad08028'/>
<id>48582bf47f5ef7a1bf136ca455d182addad08028</id>
<content type='text'>
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Add include guards</title>
<updated>2022-03-29T13:35:38+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2022-03-25T23:05:31+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=965f603238a92b6ab8cd8a0592e0fb65c096b3e1'/>
<id>965f603238a92b6ab8cd8a0592e0fb65c096b3e1</id>
<content type='text'>
...at the moment, just for consistency with packet.h, icmp.h,
tcp.h and udp.h.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
...at the moment, just for consistency with packet.h, icmp.h,
tcp.h and udp.h.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>passt, pasta: Namespace-based sandboxing, defer seccomp policy application</title>
<updated>2022-02-21T12:41:13+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2022-02-07T20:11:37+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=0515adceaa8f69a1d85ae3c8c550c37dd49b0c47'/>
<id>0515adceaa8f69a1d85ae3c8c550c37dd49b0c47</id>
<content type='text'>
To reach (at least) a conceptually equivalent security level as
implemented by --enable-sandbox in slirp4netns, we need to create a
new mount namespace and pivot_root() into a new (empty) mountpoint, so
that passt and pasta can't access any filesystem resource after
initialisation.

While at it, also detach IPC, PID (only for passt, to prevent
vulnerabilities based on the knowledge of a target PID), and UTS
namespaces.

With this approach, if we apply the seccomp filters right after the
configuration step, the number of allowed syscalls grows further. To
prevent this, defer the application of seccomp policies after the
initialisation phase, before the main loop, that's where we expect bad
things to happen, potentially. This way, we get back to 22 allowed
syscalls for passt and 34 for pasta, on x86_64.

While at it, move #syscalls notes to specific code paths wherever it
conceptually makes sense.

We have to open all the file handles we'll ever need before
sandboxing:

- the packet capture file can only be opened once, drop instance
  numbers from the default path and use the (pre-sandbox) PID instead

- /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection
  of bound ports in pasta mode, are now opened only once, before
  sandboxing, and their handles are stored in the execution context

- the UNIX domain socket for passt is also bound only once, before
  sandboxing: to reject clients after the first one, instead of
  closing the listening socket, keep it open, accept and immediately
  discard new connection if we already have a valid one

Clarify the (unchanged) behaviour for --netns-only in the man page.

To actually make passt and pasta processes run in a separate PID
namespace, we need to unshare(CLONE_NEWPID) before forking to
background (if configured to do so). Introduce a small daemon()
implementation, __daemon(), that additionally saves the PID file
before forking. While running in foreground, the process itself can't
move to a new PID namespace (a process can't change the notion of its
own PID): mention that in the man page.

For some reason, fork() in a detached PID namespace causes SIGTERM
and SIGQUIT to be ignored, even if the handler is still reported as
SIG_DFL: add a signal handler that just exits.

We can now drop most of the pasta_child_handler() implementation,
that took care of terminating all processes running in the same
namespace, if pasta started a shell: the shell itself is now the
init process in that namespace, and all children will terminate
once the init process exits.

Issuing 'echo $$' in a detached PID namespace won't return the
actual namespace PID as seen from the init namespace: adapt
demo and test setup scripts to reflect that.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To reach (at least) a conceptually equivalent security level as
implemented by --enable-sandbox in slirp4netns, we need to create a
new mount namespace and pivot_root() into a new (empty) mountpoint, so
that passt and pasta can't access any filesystem resource after
initialisation.

While at it, also detach IPC, PID (only for passt, to prevent
vulnerabilities based on the knowledge of a target PID), and UTS
namespaces.

With this approach, if we apply the seccomp filters right after the
configuration step, the number of allowed syscalls grows further. To
prevent this, defer the application of seccomp policies after the
initialisation phase, before the main loop, that's where we expect bad
things to happen, potentially. This way, we get back to 22 allowed
syscalls for passt and 34 for pasta, on x86_64.

While at it, move #syscalls notes to specific code paths wherever it
conceptually makes sense.

We have to open all the file handles we'll ever need before
sandboxing:

- the packet capture file can only be opened once, drop instance
  numbers from the default path and use the (pre-sandbox) PID instead

- /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection
  of bound ports in pasta mode, are now opened only once, before
  sandboxing, and their handles are stored in the execution context

- the UNIX domain socket for passt is also bound only once, before
  sandboxing: to reject clients after the first one, instead of
  closing the listening socket, keep it open, accept and immediately
  discard new connection if we already have a valid one

Clarify the (unchanged) behaviour for --netns-only in the man page.

To actually make passt and pasta processes run in a separate PID
namespace, we need to unshare(CLONE_NEWPID) before forking to
background (if configured to do so). Introduce a small daemon()
implementation, __daemon(), that additionally saves the PID file
before forking. While running in foreground, the process itself can't
move to a new PID namespace (a process can't change the notion of its
own PID): mention that in the man page.

For some reason, fork() in a detached PID namespace causes SIGTERM
and SIGQUIT to be ignored, even if the handler is still reported as
SIG_DFL: add a signal handler that just exits.

We can now drop most of the pasta_child_handler() implementation,
that took care of terminating all processes running in the same
namespace, if pasta started a shell: the shell itself is now the
init process in that namespace, and all children will terminate
once the init process exits.

Issuing 'echo $$' in a detached PID namespace won't return the
actual namespace PID as seen from the init namespace: adapt
demo and test setup scripts to reflect that.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
