<feed xmlns='http://www.w3.org/2005/Atom'>
<title>passt, branch 2024_11_21.238c69f</title>
<subtitle>Plug A Simple Socket Transport</subtitle>
<link rel='alternate' type='text/html' href='https://passt.top/passt/'/>
<entry>
<title>tcp: Acknowledge keep-alive segments, ignore them for the rest</title>
<updated>2024-11-21T05:52:36+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2024-11-19T19:53:44+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=238c69f9af458e41dea5ad8c988dbf65b05b5172'/>
<id>238c69f9af458e41dea5ad8c988dbf65b05b5172</id>
<content type='text'>
RFC 9293, 3.8.4 says:

   Implementers MAY include "keep-alives" in their TCP implementations
   (MAY-5), although this practice is not universally accepted.  Some
   TCP implementations, however, have included a keep-alive mechanism.
   To confirm that an idle connection is still active, these
   implementations send a probe segment designed to elicit a response
   from the TCP peer.  Such a segment generally contains SEG.SEQ =
   SND.NXT-1 and may or may not contain one garbage octet of data.  If
   keep-alives are included, the application MUST be able to turn them
   on or off for each TCP connection (MUST-24), and they MUST default to
   off (MUST-25).

but currently, tcp_data_from_tap() is not aware of this and will
schedule a fast re-transmit on the second keep-alive (because it's
also a duplicate ACK), ignoring the fact that the sequence number was
rewinded to SND.NXT-1.

ACK these keep-alive segments, reset the activity timeout, and ignore
them for the rest.

At some point, we could think of implementing an approximation of
keep-alive segments on outbound sockets, for example by setting
TCP_KEEPIDLE to 1, and a large TCP_KEEPINTVL, so that we send a single
keep-alive segment at approximately the same time, and never reset the
connection. That's beyond the scope of this fix, though.

Reported-by: Tim Besard &lt;tim.besard@gmail.com&gt;
Link: https://github.com/containers/podman/discussions/24572
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
RFC 9293, 3.8.4 says:

   Implementers MAY include "keep-alives" in their TCP implementations
   (MAY-5), although this practice is not universally accepted.  Some
   TCP implementations, however, have included a keep-alive mechanism.
   To confirm that an idle connection is still active, these
   implementations send a probe segment designed to elicit a response
   from the TCP peer.  Such a segment generally contains SEG.SEQ =
   SND.NXT-1 and may or may not contain one garbage octet of data.  If
   keep-alives are included, the application MUST be able to turn them
   on or off for each TCP connection (MUST-24), and they MUST default to
   off (MUST-25).

but currently, tcp_data_from_tap() is not aware of this and will
schedule a fast re-transmit on the second keep-alive (because it's
also a duplicate ACK), ignoring the fact that the sequence number was
rewinded to SND.NXT-1.

ACK these keep-alive segments, reset the activity timeout, and ignore
them for the rest.

At some point, we could think of implementing an approximation of
keep-alive segments on outbound sockets, for example by setting
TCP_KEEPIDLE to 1, and a large TCP_KEEPINTVL, so that we send a single
keep-alive segment at approximately the same time, and never reset the
connection. That's beyond the scope of this fix, though.

Reported-by: Tim Besard &lt;tim.besard@gmail.com&gt;
Link: https://github.com/containers/podman/discussions/24572
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Reset ACK_TO_TAP_DUE flag whenever an ACK isn't needed anymore</title>
<updated>2024-11-21T05:51:25+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2024-11-19T19:53:43+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=af464c4ffbb7a5341f8a7beedce8382d598dbaf7'/>
<id>af464c4ffbb7a5341f8a7beedce8382d598dbaf7</id>
<content type='text'>
We enter the timer handler with the ACK_TO_TAP_DUE flag, call
tcp_prepare_flags() with ACK_IF_NEEDED, and realise that we
acknowledged everything meanwhile, so we return early, but we also
need to reset that flag to avoid unnecessarily scheduling the timer
over and over again until more pending data appears.

I'm not sure if this fixes any real issue, but I've spotted this
in several logs reported by users, including one where we have some
unexpected bursts of high CPU load during TCP transfers at low rates,
from https://github.com/containers/podman/issues/23686.

Link: https://github.com/containers/podman/discussions/24572
Link: https://github.com/containers/podman/issues/23686
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We enter the timer handler with the ACK_TO_TAP_DUE flag, call
tcp_prepare_flags() with ACK_IF_NEEDED, and realise that we
acknowledged everything meanwhile, so we return early, but we also
need to reset that flag to avoid unnecessarily scheduling the timer
over and over again until more pending data appears.

I'm not sure if this fixes any real issue, but I've spotted this
in several logs reported by users, including one where we have some
unexpected bursts of high CPU load during TCP transfers at low rates,
from https://github.com/containers/podman/issues/23686.

Link: https://github.com/containers/podman/discussions/24572
Link: https://github.com/containers/podman/issues/23686
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ndp: Don't send unsolicited RAs if NDP is disabled</title>
<updated>2024-11-19T20:10:42+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2024-11-19T01:21:56+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=5ae21841acd7f55a4b57b99a5097ca99b84f07c4'/>
<id>5ae21841acd7f55a4b57b99a5097ca99b84f07c4</id>
<content type='text'>
We recently added support for sending unsolicited NDP Router Advertisement
packets.  While we (correctly) disable this if the --no-ra option is given
we incorrectly still send them if --no-ndp is set.  Fix the oversight.

Fixes: 6e1e44293ef9 ("ndp: Send unsolicited Router Advertisements")
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We recently added support for sending unsolicited NDP Router Advertisement
packets.  While we (correctly) disable this if the --no-ra option is given
we incorrectly still send them if --no-ndp is set.  Fix the oversight.

Fixes: 6e1e44293ef9 ("ndp: Send unsolicited Router Advertisements")
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ndp: Don't send unsolicited router advertisement if we can't, yet</title>
<updated>2024-11-19T20:10:14+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2024-11-15T14:22:06+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=bf9492747df006a794f281d6c26ee38989b44d23'/>
<id>bf9492747df006a794f281d6c26ee38989b44d23</id>
<content type='text'>
ndp_timer() is called right away on the first epoll_wait() cycle,
when the communication channel to the guest isn't ready yet:

  1.0038: NDP: sending unsolicited RA, next in 264s
  1.0038: tap: failed to send 1 frames of 1

check that it's up before sending it. This effectively delays the
first gratuitous router advertisement, which is probably a good idea
given that we expect the guest to send a router solicitation right
away.

Fixes: 6e1e44293ef9 ("ndp: Send unsolicited Router Advertisements")
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ndp_timer() is called right away on the first epoll_wait() cycle,
when the communication channel to the guest isn't ready yet:

  1.0038: NDP: sending unsolicited RA, next in 264s
  1.0038: tap: failed to send 1 frames of 1

check that it's up before sending it. This effectively delays the
first gratuitous router advertisement, which is probably a good idea
given that we expect the guest to send a router solicitation right
away.

Fixes: 6e1e44293ef9 ("ndp: Send unsolicited Router Advertisements")
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>selinux: Use auth_read_passwd() interface for all our getpwnam() needs</title>
<updated>2024-11-19T20:10:14+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2024-11-14T22:48:54+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=5e2446667729d01ef8208d0e7e866cee09c8a3fb'/>
<id>5e2446667729d01ef8208d0e7e866cee09c8a3fb</id>
<content type='text'>
If passt or pasta are started as root, we need to read the passwd file
(be it /etc/passwd or whatever sssd provides) to find out UID and GID
of 'nobody' so that we can switch to it.

Instead of a bunch of allow rules for passwd_file_t and sssd macros,
use the more convenient auth_read_passwd() interface which should
cover our usage of getpwnam().

The existing rules weren't actually enough:

  # strace -e openat passt -f
  [...]
  Started as root, will change to nobody.
  openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 4
  openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4
  openat(AT_FDCWD, "/lib64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = 4
  openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
  openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
  openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 4

with corresponding SELinux warnings logged in audit.log.

Reported-by: Minxi Hou &lt;mhou@redhat.com&gt;
Analysed-by: Miloš Malik &lt;mmalik@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If passt or pasta are started as root, we need to read the passwd file
(be it /etc/passwd or whatever sssd provides) to find out UID and GID
of 'nobody' so that we can switch to it.

Instead of a bunch of allow rules for passwd_file_t and sssd macros,
use the more convenient auth_read_passwd() interface which should
cover our usage of getpwnam().

The existing rules weren't actually enough:

  # strace -e openat passt -f
  [...]
  Started as root, will change to nobody.
  openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 4
  openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4
  openat(AT_FDCWD, "/lib64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = 4
  openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
  openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
  openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 4

with corresponding SELinux warnings logged in audit.log.

Reported-by: Minxi Hou &lt;mhou@redhat.com&gt;
Analysed-by: Miloš Malik &lt;mmalik@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ndp: Send unsolicited Router Advertisements</title>
<updated>2024-11-14T18:00:40+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2024-11-14T03:33:10+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=6e1e44293ef991d8c946dd59fbbd65c54901b255'/>
<id>6e1e44293ef991d8c946dd59fbbd65c54901b255</id>
<content type='text'>
Currently, our NDP implementation only sends Router Advertisements (RA)
when it receives a Router Solicitation (RS) from the guest.  However,
RFC 4861 requires that we periodically send unsolicited RAs.

Linux as a guest also requires this: it will send an RS when a link first
comes up, but the route it gets from this will have a finite lifetime (we
set this to 65535s, the maximum allowed, around 18 hours).  When that
expires the guest will not send a new RS, but instead expects the route to
have been renewed (if still valid) by an unsolicited RA.

Implement sending unsolicited RAs on a partially randomised timer, as
required by RFC 4861.  The RFC also specifies that solicited RAs should
also be delayed, or even omitted, if the next unsolicited RA is soon
enough.  For now we don't do that, always sending an immediate RA in
response to an RS.  We can get away with this because in our use cases
we expect to just have passt itself and the guest on the link, rather than
a large broadcast domain.

Link: https://github.com/kubevirt/kubevirt/issues/13191
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, our NDP implementation only sends Router Advertisements (RA)
when it receives a Router Solicitation (RS) from the guest.  However,
RFC 4861 requires that we periodically send unsolicited RAs.

Linux as a guest also requires this: it will send an RS when a link first
comes up, but the route it gets from this will have a finite lifetime (we
set this to 65535s, the maximum allowed, around 18 hours).  When that
expires the guest will not send a new RS, but instead expects the route to
have been renewed (if still valid) by an unsolicited RA.

Implement sending unsolicited RAs on a partially randomised timer, as
required by RFC 4861.  The RFC also specifies that solicited RAs should
also be delayed, or even omitted, if the next unsolicited RA is soon
enough.  For now we don't do that, always sending an immediate RA in
response to an RS.  We can get away with this because in our use cases
we expect to just have passt itself and the guest on the link, rather than
a large broadcast domain.

Link: https://github.com/kubevirt/kubevirt/issues/13191
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>passt: Seed libc's pseudo random number generator</title>
<updated>2024-11-14T18:00:38+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2024-11-14T03:33:09+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=b39760cc7d89e69c7fb12eccc3df3bd15e2d5665'/>
<id>b39760cc7d89e69c7fb12eccc3df3bd15e2d5665</id>
<content type='text'>
We have an upcoming case where we need pseudo-random numbers to scatter
timings, but we don't need cryptographically strong random numbers.  libc's
built in random() is fine for this purpose, but we should seed it.  Extend
secret_init() - the only current user of random numbers - to do this as
well as generating the SipHash secret.  Using /dev/random for a PRNG seed
is probably overkill, but it's simple and we only do it once, so we might
as well.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have an upcoming case where we need pseudo-random numbers to scatter
timings, but we don't need cryptographically strong random numbers.  libc's
built in random() is fine for this purpose, but we should seed it.  Extend
secret_init() - the only current user of random numbers - to do this as
well as generating the SipHash secret.  Using /dev/random for a PRNG seed
is probably overkill, but it's simple and we only do it once, so we might
as well.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>util: Add general low-level random bytes helper</title>
<updated>2024-11-14T18:00:36+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2024-11-14T03:33:08+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=71d5deed5eed3949ee09c5f0a53b4de0b09b4afc'/>
<id>71d5deed5eed3949ee09c5f0a53b4de0b09b4afc</id>
<content type='text'>
Currently secret_init() open codes getting good quality random bytes from
the OS, either via getrandom(2) or reading /dev/random.  We're going to
add at least one more place that needs random data in future, so make a
general helper for getting random bytes.  While we're there, fix a number
of minor bugs:
 - getrandom() can theoretically return a "short read", so handle that case
 - getrandom() as well as read can return a transient EINTR
 - We would attempt to read data from /dev/random if we failed to open it
   (open() returns -1), but not if we opened it as fd 0 (unlikely, but ok)
 - More specific error reporting

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently secret_init() open codes getting good quality random bytes from
the OS, either via getrandom(2) or reading /dev/random.  We're going to
add at least one more place that needs random data in future, so make a
general helper for getting random bytes.  While we're there, fix a number
of minor bugs:
 - getrandom() can theoretically return a "short read", so handle that case
 - getrandom() as well as read can return a transient EINTR
 - We would attempt to read data from /dev/random if we failed to open it
   (open() returns -1), but not if we opened it as fd 0 (unlikely, but ok)
 - More specific error reporting

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ndp: Make route lifetime a #define</title>
<updated>2024-11-14T18:00:34+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2024-11-14T03:33:07+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=a60703e89991d23345ed929328001e19f5bc47e0'/>
<id>a60703e89991d23345ed929328001e19f5bc47e0</id>
<content type='text'>
Currently we open-code the lifetime of the route we advertise via NDP to be
65535s (the maximum).  Change it to a #define.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently we open-code the lifetime of the route we advertise via NDP to be
65535s (the maximum).  Change it to a #define.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ndp: Use struct assignment in preference to memcpy() for IPv6 addresses</title>
<updated>2024-11-14T18:00:31+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2024-11-14T03:33:06+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=36c070e6e320b97bb4761e29c934f5f269e06b35'/>
<id>36c070e6e320b97bb4761e29c934f5f269e06b35</id>
<content type='text'>
There are a number of places we can simply assign IPv6 addresses about,
rather than the current mildly ugly memcpy().

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are a number of places we can simply assign IPv6 addresses about,
rather than the current mildly ugly memcpy().

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
