<feed xmlns='http://www.w3.org/2005/Atom'>
<title>passt, branch 2023_10_04.f851084</title>
<subtitle>Plug A Simple Socket Transport</subtitle>
<link rel='alternate' type='text/html' href='https://passt.top/passt/'/>
<entry>
<title>dhcp: put option 53 at the beginning</title>
<updated>2023-10-04T21:39:58+00:00</updated>
<author>
<name>Stas Sergeev</name>
<email>stsp2@yandex.ru</email>
</author>
<published>2023-10-03T08:01:22+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=f851084c96e0a932e51c15ad9551e2cd85362147'/>
<id>f851084c96e0a932e51c15ad9551e2cd85362147</id>
<content type='text'>
... unless it is listed in 55.
Many clients expect option 53 at the beginning.
mTCP has this code:
  if ( resp-&gt;options[0] != 53 ) {
    TRACE_WARN(( "Dhcp: first option was not a Dhcp msg type\n" ));
    return;
  }

wattcp32 has this:
static int DHCP_is_ack (void)
{
  const BYTE *opt = (const BYTE*) &amp;dhcp_in.dh_opt[4];

  return (opt[0] == DHCP_OPT_MSG_TYPE &amp;&amp; opt[1] == 1 &amp;&amp; opt[2] == DHCP_ACK);
}
static int DHCP_is_nack (void)
{
  const BYTE *opt = (const BYTE*) &amp;dhcp_in.dh_opt[4];

  return (opt[0] == DHCP_OPT_MSG_TYPE &amp;&amp; opt[1] == 1 &amp;&amp; opt[2] == DHCP_NAK);
}

Link: https://bugs.passt.top/show_bug.cgi?id=77
Signed-off-by: Stas Sergeev &lt;stsp2@yandex.ru&gt;
[sbrivio: s/options 53/option 53/ and s/other/others/ in comment]
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
... unless it is listed in 55.
Many clients expect option 53 at the beginning.
mTCP has this code:
  if ( resp-&gt;options[0] != 53 ) {
    TRACE_WARN(( "Dhcp: first option was not a Dhcp msg type\n" ));
    return;
  }

wattcp32 has this:
static int DHCP_is_ack (void)
{
  const BYTE *opt = (const BYTE*) &amp;dhcp_in.dh_opt[4];

  return (opt[0] == DHCP_OPT_MSG_TYPE &amp;&amp; opt[1] == 1 &amp;&amp; opt[2] == DHCP_ACK);
}
static int DHCP_is_nack (void)
{
  const BYTE *opt = (const BYTE*) &amp;dhcp_in.dh_opt[4];

  return (opt[0] == DHCP_OPT_MSG_TYPE &amp;&amp; opt[1] == 1 &amp;&amp; opt[2] == DHCP_NAK);
}

Link: https://bugs.passt.top/show_bug.cgi?id=77
Signed-off-by: Stas Sergeev &lt;stsp2@yandex.ru&gt;
[sbrivio: s/options 53/option 53/ and s/other/others/ in comment]
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp, tap: Don't increase tap-side sequence counter for dropped frames</title>
<updated>2023-10-04T21:39:58+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2023-09-22T21:35:08+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=a469fc393fa1dfadc7c51c2729550597ee171a8e'/>
<id>a469fc393fa1dfadc7c51c2729550597ee171a8e</id>
<content type='text'>
...so that we'll retry sending them, instead of more-or-less silently
dropping them. This happens quite frequently if our sending buffer on
the UNIX domain socket is heavily constrained (for instance, by the
208 KiB default memory limit).

It might be argued that dropping frames is part of the expected TCP
flow: we don't dequeue those from the socket anyway, so we'll
eventually retransmit them.

But we don't need the receiver to tell us (by the way of duplicate or
missing ACKs) that we couldn't send them: we already know as
sendmsg() reports that. This seems to considerably increase
throughput stability and throughput itself for TCP connections with
default wmem_max values.

Unfortunately, the 16 bits left as padding in the frame descriptors
we use internally aren't enough to uniquely identify for which
connection we should update sequence numbers: create a parallel
array of pointers to sequence numbers and L4 lengths, of
TCP_FRAMES_MEM size, and go through it after calling sendmsg().

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
...so that we'll retry sending them, instead of more-or-less silently
dropping them. This happens quite frequently if our sending buffer on
the UNIX domain socket is heavily constrained (for instance, by the
208 KiB default memory limit).

It might be argued that dropping frames is part of the expected TCP
flow: we don't dequeue those from the socket anyway, so we'll
eventually retransmit them.

But we don't need the receiver to tell us (by the way of duplicate or
missing ACKs) that we couldn't send them: we already know as
sendmsg() reports that. This seems to considerably increase
throughput stability and throughput itself for TCP connections with
default wmem_max values.

Unfortunately, the 16 bits left as padding in the frame descriptors
we use internally aren't enough to uniquely identify for which
connection we should update sequence numbers: create a parallel
array of pointers to sequence numbers and L4 lengths, of
TCP_FRAMES_MEM size, and go through it after calling sendmsg().

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag</title>
<updated>2023-10-04T21:27:15+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2023-09-22T21:21:20+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=d3192f67c492f6caa3d0779ae016c34e3d847b22'/>
<id>d3192f67c492f6caa3d0779ae016c34e3d847b22</id>
<content type='text'>
It looks like we need it as workaround for this situation, readily
reproducible at least with a 6.5 Linux kernel, with default rmem_max
and wmem_max values:

- an iperf3 client on the host sends about 160 KiB, typically
  segmented into five frames by passt. We read this data using
  MSG_PEEK

- the iperf3 server on the guest starts receiving

- meanwhile, the host kernel advertised a zero-sized window to the
  sender, as expected

- eventually, the guest acknowledges all the data sent so far, and
  we drop it from the buffer, courtesy of tcp_sock_consume(), using
  recv() with MSG_TRUNC

- the client, however, doesn't get an updated window value, and
  even keepalive packets are answered with zero-window segments,
  until the connection is closed

It looks like dropping data from a socket using MSG_TRUNC doesn't
cause a recalculation of the window, which would be expected as a
result of any receiving operation that invalidates data on a buffer
(that is, not with MSG_PEEK).

Strangely enough, setting TCP_WINDOW_CLAMP via setsockopt(), even to
the previous value we clamped to, forces a recalculation of the
window which is advertised to the sender.

I couldn't quite confirm this issue by following all the possible
code paths in the kernel, yet. If confirmed, this should be fixed in
the kernel, but meanwhile this workaround looks robust to me (and it
will be needed for backward compatibility anyway).

Reported-by: Matej Hrica &lt;mhrica@redhat.com&gt;
Link: https://bugs.passt.top/show_bug.cgi?id=74
Analysed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It looks like we need it as workaround for this situation, readily
reproducible at least with a 6.5 Linux kernel, with default rmem_max
and wmem_max values:

- an iperf3 client on the host sends about 160 KiB, typically
  segmented into five frames by passt. We read this data using
  MSG_PEEK

- the iperf3 server on the guest starts receiving

- meanwhile, the host kernel advertised a zero-sized window to the
  sender, as expected

- eventually, the guest acknowledges all the data sent so far, and
  we drop it from the buffer, courtesy of tcp_sock_consume(), using
  recv() with MSG_TRUNC

- the client, however, doesn't get an updated window value, and
  even keepalive packets are answered with zero-window segments,
  until the connection is closed

It looks like dropping data from a socket using MSG_TRUNC doesn't
cause a recalculation of the window, which would be expected as a
result of any receiving operation that invalidates data on a buffer
(that is, not with MSG_PEEK).

Strangely enough, setting TCP_WINDOW_CLAMP via setsockopt(), even to
the previous value we clamped to, forces a recalculation of the
window which is advertised to the sender.

I couldn't quite confirm this issue by following all the possible
code paths in the kernel, yet. If confirmed, this should be fixed in
the kernel, but meanwhile this workaround looks robust to me (and it
will be needed for backward compatibility anyway).

Reported-by: Matej Hrica &lt;mhrica@redhat.com&gt;
Link: https://bugs.passt.top/show_bug.cgi?id=74
Analysed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Fix comment to tcp_sock_consume()</title>
<updated>2023-10-04T21:24:08+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2023-09-22T21:08:29+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=feaeb4986c1143f39cd71cd88de6f2e5280beecc'/>
<id>feaeb4986c1143f39cd71cd88de6f2e5280beecc</id>
<content type='text'>
Note that tcp_sock_consume() doesn't update ACK sequence counters
anymore.

Fixes: cc6d8286d104 ("tcp: Reset ACK_FROM_TAP_DUE flag only as needed, update timer")
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Note that tcp_sock_consume() doesn't update ACK sequence counters
anymore.

Fixes: cc6d8286d104 ("tcp: Reset ACK_FROM_TAP_DUE flag only as needed, update timer")
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cppcheck: Work around bug in cppcheck 2.12.0</title>
<updated>2023-10-04T21:24:05+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2023-09-29T05:50:22+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=117b474f85f2b73278b2aa8930f1e3aa2b418373'/>
<id>117b474f85f2b73278b2aa8930f1e3aa2b418373</id>
<content type='text'>
cppcheck 2.12.0 (and maybe some other versions) things this if condition
is always true, which is demonstrably not true.  Work around the bug for
now.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cppcheck 2.12.0 (and maybe some other versions) things this if condition
is always true, which is demonstrably not true.  Work around the bug for
now.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cppcheck: Use "exhaustive" level checking when available</title>
<updated>2023-10-04T21:24:00+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2023-09-29T05:50:21+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=8640d62af719ecb48ed8c8adb31fd99f02a6c49b'/>
<id>8640d62af719ecb48ed8c8adb31fd99f02a6c49b</id>
<content type='text'>
Recent enough cppcheck versions (at least as of cppcheck 2.12) give this
error processing conf.c:

conf.c:1179:1: information: ValueFlow analysis is limited in conf. Use --check-level=exhaustive if full analysis is wanted. [checkLevelNormal]

Adding --check-level=exhaustive doesn't seem to significantly increase the
cppcheck run time for us, so enable it when possible, suppressing that
warning.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Recent enough cppcheck versions (at least as of cppcheck 2.12) give this
error processing conf.c:

conf.c:1179:1: information: ValueFlow analysis is limited in conf. Use --check-level=exhaustive if full analysis is wanted. [checkLevelNormal]

Adding --check-level=exhaustive doesn't seem to significantly increase the
cppcheck run time for us, so enable it when possible, suppressing that
warning.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>conf: Remove overly cryptic selection of forward table</title>
<updated>2023-10-04T21:23:56+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2023-09-29T05:50:20+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=c0efa4e97f20848570552286403574bf228b6420'/>
<id>c0efa4e97f20848570552286403574bf228b6420</id>
<content type='text'>
In a couple of places in conf(), we use a local 'fwd' variable to reference
one of our forwarding tables.  The value depends on which command line
option we're currently looking at, and is initialized rather cryptically
from an assignment side-effect within the if condition checking that
option.

Newer versions of cppcheck complain about this assignment being an always
true condition, but in any case it's both clearer and slightly shorter to
use separate if branches for the two cases and set the forwarding parameter
more directly.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In a couple of places in conf(), we use a local 'fwd' variable to reference
one of our forwarding tables.  The value depends on which command line
option we're currently looking at, and is initialized rather cryptically
from an assignment side-effect within the if condition checking that
option.

Newer versions of cppcheck complain about this assignment being an always
true condition, but in any case it's both clearer and slightly shorter to
use separate if branches for the two cases and set the forwarding parameter
more directly.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cppcheck: Make many pointers const</title>
<updated>2023-10-04T21:23:35+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2023-09-29T05:50:19+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=6471c7d01b0aad9d144448290557fcd783562228'/>
<id>6471c7d01b0aad9d144448290557fcd783562228</id>
<content type='text'>
Newer versions of cppcheck (as of 2.12.0, at least) added a warning for
pointers which could be declared to point at const data, but aren't.
Based on that, make many pointers throughout the codebase const.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Newer versions of cppcheck (as of 2.12.0, at least) added a warning for
pointers which could be declared to point at const data, but aren't.
Based on that, make many pointers throughout the codebase const.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>siphash: Use incremental rather than all-at-once siphash functions</title>
<updated>2023-09-30T10:40:53+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2023-09-28T01:21:02+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=fc8f0f8c48ef12edbf60f74f37024917f5812385'/>
<id>fc8f0f8c48ef12edbf60f74f37024917f5812385</id>
<content type='text'>
We have a bunch of variants of the siphash functions for different data
sizes.  The callers, in tcp.c, need to pack the various values they want to
hash into a temporary structure, then call the appropriate version.  We can
avoid the copy into the temporary by directly using the incremental
siphash functions.

The length specific hash functions also have an undocumented constraint
that the data pointer they take must, in fact, be aligned to avoid
unaligned accesses, which may cause crashes on some architectures.

So, prefer the incremental approach and remove the length-specific
functions.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have a bunch of variants of the siphash functions for different data
sizes.  The callers, in tcp.c, need to pack the various values they want to
hash into a temporary structure, then call the appropriate version.  We can
avoid the copy into the temporary by directly using the incremental
siphash functions.

The length specific hash functions also have an undocumented constraint
that the data pointer they take must, in fact, be aligned to avoid
unaligned accesses, which may cause crashes on some architectures.

So, prefer the incremental approach and remove the length-specific
functions.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>siphash, checksum: Move TBAA explanation to checksum.c</title>
<updated>2023-09-30T10:40:51+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2023-09-28T01:21:01+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=04b10a8d907d8ceb09e3adb4885c02f546841bbc'/>
<id>04b10a8d907d8ceb09e3adb4885c02f546841bbc</id>
<content type='text'>
A number of checksum and hash functions require workarounds for the odd
behaviour of Type-Baased Alias Analysis.  We have a detailed comment about
this on siphash_8b() and other functions reference that.

Move the main comment to csume_16b() instead, because we're going to
reorganise things in siphash.c.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A number of checksum and hash functions require workarounds for the odd
behaviour of Type-Baased Alias Analysis.  We have a detailed comment about
this on siphash_8b() and other functions reference that.

Move the main comment to csume_16b() instead, because we're going to
reorganise things in siphash.c.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
