| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... unless it is listed in 55.
Many clients expect option 53 at the beginning.
mTCP has this code:
if ( resp->options[0] != 53 ) {
TRACE_WARN(( "Dhcp: first option was not a Dhcp msg type\n" ));
return;
}
wattcp32 has this:
static int DHCP_is_ack (void)
{
const BYTE *opt = (const BYTE*) &dhcp_in.dh_opt[4];
return (opt[0] == DHCP_OPT_MSG_TYPE && opt[1] == 1 && opt[2] == DHCP_ACK);
}
static int DHCP_is_nack (void)
{
const BYTE *opt = (const BYTE*) &dhcp_in.dh_opt[4];
return (opt[0] == DHCP_OPT_MSG_TYPE && opt[1] == 1 && opt[2] == DHCP_NAK);
}
Link: https://bugs.passt.top/show_bug.cgi?id=77
Signed-off-by: Stas Sergeev <stsp2@yandex.ru>
[sbrivio: s/options 53/option 53/ and s/other/others/ in comment]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...so that we'll retry sending them, instead of more-or-less silently
dropping them. This happens quite frequently if our sending buffer on
the UNIX domain socket is heavily constrained (for instance, by the
208 KiB default memory limit).
It might be argued that dropping frames is part of the expected TCP
flow: we don't dequeue those from the socket anyway, so we'll
eventually retransmit them.
But we don't need the receiver to tell us (by the way of duplicate or
missing ACKs) that we couldn't send them: we already know as
sendmsg() reports that. This seems to considerably increase
throughput stability and throughput itself for TCP connections with
default wmem_max values.
Unfortunately, the 16 bits left as padding in the frame descriptors
we use internally aren't enough to uniquely identify for which
connection we should update sequence numbers: create a parallel
array of pointers to sequence numbers and L4 lengths, of
TCP_FRAMES_MEM size, and go through it after calling sendmsg().
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like we need it as workaround for this situation, readily
reproducible at least with a 6.5 Linux kernel, with default rmem_max
and wmem_max values:
- an iperf3 client on the host sends about 160 KiB, typically
segmented into five frames by passt. We read this data using
MSG_PEEK
- the iperf3 server on the guest starts receiving
- meanwhile, the host kernel advertised a zero-sized window to the
sender, as expected
- eventually, the guest acknowledges all the data sent so far, and
we drop it from the buffer, courtesy of tcp_sock_consume(), using
recv() with MSG_TRUNC
- the client, however, doesn't get an updated window value, and
even keepalive packets are answered with zero-window segments,
until the connection is closed
It looks like dropping data from a socket using MSG_TRUNC doesn't
cause a recalculation of the window, which would be expected as a
result of any receiving operation that invalidates data on a buffer
(that is, not with MSG_PEEK).
Strangely enough, setting TCP_WINDOW_CLAMP via setsockopt(), even to
the previous value we clamped to, forces a recalculation of the
window which is advertised to the sender.
I couldn't quite confirm this issue by following all the possible
code paths in the kernel, yet. If confirmed, this should be fixed in
the kernel, but meanwhile this workaround looks robust to me (and it
will be needed for backward compatibility anyway).
Reported-by: Matej Hrica <mhrica@redhat.com>
Link: https://bugs.passt.top/show_bug.cgi?id=74
Analysed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
|
|
|
|
|
|
|
|
|
| |
Note that tcp_sock_consume() doesn't update ACK sequence counters
anymore.
Fixes: cc6d8286d104 ("tcp: Reset ACK_FROM_TAP_DUE flag only as needed, update timer")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
|
|
|
|
|
|
|
|
|
| |
cppcheck 2.12.0 (and maybe some other versions) things this if condition
is always true, which is demonstrably not true. Work around the bug for
now.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recent enough cppcheck versions (at least as of cppcheck 2.12) give this
error processing conf.c:
conf.c:1179:1: information: ValueFlow analysis is limited in conf. Use --check-level=exhaustive if full analysis is wanted. [checkLevelNormal]
Adding --check-level=exhaustive doesn't seem to significantly increase the
cppcheck run time for us, so enable it when possible, suppressing that
warning.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a couple of places in conf(), we use a local 'fwd' variable to reference
one of our forwarding tables. The value depends on which command line
option we're currently looking at, and is initialized rather cryptically
from an assignment side-effect within the if condition checking that
option.
Newer versions of cppcheck complain about this assignment being an always
true condition, but in any case it's both clearer and slightly shorter to
use separate if branches for the two cases and set the forwarding parameter
more directly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Newer versions of cppcheck (as of 2.12.0, at least) added a warning for
pointers which could be declared to point at const data, but aren't.
Based on that, make many pointers throughout the codebase const.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a bunch of variants of the siphash functions for different data
sizes. The callers, in tcp.c, need to pack the various values they want to
hash into a temporary structure, then call the appropriate version. We can
avoid the copy into the temporary by directly using the incremental
siphash functions.
The length specific hash functions also have an undocumented constraint
that the data pointer they take must, in fact, be aligned to avoid
unaligned accesses, which may cause crashes on some architectures.
So, prefer the incremental approach and remove the length-specific
functions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
A number of checksum and hash functions require workarounds for the odd
behaviour of Type-Baased Alias Analysis. We have a detailed comment about
this on siphash_8b() and other functions reference that.
Move the main comment to csume_16b() instead, because we're going to
reorganise things in siphash.c.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Move a bunch of code from siphash.c to siphash.h, making it available to
other modules. This will allow places which need hashes of more complex
objects to construct them incrementally.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
| |
To improve type safety, encapsulate the internal state of the SipHash
algorithm into a dedicated structure type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
The PREAMBLE macro sets up the SipHash initial internal state. It also
defines that state as a variable, which isn't macro hygeinic. With
previous changes simplifying this premable, it's now possible to replace it
with a macro which simply expands to a C initialisedrfor that state.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SipHash algorithm starts with initializing the 32 bytes of internal
state with some magic numbers XORed with the hash key. However, our
implementation has a bug - rather than XORing the hash key, it *sets* the
initial state to copies of the key.
I don't know if that affects any of the cryptographic properties of SipHash
but it's not what we should be doing. Fix it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The POSTAMBLE macro implements the finalisation steps of SipHash. It
relies on some variables in the environment, including returning the final
hash value that way. That isn't great hygeine.
In addition the PREAMBLE macro takes a length parameter which is used only
to initialize the 'b' value that's not used until the finalisation and is
also sometimes modified in a non-obvious way by the callers.
The 'b' value is always composed from the total length of the hash input
plus up to 7 bytes of "tail" data - that is the remainder of the hash input
after a multiple of 8 bytes has been consumed.
Simplify all this by replacing the POSTAMBLE macro with a siphash_final()
function which takes the length and tail data as parameters and returns the
final hash value.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have macros or inlines for a number of common operations in the siphash
functions. However, in a number of places we still open code feeding
another 64-bits of data into the hash function: an xor, followed by 2
rounds of shuffling, followed by another xor.
Implement an inline function for this, which results in somewhat shortened
code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
| |
The SIPROUND(n) macro implements n rounds of SipHash shuffling. It relies
on 'v' and '__i' variables being available in the context it's used in
which isn't great hygeine. Replace it with an inline function instead.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the siphas_*b() functions return 64-bit results, others 32-bit
results, with no obvious pattern. siphash_32b() also appears to do this
incorrectly - taking the 64-bit hash value and simply returning it
truncated, rather than folding the two halves together.
Since SipHash proper is defined to give a 64-bit hash, make all of them
return 64-bit results. In the one caller which needs a 32-bit value,
tcp_seq_init() do the fold down to 32-bits ourselves.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have several workarounds for a clang-tidy bug where the checker doesn't
recognize that a number of system calls write to - and therefore initialise
- a socket address. We can't neatly use a suppression, because the bogus
warning shows up some time after the actual system call, when we access
a field of the socket address which clang-tidy erroneously thinks is
uninitialised.
Consolidate these workarounds into one place by using macros to implement
wrappers around affected system calls which add a memset() of the sockaddr
to silence clang-tidy. This removes the need for the individual memset()
workarounds at the callers - and the somewhat longwinded explanatory
comments.
We can then use a #define to not include the hack in "real" builds, but
only consider it for clang-tidy.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A classic gotcha of the standard C library is that its unwise to call any
variable 'index' because it will shadow the standard string library
function index(3). This can cause warnings from cppcheck amongst others,
and it also means that if the variable is removed you tend to get confusing
type errors (or sometimes nothing at all) instead of a nice simple "name is
not defined" error.
Strictly speaking this only occurs if <string.h> is included, but that
is so common that as a rule it's best to just avoid it always. We
have a number of places which hit this trap, so rename variables and
parameters to avoid it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The reporter is running a SMTP server behind pasta, and the client
waits for the server's banner before sending any data. In turn, the
server waits for our ACK after sending SYN,ACK, which never comes.
If we use the ACK_IF_NEEDED indication to tcp_send_flag(), given that
there's no pending data, we delay sending the ACK segment at the end
of the three-way handshake until we have some data to send to the
server.
This was actually intended, as I thought we would lower the latency
for new connections, but we can't assume that the client will start
sending data first (SMTP is the typical example where this doesn't
happen).
And, trying out this patch with SSH (where the client starts sending
data first), the reporter actually noticed we have a lower latency
by forcing an ACK right away. Comparing a capture before the patch:
13:07:14.007704 IP 10.1.2.1.42056 > 10.1.2.140.1234: Flags [S], seq 1797034836, win 65535, options [mss 4096,nop,wscale 7], length 0
13:07:14.007769 IP 10.1.2.140.1234 > 10.1.2.1.42056: Flags [S.], seq 2297052481, ack 1797034837, win 65480, options [mss 65480,nop,wscale 7], length 0
13:07:14.008462 IP 10.1.2.1.42056 > 10.1.2.140.1234: Flags [.], seq 1:22, ack 1, win 65535, length 21
13:07:14.008496 IP 10.1.2.140.1234 > 10.1.2.1.42056: Flags [.], ack 22, win 512, length 0
13:07:14.011799 IP 10.1.2.140.1234 > 10.1.2.1.42056: Flags [P.], seq 1:515, ack 22, win 512, length 514
and after:
13:10:26.165364 IP 10.1.2.1.59508 > 10.1.2.140.1234: Flags [S], seq 4165939595, win 65535, options [mss 4096,nop,wscale 7], length 0
13:10:26.165391 IP 10.1.2.140.1234 > 10.1.2.1.59508: Flags [S.], seq 985607380, ack 4165939596, win 65480, options [mss 65480,nop,wscale 7], length 0
13:10:26.165418 IP 10.1.2.1.59508 > 10.1.2.140.1234: Flags [.], ack 1, win 512, length 0
13:10:26.165683 IP 10.1.2.1.59508 > 10.1.2.140.1234: Flags [.], seq 1:22, ack 1, win 512, length 21
13:10:26.165698 IP 10.1.2.140.1234 > 10.1.2.1.59508: Flags [.], ack 22, win 512, length 0
13:10:26.167107 IP 10.1.2.140.1234 > 10.1.2.1.59508: Flags [P.], seq 1:515, ack 22, win 512, length 514
the latency between the initial SYN segment and the first data
transmission actually decreases from 792µs to 334µs. This is not
statistically relevant as we have a single measurement, but it can't
be that bad, either.
Reported-by: cr3bs (from IRC)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out we never used 'clen' until commit 1f24d3efb499 ("dhcp:
support BOOTP clients"), and we always ignored option 55 (Parameter
Request List), while, according to RFC 2132, we MUST try to insert
the requested options in the order requested by the client.
The commit mentioned above made this visible because now every client
is reported as sending a DHCPREQUEST as an old BOOTP client, based on
the lack of option 53 (that is, zero length).
Fixes: b439984641ed ("merd: ARP and DHCP handlers, connection tracking fixes")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
|
|
To prepare the DHCPv6 domain search list option, we go over the
flattened list of domains, and replace both dots and zero bytes with a
counter of bytes in the next label, implementing the encoding
specified by section 3.1 of RFC 1035.
If there are multiple domains in the list, however, zero bytes serve
as markers for the end of a domain name, and we'll replace them with
the length of the first label of the next domain, plus one. This is
wrong. We should only convert the dots before the labels.
To distinguish between label separators and domain names separators,
for simplicity, introduce a dot before the first label of every
domain we copy to form the list. All dots are then replaced by label
lengths, and separators (zero bytes) remain as they are.
As we do this, we need to make sure we don't replace the trailing
dot, if present: that's already a separator. Skip copying it, and
just add separators as needed.
Now that we don't copy those, though, we might end up with
zero-length domains: skip them, as they're meaningless anyway.
And as we might skip domains, we can't use the index 'i' to check if
we're at the beginning of the option -- use 'srch' instead.
This is very similar to how we prepare the list for NDP option 31,
except that we don't need padding (RFC 8106, 5.2) here, and we should
refactor this into common functions, but it probably makes sense to
rework the NDP responder (https://bugs.passt.top/show_bug.cgi?id=21)
first.
Reported-by: Sebastian Mitterle <smitterl@redhat.com>
Link: https://bugs.passt.top/show_bug.cgi?id=75
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|