| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
| |
...at the moment, just for consistency with packet.h, icmp.h,
tcp.h and udp.h.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.
Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.
This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
| |
In section 3 ("Packet Format"), "vend" is 64 bytes long, minus the
magic that's 60 bytes, not 62.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...by:
- storing the chained-hash next connection pointer as numeric
reference rather than as pointer
- storing the MSS as 14-bit value, and rounding it
- using only the effective amount of bits needed to store the hash
bucket number
- explicitly limiting window scaling factors to 4-bit values
(maximum factor is 14, from RFC 7323)
- scaling SO_SNDBUF values, and using a 8-bit representation for
the duplicate ACK sequence
- keeping window values unscaled, as received and sent
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
| |
They look a bit lame: rephrase sentences to avoid them.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
| |
I didn't have time to investigate the root cause for the virtio_net
TX hang yet. Add a quick work-around for the moment being.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can't take for granted that the hard limit for open files is
big enough as to allow to delay closing sockets to a timer.
Store the value of RTLIMIT_NOFILE we set at start, and use it to
understand if we're approaching the limit with pending, spliced
TCP connections. If that's the case, close sockets right away as
soon as they're not needed, instead of deferring this task to a
timer.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With a lot of concurrent connections, the bitmap scan approach is
not really sustainable.
Switch to per-connection timerfd timers, set based on events and on
two new flags, ACK_FROM_TAP_DUE and ACK_TO_TAP_DUE. Timers are added
to the common epoll list, and implement the existing timeouts.
While at it, drop the CONN_ prefix from flag names, otherwise they
get quite long, and fix the logic to decide if a connection has a
local, possibly unreachable endpoint: we shouldn't go through the
rest of tcp_conn_from_tap() if we reset the connection due to a
successful bind(2), and we'll get EACCES if the port number is low.
Suggested by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
| |
This should never happen, but there are no formal guarantees: ensure
socket numbers are below SOCKET_MAX.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pass to seccomp.sh a list of additional syscalls valgrind needs as
EXTRA_SYSCALLS in a new 'valgrind' make target, and add corresponding
support in seccomp.sh itself.
In test setup functions, start passt with valgrind, but not for
performance tests.
Add tests checking that valgrind exits without errors after all the
other tests in the group are done.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
| |
They don't have a measurable performance impact and make things a
bit safer.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
There's no value in keeping a separate timestamp for activity and for
aging of local binds, given that they have the same timeout. Reduce
that to a single timestamp, with a flag indicating the local bind.
Also use flags instead of separate int fields for loopback and
configured unicast address usage as source address.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
...it became too hard to follow: split it off to
udp_sock_fill_data_v{4,6}.
While at it, use IN6_ARE_ADDR_EQUAL(a, b), courtesy of netinet/in.h,
instead of open-coded memcmp().
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
| |
It's already implied by the fact they don't have "l2" in their
names, and dropping it improves readability a bit.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
| |
...so that they can be indexed.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using events and flags instead of states makes the implementation
much more straightforward: actions are mostly centered on events
that occurred on the connection rather than states.
An example is given by the ESTABLISHED_SOCK_FIN_SENT and
FIN_WAIT_1_SOCK_FIN abominations: we don't actually care about
which side started closing the connection to handle closing of
connection halves.
Split out the spliced implementation, as it has very little in
common with the "regular" TCP path.
Refactor things here and there to improve clarity. Add helpers
to trace where resets and flag settings come from.
No functional changes intended.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
| |
...instead of kernel-like short notations.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
| |
...those were used when epoll references used to be socket numbers,
they should have gone away a long time ago.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
| |
Packets are saved *to* a file, not *at* it.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
| |
--debug can be a bit too noisy, especially as single packets or
socket messages are logged: implement a new option, --trace,
implying --debug, that enables all debug messages.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
| |
Upload gzipped js snippets for usage with gzip_static in nginx or
equivalent. Convert png drawings to webp for smaller size, use them
as alternatives in README.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
| |
...otherwise, we'll leave processes (dhclient) around.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
| |
Convenience packages are anyway built from static builds.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Build-time selection of AVX2 flags and routines is not practical for
distributions, but limiting AVX2 usage to checksum routines with
specific run-time detection doesn't allow for easy performance gains
from auto-vectorisation of batched packet handling routines.
For x86_64, build non-AVX2 and AVX2 binaries, and implement a simple
wrapper replacing the current executable with the AVX2 build if it's
available, and if AVX2 is supported by the current CPU.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
| |
I changed this in a previous commit by mistake, restore the original
command.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
| |
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
It looks like glibc commonly implements clock_gettime(2) with
clock_gettime64(), and uses recv() instead of recvfrom(), send()
instead of sendto(), and sigreturn() instead of rt_sigreturn() on
armv6l and armv7l.
Adjust the list of system calls for armv6l and armv7l accordingly.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
| |
A kernel might not be configured with CONFIG_TRANSPARENT_HUGEPAGE,
especially on embedded systems. Ignore the error, it doesn't affect
functionality.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
| |
There's a single AUDIT_ARCH_ARM define available (and big-endian
shouldn't be a concern with those).
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
| |
At least gcc 8.3 and 10.2 emit a warning on armv6l and armv7l.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
| |
This is the case at least for current glibc headers on armv6l and
armv7l.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
| |
Not functionally needed, but gcc versions 7 to 9 (at least) will
issue a warning otherwise.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
| |
...it doesn't actually matter as we're checking errno at the very
end, but, depending on build flags, chdir() might be declared with
warn_unused_result and the compiler issues a warning.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
| |
...to benefit from gzip_static in nginx or equivalent.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|