| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Host routes can include a preferred source address (RTA_PREFSRC), which
must be one of the host's addresses. However when using pasta with -a the
namespace might be given a different address, not on the host. This seems
to occur pretty routinely depending on the network configuration systems
in place on the host.
With --config-net we will try to copy host routes to the namespace. If
one of those includes an RTA_PREFSRC, but the namespace doesn't have the
host address, this will fail with -EINVAL, causing pasta to fail.
Fix this by stripping off RTA_PREFSRC attributes from routes as we copy
them to the namespace. This is by no means infallible, bit it should at
least handle common cases for the time being.
Link: https://bugs.passt.top/show_bug.cgi?id=71
Link: https://github.com/containers/podman/pull/19699#issuecomment-1688769287
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Otherwise, we actually configure the address, but it's not usable
because no local route is added by the kernel.
Link: https://github.com/containers/podman/pull/19699
Fixes: cfe7509e5c16 ("netlink: Use struct in_addr for IPv4 addresses, not bare uint32_t")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tcp_defer_handler() performs a potentially expensive linear scan of the
connection table. So, to mitigate the cost of that we skip if if we're not
under at least moderate pressure: either 30% of available connections or
30% (estimated) of available fds used.
But, the calculation for this has been broken since it was introduced: we
calculate "max_conns" based on c->tcp.conn_count, not TCP_MAX_CONNS,
meaning we only exit early if conn_count is less than 30% of itself, i.e.
never.
If that calculation is "corrected" to be based on TCP_MAX_CONNS, it
completely tanks the TCP CRR times for passt - from ~60ms to >1000ms on my
laptop. My guess is that this is because in the case of many short lived
connections, we're letting the table become much fuller before compacting
it. That means that other places which perform a table scan now have to
do much, much more.
For the time being, simply remove the tests, since they're not doing
anything useful. We can reintroduce them more carefully if we see a need
for them.
This also removes the only user of c->tcp.splice_conn_count, so that can
be removed as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
| |
This was overlooked when the file was created.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The in_epoll boolean is one of only two fields (currently) in the common
structure shared between tap and spliced connections. It seems like it
belongs there, because both tap and spliced connections use it, and it has
roughly the same meaning.
Roughly, however, isn't exactly: which fds this flag says are in the epoll
varies between the two connection types, and are in type specific fields.
So, it's only possible to meaningfully use this value locally in type
specific code anyway.
This common field is going to get in the way of more widespread
generalisation of connection / flow tracking, so move it to separate fields
in the tap and splice specific structures.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because packets sent on the tap interface will always be going to the
guest/namespace, we more-or-less know what address they'll be going to. So
we pre-fill this destination address in our header buffers for IPv4. We
can't do the same for IPv6 because we could need either the global or
link-local address for the guest. In future we're going to want more
flexibility for the destination address, so this pre-filling will get in
the way.
Change the flow so we always fill in the IPv4 destination address for each
packet, rather than prefilling it from proto_update_l2_buf(). In fact for
TCP we already redundantly filled the destination for each packet anyway.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We partially prepopulate IP and TCP header structures including, amongst
other things the destination address, which for IPv4 is always the known
address of the guest/namespace. We partially precompute both the IPv4
header checksum and the TCP checksum based on this.
In future we're going to want more flexibility with controlling the
destination for IPv4 (as we already do for IPv6), so this precomputed value
gets in the way. Therefore remove the IPv4 destination from the
precomputed checksum and fold it into the checksum update when we actually
send a packet.
Doing this means we no longer need to recompute those partial sums when
the destination address changes ({tcp,udp}_update_l2_buf()) and instead
the computation can be moved to compile time. This means while we perform
slightly more computations on each packet, we slightly reduce the amount of
memory we need to access.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In tcp_seq_init() the meaning of "src" and "dst" isn't really clear since
it's used for connections in both directions. However, these values are
just feeding a hash, so as long as we're consistent and include all the
information we want, it doesn't really matter.
Oddly, for the "src" side we supply the (tap side) forwarding address but
the (tap side) endpoint port. This again doesn't really matter, but it's
confusing. So swap this with dstport, so "src" is always forwarding
and "dst" is always endpoint.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a number of places the comments and variable names we use to describe
addresses and ports are ambiguous. It's not sufficient to describe a port
as "tap-facing" or "socket-facing", because on both the tap side and the
socket side there are two ports for the two ends of the connection.
Similarly, "local" and "remote" aren't particularly helpful, because it's
not necessarily clear whether we're talking from the point of view of the
guest/namespace, the host, or passt itself.
This patch makes a number of changes to be more precise about this. It
introduces two new terms in aid of this:
A "forwarding" address (or port) refers to an address which is local
from the point of view of passt itself. That is a source address for
traffic sent by passt, whether it's to the guest via the tap interface
or to a host on the internet via a socket.
The "endpoint" address (or port) is the reverse: a remote address
from passt's point of view, the destination address for traffic sent
by passt.
Between them the "side" (either tap/guest-facing or sock/host-facing)
and forwarding vs. endpoint unambiguously describes which address or
port we're talking about.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The tap code passes the IPv4 or IPv6 destination address of packets it
receives to the protocol specific code. Currently that protocol code
doesn't use the source address, but we want it to in future. So, in
preparation, pass the IPv4/IPv6 source address of tap packets to those
functions as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|
|
In tap6_handler() saddr is initialized to the IPv6 source address from the
incoming packet. However part way through, but before organizing the
packet into a "sequence" we set it unconditionally to the guest's assigned
address. We don't do anything equivalent for IPv4.
This doesn't make a lot of sense: if the guest is using a different source
address it makes sense to consider these different sequences of packets and
we shouldn't try to combine them together.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
|