passt - Plug A Simple Socket Transport

	Commit message (Collapse)	Author	Age	Files	Lines
*	udp: Handle "spliced" datagrams with per-flow sockets	David Gibson	2024-07-19	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When forwarding a datagram to a socket, we need to find a socket with a suitable local address to send it. Currently we keep track of such sockets in an array indexed by local port, but this can't properly handle cases where we have multiple local addresses in active use. For "spliced" (socket to socket) cases, improve this by instead opening a socket specifically for the target side of the flow. We connect() as well as bind()ing that socket, so that it will only receive the flow's reply packets, not anything else. We direct datagrams sent via that socket using the addresses from the flow table, effectively replacing bespoke addressing logic with the unified logic in fwd.c When we create the flow, we also take a duplicate of the originating socket, and use that to deliver reply datagrams back to the origin, again using addresses from the flow table entry. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	udp: Create flows for datagrams from originating sockets	David Gibson	2024-07-19	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements the first steps of tracking UDP packets with the flow table rather than its own (buggy) set of port maps. Specifically we create flow table entries for datagrams received from a socket (PIF_HOST or PIF_SPLICE). When splitting datagrams from sockets into batches, we group by the flow as well as splicesrc. This may result in smaller batches, but makes things easier down the line. We can re-optimise this later if necessary. For now we don't do anything else with the flow, not even match reply packets to the same flow. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow, icmp: Use general flow forwarding rules for ICMP	David Gibson	2024-07-19	1	-32/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current ICMP hard codes its forwarding rules, and never applies any translations. Change it to use the flow_target() function, so that it's translated the same as TCP (excluding TCP specific port redirection). This means that gw mapping now applies to ICMP so "ping <gw address>" will now ping the host's loopback instead of the actual gw machine. This removes the surprising behaviour that the target you ping might not be the same as you connect to with TCP. This removes the last user of flow_target_af(), so that's removed as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow, tcp: Flow based NAT and port forwarding for TCP	David Gibson	2024-07-19	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the code to translate host side addresses and ports to guest side addresses and ports, and vice versa, is scattered across the TCP code. This includes both port redirection as controlled by the -t and -T options, and our special case NAT controlled by the --no-map-gw option. Gather this logic into fwd_nat_from_*() functions for each input interface in fwd.c which take protocol and address information for the initiating side and generates the pif and address information for the forwarded side. This performs any NAT or port forwarding needed. We create a flow_target() helper which applies those forwarding functions as needed to automatically move a flow from INI to TGT state. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	icmp: Manage outbound socket address via flow table	David Gibson	2024-07-19	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For now when we forward a ping to the host we leave the host side forwarding address and port blank since we don't necessarily know what source address and id will be used by the kernel. When the outbound address option is active, though, we do know the address at least, so we can record it in the flowside. Having done that, use it as the primary source of truth, binding the outgoing socket based on the information in there. This allows the possibility of more complex rules for what outbound address and/or id we use in future. To implement this we create a new helper which sets up a new socket based on information in a flowside, which will also have future uses. It behaves slightly differently from the existing ICMP code, in that it doesn't bind to a specific interface if given a loopback address. This is logically correct - the loopback address means we need to operate through the host's loopback interface, not ifname_out. We didn't need it in ICMP because ICMP will never generate a loopback address at this point, however we intend to change that in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Helper to create sockets based on flowside	David Gibson	2024-07-19	1	-0/+93
\| \| \| \| \| \| \| \| \|	We have upcoming use cases where it's useful to create new bound socket based on information from the flow table. Add flowside_sock_l4() to do this for either PIF_HOST or PIF_SPLICE sockets. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp: Re-use flow hash for initial sequence number generation	David Gibson	2024-07-19	1	-6/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We generate TCP initial sequence numbers, when we need them, from a hash of the source and destination addresses and ports, plus a timestamp. Moments later, we generate another hash of the same information plus some more to insert the connection into the flow hash table. With some tweaks to the flow_hash_insert() interface and changing the order we can re-use that hash table hash for the initial sequence number, rather than calculating another one. It won't generate identical results, but that doesn't matter as long as the sequence numbers are well scattered. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow, tcp: Generalise TCP hash table to general flow hash table	David Gibson	2024-07-19	1	-5/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the data structures and helper functions for the TCP hash table to flow.c, making it a general hash table indexing sides of flows. This is largely code motion and straightforward renames. There are two semantic changes: * flow_lookup_af() now needs to verify that the entry has a matching protocol and interface as well as matching addresses and ports. * We double the size of the hash table, because it's now at least theoretically possible for both sides of each flow to be hashed. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	tcp, flow: Replace TCP specific hash function with general flow hash	David Gibson	2024-07-19	1	-3/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we match TCP packets received on the tap connection to a TCP connection via a hash table based on the forwarding address and both ports. We hope in future to allow for multiple guest side addresses, or for multiple interfaces which means we may need to distinguish based on the endpoint address and pif as well. We also want a unified hash table to cover multiple protocols, not just TCP. Replace the TCP specific hash function with one suitable for general flows, or rather for one side of a general flow. This includes all the information from struct flowside, plus the pif and the L4 protocol number. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Common address information for target side	David Gibson	2024-07-19	1	-9/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Require the address and port information for the target (non initiating) side to be populated when a flow enters TGT state. Implement that for TCP and ICMP. For now this leaves some information redundantly recorded in both generic and type specific fields. We'll fix that in later patches. For TCP we now use the information from the flow to construct the destination socket address in both tcp_conn_from_tap() and tcp_splice_connect(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Common address information for initiating side	David Gibson	2024-07-19	1	-5/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Handling of each protocol needs some degree of tracking of the addresses and ports at the end of each connection or flow. Sometimes that's explicit (as in the guest visible addresses for TCP connections), sometimes implicit (the bound and connected addresses of sockets). To allow more consistent handling across protocols we want to uniformly track the address and port at each end of the connection. Furthermore, because we allow port remapping, and we sometimes need to apply NAT, the addresses and ports can be different as seen by the guest/namespace and as by the host. Introduce 'struct flowside' to keep track of address and port information related to one side of a flow. Store two of these in the common fields of a flow to track that information for both sides. For now we only populate the initiating side, requiring that information be completed when a flows enter INI. Later patches will populate the target side. For now this leaves some information redundantly recorded in both generic and type specific fields. We'll fix that in later patches. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Record the pifs for each side of each flow	David Gibson	2024-05-22	1	-4/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we have no generic information flows apart from the type and state, everything else is specific to the flow type. Start introducing generic flow information by recording the pifs which the flow connects. To keep track of what information is valid, introduce new flow states: INI for when the initiating side information is complete, and TGT for when both sides information is complete, but we haven't chosen the flow type yet. For now, these states don't do an awful lot, but they'll become more important as we add more generic information. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Make side 0 always be the initiating side	David Gibson	2024-05-22	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each flow in the flow table has two sides, 0 and 1, representing the two interfaces between which passt/pasta will forward data for that flow. Which side is which is currently up to the protocol specific code: TCP uses side 0 for the host/"sock" side and 1 for the guest/"tap" side, except for spliced connections where it uses 0 for the initiating side and 1 for the target side. ICMP also uses 0 for the host/"sock" side and 1 for the guest/"tap" side, but in its case the latter is always also the initiating side. Make this generically consistent by always using side 0 for the initiating side and 1 for the target side. This doesn't simplify a lot for now, and arguably makes TCP slightly more complex, since we add an extra field to the connection structure to record which is the guest facing side. This is an interim change, which we'll be able to remove later. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>q Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Clarify and enforce flow state transitions	David Gibson	2024-05-22	1	-60/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Flows move over several different states in their lifetime. The rules for these are documented in comments, but they're pretty complex and a number of the transitions are implicit, which makes this pretty fragile and error prone. Change the code to explicitly track the states in a field. Make all transitions explicit and logged. To the extent that it's practical in C, enforce what can and can't be done in various states with ASSERT()s. While we're at it, tweak the docs to clarify the restrictions on each state a bit. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Properly type callbacks to protocol specific handlers	David Gibson	2024-05-22	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The flow dispatches deferred and timer handling for flows centrally, but needs to call into protocol specific code for the handling of individual flows. Currently this passes a general union flow *. It makes more sense to pass the specific relevant flow type structure. That brings the check on the flow type adjacent to casting to the union variant which it tags. Arguably, this is a slight abstraction violation since it involves the generic flow code using protocol specific types. It's already calling into protocol specific functions, so I don't think this really makes any difference. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	icmp: Store ping socket information in flow table	David Gibson	2024-03-12	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently icmp_id_map[][] stores information about ping sockets in a bespoke structure. Move the same information into new types of flow in the flow table. To match that change, replace the existing ICMP timer with a flow-based timer for expiring ping sockets. This has the advantage that we only need to scan the active flows, not all possible ids. We convert icmp_id_map[][] to point to the flow table entries, rather than containing its own information. We do still use that array for locating the right ping flows, rather than using a "flow native" form of lookup for the time being. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Update id_sock description in comment to icmp_ping_new()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	util: move IP stuff from util.[ch] to ip.[ch]	Laurent Vivier	2024-03-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Introduce ip.[ch] file to encapsulate IP protocol handling functions and structures. Modify various files to include the new header ip.h when it's needed. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-5-lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Clarify flow entry life cycle, introduce uniform logging	David Gibson	2024-02-29	1	-2/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Our allocation scheme for flow entries means there are some non-obvious constraints on when what things can be done with an entry. Add a big doc comment explaining the life cycle. In addition, make a FLOW_START() macro to mark one of the important transitions. This encourages correct usage, by making it natural to only access the flow type specific structure after calling it. It also logs that a new flow has been created, which is useful for debugging. We also add logging when a flow's lifecycle ends. This doesn't need a new helper, because it can only happen either from flow_alloc_cancel() or from the flow deferred handler. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Add helper to determine a flow's protocol	David Gibson	2024-02-29	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each flow already has a type field. This implies the protocol the flow represents, but also has more information: we have two ways to represent TCP flows, "tap" and "spliced". In order to generalise some of the flow mechanics, we'll need to determine a flow's protocol in terms of the IP (L4) protocol number. Introduce a constant table and helper macro to derive this from the flow type. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Avoid moving flow entries to compact table	David Gibson	2024-01-22	1	-49/+149
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we always keep the flow table maximally compact: that is all the active entries are contiguous at the start of the table. Doing this sometimes requires moving an entry when one is freed. That's kind of fiddly, and potentially expensive: it requires updating the hash table for the new location, and depending on flow type, it may require EPOLL_CTL_MOD, system calls to update epoll tags with the new location too. Implement a new way of managing the flow table that doesn't ever move entries. It attempts to maintain some compactness by always using the first free slot for a new connection, and mitigates the effect of non compactness by cheaply skipping over contiguous blocks of free entries. See the "theory of operation" comment in flow.c for details. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>b [sbrivio: additional ASSERT(flow_first_free <= FLOW_MAX - 2) to avoid Coverity Scan false positive] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Enforce that freeing of closed flows must happen in deferred handlers	David Gibson	2024-01-22	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, flows are only evern finally freed (and the table compacted) from the deferred handlers. Some future ways we want to optimise managing the flow table will rely on this, so enforce it: rather than having the TCP code directly call flow_table_compact(), add a boolean return value to the per-flow deferred handlers. If true, this indicates that the flow code itself should free the flow. This forces all freeing of flows to occur during the flow code's scan of the table in flow_defer_handler() which opens possibilities for future optimisations. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Abstract allocation of new flows with helper function	David Gibson	2024-01-22	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently tcp.c open codes the process of allocating a new flow from the flow table: twice, in fact, once for guest to host and once for host to guest connections. This duplication isn't ideal and will get worse as we add more protocols to the flow table. It also makes it harder to experiment with different ways of handling flow table allocation. Instead, introduce a function to allocate a new flow: flow_alloc(). In some cases we currently check if we're able to allocate, but delay the actual allocation. We now handle that slightly differently with a flow_alloc_cancel() function to back out a recent allocation. We have that separate from a flow_free() function, because future changes we have in mind will need to handle this case a little differently. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Move flow_count from context structure to a global	David Gibson	2024-01-22	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In general, the passt code is a bit haphazard about what's a true global variable and what's in the quasi-global 'context structure'. The flow_count field is one such example: it's in the context structure, although it's really part of the same data structure as flowtab[], which is a genuine global. Move flow_count to be a regular global to match. For now it needs to be public, rather than static, but we expect to be able to change that in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Move flow_log_() to near top of flow.c	David Gibson	2024-01-22	1	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \|	flow_log_() is a very basic widely used function that many other functions in flow.c will end up needing. At present it's below flow_table_compact() which happens not to need it, but that's likely to change. Move it to near the top of flow.c to avoid forward declarations. Code motion only, no changes. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow, tcp: Add handling for per-flow timers	David Gibson	2024-01-22	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_timer() scans the flow table so that it can run tcp_splice_timer() on each spliced connection. More generally, other flow types might want to run similar timers in future. We could add a flow_timer() analagous to tcp_timer(), udp_timer() etc. However, this would need to scan the flow table, which we would have just done in flow_defer_handler(). We'd prefer to just scan the flow table once, dispatching both per-flow deferred events and per-flow timed events if necessary. So, extend flow_defer_handler() to do this. For now we use the same timer interval for all flow types (1s). We can make that more flexible in future if we need to. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow, tcp: Add flow-centric dispatch for deferred flow handling	David Gibson	2024-01-22	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_defer_handler(), amongst other things, scans the flow table and does some processing for each TCP connection. When we add other protocols to the flow table, they're likely to want some similar scanning. It makes more sense for cache friendliness to perform a single scan of the flow table and dispatch to the protocol specific handlers, rather than having each protocol separately scan the table. To that end, add a new flow_defer_handler() handling all flow-linked deferred operations. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Make flow_table.h #include the protocol specific headers it needs	David Gibson	2024-01-22	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	flow_table.h, the lower level flow header relies on having the struct definitions for every protocol specific flow type - so far that means tcp_conn.h. It doesn't include it itself, so tcp_conn.h must be included before flow_table.h. That's ok for now, but as we use the flow table for more things, flow_table.h will need the structs for all of them, which means the protocol specific .c files would need to include tcp_conn.h _and_ the equivalents for every other flow type before flow_table.h every time, which is weird. So, although we mostly lean towards the include style where .c files need to handle the include dependencies, in this case it makes more sense to have flow_table.h include all the protocol specific headers it needs. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Add missing include, stdio.h	Stefano Brivio	2023-12-27	1	-0/+1
\| \| \| \| \| \| \|	Reported-by: lemmi <lemmi@nerd2nerd.org> Link: https://github.com/void-linux/void-packages/actions/runs/7097192513/job/19316903568 Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow, tcp: Add logging helpers for connection related messages	David Gibson	2023-12-04	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of the messages logged by the TCP code (be they errors, debug or trace messages) are related to a specific connection / flow. We're fairly consistent about prefixing these with the type of connection and the connection / flow index. However there are a few places where we put the index later in the message or omit it entirely. The template with the prefix is also a little bulky to carry around for every message, particularly for spliced connections. To help keep this consistent, introduce some helpers to log messages linked to a specific flow. It takes the flow as a parameter and adds a uniform prefix to each message. This makes things slightly neater now, but more importantly will help keep formatting consistent as we add more things to the flow table. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow: Make unified version of flow table compaction	David Gibson	2023-12-04	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \|	tcp_table_compact() will move entries in the connection/flow table to keep it compact when other entries are removed. The moved entries need not have the same type as the flow removed, so it needs to be able to handle moving any type of flow. Therefore, move it to flow.c rather than being purportedly TCP specific. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow, tcp: Move TCP connection table to unified flow table	David Gibson	2023-12-04	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	We want to generalise "connection" tracking to things other than true TCP connections. Continue implenenting this by renaming the TCP connection table to the "flow table" and moving it to flow.c. The definitions are split between flow.h and flow_table.h - we need this separation to avoid circular dependencies: the definitions in flow.h will be needed by many headers using the flow mechanism, but flow_table.h needs all those protocol specific headers in order to define the full flow table entry. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
*	flow, tcp: Generalise connection types	David Gibson	2023-12-04	1	-0/+18
	Currently TCP connections use a 1-bit selector, 'spliced', to determine the rest of the contents of the structure. We want to generalise the TCP connection table to other types of flows in other protocols. Make a start on this by replacing the tcp_conn_common structure with a new flow_common structure with an enum rather than a simple boolean indicating the type of flow. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>