<feed xmlns='http://www.w3.org/2005/Atom'>
<title>passt/tcp_splice.h, branch 2023_03_21.1ee2f7c</title>
<subtitle>Plug A Simple Socket Transport</subtitle>
<link rel='alternate' type='text/html' href='https://passt.top/passt/'/>
<entry>
<title>tcp: Remove splice from tcp_epoll_ref</title>
<updated>2022-11-25T00:35:25+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-11-17T05:58:53+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=233b95e90f8e9391d93d2187f682eaf51b6f3dd1'/>
<id>233b95e90f8e9391d93d2187f682eaf51b6f3dd1</id>
<content type='text'>
Currently the epoll reference for tcp sockets includes a bit indicating
whether the socket maps to a spliced connection.  However, the reference
also has the index of the connection structure which also indicates whether
it is spliced.  We can therefore avoid the splice bit in the epoll_ref by
unifying the first part of the non-spliced and spliced handlers where we
look up the connection state.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently the epoll reference for tcp sockets includes a bit indicating
whether the socket maps to a spliced connection.  However, the reference
also has the index of the connection structure which also indicates whether
it is spliced.  We can therefore avoid the splice bit in the epoll_ref by
unifying the first part of the non-spliced and spliced handlers where we
look up the connection state.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Use the same sockets to listen for spliced and non-spliced connections</title>
<updated>2022-11-25T00:35:22+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-11-17T05:58:52+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=d909fda1e81979da12ed4ab8b2f2a18fba756a07'/>
<id>d909fda1e81979da12ed4ab8b2f2a18fba756a07</id>
<content type='text'>
In pasta mode, tcp_sock_init[46]() create separate sockets to listen for
spliced connections (these are bound to localhost) and non-spliced
connections (these are bound to the host address).  This introduces a
subtle behavioural difference between pasta and passt: by default, pasta
will listen only on a single host address, whereas passt will listen on
all addresses (0.0.0.0 or ::).  This also prevents us using some additional
optimizations that only work with the unspecified (0.0.0.0 or ::) address.

However, it turns out we don't need to do this.  We can splice a connection
if and only if it originates from the loopback address.  Currently we
ensure this by having the "spliced" listening sockets listening only on
loopback.  Instead, defer the decision about whether to splice a connection
until after accept(), by checking if the connection was made from the
loopback address.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In pasta mode, tcp_sock_init[46]() create separate sockets to listen for
spliced connections (these are bound to localhost) and non-spliced
connections (these are bound to the host address).  This introduces a
subtle behavioural difference between pasta and passt: by default, pasta
will listen only on a single host address, whereas passt will listen on
all addresses (0.0.0.0 or ::).  This also prevents us using some additional
optimizations that only work with the unspecified (0.0.0.0 or ::) address.

However, it turns out we don't need to do this.  We can splice a connection
if and only if it originates from the loopback address.  Currently we
ensure this by having the "spliced" listening sockets listening only on
loopback.  Instead, defer the decision about whether to splice a connection
until after accept(), by checking if the connection was made from the
loopback address.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Unify part of spliced and non-spliced conn_from_sock path</title>
<updated>2022-11-25T00:35:19+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-11-17T05:58:51+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=356c6e0677072f1a6cfe9b5d0648d89ab6fd5523'/>
<id>356c6e0677072f1a6cfe9b5d0648d89ab6fd5523</id>
<content type='text'>
In tcp_sock_handler() we split off to handle spliced sockets before
checking anything else.  However the first steps of the "new connection"
path for each case are the same: allocate a connection entry and accept()
the connection.

Remove this duplication by making tcp_conn_from_sock() handle both spliced
and non-spliced cases, with help from more specific tcp_tap_conn_from_sock
and tcp_splice_conn_from_sock functions for the later stages which differ.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In tcp_sock_handler() we split off to handle spliced sockets before
checking anything else.  However the first steps of the "new connection"
path for each case are the same: allocate a connection entry and accept()
the connection.

Remove this duplication by making tcp_conn_from_sock() handle both spliced
and non-spliced cases, with help from more specific tcp_tap_conn_from_sock
and tcp_splice_conn_from_sock functions for the later stages which differ.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Partially unify tcp_timer() and tcp_splice_timer()</title>
<updated>2022-11-25T00:34:58+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-11-17T05:58:48+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=34476511f724d52ef4d944627ee741e33c1ba3d7'/>
<id>34476511f724d52ef4d944627ee741e33c1ba3d7</id>
<content type='text'>
These two functions scan all the non-splced and spliced connections
respectively and perform timed updates on them.  Avoid scanning the now
unified table twice, by having tcp_timer scan it once calling the
relevant per-connection function for each one.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These two functions scan all the non-splced and spliced connections
respectively and perform timed updates on them.  Avoid scanning the now
unified table twice, by having tcp_timer scan it once calling the
relevant per-connection function for each one.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Unify tcp_defer_handler and tcp_splice_defer_handler()</title>
<updated>2022-11-25T00:34:54+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-11-17T05:58:47+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=0eef48c4be5ae8d9de5fed4daaa1954323e08b96'/>
<id>0eef48c4be5ae8d9de5fed4daaa1954323e08b96</id>
<content type='text'>
These two functions each step through non-spliced and spliced connections
respectively and clean up entries for closed connections.  To avoid
scanning the connection table twice, we merge these into a single function
which scans the unified table and performs the appropriate sort of cleanup
action on each one.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These two functions each step through non-spliced and spliced connections
respectively and clean up entries for closed connections.  To avoid
scanning the connection table twice, we merge these into a single function
which scans the unified table and performs the appropriate sort of cleanup
action on each one.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Unify spliced and non-spliced connection tables</title>
<updated>2022-11-25T00:34:51+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-11-17T05:58:46+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=ee8f8e9564c5bd0ca25a9492e3102655d9ce0620'/>
<id>ee8f8e9564c5bd0ca25a9492e3102655d9ce0620</id>
<content type='text'>
Currently spliced and non-spliced connections are stored in completely
separate tables, so there are completely independent limits on the number
of spliced and non-spliced connections.  This is a bit counter-intuitive.

More importantly, the fact that the tables are separate prevents us from
unifying some other logic between the two cases.  So, merge these two
tables into one, using the 'c.spliced' common field to distinguish between
them when necessary.

For now we keep a common limit of 128k connections, whether they're spliced
or non-spliced, which means we save memory overall.  If necessary we could
increase this to a 256k or higher total, which would cost memory but give
some more flexibility.

For now, the code paths which need to step through all extant connections
are still separate for the two cases, just skipping over entries which
aren't for them.  We'll improve that in later patches.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently spliced and non-spliced connections are stored in completely
separate tables, so there are completely independent limits on the number
of spliced and non-spliced connections.  This is a bit counter-intuitive.

More importantly, the fact that the tables are separate prevents us from
unifying some other logic between the two cases.  So, merge these two
tables into one, using the 'c.spliced' common field to distinguish between
them when necessary.

For now we keep a common limit of 128k connections, whether they're spliced
or non-spliced, which means we save memory overall.  If necessary we could
increase this to a 256k or higher total, which would cost memory but give
some more flexibility.

For now, the code paths which need to step through all extant connections
are still separate for the two cases, just skipping over entries which
aren't for them.  We'll improve that in later patches.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp_splice: #include tcp_splice.h in tcp_splice.c</title>
<updated>2022-11-25T00:34:30+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-11-17T05:58:39+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=9ffa0184e30322b160dba01f63f1ec6d8026674c'/>
<id>9ffa0184e30322b160dba01f63f1ec6d8026674c</id>
<content type='text'>
This obvious include was omitted, which means that declarations in the
header weren't checked against definitions in the .c file.  This shows up
an old declaration for a function that is now static, and a duplicate

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This obvious include was omitted, which means that declarations in the
header weren't checked against definitions in the .c file.  This shows up
an old declaration for a function that is now static, and a duplicate

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Add include guards</title>
<updated>2022-03-29T13:35:38+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2022-03-25T23:05:31+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=965f603238a92b6ab8cd8a0592e0fb65c096b3e1'/>
<id>965f603238a92b6ab8cd8a0592e0fb65c096b3e1</id>
<content type='text'>
...at the moment, just for consistency with packet.h, icmp.h,
tcp.h and udp.h.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
...at the moment, just for consistency with packet.h, icmp.h,
tcp.h and udp.h.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Packet abstraction with mandatory boundary checks</title>
<updated>2022-03-29T13:35:38+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2022-03-25T12:02:47+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=bb708111833e23cafda1a5dd377e13400fa1e452'/>
<id>bb708111833e23cafda1a5dd377e13400fa1e452</id>
<content type='text'>
Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.

Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.

This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.

Suggested-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.

Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.

This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.

Suggested-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp_splice: Close sockets right away on high number of open files</title>
<updated>2022-03-29T13:35:38+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2022-03-18T23:33:46+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=92074c16a8dda83d79c4e9a534757b0e116f7157'/>
<id>92074c16a8dda83d79c4e9a534757b0e116f7157</id>
<content type='text'>
We can't take for granted that the hard limit for open files is
big enough as to allow to delay closing sockets to a timer.

Store the value of RTLIMIT_NOFILE we set at start, and use it to
understand if we're approaching the limit with pending, spliced
TCP connections. If that's the case, close sockets right away as
soon as they're not needed, instead of deferring this task to a
timer.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We can't take for granted that the hard limit for open files is
big enough as to allow to delay closing sockets to a timer.

Store the value of RTLIMIT_NOFILE we set at start, and use it to
understand if we're approaching the limit with pending, spliced
TCP connections. If that's the case, close sockets right away as
soon as they're not needed, instead of deferring this task to a
timer.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
