<feed xmlns='http://www.w3.org/2005/Atom'>
<title>passt, branch 2022_10_15.b3f3591</title>
<subtitle>Plug A Simple Socket Transport</subtitle>
<link rel='alternate' type='text/html' href='https://passt.top/passt/'/>
<entry>
<title>passt.1: Add David to AUTHORS</title>
<updated>2022-10-15T00:10:36+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2022-10-13T19:58:13+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=b3f359167be0ca9a6fea2554b2e2545177181269'/>
<id>b3f359167be0ca9a6fea2554b2e2545177181269</id>
<content type='text'>
I just realised while reading the man page.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I just realised while reading the man page.

Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()</title>
<updated>2022-10-15T00:10:36+00:00</updated>
<author>
<name>Stefano Brivio</name>
<email>sbrivio@redhat.com</email>
</author>
<published>2022-10-13T16:21:27+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=3e2eb4337bc06e2331b200b5805ed09244f92bf7'/>
<id>3e2eb4337bc06e2331b200b5805ed09244f92bf7</id>
<content type='text'>
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.

Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.

To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.

As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.

While at it:

- move the v4_only &amp;&amp; v6_only options check just after the first
  option processing loop, as this is more strictly related to
  option parsing proper

- update the man page, explaining that CAP_NET_BIND_SERVICE is
  *not* the preferred way to bind ports, because passt and pasta
  can be abused to allow other processes to make effective usage
  of it. Add a note about the recommended sysctl instead

- simplify nl_sock_init_do() now that it's called once for each
  case

Reported-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.

Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.

To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.

As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.

While at it:

- move the v4_only &amp;&amp; v6_only options check just after the first
  option processing loop, as this is more strictly related to
  option parsing proper

- update the man page, explaining that CAP_NET_BIND_SERVICE is
  *not* the preferred way to bind ports, because passt and pasta
  can be abused to allow other processes to make effective usage
  of it. Add a note about the recommended sysctl instead

- simplify nl_sock_init_do() now that it's called once for each
  case

Reported-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Rename pasta_setup_ns() to pasta_spawn_cmd()</title>
<updated>2022-10-15T00:10:36+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-10-14T04:25:37+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=40abd447c80a01fbbfa5de70ad3c33bdc1380c06'/>
<id>40abd447c80a01fbbfa5de70ad3c33bdc1380c06</id>
<content type='text'>
pasta_setup_ns() no longer has much to do with setting up a namespace.
Instead it's really about starting the shell or other command we want to
run with pasta connectivity.  Rename it and its argument structure to be
less misleading.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
pasta_setup_ns() no longer has much to do with setting up a namespace.
Instead it's really about starting the shell or other command we want to
run with pasta connectivity.  Rename it and its argument structure to be
less misleading.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>isolation: Only configure UID/GID mappings in userns when spawning shell</title>
<updated>2022-10-15T00:10:36+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-10-14T04:25:36+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=eb3d03a588e8a246a30017add08c79527fa9a6a3'/>
<id>eb3d03a588e8a246a30017add08c79527fa9a6a3</id>
<content type='text'>
When in passt mode, or pasta mode spawning a command, we create a userns
for ourselves.  This is used both to isolate the pasta/passt process itself
and to run the spawned command, if any.

Since eed17a47 "Handle userns isolation and dropping root at the same time"
we've handled both cases the same, configuring the UID and GID mappings in
the new userns to map whichever UID we're running as to root within the
userns.

This mapping is desirable when spawning a shell or other command, so that
the user gets a root shell with reasonably clear abilities within the
userns and netns.  It's not necessarily essential, though.  When not
spawning a shell, it doesn't really have any purpose: passt itself doesn't
need to be root and can operate fine with an unmapped user (using some of
the capabilities we get when entering the userns instead).

Configuring the uid_map can cause problems if passt is running with any
capabilities in the initial namespace, such as CAP_NET_BIND_SERVICE to
allow it to forward low ports.  In this case the kernel makes files in
/proc/pid owned by root rather than the starting user to prevent the user
from interfering with the operation of the capability-enhanced process.
This includes uid_map meaning we are not able to write to it.

Whether this behaviour is correct in the kernel is debatable, but in any
case we might as well avoid problems by only initializing the user mappings
when we really want them.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When in passt mode, or pasta mode spawning a command, we create a userns
for ourselves.  This is used both to isolate the pasta/passt process itself
and to run the spawned command, if any.

Since eed17a47 "Handle userns isolation and dropping root at the same time"
we've handled both cases the same, configuring the UID and GID mappings in
the new userns to map whichever UID we're running as to root within the
userns.

This mapping is desirable when spawning a shell or other command, so that
the user gets a root shell with reasonably clear abilities within the
userns and netns.  It's not necessarily essential, though.  When not
spawning a shell, it doesn't really have any purpose: passt itself doesn't
need to be root and can operate fine with an unmapped user (using some of
the capabilities we get when entering the userns instead).

Configuring the uid_map can cause problems if passt is running with any
capabilities in the initial namespace, such as CAP_NET_BIND_SERVICE to
allow it to forward low ports.  In this case the kernel makes files in
/proc/pid owned by root rather than the starting user to prevent the user
from interfering with the operation of the capability-enhanced process.
This includes uid_map meaning we are not able to write to it.

Whether this behaviour is correct in the kernel is debatable, but in any
case we might as well avoid problems by only initializing the user mappings
when we really want them.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>isolation: Prevent any child processes gaining capabilities</title>
<updated>2022-10-15T00:10:36+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-10-14T04:25:35+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=fb449b16bd8ccf8b751dfa5aee54cb7de0525706'/>
<id>fb449b16bd8ccf8b751dfa5aee54cb7de0525706</id>
<content type='text'>
We drop our own capabilities, but it's possible that processes we exec()
could gain extra privilege via file capabilities.  It shouldn't be possible
for us to exec() anyway due to seccomp() and our filesystem isolation.  But
just in case, zero the bounding and inheritable capability sets to prevent
any such child from gainin privilege.

Note that we do this *after* spawning the pasta shell/command (if any),
because we do want the user to be able to give that privilege if they want.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We drop our own capabilities, but it's possible that processes we exec()
could gain extra privilege via file capabilities.  It shouldn't be possible
for us to exec() anyway due to seccomp() and our filesystem isolation.  But
just in case, zero the bounding and inheritable capability sets to prevent
any such child from gainin privilege.

Note that we do this *after* spawning the pasta shell/command (if any),
because we do want the user to be able to give that privilege if they want.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>isolation: Replace drop_caps() with a version that actually does something</title>
<updated>2022-10-15T00:10:36+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-10-14T04:25:34+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=c22ebccba8a09065e59d080ac7a2969a267974ce'/>
<id>c22ebccba8a09065e59d080ac7a2969a267974ce</id>
<content type='text'>
The current implementation of drop_caps() doesn't really work because it
attempts to drop capabilities from the bounding set.  That's not the set
that really matters, it's about limiting the abilities of things we might
later exec() rather than our own capabilities.  It also requires
CAP_SETPCAP which we won't usually have.

Replace it with a new version which uses setcap(2) to drop capabilities
from the effective and permitted sets.  For now we leave the inheritable
set as is, since we don't want to preclude the user from passing
inheritable capabilities to the command spawed by pasta.

Correctly dropping caps reveals that we were relying on some capabilities
we'd supposedly dropped.  Re-divide the dropping of capabilities between
isolate_initial(), isolate_user() and isolate_prefork() to make this work.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current implementation of drop_caps() doesn't really work because it
attempts to drop capabilities from the bounding set.  That's not the set
that really matters, it's about limiting the abilities of things we might
later exec() rather than our own capabilities.  It also requires
CAP_SETPCAP which we won't usually have.

Replace it with a new version which uses setcap(2) to drop capabilities
from the effective and permitted sets.  For now we leave the inheritable
set as is, since we don't want to preclude the user from passing
inheritable capabilities to the command spawed by pasta.

Correctly dropping caps reveals that we were relying on some capabilities
we'd supposedly dropped.  Re-divide the dropping of capabilities between
isolate_initial(), isolate_user() and isolate_prefork() to make this work.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>isolation: Refactor isolate_user() to allow for a common exit path</title>
<updated>2022-10-15T00:10:36+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-10-14T04:25:33+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=ceb2061587b5113f58afc6944969ff79512a8767'/>
<id>ceb2061587b5113f58afc6944969ff79512a8767</id>
<content type='text'>
Currently, isolate_user() exits early if the --netns-only option is given.
That works for now, but shortly we're going to want to add some logic to
go at the end of isolate_user() that needs to run in all cases: joining a
given userns, creating a new userns, or staying in our original userns
(--netns-only).

To avoid muddying those changes, here we reorganize isolate_user() to have
a common exit path for all cases.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, isolate_user() exits early if the --netns-only option is given.
That works for now, but shortly we're going to want to add some logic to
go at the end of isolate_user() that needs to run in all cases: joining a
given userns, creating a new userns, or staying in our original userns
(--netns-only).

To avoid muddying those changes, here we reorganize isolate_user() to have
a common exit path for all cases.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Replace FWRITE with a function</title>
<updated>2022-10-15T00:10:36+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-10-14T04:25:32+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=ea5936dd3f6293fb761e3b670a0f40233e5396fd'/>
<id>ea5936dd3f6293fb761e3b670a0f40233e5396fd</id>
<content type='text'>
In a few places we use the FWRITE() macro to open a file, replace it's
contents with a given string and close it again.  There's no real
reason this needs to be a macro rather than just a function though.
Turn it into a function 'write_file()' and make some ancillary
cleanups while we're there:
  - Add a return code so the caller can handle giving a useful error message
  - Handle the case of short write()s (unlikely, but possible)
  - Add O_TRUNC, to make sure we replace the existing contents entirely

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In a few places we use the FWRITE() macro to open a file, replace it's
contents with a given string and close it again.  There's no real
reason this needs to be a macro rather than just a function though.
Turn it into a function 'write_file()' and make some ancillary
cleanups while we're there:
  - Add a return code so the caller can handle giving a useful error message
  - Handle the case of short write()s (unlikely, but possible)
  - Add O_TRUNC, to make sure we replace the existing contents entirely

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>isolation: Clarify various self-isolation steps</title>
<updated>2022-10-15T00:10:36+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-10-14T04:25:31+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=096e48669b56273d44a3242d7af4840b38335961'/>
<id>096e48669b56273d44a3242d7af4840b38335961</id>
<content type='text'>
We have a number of steps of self-isolation scattered across our code.
Improve function names and add comments to make it clearer what the self
isolation model is, what the steps do, and why they happen at the points
they happen.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have a number of steps of self-isolation scattered across our code.
Improve function names and add comments to make it clearer what the self
isolation model is, what the steps do, and why they happen at the points
they happen.

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove unhelpful drop_caps() call in pasta_start_ns()</title>
<updated>2022-10-15T00:10:36+00:00</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2022-10-14T04:25:30+00:00</published>
<link rel='alternate' type='text/html' href='https://passt.top/passt/commit/?id=6909a8e3390913d656f55803fa6cf5d0b1d46812'/>
<id>6909a8e3390913d656f55803fa6cf5d0b1d46812</id>
<content type='text'>
drop_caps() has a number of bugs which mean it doesn't do what you'd
expect.  However, even if we fixed those, the call in pasta_start_ns()
doesn't do anything useful:

* In the common case, we're UID 0 at this point.  In this case drop_caps()
  doesn't accomplish anything, because even with capabilities dropped, we
  are still privileged.
* When attaching to an existing namespace with --userns or --netns-only
  we might not be UID 0.  In this case it's too early to drop all
  capabilities: we need at least CAP_NET_ADMIN to configure the
  tap device in the namespace.

Remove this call - we will still drop capabilities a little later in
sandbox().

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
drop_caps() has a number of bugs which mean it doesn't do what you'd
expect.  However, even if we fixed those, the call in pasta_start_ns()
doesn't do anything useful:

* In the common case, we're UID 0 at this point.  In this case drop_caps()
  doesn't accomplish anything, because even with capabilities dropped, we
  are still privileged.
* When attaching to an existing namespace with --userns or --netns-only
  we might not be UID 0.  In this case it's too early to drop all
  capabilities: we need at least CAP_NET_ADMIN to configure the
  tap device in the namespace.

Remove this call - we will still drop capabilities a little later in
sandbox().

Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Signed-off-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
