This document shows how to set up a Kata Containers environment using passt to
implement user-mode networking: contrary to other networking models currently
implemented, this kind of setup requires no elevated privileges or capabilities
as far as networking is concerned.
This proof-of-concept uses CRI-O as implementation container runtime, which is
controlled directly without resorting to a full Kubernetes environment.
# Pre-requisites
* Go and rust toolchains, typically provided by distribution packages
* the usual tools, such as git, make, etc.
* a 4.x qemu version, or more recent, with a working virtiofsd executable
(provided at least by Debian, Ubuntu, Fedora packages)
# Fetch and prepare components
## CRI-O
CRI-O is the container runtime. It implements the Kubernetes CRI (Container
Runtime Interface) on one side -- and we'll handle that part manually with
`crictl` here, and on the other side it supports OCI (Open Container Initiative)
runtimes -- Kata Containers is one of them.
### Fetch
git clone https://github.com/cri-o/cri-o.git
### Build
cd cri-o
make
### Install
As root:
make install
### Configure
Configuration is now at `/etc/crio/crio.conf`. This would also be the case for
distribution packages. Some specific configuration items for Kata Containers
are:
# Cgroup management implementation used for the runtime.
cgroup_manager = "cgroupfs"
# manage_ns_lifecycle determines whether we pin and remove namespaces
# and manage their lifecycle
manage_ns_lifecycle = true
and the following section, that can be added at the end, defines a special type
of runtime, the `vm` type. This is needed to run the Kata Containers runtime
instead of the default `crun` choice:
[crio.runtime.runtimes.kata]
runtime_path = "/usr/local/bin/containerd-shim-kata-v2"
runtime_type = "vm"
runtime_root = "/run/vc"
Note that we don't have a containerd-shim-kata-v2 binary yet, we'll deal with
that in the next steps.
## CNI plugins
CNI plugins are actually binaries, run by CRI-O, used to configure networking on
the host as well as on the pod side. A few network topologies are offered, with
very limited capabilities.
### Fetch
git clone https://github.com/containernetworking/plugins
### Build
cd plugins
./build_linux.sh
### Install
As root:
mkdir -p /opt/cni/bin
cp bin/* /opt/cni/bin/
### Configure
The path where CNI configurations are located is configurable in
`/etc/crio/crio.conf`, see the `network_dir` parameter there. Assuming the
default value, we need to provide at least one configuration under
`/etc/cni/net.d/`. For example:
# cat /etc/cni/net.d/50-kata-sandbox.conf
{
"cniVersion": "0.3.0",
"name": "crio-bridge",
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "10.88.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
}
## crictl
`crictl` is needed to control CRI-O in lieu of Kubernetes.
### Fetch
git clone https://github.com/kubernetes-sigs/cri-tools.git
### Build
cd cri-tools
make
### Install
As root:
make install
## mbuto
We'll use `mbuto` to build a minimal virtual machine image for usage with the
Kata Containers runtime.
### Fetch
git clone https://mbuto.lameexcu.se/mbuto
## Kata Containers
### Fetch
git clone https://github.com/kata-containers/kata-containers
### Patch
The current upstream version doesn't support the _passt_ networking model yet,
use the patch from this directory to add it:
patch -p1 < 0001-virtcontainers-agent-Add-passt-networking-model-and-.patch
### Build
make -C src/runtime
make -C src/agent LIBC=gnu
### Install
As root:
make -C src/runtime install
cp src/agent/target/x86_64-unknown-linux-gnu/release/kata-agent /usr/libexec/
chmod 755 /usr/libexec/kata-agent
### Build the Virtual Machine image
cd mbuto
./mbuto -f /tmp/kata.img
See `mbuto -h` for additional parameters, such as choice of kernel version,
kernel modules, program add-ons, etc. `mbuto` will print some configuration
parameters to be used in the configuration of the Kata Containers runtime below.
For example:
$ ./mbuto -c lz4 -f /tmp/kata.img
Not running as root, won't keep cpio mounted
Size: bin 12M lib 59M kmod 1.4M total 70M compressed 33M
Kata Containers [hypervisor.qemu] configuration:
kernel = "/boot/vmlinuz-5.10.0-6-amd64"
initrd = "/tmp/kata.img"
### Configure
The configuration file at this point is located at
`/usr/share/defaults/kata-containers/configuration-qemu.toml`. Some parameters of general interest are:
[hypervisor.qemu]
kernel = "/boot/vmlinuz-5.10.0-6-amd64"
initrd = "/tmp/kata.img"
where we can use the values indicated earlier by `mbuto`. Currently, the default
path for the `virtiofsd` daemon doesn't work for all distributions, ensure that
it matches. For example, on Debian:
virtio_fs_daemon = "/usr/lib/qemu/virtiofsd"
we'll then need to enable the `passt` networking model for the runtime. In the
`[runtime]` section:
internetworking_model=passt
# Run an example container
## Fetch
We'll now need an image of a container to run as example. With `podman`
installed via distribution package, we can import one:
podman pull docker.io/i386/busybox
## Configure
Now we can define configuration files for pod and container we want to create
and start:
$ cat pod-config.json
{
"metadata": {
"name": "kata-sandbox",
"namespace": "default",
"attempt": 1,
"uid": "hdishd83djaidwnduwk28bcsb"
},
"logDirectory": "/tmp",
"linux": {
}
}
$ cat container-busybox.json
{
"metadata": {
"name": "kata-busybox"
},
"image": {
"image": "docker.io/i386/busybox"
},
"command": [
"sleep", "6000"
],
"log_path":"kata-busybox.log",
"linux": {
}
}
## Run the container workload
Assuming we have `pod-config.json` and `container-busybox.json` defined above,
we can now:
### start CRI-O
crio -l debug
### create the pod and run a container inside it
c=$(crictl start $(crictl create $(crictl runp --runtime=kata pod-config.json) container-dpdk.json pod-config.json))
### verify that addresses are properly configured
crictl exec $c ip addr show
## Enable support for ICMP/ICMPv6 Echo Request
_passt_ can replicate ICMP Echo Requests sent by the workload, and propagate the
replies back. However, as it's not running as root, we need to enable so-called
_ping_ sockets for unprivileged users. From the namespace created by CRI-O for
this container:
sysctl -w net.ipv4.ping_group_range=net.ipv4.ping_group_range = 0 2147483647
# Troubleshooting
## Redirect qemu's console output to file
Agent errors and kernel messages should be accessible via named UNIX domain
socket at `/run/vc/vm/*/console.sock`, provided `agent.debug_console` is enabled
in `kernel_params` of `configuration.toml` but this won't work if the agent
doesn't start. In order to get those, we can wrap `qemu` and get, additionally,
all the output piped to a file:
$ cat /usr/local/bin/qemu.sh
#!/bin/sh
/usr/bin/qemu-system-x86_64 "$@" -serial file:/tmp/qemu.log 2>/tmp/qemu_err.log
now, use this as path for `qemu` in `configuration.toml`:
[hypervisor.qemu]
path = "/usr/local/bin/qemu.sh"
and don't forget to add `console=ttyS0` to the kernel parameters, so that kernel
messages will also be included:
kernel_params = "... console=ttyS0"
## Debug console
See the `kata-console` script in the
[kata-vfio-tools repository](https://github.com/dgibson/kata-vfio-tools) for a
convenient helper to access the debug console provided by the agent.