aboutgitcodebugslistschat
path: root/contrib/kata-containers/README.md
blob: 4cc8a70e0d88bcc322ebf381a3113706cf142b9a (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
This document shows how to set up a Kata Containers environment using passt to
implement user-mode networking: contrary to other networking models currently
implemented, this kind of setup requires no elevated privileges or capabilities
as far as networking is concerned.

This proof-of-concept uses CRI-O as implementation container runtime, which is
controlled directly without resorting to a full Kubernetes environment.

# Pre-requisites

* Go and rust toolchains, typically provided by distribution packages
* the usual tools, such as git, make, etc.
* a 4.x qemu version, or more recent, with a working virtiofsd executable
  (provided at least by Debian, Ubuntu, Fedora packages)

# Fetch and prepare components

## CRI-O

CRI-O is the container runtime. It implements the Kubernetes CRI (Container
Runtime Interface) on one side -- and we'll handle that part manually with
`crictl` here, and on the other side it supports OCI (Open Container Initiative)
runtimes -- Kata Containers is one of them.

### Fetch

    git clone https://github.com/cri-o/cri-o.git

### Build

    cd cri-o
    make

### Install

As root:

    make install

### Configure

Configuration is now at `/etc/crio/crio.conf`. This would also be the case for
distribution packages. Some specific configuration items for Kata Containers
are:

    # Cgroup management implementation used for the runtime.
    cgroup_manager = "cgroupfs"

    # manage_ns_lifecycle determines whether we pin and remove namespaces
    # and manage their lifecycle
    manage_ns_lifecycle = true

and the following section, that can be added at the end, defines a special type
of runtime, the `vm` type. This is needed to run the Kata Containers runtime
instead of the default `crun` choice:

    [crio.runtime.runtimes.kata]
    runtime_path = "/usr/local/bin/containerd-shim-kata-v2"
    runtime_type = "vm"
    runtime_root = "/run/vc"

Note that we don't have a containerd-shim-kata-v2 binary yet, we'll deal with
that in the next steps.

## CNI plugins

CNI plugins are actually binaries, run by CRI-O, used to configure networking on
the host as well as on the pod side. A few network topologies are offered, with
very limited capabilities.

### Fetch

    git clone https://github.com/containernetworking/plugins

### Build

    cd plugins
    ./build_linux.sh

### Install

As root:

    mkdir -p /opt/cni/bin
    cp bin/* /opt/cni/bin/


### Configure

The path where CNI configurations are located is configurable in
`/etc/crio/crio.conf`, see the `network_dir` parameter there. Assuming the
default value, we need to provide at least one configuration under
`/etc/cni/net.d/`. For example:

    # cat /etc/cni/net.d/50-kata-sandbox.conf 
    {
        "cniVersion": "0.3.0",
        "name": "crio-bridge",
        "type": "bridge",
        "bridge": "cni0",
        "isGateway": true,
        "ipMasq": true,
        "ipam": {
            "type": "host-local",
            "subnet": "10.88.0.0/16",
            "routes": [
                { "dst": "0.0.0.0/0" }
            ]
        }
    }

## crictl

`crictl` is needed to control CRI-O in lieu of Kubernetes.

### Fetch

    git clone https://github.com/kubernetes-sigs/cri-tools.git

### Build

    cd cri-tools
    make

### Install

As root:

    make install

## mbuto

We'll use `mbuto` to build a minimal virtual machine image for usage with the
Kata Containers runtime.

### Fetch

    git clone https://mbuto.lameexcu.se/mbuto

## Kata Containers

### Fetch

    git clone https://github.com/kata-containers/kata-containers

### Patch

The current upstream version doesn't support the _passt_ networking model yet,
use the patch from this directory to add it:

    patch -p1 < 0001-virtcontainers-agent-Add-passt-networking-model-and-.patch

### Build

    make -C src/runtime
    make -C src/agent LIBC=gnu

### Install

As root:

    make -C src/runtime install
    cp src/agent/target/x86_64-unknown-linux-gnu/release/kata-agent /usr/libexec/
    chmod 755 /usr/libexec/kata-agent

### Build the Virtual Machine image

    cd mbuto
    ./mbuto -f /tmp/kata.img

See `mbuto -h` for additional parameters, such as choice of kernel version,
kernel modules, program add-ons, etc. `mbuto` will print some configuration
parameters to be used in the configuration of the Kata Containers runtime below.
For example:

    $ ./mbuto -c lz4 -f /tmp/kata.img
    Not running as root, won't keep cpio mounted
    Size: bin   12M lib   59M kmod  1.4M total   70M compressed   33M
    Kata Containers [hypervisor.qemu] configuration:
    
    	kernel = "/boot/vmlinuz-5.10.0-6-amd64"
    	initrd = "/tmp/kata.img"

### Configure

The configuration file at this point is located at
`/usr/share/defaults/kata-containers/configuration-qemu.toml`. Some parameters of general interest are:

    [hypervisor.qemu]
    kernel = "/boot/vmlinuz-5.10.0-6-amd64"
    initrd = "/tmp/kata.img"

where we can use the values indicated earlier by `mbuto`. Currently, the default
path for the `virtiofsd` daemon doesn't work for all distributions, ensure that
it matches. For example, on Debian:

    virtio_fs_daemon = "/usr/lib/qemu/virtiofsd"

we'll then need to enable the `passt` networking model for the runtime. In the
`[runtime]` section:

    	internetworking_model=passt

# Run an example container

## Fetch

We'll now need an image of a container to run as example. With `podman`
installed via distribution package, we can import one:

    podman pull docker.io/i386/busybox

## Configure

Now we can define configuration files for pod and container we want to create
and start:

    $ cat pod-config.json
    {
        "metadata": {
            "name": "kata-sandbox",
            "namespace": "default",
            "attempt": 1,
            "uid": "hdishd83djaidwnduwk28bcsb"
        },
        "logDirectory": "/tmp",
        "linux": {
        }
    }

    $ cat container-busybox.json
    {
      "metadata": {
          "name": "kata-busybox"
      },
      "image": {
          "image": "docker.io/i386/busybox"
      },
      "command": [
          "sleep", "6000"
      ],
      "log_path":"kata-busybox.log",
      "linux": {
      }
    }

## Run the container workload

Assuming we have `pod-config.json` and `container-busybox.json` defined above,
we can now:

### start CRI-O

    crio -l debug

### create the pod and run a container inside it

    c=$(crictl start $(crictl create $(crictl runp --runtime=kata pod-config.json) container-dpdk.json pod-config.json))

### verify that addresses are properly configured

    crictl exec $c ip addr show

## Enable support for ICMP/ICMPv6 Echo Request

_passt_ can replicate ICMP Echo Requests sent by the workload, and propagate the
replies back. However, as it's not running as root, we need to enable so-called
_ping_ sockets for unprivileged users. From the namespace created by CRI-O for
this container:

    sysctl -w net.ipv4.ping_group_range=net.ipv4.ping_group_range = 0 2147483647

# Troubleshooting

## Redirect qemu's console output to file

Agent errors and kernel messages should be accessible via named UNIX domain
socket at `/run/vc/vm/*/console.sock`, provided `agent.debug_console` is enabled
in `kernel_params` of `configuration.toml` but this won't work if the agent
doesn't start. In order to get those, we can wrap `qemu` and get, additionally,
all the output piped to a file:

    $ cat /usr/local/bin/qemu.sh
    #!/bin/sh
    
    /usr/bin/qemu-system-x86_64 "$@" -serial file:/tmp/qemu.log 2>/tmp/qemu_err.log

now, use this as path for `qemu` in `configuration.toml`:

    [hypervisor.qemu]
    path = "/usr/local/bin/qemu.sh"

and don't forget to add `console=ttyS0` to the kernel parameters, so that kernel
messages will also be included:

    kernel_params = "... console=ttyS0"

## Debug console

See the `kata-console` script in the
[kata-vfio-tools repository](https://github.com/dgibson/kata-vfio-tools) for a
convenient helper to access the debug console provided by the agent.