Differences between revisions 1 and 26 (spanning 25 versions)
Revision 1 as of 2017-10-24 16:47:04
Size: 897
Comment:
Revision 26 as of 2022-07-13 11:57:37
Size: 14080
Editor: ?ThomasP
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
== systemd containers ==

systemd-nspawn and machinectl are lightweight container management tools.

They are deployed as part of systemd with the DebianPts:systemd-container package.


=== Usage example ===

Deploy FreedomBox on a Sid container. This will take around 1.2 GB of disk space.

{{{#!highlight bash numbers=disable
# create a new container using debootstrap
$ CDIR=/var/lib/machines/freedombox
$ sudo debootstrap sid $CDIR
$ sudo systemd-nspawn -D $CDIR --machine FreedomBox
root@FreedomBox:~# apt-get install -y freedombox-setup

# workaround for #862758
root@FreedomBox:~# apt-get install -y gir1.2-nm-1.0

# set root password and stop the container
root@FreedomBox:~# passwd root
root@FreedomBox:~# halt

# start the container and its services
$ sudo systemd-nspawn -D $CDIR --machine FreedomBox -b

# Browse to https://127.0.0.1/
}}}
= systemd-nspawn =

<<TableOfContents(3)>>

== About systemd-nspawn ==

systemd-nspawn may be used to run a command or OS in a light-weight namespace container. In many ways it is similar to [[DebianMan:chroot]], but more powerful since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and the host and domain name.

This mechanism is also similar to [[LXC]], but is much simpler to configure and most of the necessary software is already installed on contemporary Debian systems.

== Host Preparation ==

The ''host'' (i.e. the system hosting one or more containers) needs to have the [[DebianPackage:systemd-container]] package installed.

{{{#!highlight bash numbers=disable
$ apt-get install systemd-container
}}}

The host should also have unprivileged user namespaces enabled (see the [[https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#-U|documentation]] for an explanation of why, note that some consider this a [[https://lwn.net/Articles/673597/|security risk]]):

{{{#!highlight bash numbers=disable
$ echo 'kernel.unprivileged_userns_clone=1' >/etc/sysctl.d/nspawn.conf
$ systemctl restart systemd-sysctl.service
}}}

== Creating a Debian Container ==

Each ''guest'' OS should also have the [[DebianPackage:systemd-container]] package installed. A suitable guest OS installation may created using the DebianPackage:debootstrap or DebianPackage:cdebootstrap tools. For example, to create a new guest OS called `debian`:

{{{#!highlight bash numbers=disable
$ debootstrap --include=systemd-container stable /var/lib/machines/debian
I: Target architecture can be executed
I: Retrieving InRelease
I: Checking Release signature
...
}}}

After DebianPackage:debootstrap finishes, it is necessary to login to the newly created container and make some changes to allow root logins:
{{{#!highlight bash numbers=disable
$ systemd-nspawn -D /var/lib/machines/debian -U --machine debian
Spawning container buster on /var/lib/machines/debian.
Press ^] three times within 1s to kill container.
Selected user namespace base 818610176 and range 65536.

# set root password
root@debian:~# passwd
New password:
Retype new password:
passwd: password updated successfully

# allow login via local tty
root@debian:~# echo 'pts/1' >> /etc/securetty # May need to set 'pts/0' instead

# logout from container
root@debian:~# logout
Container debian exited successfully.
}}}

== Booting a Container ==

Once it has been setup, it is possible to boot a container using an instantiated [[DebianMan:systemd.service]]:
{{{#!highlight bash numbers=disable
# The part after the @ must match the container name used in the previous step
$ systemctl start systemd-nspawn@debian
}}}

== Checking Container State ==

To check the state of containers, use one of the following commands:

{{{#!highlight bash numbers=disable
$ machinectl list
MACHINE CLASS SERVICE OS VERSION ADDRESSES
debian container systemd-nspawn debian 10 -

# or
$ systemctl status systemd-nspawn@debian
● systemd-nspawn@debian.service - Container debian
   Loaded: loaded (/lib/systemd/system/systemd-nspawn@.service; disabled; vendor preset: enabled)
   Active: active (running) since ...
}}}

== Logging into a Container ==

To login to a running container:
{{{#!highlight bash numbers=disable
$ machinectl login debian
Connected to machine debian. Press ^] three times within 1s to exit session.

Debian GNU/Linux 10 debian pts/0

debian login:
}}}

== Stopping a Container ==

To stop a running container from the host, do:
{{{#!highlight bash numbers=disable
$ systemctl stop systemd-nspawn@debian
}}}

Alternatively, you can stop the container from within the guest OS by running e.g. `halt`:
{{{#!highlight bash numbers=disable
$ machinectl login debian
Connected to machine debian. Press ^] three times within 1s to exit session.

Debian GNU/Linux 10 debian pts/0

debian login: root
Password: <something>
Last login: Wed Jan 22 21:53:00 CET 2020 on pts/1
Linux debian 5.4.0-3-amd64 #1 SMP Debian 5.4.13-1 (2020-01-19) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@debian:~# halt
...
Machine debian terminated
}}}

== Networking ==

The host communicates with the guest container using a virtual interface named `ve-<container_name>@if<X>` while the guest uses a virtual interface named `host@if<Y>` for the same purposes:
{{{
$ ip a show dev ve-debian
77: ve-debian@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ... brd ff:ff:ff:ff:ff:ff link-netnsid 1
}}}

Enable and start [[SystemdNetworkd|systemd-networkd.service]] on the host and in the container to automatically provision the virtual link via DHCP with routing onto host's external network interfaces.

Alternatively the interfaces can be configured manually, e.g. to setup IP forwarding, masquerading, etc.

A port in the container can be made reachable from the outside using the `Port=tcp:hostport:containerport` option in the `.nspawn` `[Network]` section. The port will then be reachable from outside the host [[https://github.com/systemd/systemd/issues/6106|but not on the host's own 127.0.0.1]].

Container names can be resolved to IP addresses reachable from the host with no further configuration, but you may want to enable and start `systemd-resolved` on both the host and the container if they get resolved slowly. Then, from the host, `dig @127.0.0.53 my-container-hostname` should return the container's IP (`dig` is available in the package `dnsutils`).

=== Using host networking ===

You can disable private networking and make nspawn container to use host networking instead by adding following lines to /etc/systemd/nspawn/container-name.nspawn :

{{{
[Network]
VirtualEthernet=no
}}}

Replace 'container-name' with a name of your container.

Look [[https://wiki.archlinux.org/title/Systemd-nspawn#Use_host_networking|here]] for more info


== Using programs with Xorg ==

The container does not have any knowledge of your host's X server at first. If you want to run applications inside your container that should be able to use your host's X server and session, you need to specify the `DISPLAY` environment variable. A good way to do so interactively is using the `-E` option:

{{{#!highlight bash numbers=disable
$ systemd-nspawn -E DISPLAY="$DISPLAY" ...
}}}

However, the container now knows about the display but does not have any privileges. One possible way to allow access to your X server is using DebianMan:xhost. Note that you often find `xhost +` in tutorials on the web. '''Do not use this command''', it actually disables access control so that potentially ''anybody anywhere'' can connect to your X server. To revert it use `xhost -`.

If you use a single-user machine, you may want to use the following variant which allows any connection from localhost only (non-network):

{{{
$ xhost +local:
non-network local connections being added to access control list
}}}

It's possible to passthrough the configuration needed in the container. See [[https://wiki.archlinux.org/index.php/Systemd-nspawn#Avoiding_xhost|the Arch Linux wiki]] for one option.

=== Firefox example ===

It is possible to run Firefox within a Debian container, with the graphical output being sent to
an Xorg server running on the host. This means both an exploit in Firefox, and Xorg or the Linux container system, would be required to infiltrate the rest of the computer.
This example should work with any Linux distro, though, if not running Debian, it is strongly advised that Debian's GPG keyring is imported first.


Create the container, in the traditional container location (`/var/lib/machines`), by simply running:

{{{
# debootstrap --force-check-gpg --include=systemd-container stable /var/lib/machines/deb-firefox/ https://deb.debian.org/debian
}}}

Note: you may prefer to change 'stable' to a specific release name, if you are looking for a specific package.


Once finished, install the only package needed on the host ([[Xorg]] server should already be setup on host):

{{{
# apt-get install systemd-container
}}}


Execute a shell in the container:

{{{
# systemd-nspawn --private-users=pick --private-users-chown -D /var/lib/machines/deb-firefox/
}}}

Note: the `private-users` options will automatically adjust all the users in the container to high
UIDs & GIDs, including the container's root user.


Unfortunately there are many bugs in modern browsers, and, for minimalism, DebianMan:debootstrap does not include the security repository by default. So, by using the shell within the container, we need to add it,
install security updates, then install Firefox minimally, then end the container session:

{{{
(container's root) # echo 'deb https://deb.debian.org/debian-security/ stable-security main' >> /etc/apt/sources.list
(container's root) # apt-get update && apt-get dist-upgrade -y
(container's root) # apt-get install --no-install-recommends -y firefox-esr
(container's root) # exit
}}}
Note: `--no-install-recommends` will prevent the install of things that are not necessary in the
container, e.g. graphics drivers, Xorg server, etc. and so halves the download size.


On the host, from a user already running an [[Xorg]] session, allow local connections (only) to [[Xorg]]:

{{{
$ xhost +local:
}}}

Note: this will not persist across reboots, and will allow any local user on the host to access anything on the Xorg server, until you run `xhost -local:` later.

We are now ready to run Firefox. Execute the following script as root/sudo. This will start the container and run Firefox as the main process (as oppose to a shell, as we did previously):

{{{#!highlight bash numbers=disable
#!/bin/sh
systemd-nspawn --setenv=DISPLAY=:0 \
  --bind-ro=/tmp/.X11-unix/ \
  --private-users=pick \
  --private-users-chown \
  -D /var/lib/machines/deb-firefox/ \
  --as-pid2 firefox-esr
}}}

Notice that an Xorg-server directory is being shared from the host to the container via bind
mount. Notice also we are assuming the host's variable of `DISPLAY=:0`. However, this variable increments each time Xorg is killed and then restarted e.g. with DebianMan:startx (e.g. to `DISPLAY=:1`).

After a little loading, Firefox should now appear in your [[Xorg]] session. You may prefer to close your Xorg server to new connections now with `xhost -local:`, or view current connections with `xlsclients`.


From the host, verify that Firefox is running as a high UID user:

{{{
# ps -eo user,pid,command | grep firefox
}}}
||'''USER'''||'''PID'''||'''COMMAND'''||
||root ||2346 ||systemd-nspawn [...] --as-pid2 firefox-esr||
||1247589+||2373||firefox-esr||


Notice the high UID in the 'USER' column.


One drawback to this setup is that updates to Firefox will have to be done manually, by executing a shell (as done previously), and running `apt-get update && apt-get -y dist-upgrade`. This is because the way we are running the container is very minimal, to minimize performance loss, but this means there is nothing else running in the container that can check and install updates (e.g. via `crontab`).



''' Extra steps for increased isolation '''

Systemd-nspawn has a special mode (`--volatile`) which allows for the container's filesystem to be
run as read-only, with other pieces run within `tmpfs`, whereby any modifications are discarded
when the container exits. However, this mode ignores the container's `/root/` and `/home/`, so we have
to store the Firefox profile on the host.


Make a directory for storing your Firefox profile, find the automatically assigned high GID/UID, and
adjust the owner & group of the directory to match the root in the container:

{{{
# mkdir /var/lib/machines/firefox-profile/
# ls -lhd /var/lib/machines/deb-firefox/root/
# chown your_high_uid_here:your_high_gid_here /var/lib/machines/firefox-profile/
}}}

We can then bind mount this directory into the container, so that it can store data across
`--volatile` container start/stops, and run the container in `systemd-nspawn --volatile` mode:

{{{#!highlight bash numbers=disable
#!/bin/sh
systemd-nspawn --setenv=DISPLAY=:0 \
  --bind-ro=/tmp/.X11-unix/ \
  --private-users=pick \
  --private-users-chown \
  -D /var/lib/machines/deb-firefox/ \
  --bind=/var/lib/machines/firefox-profile/:/root/.mozilla/ \
  --volatile \
  --as-pid2 firefox-esr
}}}

Now, the container will be non-persistent across start/stops, but the Firefox profile will be
persistent.

This setup was inspired by OpenBSD's Firefox defaults. Full example last tested: 2022-05-29


== PulseAudio Tweaks ==

[[PulseAudio]] will not work out of the box if you need it. Make sure that you have necessary libraries installed (e.g. by `apt install pulseaudio`).

You probably want the container to use the host's PulseAudio server. Find out the PulseAudio UNIX socket. Note: There's an article in the [[https://wiki.gentoo.org/wiki/PulseAudio#Allow_multiple_users_to_use_PulseAudio_concurrently|Gentoo wiki]] on how to allow multiple userse to use one PulseAudio server at the same time.

When you start the container, you need to bind the host socket in the file system of the guest and pass an environment variable `PULSE_SERVER` that defines where the socket is in the guest. Example:

{{{
$ systemd-nspawn -E PULSE_SERVER="unix:/pulse-guest.socket" --bind=/pulse-host.socket:/pulse-guest.socket ...
}}}

----

CategorySoftware | CategoryVirtualization | CategorySystemAdministration

systemd-nspawn

About systemd-nspawn

systemd-nspawn may be used to run a command or OS in a light-weight namespace container. In many ways it is similar to chroot, but more powerful since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and the host and domain name.

This mechanism is also similar to LXC, but is much simpler to configure and most of the necessary software is already installed on contemporary Debian systems.

Host Preparation

The host (i.e. the system hosting one or more containers) needs to have the systemd-container package installed.

$ apt-get install systemd-container

The host should also have unprivileged user namespaces enabled (see the documentation for an explanation of why, note that some consider this a security risk):

$ echo 'kernel.unprivileged_userns_clone=1' >/etc/sysctl.d/nspawn.conf
$ systemctl restart systemd-sysctl.service

Creating a Debian Container

Each guest OS should also have the systemd-container package installed. A suitable guest OS installation may created using the debootstrap or cdebootstrap tools. For example, to create a new guest OS called debian:

$ debootstrap --include=systemd-container stable /var/lib/machines/debian
I: Target architecture can be executed
I: Retrieving InRelease
I: Checking Release signature
...

After debootstrap finishes, it is necessary to login to the newly created container and make some changes to allow root logins:

$ systemd-nspawn -D /var/lib/machines/debian -U --machine debian
Spawning container buster on /var/lib/machines/debian.
Press ^] three times within 1s to kill container.
Selected user namespace base 818610176 and range 65536.

# set root password
root@debian:~# passwd
New password:
Retype new password:
passwd: password updated successfully

# allow login via local tty
root@debian:~# echo 'pts/1' >> /etc/securetty  # May need to set 'pts/0' instead

# logout from container
root@debian:~# logout
Container debian exited successfully.

Booting a Container

Once it has been setup, it is possible to boot a container using an instantiated systemd.service:

# The part after the @ must match the container name used in the previous step
$ systemctl start systemd-nspawn@debian

Checking Container State

To check the state of containers, use one of the following commands:

$ machinectl list
MACHINE CLASS     SERVICE        OS     VERSION ADDRESSES
debian container systemd-nspawn debian 10      -

# or
$ systemctl status systemd-nspawn@debian
● systemd-nspawn@debian.service - Container debian
   Loaded: loaded (/lib/systemd/system/systemd-nspawn@.service; disabled; vendor preset: enabled)
   Active: active (running) since ...

Logging into a Container

To login to a running container:

$ machinectl login debian
Connected to machine debian. Press ^] three times within 1s to exit session.

Debian GNU/Linux 10 debian pts/0

debian login:

Stopping a Container

To stop a running container from the host, do:

$ systemctl stop systemd-nspawn@debian

Alternatively, you can stop the container from within the guest OS by running e.g. halt:

$ machinectl login debian
Connected to machine debian. Press ^] three times within 1s to exit session.

Debian GNU/Linux 10 debian pts/0

debian login: root
Password: <something>
Last login: Wed Jan 22 21:53:00 CET 2020 on pts/1
Linux debian 5.4.0-3-amd64 #1 SMP Debian 5.4.13-1 (2020-01-19) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@debian:~# halt
...
Machine debian terminated

Networking

The host communicates with the guest container using a virtual interface named ve-<container_name>@if<X> while the guest uses a virtual interface named host@if<Y> for the same purposes:

$ ip a show dev ve-debian
77: ve-debian@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ... brd ff:ff:ff:ff:ff:ff link-netnsid 1

Enable and start systemd-networkd.service on the host and in the container to automatically provision the virtual link via DHCP with routing onto host's external network interfaces.

Alternatively the interfaces can be configured manually, e.g. to setup IP forwarding, masquerading, etc.

A port in the container can be made reachable from the outside using the Port=tcp:hostport:containerport option in the .nspawn [Network] section. The port will then be reachable from outside the host but not on the host's own 127.0.0.1.

Container names can be resolved to IP addresses reachable from the host with no further configuration, but you may want to enable and start systemd-resolved on both the host and the container if they get resolved slowly. Then, from the host, dig @127.0.0.53 my-container-hostname should return the container's IP (dig is available in the package dnsutils).

Using host networking

You can disable private networking and make nspawn container to use host networking instead by adding following lines to /etc/systemd/nspawn/container-name.nspawn :

[Network]
VirtualEthernet=no

Replace 'container-name' with a name of your container.

Look here for more info

Using programs with Xorg

The container does not have any knowledge of your host's X server at first. If you want to run applications inside your container that should be able to use your host's X server and session, you need to specify the DISPLAY environment variable. A good way to do so interactively is using the -E option:

$ systemd-nspawn -E DISPLAY="$DISPLAY" ...

However, the container now knows about the display but does not have any privileges. One possible way to allow access to your X server is using xhost. Note that you often find xhost + in tutorials on the web. Do not use this command, it actually disables access control so that potentially anybody anywhere can connect to your X server. To revert it use xhost -.

If you use a single-user machine, you may want to use the following variant which allows any connection from localhost only (non-network):

$ xhost +local:
non-network local connections being added to access control list

It's possible to passthrough the configuration needed in the container. See the Arch Linux wiki for one option.

Firefox example

It is possible to run Firefox within a Debian container, with the graphical output being sent to an Xorg server running on the host. This means both an exploit in Firefox, and Xorg or the Linux container system, would be required to infiltrate the rest of the computer. This example should work with any Linux distro, though, if not running Debian, it is strongly advised that Debian's GPG keyring is imported first.

Create the container, in the traditional container location (/var/lib/machines), by simply running:

# debootstrap --force-check-gpg --include=systemd-container stable /var/lib/machines/deb-firefox/ https://deb.debian.org/debian

Note: you may prefer to change 'stable' to a specific release name, if you are looking for a specific package.

Once finished, install the only package needed on the host (Xorg server should already be setup on host):

# apt-get install systemd-container

Execute a shell in the container:

# systemd-nspawn --private-users=pick --private-users-chown -D /var/lib/machines/deb-firefox/

Note: the private-users options will automatically adjust all the users in the container to high UIDs & GIDs, including the container's root user.

Unfortunately there are many bugs in modern browsers, and, for minimalism, debootstrap does not include the security repository by default. So, by using the shell within the container, we need to add it, install security updates, then install Firefox minimally, then end the container session:

(container's root) # echo 'deb https://deb.debian.org/debian-security/ stable-security main' >> /etc/apt/sources.list
(container's root) # apt-get update && apt-get dist-upgrade -y
(container's root) # apt-get install --no-install-recommends -y firefox-esr
(container's root) # exit

Note: --no-install-recommends will prevent the install of things that are not necessary in the container, e.g. graphics drivers, Xorg server, etc. and so halves the download size.

On the host, from a user already running an Xorg session, allow local connections (only) to Xorg:

$ xhost +local:

Note: this will not persist across reboots, and will allow any local user on the host to access anything on the Xorg server, until you run xhost -local: later.

We are now ready to run Firefox. Execute the following script as root/sudo. This will start the container and run Firefox as the main process (as oppose to a shell, as we did previously):

#!/bin/sh
systemd-nspawn --setenv=DISPLAY=:0    \
                --bind-ro=/tmp/.X11-unix/    \
                --private-users=pick    \
                --private-users-chown    \
                -D /var/lib/machines/deb-firefox/    \
                --as-pid2 firefox-esr

Notice that an Xorg-server directory is being shared from the host to the container via bind mount. Notice also we are assuming the host's variable of DISPLAY=:0. However, this variable increments each time Xorg is killed and then restarted e.g. with startx (e.g. to DISPLAY=:1).

After a little loading, Firefox should now appear in your Xorg session. You may prefer to close your Xorg server to new connections now with xhost -local:, or view current connections with xlsclients.

From the host, verify that Firefox is running as a high UID user:

# ps -eo user,pid,command | grep firefox

USER

PID

COMMAND

root

2346

systemd-nspawn [...] --as-pid2 firefox-esr

1247589+

2373

firefox-esr

Notice the high UID in the 'USER' column.

One drawback to this setup is that updates to Firefox will have to be done manually, by executing a shell (as done previously), and running apt-get update && apt-get -y dist-upgrade. This is because the way we are running the container is very minimal, to minimize performance loss, but this means there is nothing else running in the container that can check and install updates (e.g. via crontab).

Extra steps for increased isolation

Systemd-nspawn has a special mode (--volatile) which allows for the container's filesystem to be run as read-only, with other pieces run within tmpfs, whereby any modifications are discarded when the container exits. However, this mode ignores the container's /root/ and /home/, so we have to store the Firefox profile on the host.

Make a directory for storing your Firefox profile, find the automatically assigned high GID/UID, and adjust the owner & group of the directory to match the root in the container:

# mkdir /var/lib/machines/firefox-profile/
# ls -lhd /var/lib/machines/deb-firefox/root/
# chown your_high_uid_here:your_high_gid_here /var/lib/machines/firefox-profile/

We can then bind mount this directory into the container, so that it can store data across --volatile container start/stops, and run the container in systemd-nspawn --volatile mode:

#!/bin/sh
systemd-nspawn  --setenv=DISPLAY=:0    \
                --bind-ro=/tmp/.X11-unix/    \
                --private-users=pick    \
                --private-users-chown    \
                -D /var/lib/machines/deb-firefox/    \
                --bind=/var/lib/machines/firefox-profile/:/root/.mozilla/    \
                --volatile    \
                --as-pid2 firefox-esr

Now, the container will be non-persistent across start/stops, but the Firefox profile will be persistent.

This setup was inspired by OpenBSD's Firefox defaults. Full example last tested: 2022-05-29

PulseAudio Tweaks

PulseAudio will not work out of the box if you need it. Make sure that you have necessary libraries installed (e.g. by apt install pulseaudio).

You probably want the container to use the host's PulseAudio server. Find out the PulseAudio UNIX socket. Note: There's an article in the Gentoo wiki on how to allow multiple userse to use one PulseAudio server at the same time.

When you start the container, you need to bind the host socket in the file system of the guest and pass an environment variable PULSE_SERVER that defines where the socket is in the guest. Example:

$ systemd-nspawn -E PULSE_SERVER="unix:/pulse-guest.socket" --bind=/pulse-host.socket:/pulse-guest.socket ...


CategorySoftware | CategoryVirtualization | CategorySystemAdministration