FileSystem > ZFS

ZFS is a combined file system and logical volume manager designed by Sun Microsystems (now owned by Oracle), which is licensed as open-source software under the Common Development and Distribution License (CDDL) as part of the ?OpenSolaris project in November 2005. OpenZFS brings together developers and users from various open-source forks of the original ZFS on different platforms, it was announced in September 2013 as the truly open source successor to the ZFS project.

Described as The last word in filesystems, ZFS is scalable, and includes extensive protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z, native NFSv4 ACLs, and can be very precisely configured.

Status

Debian kFreeBSD users are able to use ZFS since the release of Squeeze, for those who use Linux kernel it is available from contrib archive area with the form of DKMS source since the release of Stretch. There is also a deprecated userspace implementation facilitating the FUSE framework. This page will demonstrate using ZFS on Linux (ZoL) if not specifically pointed to the kFreeBSD or FUSE implementation.

Due to potential legal incompatibilities between CDDL and GPL, even both of them are OSI-approved free software license that complies with DFSG, ZFS development is not supported by the Linux kernel. ZoL is a project funded by the Lawrence Livermore National Laboratory to develop a native Linux kernel module for its massive storage requirements and super computers.

Features

Installation

ZFS on Linux is provided in the form of DKMS source for Debian users, you would need to add contrib section to your apt sources configuration to be able to get the packages. Also, it is recommended by Debian ZFS on Linux Team to install ZFS related packages from Backports archive, upstream stable patches will be tracked and compatibility is always maintained. When configured, use following commands to install the packages:

  apt update
  apt install linux-headers-`uname -r`
  apt install -t buster-backports dkms spl-dkms
  apt install -t buster-backports zfs-dkms zfsutils-linux

The given example has separated the steps of installing Linux headers, spl and zfs. It's fine to combine everything in one command but let's be explict to avoid any chance of messing up with versions, future updates will be taken care by apt.

Creating the Pool

Many disks can be added to a storage pool, and ZFS can allocate space from it, so the first step of using ZFS is creating a pool. It is recommended to use more than 1 whole disk to take advantage of full benefits but you are still cool to proceed with only one device or just a partition.

In the world of ZFS, device names with path/id are usually used to identify a disk, because names of /dev/sdX is subject to change by the operating system. These names can be retrieved with ls -l /dev/disk/by-id/ or ls -l /dev/disk/by-path/

Basic Configuration

The most common pool configurations are mirror, raidz and raidz2, choose one from the following:

    zpool create tank mirror scsi-35000cca2735cbc38 scsi-35000cca266cc4b3c

    zpool create tank raidz scsi-35000cca2735cbc38 scsi-35000cca266cc4b3c scsi-35000cca26c108480

    zpool create tank raidz2 scsi-35000cca2735cbc38 scsi-35000cca266cc4b3c scsi-35000cca26c108480 scsi-35000cca266ccbdb4

    zpool create tank scsi-35000cca2735cbc38 scsi-35000cca266cc4b3c

    zpool create tank scsi-35000cca26c108480

Advanced Configuration

If building a pool with a larger number of disks, you are encouraged to configure them into more than one group and finally construct a stripe pool using these vdevs. This would allow more flexible pool design to trade-off among space, redundancy and efficiency.

Different configurations may have different IO characteristics under certain workload pattern, please refer to see also section at the end of this page for more information.

    zpool create tank mirror scsi-35000cca2735cbc38 scsi-35000cca266cc4b3c \
                    mirror scsi-35000cca26c108480 scsi-35000cca266ccbdb4 \
                    mirror scsi-35000cca266c75c74 scsi-35000cca26c0e84dc \
                    mirror scsi-35000cca266cda748 scsi-35000cca266cd14b4 \
                    mirror scsi-35000cca266cb8ae4 scsi-35000cca266cbad80

    zpool create tank raidz scsi-35000cca2735cbc38 scsi-35000cca266cc4b3c scsi-35000cca26c108480 scsi-35000cca266ccbdb4 scsi-35000cca266c75c74 \
                    raidz scsi-35000cca26c0e84dc scsi-35000cca266cda748 scsi-35000cca266cd14b4 scsi-35000cca266cb8ae4 scsi-35000cca266cbad80

ZFS can make use of fast SSD as second level cache (L2ARC) after RAM (ARC), which can improve cache hit rate thus improving overall performance. Because cache devices could be read and write very frequently when the pool is busy, please consider to use more durable SSD devices (SLC/MLC over TLC/QLC) preferably come with NVMe protocol. This cache is only use for read operations, so that data write to cache disk is demanded by read operations, and is not related to write operations at all.

    zpool add tank cache nvme-MT001600KWHAC_S3M0NA0K700264

ZFS can also make uses of NVRAM/Optane/SSD as SLOG (Separate ZFS Intent Log) device, which can be considered as kind of write cache but that's far from the truth. SLOG devices are used for speeding up synchronous writes by sending those transaction to SLOG in parallel to slower disks, as soon as the transaction is successful on SLOG the operation is marked as completed, then the synchronous operation is unblocked quicker and resistance against power loss is not compromised. Mirrored set up of SLOG devices is obviously recommended. Please also note that asynchronous writes are not sent to SLOG by default, you could try to set sync=always property of the working dataset and see whether performance gets improved.

    zpool add tank log mirror nvme-MT001600KWHAC_S3M0NA0K700244 nvme-MT001600KWHAC_S3M0NA0K700246

Provisioning file systems or volume

After creating the zpool, we are able to provision file systems or volume (ZVOL). ZVOL is a kind of block device whose space being allocated from zpool, you are able to create another file system on it like any other block device.

    mkdir -p /data
    zfs create -o mountpoint=/data tank/data

    zfs create -s -V 4GB tank/vol
    mkfs.ext4 /dev/zvol/tank/vol
    mount /dev/zvol/tank/vol /mnt

    # ZFS will handle mounts that are managed by it
    zfs destroy tank/data
    # Need to umount first, because this mount is user managed
    umount /dev/zvol/tank/vol
    zfs destroy tank/vol

Snapshots

Snapshot is a most wanted feature of modern file system, ZFS definitely supports it.

Creating and Managing Snapshots

    zfs snapshot tank/data@2019-06-24

    zfs destroy tank/data@2019-06-24

Backup and Restore (with remote)

It is possible to backup a ZFS dataset to another pool with zfs send/recv commands, even the pool is located at the other end of network.

    # create a initial snapshot
    zfs snapshot tank/data@initial
    # send it to another local pool, named ''tank2'', and calling the dataset ''packman''
    zfs send tank/data@initial | zfs recv -F tank2/packman
    # send it to a remote pool, named ''tanker'' at remote side
    zfs send tank/data@initial | ssh remotehost zfs recv -F tanker/data
    # after using ''tank/data'' for a while, create another snapshot
    zfs snapshot tank/data@2019-06-24T18-10
    # incrementally send the new state to remote
    zfs send -i initial tank/data@2019-06-24T18-10 | ssh remotehost zfs recv -F tanker/data

File Sharing

ZFS has integration with operating system's NFS, CIFS and iSCSI servers, it does not implement its own server but reuse existing software. However, iSCSI integration is not yet available on Linux. It is recommended to enable xattr=sa and dnodesize=auto for these usages.

NFS shares

To share a dataset through NFS, nfs-kernel-server package needs to be installed:

    apt install nfs-kernel-server

Set up recommended properties for the targeting zfs file system:

    zfs set xattr=sa dnodesize=auto tank/data

Configure a very simiple NFS share (read/write to 192.168.0.0/24, read only to 10.0.0.0/8):

    zfs set mountpoint=/data tank/data
    zfs set sharenfs="rw=192.168.0.0/24,ro=10.0.0.0/8" tank/data
    zfs share tank/data

Verify the share is exported successfuly:

    showmount -e 127.0.0.1

Stop the NFS share:

    zfs unshare tank/data
    # If you want to disable the share forever, do the following
    zfs sharenfs=off tank/data

CIFS shares

CIFS is a dialect of Server Message Block (SMB) Protocol and could be used on Windows, VMS, several versions of Unix, and other operating systems.

To share a dataset through CIFS, samba package needs to be installed:

    apt install samba

Because Microsoft Windows is not case sensitive, it is recommended to set casesensitivity=mixed to the dataset to be shared, and this property can only be set on creation time:

    zfs create -o casesensitivity=mixed -o xattr=sa -o dnodesize=auto tank/data

Configure a very simiple CIFS share (read/write to 192.168.0.0/24, read only to 10.0.0.0/8):

    zfs set mountpoint=/data tank/data
    zfs set sharesmb=on tank/data
    zfs share tank/data

Verify the share is exported successfuly:

    smbclient -U guest -N -L localhost

Stop the CIFS share:

    zfs unshare tank/data
    # If you want to disable the share forever, do the following
    zfs sharesmb=off tank/data

Encryption

ZFS native encryption was implemented since Zol 0.8.0 release. For any older version the alternative solution is to wrap ZFS with LUKS (see cryptsetup). Creating encrypted ZFS is straightforward, for example:

    zfs create -o encryption=on -o keyformat=passphrase tank/secret

ZFS will prompt and ask you to input the passphrase. Alternatively, the key location could be specified with the "keylocation" attribute.

ZFS can also encrypt a dataset during "recv":

    zfs send tank/data | zfs recv -o encryption=on -o keylocation=file:///path/to/my/raw/key backup/data

Before mounting an encrypted dataset, the key has to be loaded (zfs load-key tank/secret) first. "zfs mount" provides a shortcut for the two steps:

    zfs mount -l tank/secret

Interoperability

Last version of ZFS released from ?OpenSolaris is zpool v28, after that Oracle has decided not to publish future updates, so that version 28 has the best interoperability across all implementations. This is also the last pool version zfs-fuse supports.

Later it is decided the open source implementation will stick to zpool v5000 and make any future changes tracked and controled by feature flags. This is an incompatible change to the closed source successor and v28 will remain the last interoperatable pool version.

By default new pools are created with all supported features enabled (use -d option to disable), and if you want a pool of version 28:

    zpool create -o version=28 tank mirror scsi-35000cca2735cbc38 scsi-35000cca266cc4b3c

All known OpenZFS implementations have support to zpool v5000 and feature flags in major stable versions, this includes illumOS, FreeBSD, ZFS on Linux and OpenZFS on OS X. There are difference on the supported features among these implementations, for example support of large_dnode feature flag was first introduced on Linux, and spacemap_v2 is not supported on Linux until ZoL 0.8.x. There are more features have differential inclusion status other than feature flags, like xattr=sa is only available on Linux and OS X, whereas TRIM was not supported on Linux until Zol 0.8.x.

Advanced Topics

These are not really advanced stuff like internals of ZFS and storage, but are some topics not relevant to everyone.

    zpool create -o ashift=12 tank mirror scsi-35000cca2735cbc38 scsi-35000cca266cc4b3c

    zfs set compression=lz4 tank

    # dragons ahead, you have been warned
    zfs set dedup=on tank/data

    # attributes is most likely to inherit to all child datasets
    zfs set xattr=sa tank

    zfs set dnodesize=auto tank/data

See Also


CategoryStorage