Recommendations, best practices or things tried that worked well. This refers to features of the Wheezy release or later.

ZFS

ZFS has its own Best Practices Guide already.

Snapshots

If /boot is on a ZFS filesystem, you can make snapshots before or after changing the kernel. GRUB is able to boot a kernel and modules from old snapshots (:TODO: syntax).

The autosnap scheme described in /etc/default/zfs is incredibly useful and efficient for often-changing user data such as /home. The 'Towers of Hanoi' or 'logarithmic' scheme balances space taken by the snapshots, with a desire to have a long backup history, preferring to keep recent snapshots which are closer together in time.

Database backups are best written as plain text dumps such as SQL, which ZFS may apply compression to. If you take daily snapshots, simply overwrite the same file each day. The autosnap scheme is suitable here too.

The logrotate.conf option 'dateext' may be a good idea. That avoids renaming old log files each day, and makes it easy to find a particular day's log within snapshots. There is no unnecessary duplication if the same log file gets snapshotted on different days.

The 'nocompress' option can be used for logs saved to filesystems that are already compressed. It is probably not worth recompressing them, unless you need a better compression ratio such as using xz.

Deduplication

Be very careful with this! RAM requirements can be very high; I think something in the ballpark of (dataset size / record size) * 512 bytes. So for 1 TiB allocated data in 128 KiB records (the default for filesystems): an additional 4 GiB RAM. But for just a 100 GiB zvol (where the default refrecordsize=8KiB): more than 6 GiB!

If you have insufficient memory (such as 2 GiB RAM), it may actually still work for a while, but:

* writes/deletes are very slow because the dedup table must be checked first; if that doesn't fit into ARC cache along with everything else in there, there will be many small read ops incurred. You can improve speed of this somewhat by adding a small, faster (typically SSD) cache device for L2ARC, and perhaps set secondarycache=metadata so that the space is prioritised for this purpose.

* removing a large snapshot, which happens asynchronously, will need to allocate lots of kernel memory (which cannot be paged out to any swap disks) and perhaps crash your system unexpectedly some time later; after enough time and a few reboot cycles the situation should fix itself, but this is still very scary, and a reason to use dedup+snapshots *only* when you do have enough memory: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/47531

Typical space savings from deduplication may be 5-20%, unless your data is *really* duplicated many times, so consider if it is worth it. Just adding a few more disks may be the cheaper to increase storage space, than adding an SSD cache to counter performance loss, or the extra RAM to avoid downtime from the issue described above.

Encryption

A ZFS pool can work on geli-encrypted disks or partitions. You cannot boot directly from this, but you can attach geli devices early in the boot process, then 'zpool import' the encrypted zpool and filesystems will be mounted.

Partitioning

It is better to format whole disks (e.g. /dev/da0, /dev/da1) as ZFS, but this is not always possible. You can't install the GRUB bootloader to a disk without partitioning it.

MSDOS partitions

msdos partition tables work, but with caveats;

You can't partition more than 2 TiB of space on a 512-byte sector disk. (4096-byte sector disks allow 16 TiB). Use gpt otherwise.

Alignment on SSD/flash media should not be a problem any more as of Wheezy since most tools snap to a 1 MiB boundary. That should also leave a large enough embedding area for GRUB before the first partition.

The kernel of FreeBSD can't use a logical partition as a root filesystem, or (probably; unconfirmed) as a volume in a ZFS pool. Therefore avoid creating /dev/da0s2 as an extended partition as per MS-DOS/Windows convention and you can instead create up to four(?) primary partitions.

Old BIOS limitations

Old (~10+ year) PC BIOSes might not support LBA, and so cannot address beyond the 1024th cylinder. On USB flash media that could be only 1 GiB at the start of the disk; on other disks this may be up to 8 GiB. Everything necessary for GRUB (filesystems containing /boot and /lib/modules) needs to be entirely within that region. Otherwise GRUB may hang at boot, or will be unable to find or load kernels or modules, depending on which disk blocks they're stored in.

FreeBSD documentation contains a reference to that issue dating from 1997; the limitation applies to the bootloader, when loading the kernel image and modules; once the kernel itself is running it can mount the root and other filesystems from anywhere on disk: http://svnweb.freebsd.org/base/release/5.0.0/usr.sbin/sysinstall/help/drives.hlp?annotate=23516#l70

Separate /boot partition

I recommend that /boot be the first partition. It makes sense because the partition table and GRUB boot blocks are within the first 1 MiB of the disk already.

GRUB will probably need to load kernel modules from /lib/modules. You could either create another special partition for it, or keep it on the root filesystem (subject to the size constraints above), or most conveniently you can move+symlink /lib/modules to /boot/modules (which update-grub2 will detect).

Bug #651624

Partitions for ZFS may cause a problem if they extend to the very end of the disk. I suggest deliberately leaving at least 1 MiB of space when partitioning, or just make sure it is not the last partition.

Example

Example of a 80 GiB disk with msdos partition table:

0-1 MiB

MBR and GRUB boot blocks

1-128 MiB

/dev/da0s1

ufs

/boot, containing /boot/modules (symlinked from /lib)

128-1024 MiB

/dev/da0s2

swap

1024 MiB - 81919MiB

/dev/da0s3

zfs

root filesystem

81919-81920MiB

gap

GPT partitions

GPT partition tables work well.

Uunfortunately to see a gpt disk as being bootable, PC BIOS may need a dummy msdos (legacy) partition table, which you can create with gptsync. The constraint mentioned above still applies to old, non-LBA BIOS, so filesystems for /boot and /lib/modules need entries in the legacy partition table, and need to fit entirely within the first 1 GiB of disk (worst case).

GRUB will need a BIOS Boot Partition to install to, which is recommended to be at the start of the disk. It's advisable that the filesystem[s] containing /boot and /lib/modules follow immediately after, to make sure there is space for them in the legacy partition table and within the first 1 GiB of the disk.

The remaining partitions should not be limited in any way. The kernel of FreeBSD understands the gpt partitioning scheme and can mount root and other filesystems from it.

The above mentioned ZFS bug is not an issue with gpt, because the backup partition table will occupy the last LBA-addressable block on the disk.

NFS

The NFS server on Wheezy GNU/kFreeBSD (freebsd-nfs-server) refuses connections from Wheezy GNU/Linux clients when the NFS version is not specified. It defaults to NFS v4 and the server refuses the mount request.

The solution is to add the -o vers=3 option to the mount command on the client, like this: $ sudo mount -t nfs -o vers=3 $SERVER_IP:/path/to/share /local/mount

Make sure to append -h 0.0.0.0 to the rpc.lockd arguments as well in its init file, as per 664812. fixed in wheezy!

Shutdown initscripts for NFS may be slow: see 664812

If you are using e.g. an OpenELEC client, you will need to append enable TCP connections to nfsd. Without it, mounting the NFS shares on OpenELEC will not work (tested with OpenELEC 2.0). You can do that by creating an /etc/default/nfsd file containing: DAEMON_ARGS="-t -u"

I also recommend the proto=tcp mount option on clients (which avoids path MTU issues, and more).