FileSystem > Btrfs
Btrfs was created to address the lack of pooling, snapshots, checksums, and integrated multi-device spanning in Linux file systems, particularly as the need for such features emerged when working at the petabyte scale. It aspires to be a multipurpose filesystem that scales well from massive block devices all the way down to cellular phones (Sailfish OS and Android). Because all reads are checksum-verified, Btrfs takes care to ensure that one's backups are not poisoned by silently corrupted source data—ZFS similarly ensures data integrity. Btrfs also pioneered the use of Reflinks (btrfs.readthedocs.io), which XFS later gained support for.
Btrfs has been part of the mainline Linux kernel since 2.6.29, and Debian's Btrfs support was introduced in DebianSqueeze.
Ext2/3/4 filesystems are upgradeable to Btrfs; however, upstream recommends backing up the data, creating a pristine btrfs filesystem with wipefs -a and mkfs.btrfs, and restoring from backup -- or replicating the existing data (eg: using tar, cpio, rsync et al).
Btrfs single, DUP, and raid1 profiles have been reliable since Linux 4.4, so long as the #Recommendations are observed and the pitfalls documented here are avoided (TODO: Refactor this document to make it easier to become appraised of the pitfalls and corner cases).
"Google is evaluating btrfs for its potential use in android, but currently the lack of native file-based encryption unfortunately makes it a nonstarter" (Filip Bystricky, linux-btrfs, 2017-06-09). It appears that ChromeOS has been using btrfs since version 69 (ubuntu.com,"Using LXD on Your Chromebook").
Facebook has "now deployed [btrfs] on millions of servers, driving significant efficiency gains", because "btrfs helped eliminate priority inversions caused by the journaling behavior of the previous filesystem, when used for I/O control with cgroup2", and "btrfs is the only filesystem implementation that currently works with resource isolation" (Facebook open-sources new suite of Linux kernel components and tools, code.fb.com, 2018-10-30).
Linux ≥ 5.5 and btrfs-progs ≥ 5.4 finally bring support for checksum algorithms that are stronger than CRC32C. xxHash, SHA256, and BLAKE2 are supported with kernel+btrfs-progs newer than these. Additionally, with these releases raid1c3 and raid1c4 profiles have finally been introduced. These new profiles have enhanced data redundancy. Where the well-tested raid1 profile supports two copies on N devices, raid1c3 supports three copies, and raid1c4 supports four.
In the future, when a failing disk needs to be replaced it will be possible to add a third disk, rebalance to raid1c3, then drop the failing disk from the array. The degraded chunk problem is a long-term outstanding issue that has not been solved in any kernel as of 2022-06-07 (TODO: update this page when it's fixed). Thus a manual rebalance to raid1 profile is necessary if ever a disk is lost; a manual full rebalance rewrites all data and metadata. Also, as of this date, the use of raid1c3 and raid1c4 is no recommended because "we didn't really test read-repair that well" (Qu Wenruo, linux-btrfs, 2022-06-07); Please do note use it on Bullseye!
An partial list of organisations who use btrfs in production can be found on btrfs.wiki.kernel.org's "Production Users" page.
Somewhat official upstream status is available here: The Btrfs Wiki: Status.
The DebianInstaller can format and install to single-disk Btrfs volumes, but does not yet support multi-disk btrfs volumes nor subvolume creation (Bug #686097). Daniel Pocock has a good article on how to Install Debian wheezy and jessie directly with btrfs RAID1; however, strictly speaking it showcases Btrfs' integrated multi-device flexibility. eg: Install to a single disk, add a second disk to the volume, rebalance while converting all data and metadata to raid1 profile.
So long as advanced features such as zstd compression are not enabled, raid1-profile Btrfs volumes created on msdos or gpt partitions are bootable using grub-pc or grub-efi without a dedicated /boot, and it should also be possible to boot from a volume created on a raw disk using grub-pc. If booting with EFI firmware, consult UEFI for additional requirements. When booting using EFI, with rootfs on btrfs, the creation of an EFI system partition is essential.
While support for swap files was added to linux-5.0, it is highly recommended to use a dedicated swap partition. Furthermore, enabling swap using a virtual block (loop) device is dangerous, because this "will only cause memory allocation lock-ups" (Martin Raiber, linux-btrfs).
DebianBullseye and DebianBuster have good btrfs support out-of-the-box. DebianStretch users are urged to upgrade to a newer release. This document will soon no longer document pitfalls that affect Stretch or older releases.
If one requires btrfs features from a newer kernel during for a debian-stable system, then it is safer to track the latest LTS kernel rather than the latest version in backports, because tracking the linux-image backport has resulted in bugs which force a reboot and/or corrupt data. In addition to his extensive work fixing these regressions, Qu Wenruo writes "That's why we have LTS, and commercial company baked kernels" (linux-btrfs, 2019-12-29). In other words, to avoid regressions, use an LTS kernel, either vendor-supported (eg: DebianStable) or from kernel.org. DebianTesting and DebianUnstable are also necessarily affected by this issue, because they track the latest stable upstream kernel. That said, if one would like to participate in the effort to debug and stabilise btrfs and can risk encountering serious (but thankfully no longer grave) bugs, please use the newest kernel available to you. The upstream mailing list linux-btrfs appreciates this!
Users who do not yet have a backup strategy in place are urgently recommended to consult BackupAndRecovery, and to regularly verify that these backups are restorable.
Raid5 and Raid6 Profile Status
2020-10-06 Update [edited 2022-01-09]
The developers recommend against using the Raid 5 and Raid 6 profiles in production at this point. Large-scale data loss is seen as unlikely, but the occasional downtime or single-file restore from backup should be tolerated before building such an array.
For btrfs metadata (-m flag on mkfs), these profiles carry an increased risk of corruption, use raid1 instead (or raid1c3/raid1c4 when building a Raid-6). Metadata should *never* be rebalanced.
For btrfs data (-d flag), experimentation with raid5 and raid6 is encouraged by users who have full, up-to-date backups, do a full read of the volume (eg: provided by a weekly full backup) and/or a scrub at least once a month, are willing to tolerate the occasional manual intervention, who are comfortable asking for help upstream when things go wrong, and who will never run btrfs check --repair unless advised to by an upstream developer. Everyone else needed multi-disk support should choose raid1c2/3/4 (for data and metadata) or raid10 profiles for the foreseeable future.
Quoting from Zygo Blaxell's email:
- Reads don't work properly in degraded mode.
- The admin tools are incomplete.
- The diagnostic tools are broken.
- It is not possible to recover from all theoretically recoverable failure events …The known issues are related to availability (it crashes a lot and isn't fully usable in degraded mode)
Refer to the full thread at linux-btrfs (Re: using raid56 on a private machine, 2020-10-05).
- Layering btrfs volumes on top of LVM may be implicated in passive causes of filesystem corruption (Needs citation -- Nicholas D Steeves).
There is currently (2019-07-07, linux ≤ 5.1.16) a bug that causes a two-disk raid1 profile to forever become read-only the second time it is mounted in a degraded state—for example due to a missing/broken/SATA link reset disk (unix.stackexchange.com, How to replace a disk drive that is physically no more there?).
As an alternative, Adam Borowski has submitted [PATCH] [NOT-FOR-MERGING] btrfs: make "too many missing devices" check non-fatal to linux-btrfs, which addresses this issue, which is also addressed by Qu Wenro's yet-unmerged Btrfs: Per-chunk degradable check patch. The thread surrounding Borowski's patch is an excellent introduction to the debate surrounding whether or not btrfs volumes should be run in a degraded state.
This bug is still a major pitfall as of 2022-06-07, in all available kernels. See the thread Tried to replace a drive in a raid 1 and all hell broke loose (linux-btrfs, 2022-05-23) for more information. Tldr: mounting degraded (eg: unplugging one drive of a raid1 pair as a test) precipitates the creation of single rather than raid1 block groups. These single block groups must be manually rebalanced to raid1, or data will be lost if the disk that holds the single copy experiences any fault (for example if it is then unplugged as a robustness test). Unfortunately this rebalance may also necessitate rebalancing metadata, which is generally considered to be risky.
While using btrfs with bcache increases performance, in the past bcache has introduced grave errors such as Regression in 4.14: wrong data being read from bcache device (Pavel Goran, linux-bcache, 2017-11-16).
- Quotas and qgroups should not be used if one expects reasonable performance.
Subvolumes cannot yet be mounted with different btrfs-specific options; the first line for a given volume in /etc/fstab takes effect. eg: one cannot mount / with noatime and /var with nodatacow,compress=lzo (Btrfs Wiki, Mount options).
At present, nodatacow implies nodatasum; this means that anything with the nodatacow attribute does not receive the benefits of btrfs' checksum protection and self-healing (for raid levels greater >= 1); disabling CoW (Copy on Write) means that the a VM disk image will not be consistent if the host crashes or loses power. Nodatacow also carries the following additional danger on multidisk systems: because nodatasum is disabled there is no way to verify which disk in a two disk raid1 profile volume contains the correct data. After a crash there is a roughly 50% probability that the bad copy will be read on each request; this is equivalently bad to MD's RAID1 "the non-failed copy is always good" assumption. Consequently, it is almost always preferable to disable COW in the application, and nodatacow should only be used for disposable data.
Fedora enabled zstd compression, by default, in Fedora Workstation 34 (Linux 5.11.12), and this was announced in Fedora Workstation 34 feature focus: Btrfs transparent compression (fedoramagazine.org, 2021-04-14). If no major bugs have been found by the time DebianBookworm is released, then this advanced feature may begin to be recommended at that time.
- Mounting with -o autodefrag will often duplicate reflinked or snapshotted files when a balance is run.
- Any "btrfs filesystem defrag" operation can potentially duplicate reflinked or snapshotted blocks. Files with shared extents lose their shared reflinks, which are then duplicated with n-copies. The effect of the lack of "snapshot aware defrag" is that volumes that make heavy use of reflinks or snapshots will unexpectedly run out of free space. Avoid this by minimizing the use of snapshots, and instead use deduplicating backup software to store backups efficiently (eg: borgbackup).
And others from The Btrfs WikiGotchas
As a btrfs volume ages, its performance may degrade. This is because btrfs is a Copy On Write file system, and all COW filesystems eventually reach a heavily fragmented state—including ZFS, where free space becomes fragmented. Over time, frequently appended or updated-in-place files will become split across tens of thousands of extents. This affects unrotated logs and databases such as those used by Firefox and a variety of common desktop software. Fragmentation can be a major contributing factor to why COW volumes become slower over time.
ZFS addresses the performance problems of fragmentation using an intelligent Adaptive Replacement Cache (ARC), but the ARC requires massive amounts of RAM and only speeds up access to the hottest (most frequently and consistently accessed) data/metadata. When ZFS becomes so fragmented that the ARC cannot compensate for the slowness, the only way to recover performance is to send the pool to a new ZFS volume (or destroy the pool and rebuild from backups). Btrfs took a different approach and benefits from—some would say requires—periodic defragmentation. Btrfsmaintenance can be used to automate defragmentation and other btrfs maintenance tasks.
That said, it is almost certain that a database's own defragmentation/compaction operation will result in a better on-disk layout than "btrfs filesystem defragment". For example, "notmuch compact" produces a state that has between 10x and 100x fewer extents than the "btrfs filesystem defragment" equivalent.
Performance Considerations and Tuning
While segmenting datasets using subvolumes will usually speed up operations that require walking the backref tree, creating too many snapshots has the opposite effect. Too many snapshots, (NOTE: check check snapper snapshot retention policy) will cause performance crashes somewhere between 12 snapshots per subvolume and/or 100 subvolumes per volume—including all snapshots. (btrfs.wiki.kernel.org, Having many subvolumes can be very slow) An obscene number of snapshots can also sometimes wedge the volume into an unmountable state.
In the linux-btrfs thread Re: Understanding BTRFS RAID0 Performance (2018-10-08), Austin S. Hemmelgarn writes "If you can find some way to logically subdivide your workload, you should look at creating one subvolume per subdivision. This will reduce lock contention (and thus make bumping up the thread_pool option actually have some benefits)". So for example, on a combined web and mail server, /var/www/html and the location where maildirs are stored (usually /home/$user/Maildir or /var/spool/mail/$user) should be on different subvolumes (eg: make subvolumes named "@html", and "@home" or "@mail").
- Use maildirs and not mbox spool files.
SAMBA in Buster and newer supports "the Btrfs clone-range IOCTL…File data does not traverse network or disk" when copying or moving a file within a SAMBA share, so long as "Btrfs Enhanced Server-Side Copy Offload" is enabled (Samba Wiki, Server-Side Copy).
Dpkg and thus apt can be slow on btrfs (Bug #635993). This is important to know if you want to use btrfs as your root filesystem and you are using stable, this is extremely annoying if your are using unstable or want to run sbuild. eatmydata helps a lot here, but a power failure can leave you with a broken dpkg database (citation needed, because the database can be recovered from a previous btrfs transaction--unless discard is enabled).
- Configuring sbuild to use overlayfs+tmpfs solves this issue, and has been flawless since mid-2017 (NDS, confirmed by two DDs, personal experience since mid 2017, and no btrfs+overlayfs bugs noted on linux-btrfs for LTS kernels during this time).
- Mounting with -o compress will amplify fragmentation. All COW filesystems necessarily fragment. There is also a positive correlation between the number of snapshots and the degree of fragmentation. Fragmentation manifests as higher than expected CPU usage on SSDs and increased read latency on rotational disks. The focus of btrfs development has recently been on stabilisation, bug fixes, and core features; however, the groundwork for seek optimisations and load balancing between disks will hopefully be merged by 2023 (TODO: has it been merged yet?). This said, some workloads show marked benefit from compression!
- Is there anything I can do to improve system responsiveness while running a scrub, balance, or defrag?
Yes, but only if the old CFQ or new blk-mq BFQ scheduler is enabled for the affected btrfs drives, because the "idle" ionice class is exclusively supported by these two schedulers.
echo -n cfq > /sys/block/sdX/queue/scheduler # or echo -n bfq > /sys/block/sdX/queue/schedulerIf the change is positive, make it permanent using /etc/rc.local, or a udev rule.
- Btrfs makes my desktop slow, is there anything I can do to restore a snappy feeling?
Yes, but at the potential cost of reduced interactivity during scrub, balance, and defrag operations.
echo -n deadline > /sys/block/sdX/queue/scheduler # mq-deadline is the default scheduler in BusterUse your preferred method to make this permanent (eg: /etc/rc.local, or a udev rule). Please note that using deadline on a rotational boot disk is not a panacea for all btrfs performance issues and this very much a case of "your mileage may vary". (2022-01-09: mq-deadline and [mq]bfq appear to be equally bad when running indexing software, eg: Tracker, Baloo, Recoll)
- I want to run defrag to restore performance. Is it possible reduce the incidence of unexpected out of space errors ?
Yes. The default target extent size of defrag primarily effects files ≤32MB, because "on an average aged filesystem…whole files [are overwritten] breaking the reflinks" (David Sterba, btrfsmaintenance issue #43, 2018-01-23). Alternatively, skip all extents larger than the most reflinked files with btrfs filesystem defrag -t 4M /mountpoint. Also, care should be taken to only defrag source subvolumes, and to never defrag their snapshots. This may be automated with Btrfsmaintenance.
Many people have experienced upwards of six years of btrfs usage without issue, and this wiki page will continue to be updated with configuration recommendations known to be good and cautions against those known to cause issues.
- Use two (ideally three) equally sized disks, partition them identically, and add each partition to a btrfs raid1 profile volume.
- Alternatively, dedicate 1/3 of the disks for holding backups, because not much benefit in throughput or iops is yet gained by using btrfs raid1.
Alternatively use raid1c3 or raid1c4 (Qu Wenruo, linux-btrfs, 2022-06-07)
- Do not enable or use transparent filesystem compression with a mount option, nor in fstab, nor with "chattr +c", nor with "btrfs filesystem defrag".
- Do not use quotas/qgroups.
- Keep regular backups, and periodically test that they are restorable.
- Do not enable mount -o discard, autodefrag.
- Overprovision an SSD when partitioning so periodic trim will not be required (SSD firmware does background garbage collection)
- Periodically run btrfs defrag against source subvolumes.
- Never run btrfs defrag against a child subvolume (eg: snapshots).
- Ensure that the number of snapshots per volume/filesystem never exceeds ~12; two or three times that might not cause ill effects, but keeping the count in the small double digits provides the greatest odds for avoiding morbid performance issues and out of space conditions. On the upside, many more btrfs snapshots can be taken before performance suffers when compared to LVM.
- Take care to keep a minimum of 10GB of free space per volume, at all times. If this minimum free space is not maintained then it will become necessary to run periodic balances to consolidate fragmented free space into contiguous chunks, and consequently performance will become less predictable (ie: poor). Also, with insufficient free space, btrfs has no "work-space" and without this work-space snapshots cannot be deleted; consequently, it may become impossible to delete snapshots. The workaround for this is to temporarily add ≥10GB of storage to the affected btrfs volume. On the upside, this is better than ZFS, whose performance will permanently crash once filled past a certain point (recreating the dataset is required)
- Which package contains the tools?
btrfs-progs. Most interaction with Btrfs' advanced features requires these tools.
- Does btrfs really protect my data from hard drive corruption?
Yes, but this requires at least two disks in raid1 profile, aka raid1c2. Without at least two copies of data, corruption can be detected but not corrected. Btrfs raid5 or raid6 profiles will not protect your data, as of early 2022. Additionally, like for "mdadm or lvm raid, take care to ensure make sure that the SCSI command timer (a kernel setting per block device) is longer than the drive's SCT ERC setting...If the command timer is shorter, bad sectors will not get reported as read errors for proper fixup, instead there will be a link reset and it's just inevitable there will be worse problems" (Chris Murphy, 2016-04-27, linux-btrfs). The Debian bug for this issue can be found here. For now do the following for all drives in the array, and then configure the system to set the SCSI command timer automatically on boot:
cat /sys/block/<dev>/device/timeout smartctl -l scterc /dev # echo -n ((the scterc value)/10)+10 to /sys/block/<dev>/device/timeoutThe default value is 30 seconds, which should be fine for disks that support SCT and likely have low timeout values like 7 seconds. For disks that fail smartctl -l scterc, and thus do not support SCT, set the timeout value to 120. Consider a timeout of 180 to be extra safe with large consumer-grade disks.
- Does btrfs support SSD optimizations?
Yes. For more details on using SSDs with Debian, refer to SSDOptimization. NOTE: Using "-o discard" with btrfs is generally unsafe. For an up-to-date discussion relevant to anything before Debian 11/bullseye see [LSF/MM TOPIC] More async operations for file systems - async discard (linux-btrfs via spinics).
- What are btrfs' raid1, raid10, raid1c3, and raid1c4 profiles?
- The first two are not classical RAID1 nor RAID10, but rather 2 or more copies distributed on n devices. Adding more devices does not make more copies; adding devices increases the size of the volume; both raid1 and raid10 profiles always make 2 copies. Adding more devices to increase redundancy is what upstream calls "raid1 profile n-copies" and was implemented in Linux 5.5 as raid1c3 (3 copies) and raid1c4 (4 copies); these features are incompatible with prior kernels. Btrfs' raid10 profile is currently unoptimised and usually performs identically to or worse than the same disks in raid1 profile. Given the raid10 profile's added complexity, it is clear that raid1, should continue to be preferred at this time.
- Does btrfs support compression?
Yes, but consider this functionality "under development" (IIRC: Fedora enables zstd by default, but it's a bleeding edge type distribution. Needs citation NDS). Add compress=lzo, compress=zlib, or compress=zstd, according to the priority of throughput (lzo), best compression and fewest bugs (zlib), or something in between the two (zstd). If "=choice" is not specified then zlib will be used:
/dev/sdaX / btrfs defaults,compress=choice 0 1
Change /dev/sdaX to the actual root device (UUID support in btrfs is a work-in-progress, but it works for mounting volumes; use the command blkid to get the UUID of all filesystems). Labels are also supported.
- How do I use per-directory transparent compression?
btrfs filesystem defragment -r -v -clzo /var chattr +c /var
Adding the +c attribute ensures that any new file created inside the folder is compressed. Existing files will not be compressed; to compress existing files use "btrfs defrag".
- How do I work around systemd mount timeouts for large btrfs filesystems?
This may indicate a more serious issue, and should be investigated (ie run btrfs-check, count subvolumes, reconsider use of qgroups, raid56, etc); however, the mount timeout can be worked around by adding "x-systemd.mount-timeout=90" (or longer). Thanks to Graham Cobb for the report and the workaround in Bug #955413.
- What are the recommended options for installing to a pendrive, an SD card or a slow SSD drive?
When installing, use manual partitioning and select btrfs as file system. In the first boot, edit /etc/fstab with these options, for possible improvements in throughput and latency (Compression is used here, because it is assumed that the pendrive contains throwaway or easily replaceable data). Zstd support is better than lzo.
/dev/sdaX / btrfs noatime,compress=lzo,commit=0,ssd_spread,autodefrag 0 0
- But I have a super-small pendrive and keep running out of space! Now what?
Using another system, try something like this If Your Device is Small (See note above regarding compression and throwaway data):
mkdir /tmp/pendrive mount /dev/sdX -o noatime,ssd_spread,compress /tmp/pendrive btrfs sub snap -r /tmp/pendrive /tmp/pendrive/tmp_snapshot btrfs send /tmp/pendrive/tmp_snapshot > /tmp/pendrive_snapshot.btrfs umount /tmp/pendrive wipefs -a /dev/sdX mkfs.btrfs --mixed /dev/sdX mount /dev/sdX -o noatime,ssd_spread,compress /tmp/pendrive btrfs receive -f /tmp/pendrive_snapshot.btrfs /tmp/pendrive # Convert snapshot into writeable subvolume btrfs property set -ts /tmp/pendrive/tmp_snapshot ro false # Rename subvolume mv /tmp/pendrive/tmp_snapshot /tmp/pendrive/tmp_snapshot/rootfs # Alternatively, this conversion can be done thus: # btrfs subvolume snap /tmp/pendrive/tmp_snapshot /tmp/pendrive/rootfs # btrfs subvolume delete /tmp/pendrive/tmp_snapshot # Now edit /tmp/pendrive/rootfs/etc/fstab to # 1) Update UUID if using UUIDs # 2) Use the "noatime,ssd_spread,compress" mount options sync btrfs fi sync /tmp/pendrive/
Now follow the procedure enabling / on a subvolume. Also, the bootloader needs to be reinstalled if your pendrive is a bootable OS drive and not just a data drive (--TODO: Needs to be written).
- Can I encrypt a btrfs installation?
Yes, you can by selecting manual partitioning and creating an encryption (LUKS) volume and then a btrfs file system on top of that. For the moment, btrfs does not support direct encryption so the installer uses cryptsetup, but this is a planned feature, and experimental patches have recently been submitted to enable this (Anand Jain, linux-btrfs, Add btrfs encryption support) (—TENTATIVE CONCLUSION: I have tested btrfs on LUKS1, and LUKS2 using cryptsetup, without LVM. Since June 2019 I have experienced no issues on my desktop, so I have also transitioned my laptop to this setup. In both cases, this was initially with linux-4.19.x, then linux-5.4.x, and now linux-5.10.x. LUKS appears to make poor performance corner cases much worse, but the combination has flawlessly survived countless unplanned hard power-offs with disks that have mostly truthful firmware. --Nicholas D Steeves)
- Does btrfs work on RaspberryPi?
Yes, possibly improving filesystem I/O responsiveness. One may have to convert the filesystem to btrfs first from a PC and change the filesystem type from ext4 to btrfs in /etc/fstab before the first boot. See above for recommended sdcard /etc/fstab options.
- Fsck.btrfs doesn't do anything, how to I verify the integrity of my filesystem?
Rather than a fsck, btrfs has two methods to detect and repair corruption. The first method executes as a background process for a mounted volume. It verifies the checksums for all data and metadata. If the checksum fails it marks it as bad, and if a good copy is available on another device then a scrub updates the bad copy using the good one; it heals the corruption. This operation runs at a default IO priority of idle, which strives to minimize the impact on other active processes; nevertheless, like any IO-intensive background job, it is best to run it at a time when the system is not busy. Scrubs may be automated by Btrfsmaintenance. To manually initiate a scrub:
btrfs scrub start /btrfs_mountpoint
To monitor its progress:
btrfs scrub status /btrfs_mountpoint
The second method checks an umounted filesystem. It verifies that the metadata and filesystem structures of the volume are intact and uncorrupted. It should not usually be necessary to run this type of check. Please note that it runs read-only; this is by design, and there are usually better methods to recover a corrupted btrfs volume than to use the dangerous "--repair" option. Please do not use "--repair" unless an upstream linux-btrfs developer has assured you that this is the best course of action. To run a standard read-only metadata and filesystem structures verification:
btrfs check -p /dev/sdX
btrfs check -p /dev/disk/by-partuuid/UUID
- How can I quickly check to see if my btrfs volume has experienced errors, with per-device accounting of any possible errors?
Get an at-a-glance overview of all devices in your pool with the following:
btrfs dev stats /btrfs_mountpoint
Command output for a healthy two device raid1 volume should show all zeroes, like this:
[/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 0 [/dev/sdb1].generation_errs 0 [/dev/sdc1].write_io_errs 0 [/dev/sdc1].read_io_errs 0 [/dev/sdc1].flush_io_errs 0 [/dev/sdc1].corruption_errs 0 [/dev/sdc1].generation_errs 0
- COW on COW: Don't do it!
This includes unionfs, databases that do their own COW, certain cowbuilder configurations, and virtual machine disk images like qcow. Please disable COW in the application if possible. Schroot+overlayfs has been safe since linux 4.9. For QEMU, refer to qemu-img(1) and take care to use raw images. If this is not possible, COW may be disabled for single empty directory like this
mkdir directory chattr +C directory
Newly created files in this directory will inherit the nodatacow attribute. Alternatively, nodatacow can be applied to a single file, but only for empty files
touch file chattr +C filePlease read earlier warning about using nodatacow. Applications that support integrity checks and/or self-healing can somewhat mitigate the risk of nodatacow, but please note that nodatacow files cannot be "healed" in case of corruption.
- What happens if I mix differently sized disks in a btrfs raid profile?
"RAID1 (and transitively RAID10) guarantees two copies on different disks, always. Only dup allows the copies to reside on the same disk. This is guaranteed is preserved, even when n=2k+1 and mixed-capacity disks. If disks run out of available chunks to satisfy the redundancy profile, the result is ENOSPC and requires the administrator to balance the file system before new allocations can succeed. The question essentially is asking if Btrfs will spontaneously degrade into "dup" if chunks cannot be allocated on some devices. That will never happen." (Justin Brown, 2016-06-03, linux-btrfs).
- Why doesn't updatedb index /home when /home is on its own subvolume?
Consult this thread on linux-btrfs. The workaround I use is to have each top-level subvolume (id=5 or subvol=/) mounted at /btrfs-admin/$LABEL, where /btrfs-admin is root:sudo 750, and this is what I use in /etc/updatedb.conf:
PRUNE_BIND_MOUNTS="no" PRUNENAMES=".git .bzr .hg .svn" PRUNEPATHS="/tmp /var/spool /media /btrfs-admin /var/cache /var/lib/lxc" PRUNEFS="NFS nfs nfs4 rpc_pipefs afs binfmt_misc proc smbfs autofs iso9660 ncpfs coda devpts ftpfs devfs mfs shfs sysfs cifs lustre tmpfs usbfs udf fuse.glusterfs fuse.sshfs curlftpfs"(With the exception of LXC rootfss I have a flat subvolume structure under each subvol=/. These subvolumes are mounted at specific mountpoints using fstab. Given that updatedb and locate work flawlessly, I'm inclined to conclude that this is the least disruptive configuration. If I used snapper I'd add it to PRUNEPATHS and rely on its facilities to find files that had been deleted, because I don't want to see n-duplicates-for-file when I use locate. A user who wanted locate to return duplicate paths could omit the path from PRUNEPATHS. --NicholasDSteeves)
Old but still relevant References
The number of snapshots per volume and per subvolume must be carefully monitored and/or automatically pruned, because too many snapshots can wedge the filesystem into an out of space condition or gravely degrade performance (Duncan, 2016-02-16, linux-btrfs). There are also reports that IO becomes sluggish and lags with far fewer snapshots, eg: only 86/subvolume on linux-4.0.4; this might be fixed in a newer kernel (Pete, 2016-03-11, linux-btrfs).
This command must be run as root, and it is recommended to ionice it to reduce the load on the system. To further reduce the IO load, flush data after defragmenting each file using:
sudo ionice -c idle btrfs filesystem defragment -f -t 32M -r $PATH
- Create a "For Advanced Users" section. Raid56, qgroups, nodatacow, deduplication, warnings, and pitfalls should be moved to this section. Prominently note that users who use the defaults don't need to worry about any of this stuff.
- Warn about ways to innocently make a system unbootable, while experimenting?
- Write FAQ entry on "my array is so slow!" -- needs research into both bcache and upstream's recommendation of btrfs raid1 of raid0 (either mdraid or hardware raid) pairs.
Rewrite "Does it work on RaspberryPi?" to not use btrfs convert?
- Add "Tuning for throughput" on SSD and rotational disks, and also "Tuning for Latency" to "Performance Considerations and Tuning" section. Also link to some other source for tuning for latency, throughput using different knobs like VM tuning of dirty pages.
Add/write/link to a HOWTO for migrating for a default installation to a system with subvolumes for / and /home. --> See Btrfs migration which uses @rootfs as root subvolume.
- TODO: Check the wiki page on SSDs and add a section on overprovisioning when partitioning. Giving the firmware more unallocated space to work with allows the SSD to maintain more consistent performance as the disk is filled; the benefit is particularly noticeable on lower priced SSDs. Some rotational hard drives also benefit from overprovisioning in a practice known as "short stroking".
Merge the following info to fstab, and also add manpage link for fstab (from bin:mount) to that page:
- (Remember: all fstab mount options must be comma separated but NOT space separated, so do not insert a space after the comma or the equal symbol).
In order to check if you have written the options correctly before rebooting and therefore before being in trouble, run this command as root:
mount -o remount /If no error is reported, everything is OK. Never try to boot with a troubled options fstab file or you'll have to manually try to recover it, a procedure that is more complicated.
- (Remember: all fstab mount options must be comma separated but NOT space separated, so do not insert a space after the comma or the equal symbol).
- TODO: benchmark Tracker, Baloo, and Recoll on btrfs vs ext4 and work with upstream. Tracker's upstream is already aware that various assumptions are not valid on btrfs and produce serious and excessive I/O.
Primary manpages: btrfs(5) btrfs(8) mkfs.btrfs(8) btrfs-balance(8) btrfs-device(8) btrfs-filesystem(8) btrfs-property(8) btrfs-scrub(8) btrfs-show(8) btrfs-subvolume(8) btrfstune(8), and others from btrfs-progs.
Btrfs on Wikipedia
Btrfs mailing list: email@example.com