Differences between revisions 5 and 6
Revision 5 as of 2013-10-17 14:37:06
Size: 6998
Editor: ?ChristianAnders
Comment: Link
Revision 6 as of 2013-10-17 17:42:07
Size: 7011
Editor: ?ChristianAnders
Comment:
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
A commonly recommendable optimal partitioning scheme uses at least three devices (to do serious work): A commonly recommendable optimized partitioning scheme (to store serious work) uses at least three devices:
Line 10: Line 10:
 * Internal SSD, mirrors and speeds up the static part of the system, and the user's important work-data. We assume a 128GB SSD and only use about 120GB of it, to always have enough free blocks available (improve "overprovisioning") and avoid slow write performance.  * Internal SSD, mirrors and speeds up the static part of the system, and the user's important work-data. Here, we assume a 128GB SSD and only use about 120GB of it, to always have enough free blocks available (improves "overprovisioning" and avoids slow write performance).

Translation(s): none


Multi HDD/SSD Partitioning Scheme

A commonly recommendable optimized partitioning scheme (to store serious work) uses at least three devices:

  • Internal SSD, mirrors and speeds up the static part of the system, and the user's important work-data. Here, we assume a 128GB SSD and only use about 120GB of it, to always have enough free blocks available (improves "overprovisioning" and avoids slow write performance).
  • Internal HDD (iHDD), contains the whole system.
  • External (removable) hdd (eHDD), mirrors the whole system (for example when laptop is docked).
  • Optional (external, removable) hdds (oHDDs) (to mirror the whole system, rolling-backup-style)

/!\ Note that if you use USB devices, instead of (e)SATA, you may still have to work around 624343

If you want the removable disks to resync fast (only the required changes) and the write performance is not too crucial for you, you can daisy chain several raid devices with individual write-intent bitmaps.

The chain on md1 may be visualized like this:

  md1 --- md10 --- md1(...) --- md1n --- iHDD
   |       |        |            |
   SSD     eHDD     oHDD(...)    oHDDn

Data is written to md1. If we look at the physical devices, we have on one side the SSD as a member of md1 that does not have a bitmap to avoid excessive SSD wear. On the other side we have the iHDD and it will hold the bitmaps of all md devices in between. If the oHDDs are disconnected the most of the time (only connected temporarily to get synced) and we put them to the right, we can most of the time avoid that md10 bitmap updates are written to other external disks.

The filesystem descriptions below do not include oHDDs. If you have one, setup an additional mdX1 raid 1 array (that contains a partition on the iHDD and one on the oHDD, and has a write intent bitmap). These mdX1 arrays then replace the iHDD partition in the details presented below.

The partitions on the HDDs should be ordered as in the following list to keep frequently accessed areas closer together (reduce seeks):

  • boot-fs (md0)
  • root-fs (md1)
  • var-fs (md2)
  • swap (md3)
  • home-fs (md4)
  • work-data-fs (md5)
  • bulk-data-fs (md6)

If /var is synced to a persistent ramdisk to avoid excessive wear on the SSD (or remains only on the HDDs), a failed SSD can still be fully reconstructed from HDD. However, a failed HDD can only be fully reconstructed from an external HDD mirror, or from a backup. (Your latest work-data can nevertheless be reconstructed from the SSD, even if the iHDD and with it the operating system suddenly stops working without the eHDD attached.)

The filesystems in more detail:

350MB boot-fs ( md0(SSD, iHDD, eHDD) mounted at /boot): If you want to encrypt the rootfs you need to encrypt the swap partition, and create this separate 350MB boot-fs (md0).

  • raid 1 (SSD + all HDDs) with hdds for failure tolerance
  • no write intent bitmaps and thus no daisy-chain necessary, because the boot-fs is so small
  • create and add partitions with (g)parted and mdadm directly, if gnome-disk-utility (palimpsest) gives errors.

20GB root-fs (md1_crypt opened from md1(SSD, md10(iHDD, eHDD) ) mounted at /):

  • Keeps system separated from user data
  • Allows system mirroring to be temporarily write-buffered or disabled (on laptops if on the road) independently from user data mirroring (HDD idle/undocked)
  • syncing user data does not involve syncing system data (is faster)
  • md1 (SSD + md10) without a bitmap to avoid SSD wear from frequent bitmap updates
  • md10 (iHDD, eHDD) with a bitmap to speed up syncs: mdadm --grow /dev/md10 -b internal
  • However, just setting "echo writemostly > /sys/block/md0/md/dev-<HDD-PARTITION>/state" only seemed to add the "W" flag, but not to stop slow/noisy/consuming reads from hdd. Workaround: mdadm {--fail, --remove, --add} /dev/mdX --write-mostly /dev/sdXY

  • Make the HDD partitions a little (1MiB?) larger than on the SSD, to allow the resulting md10 to hold the content of the main raid md1 + stacked superblocks and bitmaps?

15 GB var-fs (md2_crypt opened from md2(iHDD, eHDD) mounted at /var) It allows you to see how variable /var actually is, by experiencing hdd spin-ups in addition to when saving to work-data (even if the root-fs/home-fs HDD raid members are write-buffered/disabled).

  • raid 1 (iHDD + eHDD)
  • SSD is not included to avoid excessive wear
  • with write intent bitmap for faster resyncs

1,x * amount of installed RAM as swap ( md3(iHDD, eHDD) )

  • ensures redundancy for swap space
  • without a write intent bitmap to avoid the write speed penalty

Optionally, 5GB home-fs (md4_crypt opened from md4(SSD, md40(HDD, eHDD) mounted at /home): Even if you do not require a raid mirror for the boot- and root-fs, you may still want at least a small separate home-fs raid (different from the work-data-fs), because it allows to reduce HDD spin-ups without the general write buffering risks: The HDD can be removed from this home-fs raid (or be write buffered) if on battery, while updates are still written to SSD immediately. (And updates to the work-data-fs continue to spin up and be written to the HDD.) Create the home-fs raid with a few GBs mounted as /home, to contain (mostly only) the configuration files, lists of most recently used files, desktop (environment) databases, etc. that don't warrant to spin up the hdd on every change. Then you may be remove the hdd from that raid if on battery.

  • raid 1 (SSD + HDD) with hdd for failure tolerance
  • same setup as root-fs

Still, even if you can prevent hdd spin-ups with this, to reduce the wear on the SSD caused by programs that are constantly updating logs, desktop databases or state files etc. in /home, you will have to use a persistent ramdisk (see profile-sync and goanything in SSDoptimization) for those files (or the complete /home filesystem).

100GB work-data-fs (md5_crypt opened from md5(SSD, md50(HDD, eHDD) mounted at /mnt/work-data) Using this only for /home/*/work-data allows to keep this raid mirror fully active while the hdd in the root-fs or home-fs raid is write-buffered or disabled. Thus writes to most recently used lists, browser caches, etc. do not wake up the hdd, but saving your work does.

  • raid 1 (SSD + HDD) with hdd for failure tolerance
  • same setup as above
  • Optionally, SSD + md-hdd (raid 1 with bitmap of an internal + external HDD)
    • ~/work-data (symlink into work-data-fs, or mountpoint on single-user systems)
    • ~/bulk-data (symlink into bulk-data-fs)
    • ~/volatile (transient RAM buffer synced to home-fs)

remaining GBs for large bulk-data-fs (md6_crypt opened from md6(HDD, eHDD) mounted at/mnt/bulk-data):

  • raid 1 (internal HDD + external HDD)
  • with write intent bitmap to speed up syncs: