Differences between revisions 28 and 29
Revision 28 as of 2012-11-26 17:26:24
Size: 6258
Editor: jmtd
Comment: there is no "discard" option in lvm.conf. Only "issue_discard".
Revision 29 as of 2012-12-09 22:21:48
Size: 12223
Comment: partitioning
Deletions are marked like this. Additions are marked like this.
Line 12: Line 12:
== Preliminaries ==

 * Use a recent Linux kernel. (>3.2)
 * Have enough RAM to not need any swap space under normal workloads. However, do still set up a swap partition on a hdd, just in case and to be able to suspend to hdd.
 * To disable (or reduce) disk writes during each disk read access, use the "noatime" (or "relatime") mount option in /etc/fstab.
 * Create all filesytems as ext4.

== Partitioning Scheme ==

A commonly recomendable setup for serious work on a desktop/laptop:

 * internal SSD (speeds up and mirrors the static part of the system, and the user's important work-data)
 * internal HDD (contains the whole system)
 * external (removable) HDD to (mirrors the whole system)
 * optional: an additional external (removable) HDD (to mirror the whole system)
   With additional HDDs, setup additional md* raids (each mirroring an internal and external partition and having a write intent bitmap) that replace the the internal HDD partitions in the raids below.
   (to mirror the whole system to multiple external HDDs "rolling-backup-style" create stacked raids with bitmaps)

If /var is kept on a persistent ramdisk (or only directly on the HDD) to avoid excessive wear on the SSD, a failed ssd can be fully reconstructed from HDD, but a failed HDD can only be fully reconstructed from the external HDD mirror or a backup. (Nevertheless, the presented scheme still allows to reconstruct your latest work-data from the SSD if the internal HDD failed.)



350MB boot-fs (md0 on /boot):
If you want to encrypt the rootfs you need to encrypt the swap partition,
and create this separate 350MB boot-fs (md0).
 * raid 1 (SSD + HDD) with hdd for failure tolerance
 * no write intent bitmap to avoid SSD wear from frequent bitmap updates
 * Setting "echo writemostly > /sys/block/md0/md/dev-<HDD-PARTITION>/state" seemed to add the "W" flag, but not avoid slow/noisy/consuming reads from hdd.
Workaround: mdadm {--fail, --remove, --add} /dev/mdX --write-mostly /dev/sdXY
 * With an external HDD, make the HDD partitions a little (1MiB?) larger than on the SSD, to allow them to hold a the main raid1.
 * Use mdadm directly if disk-utility (palimpsest) gives errors.


20GB root-fs (md1_crypt on /):
 * Keeps system separated from user data
 * Allows system mirroring to be temporarily write-buffered or disabled (on laptops if on the road) independently from user data mirroring (HDD idle/undocked)
 * syncing user data does not involve syncing system data (is faster)
 * raid 1 (SSD + HDD) with hdd for failure tolerance
 * same setup as above

or 5GB home-fs (md1_crypt on /home):
Even if you do not want a raid mirror for the boot- and root-fs, you may still want at least a small separate home-fs raid, because it allows to reduce HDD spin-ups without the general write buffering risks: The HDD can be removed (or write buffered) separately if on battery, while updates are still written to SSD imediately.
Create the home-fs raid with a few GBs mounted as /home, to contain (mostly only) the configuration files, lists of most recently used files, desktop (environment) databases, etc. that don't warrant to spin up the hdd on every change. Then you may be remove the hdd from that raid if on battery.
 * raid 1 (SSD + HDD) with hdd for failure tolerance
 * same setup as above
Still, even if you can prevent hdd spin-ups with this, to reduce the wear on the SSD caused by programs that are constantly updating logs, desktop databases or state files etc. in /home, you will have to use a persistent ramdisk (see profile-sync and goanything below) for those files (or the complete /home).

100GB work-data-fs (md2_crypt on /mnt/work-data)
Using this only for /home/*/work-data allows to keep this raid mirror fully active while the hdd in the root-fs or home-fs raid is write-buffered or disabled. Thus writes to most recently used lists, browser caches, etc. do not wake up the hdd, but saving your work does.
 * raid 1 (SSD + HDD) with hdd for failure tolerance
 * same setup as above
 * Optionally, SSD + md-hdd (raid 1 with bitmap of an internal + external HDD)
  * ~/work-data (symlink into work-data-fs, or mountpoint on single-user systems)
  * ~/bulk-data (symlink into bulk-data-fs)
  * ~/volatile (transient RAM buffer synced to home-fs)

15 GB var-fs (md3_crypt on /var)
It allows you to see how variable /var actually is, by experiencing hdd spin-ups in addition to when saving to work-data (even if the root-fs/home-fs HDD raid members are write-buffered/disabled).
 * raid 1 (internal HDD + external HDD)

Large bulk-data-fs (/mnt/bulk-data):
 * raid 1 (internal HDD + external HDD)
 * with write intent bitmap to speed up syncs:
    mdadm --grow /dev/mdX -b internal

Line 16: Line 82:
 * To disable or reduce disk writes during disk read access, add the "noatime" or "relatime" mount options in /etc/fstab.
 * Set RAMTMP, RAMRUN and RAMLOCK to "yes" (in /etc/default/rcS or tmpfs since wheezy).
  /!\ RAMTMP will keep /tmp in RAM only, causing its content to be discarded on every shutdown! Using an increased commit interval or a sync scripts (see below) shall reduce disk writes significantly without discarding data on a regular basis.
 * Optionally, make system only flush data to the disk every 10 minutes or more:
 /!\ Attention: Increasing the flushing interval from the default 5 seconds (maybe even until proper shutdown) leaves your data much more vulnerable in case of lock-ups or power failures.
To stop constantly changing files from hitting on the ssd directly:

Use a throwaway /tmp ramdisk (tmpfs), to completely avoid unnecessary writes:
debian: Set RAMTMP, RAMRUN and RAMLOCK to "yes" (in /etc/default/rcS or /etc/default/tmpfs since wheezy)
ubuntu: /etc/fstab: tmpfs /tmp noatime,nosuid 0 0
 /!\ RAMTMP will keep /tmp in RAM only, causing its content to be discarded on every shutdown! Using a persisten ramdisk (see below) or an increased commit interval shall reduce disk writes significantly without discarding data on a regular basis.


Use persistent ramdisks (dedicated read/write RAM buffer that gets synced periodically and on startup/shutdown) to accumulate sdd-writes and hdd spin-ups.
  
With anything-sync-daemon or goanysync set up:
 * /home (synced to work-data-fs raid only once a day?), you only risk settings the true work in /home/*/work-data is on a dedicated raid
 * /home/*/work-data/volatile (synced more frequently, once per hour?)
 * /home/*/Downloads (synced to bulk-data-fs once a day?)
 * /var completely if supported (syncing once a day? avoids spin-ups and allows to save /var also to SSD), at least set this up for
  * /var/log if suported
  * /var/cache/apt/archives
     Configure apt to delete package files after installing, to minimize the data to sync.

Options to having logs copied into RAM: http://www.debian-administration.org/articles/661, http://www.tremende.com/ramlog, https://github.com/graysky2/anything-sync-daemon (if it supports this), or https://github.com/wor/goanysync
  
 
If /home is not on a persistent ramdisk, use profile-sync-daemon to have the browser database and cache copied into RAM during uptime (http://ubuntuforums.org/showthread.php?t=1921800 https://github.com/graysky2/profile-sync-daemon)
 * /home/*/<browser-cache-and-profiles> (synced to root-fs or home-fs)


Further improvement: Patch anything-sync-daemon or goanysync to use a (copy-on-write) union filesystem mount (e.g. http://aufs.sourceforge.net) to keep changes in RAM and only save to SSD on unmount/shutdown (aubrsync), instead of copying all data to RAM and having to sync it all back.



Alternatives to persistent ramdisk:
 
 * Make system only flush data to the disk every 10 minutes or more:
 /!\ Attention: Increasing the flushing interval from the default 5 seconds (maybe even until proper shutdown) leaves your data much more vulnerable in case of lock-ups or power failures, and seems to be a global setting.
Line 23: Line 118:
 * Alternatively, and more selectively than changing the global filesystem commit interval:
  * have the browser database and cache copied into RAM during uptime (http://ubuntuforums.org/showthread.php?t=1921800 https://github.com/graysky2/profile-sync-daemon)
  * consider having logs copied into RAM with http://www.debian-administration.org/articles/661, http://www.tremende.com/ramlog, https://github.com/graysky2/anything-sync-daemon (if it supports this), or https://github.com/wor/goanysync
  * Or, use a union filesystem mount (http://aufs.sourceforge.net) to keep changes in RAM and only save to SSD on unmount/shutdown (aubrsync)?

== Optimizations for SSDs ==

Performance of SSDs can be optimized as follows.
 * Use a recent Linux kernel. (>3.2)
 * Maybe install sysfsutils and add "block/sdX/queue/scheduler = noop" (or deadline) to /etc/sysfs.conf (adjust sdX to match your SSD).


== Optimized IO-Scheduler ==

Install sysfsutils and
  echo "/sys/block/sdX/queue/scheduler = deadline" > /etc/sysfs.conf
(adjust sdX to match your SSD) reboot or
  echo deadline > /sys/block/sdX/queue/scheduler

== Other Optimizations for SSDs ==

Performance of SSDs can also be influenced by these:

Translation(s): none


This describe SDD optimization with system having encrypted root and swap.

/!\ An important aspect in optimizing SSD performance is the file system and partition alignment (1 MiB borders aligned to the 4096 byte blocks of the hardware). This wiki page does not cover these issues.

Preliminaries

  • Use a recent Linux kernel. (>3.2)

  • Have enough RAM to not need any swap space under normal workloads. However, do still set up a swap partition on a hdd, just in case and to be able to suspend to hdd.
  • To disable (or reduce) disk writes during each disk read access, use the "noatime" (or "relatime") mount option in /etc/fstab.
  • Create all filesytems as ext4.

Partitioning Scheme

A commonly recomendable setup for serious work on a desktop/laptop:

  • internal SSD (speeds up and mirrors the static part of the system, and the user's important work-data)
  • internal HDD (contains the whole system)
  • external (removable) HDD to (mirrors the whole system)
  • optional: an additional external (removable) HDD (to mirror the whole system)
    • With additional HDDs, setup additional md* raids (each mirroring an internal and external partition and having a write intent bitmap) that replace the the internal HDD partitions in the raids below. (to mirror the whole system to multiple external HDDs "rolling-backup-style" create stacked raids with bitmaps)

If /var is kept on a persistent ramdisk (or only directly on the HDD) to avoid excessive wear on the SSD, a failed ssd can be fully reconstructed from HDD, but a failed HDD can only be fully reconstructed from the external HDD mirror or a backup. (Nevertheless, the presented scheme still allows to reconstruct your latest work-data from the SSD if the internal HDD failed.)

350MB boot-fs (md0 on /boot): If you want to encrypt the rootfs you need to encrypt the swap partition, and create this separate 350MB boot-fs (md0).

  • raid 1 (SSD + HDD) with hdd for failure tolerance
  • no write intent bitmap to avoid SSD wear from frequent bitmap updates
  • Setting "echo writemostly > /sys/block/md0/md/dev-<HDD-PARTITION>/state" seemed to add the "W" flag, but not avoid slow/noisy/consuming reads from hdd.

Workaround: mdadm {--fail, --remove, --add} /dev/mdX --write-mostly /dev/sdXY

  • With an external HDD, make the HDD partitions a little (1MiB?) larger than on the SSD, to allow them to hold a the main raid1.
  • Use mdadm directly if disk-utility (palimpsest) gives errors.

20GB root-fs (md1_crypt on /):

  • Keeps system separated from user data
  • Allows system mirroring to be temporarily write-buffered or disabled (on laptops if on the road) independently from user data mirroring (HDD idle/undocked)
  • syncing user data does not involve syncing system data (is faster)
  • raid 1 (SSD + HDD) with hdd for failure tolerance
  • same setup as above

or 5GB home-fs (md1_crypt on /home): Even if you do not want a raid mirror for the boot- and root-fs, you may still want at least a small separate home-fs raid, because it allows to reduce HDD spin-ups without the general write buffering risks: The HDD can be removed (or write buffered) separately if on battery, while updates are still written to SSD imediately. Create the home-fs raid with a few GBs mounted as /home, to contain (mostly only) the configuration files, lists of most recently used files, desktop (environment) databases, etc. that don't warrant to spin up the hdd on every change. Then you may be remove the hdd from that raid if on battery.

  • raid 1 (SSD + HDD) with hdd for failure tolerance
  • same setup as above

Still, even if you can prevent hdd spin-ups with this, to reduce the wear on the SSD caused by programs that are constantly updating logs, desktop databases or state files etc. in /home, you will have to use a persistent ramdisk (see profile-sync and goanything below) for those files (or the complete /home).

100GB work-data-fs (md2_crypt on /mnt/work-data) Using this only for /home/*/work-data allows to keep this raid mirror fully active while the hdd in the root-fs or home-fs raid is write-buffered or disabled. Thus writes to most recently used lists, browser caches, etc. do not wake up the hdd, but saving your work does.

  • raid 1 (SSD + HDD) with hdd for failure tolerance
  • same setup as above
  • Optionally, SSD + md-hdd (raid 1 with bitmap of an internal + external HDD)
    • ~/work-data (symlink into work-data-fs, or mountpoint on single-user systems)
    • ~/bulk-data (symlink into bulk-data-fs)
    • ~/volatile (transient RAM buffer synced to home-fs)

15 GB var-fs (md3_crypt on /var) It allows you to see how variable /var actually is, by experiencing hdd spin-ups in addition to when saving to work-data (even if the root-fs/home-fs HDD raid members are write-buffered/disabled).

  • raid 1 (internal HDD + external HDD)

Large bulk-data-fs (/mnt/bulk-data):

  • raid 1 (internal HDD + external HDD)
  • with write intent bitmap to speed up syncs:
    • mdadm --grow /dev/mdX -b internal

Reducing writes to solid state disks "SSDs" or (laptop) hard disk drives "HDDs"

To stop constantly changing files from hitting on the ssd directly:

Use a throwaway /tmp ramdisk (tmpfs), to completely avoid unnecessary writes: debian: Set RAMTMP, RAMRUN and RAMLOCK to "yes" (in /etc/default/rcS or /etc/default/tmpfs since wheezy) ubuntu: /etc/fstab: tmpfs /tmp noatime,nosuid 0 0

  • /!\ RAMTMP will keep /tmp in RAM only, causing its content to be discarded on every shutdown! Using a persisten ramdisk (see below) or an increased commit interval shall reduce disk writes significantly without discarding data on a regular basis.

Use persistent ramdisks (dedicated read/write RAM buffer that gets synced periodically and on startup/shutdown) to accumulate sdd-writes and hdd spin-ups.

With anything-sync-daemon or goanysync set up:

  • /home (synced to work-data-fs raid only once a day?), you only risk settings the true work in /home/*/work-data is on a dedicated raid
  • /home/*/work-data/volatile (synced more frequently, once per hour?)
  • /home/*/Downloads (synced to bulk-data-fs once a day?)
  • /var completely if supported (syncing once a day? avoids spin-ups and allows to save /var also to SSD), at least set this up for
    • /var/log if suported
    • /var/cache/apt/archives
      • Configure apt to delete package files after installing, to minimize the data to sync.

Options to having logs copied into RAM: http://www.debian-administration.org/articles/661, http://www.tremende.com/ramlog, https://github.com/graysky2/anything-sync-daemon (if it supports this), or https://github.com/wor/goanysync

If /home is not on a persistent ramdisk, use profile-sync-daemon to have the browser database and cache copied into RAM during uptime (http://ubuntuforums.org/showthread.php?t=1921800 https://github.com/graysky2/profile-sync-daemon)

  • /home/*/<browser-cache-and-profiles> (synced to root-fs or home-fs)

Further improvement: Patch anything-sync-daemon or goanysync to use a (copy-on-write) union filesystem mount (e.g. http://aufs.sourceforge.net) to keep changes in RAM and only save to SSD on unmount/shutdown (aubrsync), instead of copying all data to RAM and having to sync it all back.

Alternatives to persistent ramdisk:

  • Make system only flush data to the disk every 10 minutes or more:

    /!\ Attention: Increasing the flushing interval from the default 5 seconds (maybe even until proper shutdown) leaves your data much more vulnerable in case of lock-ups or power failures, and seems to be a global setting.

    • Manually set "commit=600" mount option in /etc/fstab. See mount(8).
    • Or better, set up pm-utils (Debian BTS #659260) or laptop-mode-tools (also optimizes read buffers) to enable laptop-mode even under AC operation.

Optimized IO-Scheduler

Install sysfsutils and

  • echo "/sys/block/sdX/queue/scheduler = deadline" > /etc/sysfs.conf

(adjust sdX to match your SSD) reboot or

  • echo deadline > /sys/block/sdX/queue/scheduler

Other Optimizations for SSDs

Performance of SSDs can also be influenced by these:

  • Maybe enable the "discard" filesystem options for automatic/online TRIM. However this is not strictly necessary if your SSD has enough overprovisioning (spare space) or you leave (unpartitioned) free space on the SSD (http://www.spinics.net/lists/raid/msg40866.html). Enabling online-trim in fstab may just slow down some SSDs signficantly (https://patrick-nagel.net/blog/archives/337).

    • Set "discard" mount option in /etc/fstab for the ext4 filesystem, swap partition, Btrfs, etc. See mount(8).
    • Set "issue_discard" option in /etc/lvm/lvm.conf for LVM. See lvm.conf(5).
    • Set "discard" option in /etc/crypttab for dm-crypt.

Note that using discard with on-disk-cryptogrpahy (like dm-crypt) also has drawbacks with respect to security/cryptography! See crypttab(5).

dm-crypt's /etc/crypttab:

#<target name>    <source device>            <key file>  <options>
var  UUID=01234567-89ab-cdef-0123-456789abcdef  none  luks,discard
  • You'll also need to update your initramfs: update-initramfs -u -k all

  • Optionally, set up an offline-trim cronjob that runs time fstrim -v  (or mdtrim) on the ssd mountpoints periodically. Until software raid (md device layer) has trim support, you could use something like mdtrim (https://github.com/Cyberax/mdtrim/).

  • With btrfs, set "ssd" mount option in /etc/fstab to enable the SSD optimized disk space allocation scheme.

More: http://siduction.org/index.php?module=news&func=display&sid=78 http://forums.debian.net/viewtopic.php?f=16&t=76921 https://wiki.archlinux.org/index.php/SSD http://wiki.ubuntuusers.de/SSD

/etc/fstab

# /etc/fstab: static file system information.
#
# Use 'vol_id --uuid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
### SSD: discard,noatime
### match battery operation default for commit JOURNAL_COMMIT_TIME_AC in Add files in /etc/pm/config.d/*
/dev/mapper/goofy-root /               ext4    discard,noatime,commit=600,errors=remount-ro 0       1
# /boot was on /dev/sda1 during installation
UUID=709cbe4a-80c1-46cb-8bb1-dbce3059d1f7 /boot           ext4    discard,noatime,commit=600,defaults        0       2
### SSD: discard
/dev/mapper/goofy-swap none            swap    sw,discard              0       0
/dev/mapper/goofy-chroot /srv/chroot         btrfs    ssd,discard,noatime 0       2
/dev/scd0       /media/cdrom0   udf,iso9660 user,noauto     0       0

/etc/lvm/lvm.conf

...
# This section allows you to configure which block devices should
# be used by the LVM system.
devices {
...
    # Issue discards to a logical volumes's underlying physical volume(s) when
    # the logical volume is no longer using the physical volumes' space (e.g.
    # lvremove, lvreduce, etc).  Discards inform the storage that a region is
    # no longer in use.  Storage that supports discards advertise the protocol
    # specific way discards should be issued by the kernel (TRIM, UNMAP, or
    # WRITE SAME with UNMAP bit set).  Not all storage will support or benefit
    # from discards but SSDs and thinly provisioned LUNs generally do.  If set
    # to 1, discards will only be issued if both the storage and kernel provide
    # support.
    # 1 enables; 0 disables.
    #issue_discards = 0
    issue_discards = 1
}
...

Smaller system with SSD

See