HOWTO: Ceph as Openstack nova-volumes/swift backend on Debian GNU/Linux wheezy


This howto aims at providing guidelines to install Ceph and at using Ceph as nova-volumes/swift backends for Openstack.

The environment includes the following software:

All Openstack nodes have been already set up with the OpenStack Howto or the Puppet based HOWTO

X is from 1 to 3, in the rest of the howto


In formatted blocks :


Things to prepare beforehand :

Technical Choices

Only OSD and MON daemon of Ceph will be configured because the Metadata one (MDS) is only needed when using CephFS


Configuring Ceph on all nodes

On each Ceph nodes do:

# apt-get install -y ceph xfsprogs

Configuring ceph (/etc/ceph/ceph.conf) on each nodes with this following content:

    auth supported = cephx
    keyring = /etc/ceph/keyring.admin

    osd data = /srv/ceph/osd$id
    osd journal = /srv/ceph/osd$id/journal
    osd journal size = 512
    keyring = /etc/ceph/keyring.$name

    ; working with ext4 (sileht: disable because xfs is used)
    ;filestore xattr use omap = true

    ; solve rbd data corruption (sileht: disable by default in 0.48)
    filestore fiemap = false

    host = ceph1
    cluster addr = 10.X.X.1:6800
    public addr = 192.168.X.X.1:6801
    devs = /dev/sda2
    host = ceph1
    cluster addr = 10.X.X.1:6802
    public addr = 192.168.X.X.1:6802
    devs = /dev/sdb2

    host = ceph2
    cluster addr = 10.X.X.2:6800
    public addr = 192.168.X.X.2:6801
    devs = /dev/sda2
    host = ceph2
    cluster addr = 10.X.X.2:6802
    public addr = 192.168.X.X.2:6803
    devs = /dev/sdb2

    host = ceph3
    cluster addr = 10.X.X.3:6800
    public addr = 192.168.X.X.3:6801
    devs = /dev/sda2
    host = ceph3
    cluster addr = 10.X.X.3:6802
    public addr = 192.168.X.X.3:6803
    devs = /dev/sdb2

    mon data = /srv/ceph/mon$id
    host = ceph1
    mon addr = 192.168.X.1:6789
    host = ceph2
    mon addr = 192.168.X.2:6789
    host = ceph3
    mon addr = 10.X.X.3:6789

Prepare the fstab for matching ceph.conf file by adding the following line for node cephX

/dev/sda2       /srv/ceph/osdX1  xfs rw,noexec,nodev,noatime,nodiratime,barrier=0   0   0
/dev/sdb2       /srv/ceph/osdX2  xfs rw,noexec,nodev,noatime,nodiratime,barrier=0   0   0

on ceph1 you have:

/dev/sda2       /srv/ceph/osd11  xfs rw,noexec,nodev,noatime,nodiratime,barrier=0   0   0
/dev/sdb2       /srv/ceph/osd12  xfs rw,noexec,nodev,noatime,nodiratime,barrier=0   0   0

for ceph2 you have:

/dev/sda2       /srv/ceph/osd21  xfs rw,noexec,nodev,noatime,nodiratime,barrier=0   0   0
/dev/sdb2       /srv/ceph/osd22  xfs rw,noexec,nodev,noatime,nodiratime,barrier=0   0   0

on ceph3 you have:

/dev/sda2       /srv/ceph/osd31  xfs rw,noexec,nodev,noatime,nodiratime,barrier=0   0   0
/dev/sdb2       /srv/ceph/osd32  xfs rw,noexec,nodev,noatime,nodiratime,barrier=0   0   0

Create the mount point on each nodes:

on ceph1:

mkdir -p /srv/ceph/{mon1,osd1{1,2}}
mkdir -p /srv/ceph/{mon1,osd1{1,2}}

on ceph2:

mkdir -p /srv/ceph/{mon2,osd2{1,2}}
mkdir -p /srv/ceph/{mon2,osd2{1,2}}

on ceph3:

mkdir -p /srv/ceph/{mon3,osd3{1,2}}
mkdir -p /srv/ceph/{mon3,osd3{1,2}}

Create the filesystem on each nodes:

mkfs.xfs -f /dev/sda2
mkfs.xfs -f /dev/sdb2

Mount filesystem on each nodes:

mount /dev/sda2
mount /dev/sdb2

Prepare ssh key for mkcephfs

# Ensure you don't need password for ssh between nodes:

# ssh cephX uname -a
Linux 3.2.0-3-amd64 #1 SMP Thu Jun 28 09:07:26 UTC 2012 x86_64 GNU/Linux

if not, create a ssh keys pair and send it to all nodes

# ssh-keygen
# cat .ssh/ >> .ssh/authorized_keys
# rsync -r .ssh/ root@ceph2:.ssh/
# rsync -r .ssh/ root@ceph3:.ssh/
# rsync -r .ssh/ root@ceph4:.ssh/

Create the ceph cluster

On ceph1 initialize the cluster with:

# mkcephfs -a -c /etc/ceph/ceph.conf -k /etc/ceph/keyring.admin
temp dir is /tmp/mkcephfs.oqB5qpHXEi
preparing monmap in /tmp/mkcephfs.oqB5qpHXEi/monmap
/usr/bin/monmaptool --create --clobber --add 1 --add 2 --add 3 --print /tmp/mkcephfs.oqB5qpHXEi/monmap
/usr/bin/monmaptool: monmap file /tmp/mkcephfs.oqB5qpHXEi/monmap
/usr/bin/monmaptool: generated fsid e0a0b83d-f188-4baf-82f2-3102fbb1c194
epoch 0
fsid e0a0b83d-f188-4baf-82f2-3102fbb1c194
last_changed 2012-07-17 08:45:35.681299
created 2012-07-17 08:45:35.681299
0: mon.1
1: mon.2
2: mon.3
/usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.oqB5qpHXEi/monmap (3 monitors)
=== osd.11 ===
2012-07-17 08:45:35.792982 7fe7bcf55780 created object store /srv/ceph/osd11 journal /srv/ceph/osd11/journal for osd.11 fsid e0a0b83d-f188-4baf-82f2-3102fbb1c194
creating private key for osd.11 keyring /etc/ceph/keyring.admin
creating /etc/ceph/keyring.admin


2012-07-17 08:46:08.993851 7f165d1a6760  adding osd.21 at {host=ceph2,pool=default,rack=unknownrack}
2012-07-17 08:46:08.993895 7f165d1a6760  adding osd.22 at {host=ceph2,pool=default,rack=unknownrack}
2012-07-17 08:46:08.993926 7f165d1a6760  adding osd.31 at {host=ceph3,pool=default,rack=unknownrack}
2012-07-17 08:46:08.993956 7f165d1a6760  adding osd.32 at {host=ceph3,pool=default,rack=unknownrack}
/usr/bin/osdmaptool: writing epoch 1 to /tmp/mkcephfs.oqB5qpHXEi/osdmap
Generating admin key at /tmp/mkcephfs.oqB5qpHXEi/keyring.admin
creating /tmp/mkcephfs.oqB5qpHXEi/keyring.admin
Building initial monitor keyring
added entity osd.11 auth auth(auid = 18446744073709551615 key=AQAPCgVQqGJCMBAA6y4blmINAgB+nrX3wPla2Q== with 0 caps)
added entity osd.12 auth auth(auid = 18446744073709551615 key=AQAPCgVQeKF9NhAAM7EPeskDwikMl1vPi2pWpw== with 0 caps)
added entity osd.21 auth auth(auid = 18446744073709551615 key=AQAWCgVQKKOpAxAAHe7W7KyASI2xnkdOilzSFQ== with 0 caps)
added entity osd.22 auth auth(auid = 18446744073709551615 key=AQAhCgVQiC4aLxAA1pT/rOUHg07MLablCnlppg== with 0 caps)
added entity osd.31 auth auth(auid = 18446744073709551615 key=AQAmCgVQWLCnIhAA692Rhs2rws8yQLrT8vXaBw== with 0 caps)
added entity osd.32 auth auth(auid = 18446744073709551615 key=AQAyCgVQQLIrFBAA/2lJMVPzsBFypCihJubdxg== with 0 caps)
=== mon.1 ===
/usr/bin/ceph-mon: created monfs at /srv/ceph/mon1 for mon.1
=== mon.2 ===
pushing everything to ceph2
/usr/bin/ceph-mon: created monfs at /srv/ceph/mon2 for mon.2
=== mon.3 ===
pushing everything to ceph3
/usr/bin/ceph-mon: created monfs at /srv/ceph/mon3 for mon.3
placing client.admin keyring in /etc/ceph/keyring.admin

On ceph1, start all ceph services (-a) on all nodes:

$ /etc/init.d/ceph -a start

On ceph1, check the status of Ceph:

# ceph -k /etc/ceph/keyring.admin -c /etc/ceph/ceph.conf health
2012-07-17 08:47:56.026981 mon <- [health]
2012-07-17 08:47:56.027389 mon.0 -> 'HEALTH_OK' (0)

# ceph -s
2012-07-17 13:30:28.537300    pg v1228: 6542 pgs: 6542 active+clean; 16 bytes data, 3387 MB used, 1512 GB / 1516 GB avail
2012-07-17 13:30:28.552231   mds e1: 0/0/1 up
2012-07-17 13:30:28.552267   osd e10: 6 osds: 6 up, 6 in
2012-07-17 13:30:28.552389   log 2012-07-17 10:21:54.329413 osd.31 10.X.X.3:6800/31088 1233 : [INF] 3.4 scrub ok
2012-07-17 13:30:28.552492   mon e1: 3 mons at {1=10.X.X.1:6789/0,2=10.X.X.2:6789/0,3=10.X.X.3:6789/0}

Note: control the numbers of mds/osd/mon nodes detected, here you need have 0 mds, 6 osd and 3 mon

Note: If cluster is not HEALTH, start by checking that all servers are ntp sync

Testing RBD backend

All tests are done on ceph1

Get the auth key of the admin user:

# ceph-authtool --print-key /etc/ceph/keyring.admin | tee client.admin

Create a pool and a volume in the ceph cluster:

# rados lspools
# rados mkpool nova
# rados lspools
# rbd --pool nova create --size 1024 rbd-test
# rbd --pool nova ls

Prepare and mount the volume tbd-test on the node ceph1:

# modprobe rbd 
# rbd map rbd-test --pool nova --secret client.admin
# dmesg | tail
[63851.029151] rbd: rbd0: added with size 0x40000000
[66908.383667] libceph: client0 fsid 95d8f4b8-01d8-4b0d-8534-d4f1d32120c9
[66908.384701] libceph: mon2 session established
[66908.387263]  rbd0: unknown partition table

# mkfs.btrfs /dev/rbd0
# mount /dev/rbd0 /mnt
# touch /mnt/rbd-test
# ls /mnt/

# rbd showmapped
id      pool    image   snap    device
0       nova    rbd-test        -       /dev/rbd0

Remove the test volume

# umount /mnt
# rbd unmap /dev/rbd/nova/rbd-test
# rbd --pool nova rm rbd-test
Removing image: 100% complete...done.
# rbd --pool nova ls

Note: we kept the nova pool for the configuration of nova later.

Installing a RADOS gateway

It will be installed on the ceph1 server, a apache server is configured as a fastcgi frontend that pass the request to a radosgw daemon via a fastcgi script. A ceph user is created to ensure the authentication (and right access) between radosgw and ceph.

All the rados gateway configurations are done on ceph1 node

Configuration of ceph:

In /etc/ceph/ceph.conf add this:

        host = ceph1.fqdn.tld
        keyring = /etc/ceph/keyring.radosgw.gateway
        rgw socket path = /tmp/radosgw.sock
        log file = /var/log/ceph/radosgw.log

Note: the client name (ie: radosgw) can't be changed because the init script use this one and it is not configurable. Note: the host must be the fdqn

Copy the file on other nodes:

# scp /etc/ceph/ceph.conf ceph2:/etc/ceph/ceph.conf
# scp /etc/ceph/ceph.conf ceph3:/etc/ceph/ceph.conf

Installation of apache2

# apt-get install apache2 libapache2-mod-fastcgi radosgw
# a2enmod rewrite

Preparation of the Apache Virtual Host, file /etc/apache/site-available/radosgw

<VirtualHost *:80>
        ServerName ceph1.fqdn.tld
        ServerAdmin root@ceph1
        DocumentRoot /var/www

        # rewrting rules only need for amazon s3
        RewriteEngine On
        RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /s3gw.fcgi?page=$1&params=$2&%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

        FastCgiExternalServer /var/www/s3gw.fcgi -socket /tmp/radosgw.sock
        <IfModule mod_fastcgi.c>
                <Directory /var/www>
                        Options +ExecCGI
                        AllowOverride All
                        SetHandler fastcgi-script
                        Order allow,deny
                        Allow from all
                        AuthBasicAuthoritative Off

        AllowEncodedSlashes On
        ErrorLog /var/log/apache2/error.log
        CustomLog /var/log/apache2/access.log combined
        ServerSignature Off

Create the fcgi script /var/www/s3gw.fcgi:

exec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway

And make it executable:

# chmod +x /var/www/s3gw.fcgi

Enable the RADOS gateway ?VirtualHost and disable the default one.

# a2ensite radosgw
# a2dissite default

Create the keyring for RADOS gateway:

# ceph-authtool --create-keyring /etc/ceph/keyring.radosgw.gateway
# chmod +r /etc/ceph/keyring.radosgw.gateway

Generate a new key for RADOS Gateway in the keyring

# ceph-authtool /etc/ceph/keyring.radosgw.gateway -n client.radosgw.gateway --gen-key
# ceph-authtool -n client.radosgw.gateway --cap osd 'allow rwx' --cap mon 'allow r' /etc/ceph/keyring.radosgw.gateway

Copy this key to the main ceph keyring

# ceph -k /etc/ceph/keyring.admin  auth add client.radosgw.gateway -i /etc/ceph/keyring.radosgw.gateway
2012-07-17 18:12:33.216484 7f8a142e8760 read 117 bytes from /etc/ceph/keyring.rados.gateway
2012-07-17 18:12:33.218728 mon <- [auth,add,client.rados.gateway]
2012-07-17 18:12:33.221727 mon.0 -> 'added key for client.rados.gateway' (0)

Note: you need at least one MON daemon started to do that

Restart/start all rados related services

# service ceph restart
# service apache2 restart
# service radosgw start

Testing the RADOS gateway

Create a rados user to use the s3 REST API

# radosgw-admin user create --uid="testuser" --display-name="First User"
2012-07-17 18:35:30.571933 7fe4f45bd780 cache put: name=.users.uid+testuser
2012-07-17 18:35:30.572058 7fe4f45bd780 adding .users.uid+testuser to cache LRU end
2012-07-17 18:35:30.572083 7fe4f45bd780 distributing notification oid=notify bl.length()=378
2012-07-17 18:35:30.572736 7fe4ee785700 RGWWatcher::notify() opcode=1 ver=1 bl.length()=378
2012-07-17 18:35:30.572765 7fe4ee785700 cache put: name=.users.uid+testuser
2012-07-17 18:35:30.572771 7fe4ee785700 moving .users.uid+testuser to cache LRU end
2012-07-17 18:35:32.574032 7fe4f45bd780 cache put: name=.users+J7ATWD6EXOEYSD4B6AOF
2012-07-17 18:35:32.574054 7fe4f45bd780 adding .users+J7ATWD6EXOEYSD4B6AOF to cache LRU end
2012-07-17 18:35:32.574070 7fe4f45bd780 distributing notification oid=notify bl.length()=390
2012-07-17 18:35:32.574813 7fe4ee785700 RGWWatcher::notify() opcode=1 ver=1 bl.length()=390
2012-07-17 18:35:32.574838 7fe4ee785700 cache put: name=.users+J7ATWD6EXOEYSD4B6AOF
2012-07-17 18:35:32.574844 7fe4ee785700 moving .users+J7ATWD6EXOEYSD4B6AOF to cache LRU end
{ "user_id": "testuser",
  "rados_uid": 0,
  "display_name": "First User",
  "email": "",
  "suspended": 0,
  "subusers": [],
  "keys": [
        { "user": "testuser",
          "access_key": "J7ATWD6EXOEYSD4B6AOF",
          "secret_key": "1M2OiTEVL4CviMVdXoj17HL8jTeqHTrk6MW+UBsN"}],
  "swift_keys": []}

Add a rados subuser for swift access

# radosgw-admin subuser create --uid=testuser --subuser=testuser:swift --access=full
# radosgw-admin key create --subuser=testuser:swift --key-type=swift
2012-07-18 09:08:29.942508 7ff36ab57700 RGWWatcher::notify() opcode=1 ver=1 bl.length()=513
2012-07-18 09:08:29.942543 7ff36ab57700 cache put: name=.users.swift+testuser:
2012-07-18 09:08:29.942550 7ff36ab57700 moving .users.swift+testuser: to cache LRU end
{ "user_id": "testuser",
  "rados_uid": 0,
  "display_name": "First User",
  "email": "",
  "suspended": 0,
  "subusers": [
        { "id": "testuser:swift",
          "permissions": "full-control"}],
  "keys": [
        { "user": "testuser",
          "access_key": "J7ATWD6EXOEYSD4B6AOF",
          "secret_key": "1M2OiTEVL4CviMVdXoj17HL8jTeqHTrk6MW+UBsN"}],
  "swift_keys": [
        { "user": "testuser:swift",
          "secret_key": "Cz9D3Ugx1P5RRWxwwgppAd9c4J5zBWXJwCWFJobZ"}]}

Note: the subuser name has two part separate by a :, it's mandatory for swift authentification

Create a python script to test s3 connection:

import boto
import boto.s3.connection
access_key = 'J7ATWD6EXOEYSD4B6AOF'
secret_key = '1M2OiTEVL4CviMVdXoj17HL8jTeqHTrk6MW+UBsN'

conn = boto.connect_s3(
        aws_access_key_id = access_key,
        aws_secret_access_key = secret_key,
        host = 'ceph1',
        calling_format = boto.s3.connection.OrdinaryCallingFormat(),
bucket = conn.create_bucket('my-new-bucket')
for bucket in conn.get_all_buckets():
        print "{name}\t{created}".format(
                name =,
                created = bucket.creation_date,

Try it:

# python
my-new-bucket   2012-07-17T17:57:10.000Z

Test the swift connection

# apt-get install -y swift
# swift -A http://ceph1/auth/1.0 -U testuser:swift -K "Cz9D3Ugx1P5RRWxwwgppAd9c4J5zBWXJwCWFJobZ" list

Configuration of nova-volume/nova-compute

On ceph1, create a keyring for the ceph user used by nova (ie: client.nova)

# ceph-authtool --create-keyring /etc/ceph/keyring.nova
# ceph-authtool --gen-key --name client.nova --cap mon 'allow r' --cap osd 'allow rwx pool=nova' /etc/ceph/keyring.nova 

On ceph1, copy the key in the admin keyring

# ceph -k /etc/ceph/keyring.admin auth add client.nova -i /etc/ceph/keyring.nova
2012-07-19 15:23:24.258404 7f36d002f760 -1 read 118 bytes from /etc/ceph/keyring.nova
added key for client.nova

On ceph1, get the key with:

# ceph-authtool --print-key /etc/ceph/keyring.nova  -n client.nova

We need to store this key in libvirt on each compute nodes, so:

On compute nodes (ceph1 and ceph2) create the file secret.xml like this:

<secret ephemeral='no' private='no'>
   <usage type='ceph'>
     <name>client.nova secret</name>

On compute nodes (ceph1 and ceph2) import the secret in libvirt:

# virsh secret-define --file secret.xml
Secret 83890793-e8c7-a481-6a0d-68e5d2538330 created
# virsh secret-set-value --secret <83890793-e8c7-a481-6a0d-68e5d2538330> --base64 <AQCGxwZQcBlBAxAAJAoe/H9vnOlpdh1WshvOsg==>
Secret value set

On compute nodes (ceph1 and ceph2) nodes fill the /etc/nova/nova.conf like this (with secret_uuid of the node):


On ceph1 add this to /etc/default/nova-common

export CEPH_ARGS="--id nova"

On all nodes, make the new client ceph configuration by adding in /etc/ceph/ceph.conf

    keyring = /etc/ceph/keyring.nova

Note: it's only useful on ceph1, but need keep in mind that it's better to have the same ceph.conf on all nodes

On ceph1 setup permissions on the keyring:

chown nova:nova /etc/ceph/keyring.nova

On ceph1Then install nova-volume

# apt-get install nova-volume

On ceph1 et ceph2 restart somes services:

# service nova-compute restart

On ceph1 restart nova volume

# service nova-volume restart

On ceph1 test pool listing with nova user like nova-volume does:

sudo -u nova CEPH_ARGS="--id nova" rados lspools

On ceph3 check that compute and volume are running:

# nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-consoleauth ceph3               nova             enabled    :-)   2012-07-18 10:00:15
nova-scheduler   ceph3               nova             enabled    :-)   2012-07-18 10:00:18
nova-cert        ceph3               nova             enabled    :-)   2012-07-18 10:00:17
nova-compute     ceph1               nova             enabled    :-)   2012-07-18 10:00:18
nova-compute     ceph2               nova             enabled    :-)   2012-07-18 10:00:16
nova-network     ceph1               nova             enabled    :-)   2012-07-18 10:00:18
nova-network     ceph2               nova             enabled    :-)   2012-07-18 10:00:16
nova-volume      ceph1               nova             enabled    :-)   2012-07-18 10:00:16      

On ceph3 create a volume in nova

# nova volume-create --display_name first-rbd-volume 5
# nova volume-list
| ID |   Status  |   Display Name   | Size | Volume Type | Attached to |
| 1  | available | first-rbd-volume |  5   |     None    |             |

On ceph1 check that the rbd volume exists:

# rbd --pool nova ls

On ceph3 attach it on a VM (here the vm is named testvm):

# nova volume-attach testvm 1 /dev/vdb
# nova volume-list
| ID |   Status  |   Display Name   | Size | Volume Type | Attached to                          |
| 1  |   in-use  | first-rbd-volume |  5   |     None    | fdebd660-27b1-41a7-a1f7-bdb80ae5d7d5 |

Login onto the vm testvm:

# ssh root@testvm
# dmesg | tail
[ 7737.155376]  vdb: unknown partition table
# mkfs.ext4 /dev/vdb
mke2fs 1.41.14 (22-Dec-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
327680 inodes, 1310720 blocks
65536 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=1342177280
40 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 31 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

# mount /dev/vdb /mnt
# dd if=/dev/zero of=/mnt/big_file bs=4k count=100000

It seems work fine :D

Upgrade to ceph 0.48 - First stable release (Optional)

On some architecture the sid version is 0.47 (because of compilation issue) - 19 Jui 2012

On all nodes, add this line to /etc/apt/sources.list

deb sid main

On all nodes, type:

# apt-get update
# apt-get install -y ceph ceph-common ceph-fuse libcephfs1

On all nodes, comment sid reference in /etc/apt/sources.list:

# deb sid main

On all nodes, then type:

# apt-get update

On ceph1 restart some services:

# /etc/init.d/ceph -a restart
# /etc/init.d/radosgw restart
# /etc/init.d/apache2 restart

On ceph1 check cluster status:

# ceph -s
   health HEALTH_OK
   monmap e1: 3 mons at {1=,2=,3=}, election epoch 14, quorum 0,1,2 1,2,3
   osdmap e43: 6 osds: 6 up, 6 in
    pgmap v2172: 6400 pgs: 4964 active+clean, 1436 active+clean+replay; 2053 bytes data, 3393 MB used, 1512 GB / 1516 GB avail
   mdsmap e1: 0/0/1 up