HOWTO: Ceph as Openstack nova-volumes/swift backend on Debian GNU/Linux wheezy
This howto is not finished yet, don't use it for now
This howto aims to provide guidelines to install Ceph and to use Ceph as nova-volumes/swift backends for Openstack.
The environment includes the following software:
The Ceph Gateway node <ceph1>:
- the Rados Gateway (radosgw)
Three Ceph nodes <cephX> with:
- the Object Storage (OSD)
- the Monitoring Daemon (MON)
- the RADOS Block Device (RBD)
A Openstack “proxy” or "management" node <ceph1>
And one or more pure Openstack “compute” <cephX>
All Openstack nodes have been already set up with the OpenStack Howto or the Puppet based HOWTO
X is from 1 to 3, in the rest of the howto
DOCUMENT CONVENTIONS
In formatted blocks :
command lines starting with a # must be run as root.
values between < and > must be replaced by your values.
PREREQUISITES
Things to prepare beforehand :
- 3 Servers:
- should have two network interfaces to ensure security. If only one interface is used the private part is more exposed to attacks coming from the public part.
a _public_ one to communicate with the outside world (192.168.X.X)
a _private_ one for the guests VLans (10.X.X.Y)
- should have two network interfaces to ensure security. If only one interface is used the private part is more exposed to attacks coming from the public part.
- Network :
- public network (192.168.X.X)
private network (10.X.X.Y). If the machines are not on a LAN, create one with OpenVPN.
- Disk:
- sda1 for / et *MON*
- sda2 for *OSD*
- sdb1 for swap
- sdb2 for *OSD*
- Distribution :
- Debian GNU/Linux wheezy
Technical Choices
Only OSD and MON daemon of Ceph will be configured because the Metadata one (MDS) is only needed when using CephFS
Installation
Configuring Ceph on all nodes
On *each Ceph nodes* do:
# apt-get install -y ceph
Configuring ceph (/etc/ceph/ceph.conf) with this following configuration:
[global] auth supported = cephx keyring = /etc/ceph/keyring.admin [osd] osd data = /srv/ceph/osd$id osd journal = /srv/ceph/osd$id/journal osd journal size = 512 keyring = /etc/ceph/keyring.$name ; working with ext4 (sileht: disable because xfs is used) ;filestore xattr use omap = true ; solve rbd data corruption (sileht: disable by default in 0.48) filestore fiemap = false [osd.11] host = ceph1 osd addr = 10.X.X.1 devs = /dev/sda2 [osd.12] host = ceph1 osd addr = 10.X.X.1 devs = /dev/sdb2 [osd.21] host = ceph2 osd addr = 10.X.X.2 devs = /dev/sda2 [osd.22] host = ceph2 osd addr = 10.X.X.2 devs = /dev/sdb2 [osd.31] host = ceph3 osd addr = 10.X.X.3 devs = /dev/sda2 [osd.32] host = ceph3 osd addr = 10.X.X.3 devs = /dev/sdb2 [mon] mon data = /srv/ceph/mon$id [mon.1] host = ceph1 mon addr = 10.X.X.1:6789 [mon.2] host = ceph2 mon addr = 10.X.X.2:6789 [mon.3] host = ceph3 mon addr = 10.X.X.3:6789
Prepare the fstab for matching ceph.conf file by adding the following line for node cephX
/dev/sda2 /srv/ceph/osdX1 xfs rw,noexec,nodev,noatime,nodiratime,barrier=0 0 0 /dev/sdb2 /srv/ceph/osdX2 xfs rw,noexec,nodev,noatime,nodiratime,barrier=0 0 0
for ceph1 you have:
/dev/sda2 /srv/ceph/osd11 xfs rw,noexec,nodev,noatime,nodiratime,barrier=0 0 0 /dev/sdb2 /srv/ceph/osd12 xfs rw,noexec,nodev,noatime,nodiratime,barrier=0 0 0
for ceph3 you have:
/dev/sda2 /srv/ceph/osd21 xfs rw,noexec,nodev,noatime,nodiratime,barrier=0 0 0 /dev/sdb2 /srv/ceph/osd22 xfs rw,noexec,nodev,noatime,nodiratime,barrier=0 0 0
for ceph3 you have:
/dev/sda2 /srv/ceph/osd31 xfs rw,noexec,nodev,noatime,nodiratime,barrier=0 0 0 /dev/sdb2 /srv/ceph/osd32 xfs rw,noexec,nodev,noatime,nodiratime,barrier=0 0 0
Create the mount point on each nodes:
on ceph1:
mkdir -p /srv/ceph/{mon1,osd1{1,2}} mkdir -p /srv/ceph/{mon1,osd1{1,2}}
on ceph2:
mkdir -p /srv/ceph/{mon2,osd2{1,2}} mkdir -p /srv/ceph/{mon2,osd2{1,2}}
on ceph3:
mkdir -p /srv/ceph/{mon3,osd3{1,2}} mkdir -p /srv/ceph/{mon3,osd3{1,2}}
Next steps are only on one node of your choice
# Ensure you don't need password for ssh between nodes:
# ssh cephX uname -a Linux cephX.domain.ltd 3.2.0-3-amd64 #1 SMP Thu Jun 28 09:07:26 UTC 2012 x86_64 GNU/Linux
if not, create a ssh keys pair and send it to all nodes
# ssh-keygen # cat .ssh/id_rsa.pub >> .ssh/authorized_keys # rsync -r .ssh/ root@ceph2:.ssh/ # rsync -r .ssh/ root@ceph3:.ssh/ # rsync -r .ssh/ root@ceph4:.ssh/
Create the ceph cluster:
# mkcephfs -a -c /etc/ceph/ceph.conf -k /etc/ceph/keyring.admin temp dir is /tmp/mkcephfs.oqB5qpHXEi preparing monmap in /tmp/mkcephfs.oqB5qpHXEi/monmap /usr/bin/monmaptool --create --clobber --add 1 169.254.6.21:6789 --add 2 169.254.6.22:6789 --add 3 169.254.6.23:6789 --print /tmp/mkcephfs.oqB5qpHXEi/monmap /usr/bin/monmaptool: monmap file /tmp/mkcephfs.oqB5qpHXEi/monmap /usr/bin/monmaptool: generated fsid e0a0b83d-f188-4baf-82f2-3102fbb1c194 epoch 0 fsid e0a0b83d-f188-4baf-82f2-3102fbb1c194 last_changed 2012-07-17 08:45:35.681299 created 2012-07-17 08:45:35.681299 0: 169.254.6.21:6789/0 mon.1 1: 169.254.6.22:6789/0 mon.2 2: 169.254.6.23:6789/0 mon.3 /usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.oqB5qpHXEi/monmap (3 monitors) === osd.11 === 2012-07-17 08:45:35.792982 7fe7bcf55780 created object store /srv/ceph/osd11 journal /srv/ceph/osd11/journal for osd.11 fsid e0a0b83d-f188-4baf-82f2-3102fbb1c194 creating private key for osd.11 keyring /etc/ceph/keyring.admin creating /etc/ceph/keyring.admin ... 2012-07-17 08:46:08.993851 7f165d1a6760 adding osd.21 at {host=ceph2,pool=default,rack=unknownrack} 2012-07-17 08:46:08.993895 7f165d1a6760 adding osd.22 at {host=ceph2,pool=default,rack=unknownrack} 2012-07-17 08:46:08.993926 7f165d1a6760 adding osd.31 at {host=ceph3,pool=default,rack=unknownrack} 2012-07-17 08:46:08.993956 7f165d1a6760 adding osd.32 at {host=ceph3,pool=default,rack=unknownrack} /usr/bin/osdmaptool: writing epoch 1 to /tmp/mkcephfs.oqB5qpHXEi/osdmap Generating admin key at /tmp/mkcephfs.oqB5qpHXEi/keyring.admin creating /tmp/mkcephfs.oqB5qpHXEi/keyring.admin Building initial monitor keyring added entity osd.11 auth auth(auid = 18446744073709551615 key=AQAPCgVQqGJCMBAA6y4blmINAgB+nrX3wPla2Q== with 0 caps) added entity osd.12 auth auth(auid = 18446744073709551615 key=AQAPCgVQeKF9NhAAM7EPeskDwikMl1vPi2pWpw== with 0 caps) added entity osd.21 auth auth(auid = 18446744073709551615 key=AQAWCgVQKKOpAxAAHe7W7KyASI2xnkdOilzSFQ== with 0 caps) added entity osd.22 auth auth(auid = 18446744073709551615 key=AQAhCgVQiC4aLxAA1pT/rOUHg07MLablCnlppg== with 0 caps) added entity osd.31 auth auth(auid = 18446744073709551615 key=AQAmCgVQWLCnIhAA692Rhs2rws8yQLrT8vXaBw== with 0 caps) added entity osd.32 auth auth(auid = 18446744073709551615 key=AQAyCgVQQLIrFBAA/2lJMVPzsBFypCihJubdxg== with 0 caps) === mon.1 === /usr/bin/ceph-mon: created monfs at /srv/ceph/mon1 for mon.1 === mon.2 === pushing everything to ceph2 /usr/bin/ceph-mon: created monfs at /srv/ceph/mon2 for mon.2 === mon.3 === pushing everything to ceph3 /usr/bin/ceph-mon: created monfs at /srv/ceph/mon3 for mon.3 placing client.admin keyring in /etc/ceph/keyring.admin
Start ceph on all nodes
$ /etc/init.d/ceph -a start
Check the status of Ceph
# ceph -k /etc/ceph/keyring.admin -c /etc/ceph/ceph.conf health 2012-07-17 08:47:56.026981 mon <- [health] 2012-07-17 08:47:56.027389 mon.0 -> 'HEALTH_OK' (0) # ceph -s 2012-07-17 13:30:28.537300 pg v1228: 6542 pgs: 6542 active+clean; 16 bytes data, 3387 MB used, 1512 GB / 1516 GB avail 2012-07-17 13:30:28.552231 mds e1: 0/0/1 up 2012-07-17 13:30:28.552267 osd e10: 6 osds: 6 up, 6 in 2012-07-17 13:30:28.552389 log 2012-07-17 10:21:54.329413 osd.31 10.X.X.3:6800/31088 1233 : [INF] 3.4 scrub ok 2012-07-17 13:30:28.552492 mon e1: 3 mons at {1=10.X.X.1:6789/0,2=10.X.X.2:6789/0,3=10.X.X.3:6789/0}
Testing RBD backend
Get auth key
# ceph-authtool --print-key /etc/ceph/keyring.admin | tee client.admin AQADEQVQyAevJhAAnZAcUsmuf8tSLp+7jgXglQ==
Create a pool and a volume in it
# rados lspools data metadata rbd # rados mkpool nova # rados lspools data metadata rbd nova # rbd --pool nova create --size 1024 rbd-test # rbd --pool nova ls rbd-test
Prepare and mount on a node:
# modprobe rbd # rbd map rbd-test --pool nova --name client.test --secret client.admin # dmesg | tail ... [63851.029151] rbd: rbd0: added with size 0x40000000 [66908.383667] libceph: client0 fsid 95d8f4b8-01d8-4b0d-8534-d4f1d32120c9 [66908.384701] libceph: mon2 169.254.6.21:6789 session established [66908.387263] rbd0: unknown partition table # mkfs.btrfs /dev/rbd0 # mount /dev/rbd0 /mnt # touch /mnt/rbd-tset # ls /mnt/ rbd-test # rbd showmapped id pool image snap device 0 nova rbd-test - /dev/rbd0
Cleanup test
umount /mnt rbd unmap /dev/rbd/nova/rbd-test # rbd --pool nova rm rbd-test Removing image: 100% complete...done. # rbd --pool nova ls
RADOS gateway
It will be installed on the ceph1 server, the following configure apache, the radosgw fcgi script, and the auth between radosgw fcgi script and ceph.
Configuration of ceph:
In /etc/ceph/ceph.conf add this:
[client.radosgw.gateway] host = ceph1 keyring = /etc/ceph/keyring.radosgw.gateway rgw socket path = /tmp/radosgw.sock log file = /var/log/ceph/radosgw.log
Copy the file on all nodes: scp /etc/ceph/ceph.conf ceph2:/etc/ceph/ceph.conf scp /etc/ceph/ceph.conf ceph3:/etc/ceph/ceph.conf
Installation and configuration of apache2 on ceph1
apt-get install apache2 libapache2-mod-fastcgi radosgw a2enmod rewrite /etc/init.d/apache2 restart
Prepare the Apache Virtual Host, file /etc/apache/site-avialable/rgw.conf
<VirtualHost *:80> ServerName ceph1.fqdn.tld ServerAdmin root@ceph1 DocumentRoot /var/www # rewrting rules only need for amazon s3 RewriteEngine On RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /s3gw.fcgi?page=$1¶ms=$2&%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] FastCgiExternalServer /var/www/s3gw.fcgi -socket /tmp/radosgw.sock <IfModule mod_fastcgi.c> <Directory /var/www> Options +ExecCGI AllowOverride All SetHandler fastcgi-script Order allow,deny Allow from all AuthBasicAuthoritative Off </Directory> </IfModule> AllowEncodedSlashes On ErrorLog /var/log/apache2/error.log CustomLog /var/log/apache2/access.log combined ServerSignature Off </VirtualHost>
Create the fcgi script /var/www/s3gw.fcgi:
exec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
And make it executable:
# chmod +x /var/www/s3gw.fcgi
Enable the RADOS gateway ?VirtualHost and disable the default one.
# a2ensite rgw.conf # a2dissite default
Create the keyring for RADOS gateway:
# ceph-authtool --create-keyring /etc/ceph/keyring.radosgw.gateway # chmod +r /etc/ceph/keyring.radosgw.gateway
Generate a new key for RADOS Gateway in the keyring
# ceph-authtool /etc/ceph/keyring.radosgw.gateway -n client.radosgw.gateway --gen-key # ceph-authtool -n client.radosgw.gateway --cap osd 'allow rwx' --cap mon 'allow r' /etc/ceph/keyring.radosgw.gateway
Copy this key to the main ceph keyring (mon must be started at least on one node to do that)
# ceph -k /etc/ceph/keyring.admin auth add client.radosgw.gateway -i /etc/ceph/keyring.radosgw.gateway 2012-07-17 18:12:33.216484 7f8a142e8760 read 117 bytes from /etc/ceph/keyring.rados.gateway 2012-07-17 18:12:33.218728 mon <- [auth,add,client.rados.gateway] 2012-07-17 18:12:33.221727 mon.0 -> 'added key for client.rados.gateway' (0)
restart all services
service ceph restart service apache2 restart service radosgw start
Create a user to use the s3 REST API
# radosgw-admin user create --uid="testuser" --display-name="First User" 2012-07-17 18:35:30.571933 7fe4f45bd780 cache put: name=.users.uid+testuser 2012-07-17 18:35:30.572058 7fe4f45bd780 adding .users.uid+testuser to cache LRU end 2012-07-17 18:35:30.572083 7fe4f45bd780 distributing notification oid=notify bl.length()=378 2012-07-17 18:35:30.572736 7fe4ee785700 RGWWatcher::notify() opcode=1 ver=1 bl.length()=378 2012-07-17 18:35:30.572765 7fe4ee785700 cache put: name=.users.uid+testuser 2012-07-17 18:35:30.572771 7fe4ee785700 moving .users.uid+testuser to cache LRU end 2012-07-17 18:35:32.574032 7fe4f45bd780 cache put: name=.users+J7ATWD6EXOEYSD4B6AOF 2012-07-17 18:35:32.574054 7fe4f45bd780 adding .users+J7ATWD6EXOEYSD4B6AOF to cache LRU end 2012-07-17 18:35:32.574070 7fe4f45bd780 distributing notification oid=notify bl.length()=390 2012-07-17 18:35:32.574813 7fe4ee785700 RGWWatcher::notify() opcode=1 ver=1 bl.length()=390 2012-07-17 18:35:32.574838 7fe4ee785700 cache put: name=.users+J7ATWD6EXOEYSD4B6AOF 2012-07-17 18:35:32.574844 7fe4ee785700 moving .users+J7ATWD6EXOEYSD4B6AOF to cache LRU end { "user_id": "testuser", "rados_uid": 0, "display_name": "First User", "email": "", "suspended": 0, "subusers": [], "keys": [ { "user": "testuser", "access_key": "J7ATWD6EXOEYSD4B6AOF", "secret_key": "1M2OiTEVL4CviMVdXoj17HL8jTeqHTrk6MW+UBsN"}], "swift_keys": []}
Add a key to this user to use swift (note the subuser has two part separate by a :, it's mandatory for swift auth)
# radosgw-admin key create --uid testuser --subuser testuser:swift --key-type=swift 2012-07-18 09:08:28.838597 7ff37098f780 get_obj_state: rctx=0x7ff364001530 obj=.users.uid:testuser state=0x7ff364001758 s->prefetch_data=0 2012-07-18 09:08:28.838748 7ff37098f780 cache get: name=.users.uid+testuser : miss 2012-07-18 09:08:28.839317 7ff37098f780 cache put: name=.users.uid+testuser .... 2012-07-18 09:08:29.942508 7ff36ab57700 RGWWatcher::notify() opcode=1 ver=1 bl.length()=513 2012-07-18 09:08:29.942543 7ff36ab57700 cache put: name=.users.swift+testuser: 2012-07-18 09:08:29.942550 7ff36ab57700 moving .users.swift+testuser: to cache LRU end { "user_id": "testuser", "rados_uid": 0, "display_name": "First User", "email": "", "suspended": 0, "subusers": [], "keys": [ { "user": "testuser", "access_key": "J7ATWD6EXOEYSD4B6AOF", "secret_key": "1M2OiTEVL4CviMVdXoj17HL8jTeqHTrk6MW+UBsN"}], "swift_keys": [ { "user": "testuser:swift", "secret_key": "Cz9D3Ugx1P5RRWxwwgppAd9c4J5zBWXJwCWFJobZ"}]}
Create a python script s3test.py to test s3 connection:
import boto import boto.s3.connection access_key = 'J7ATWD6EXOEYSD4B6AOF' secret_key = '1M2OiTEVL4CviMVdXoj17HL8jTeqHTrk6MW+UBsN' conn = boto.connect_s3( aws_access_key_id = access_key, aws_secret_access_key = secret_key, host = 'ceph1', is_secure=False, calling_format = boto.s3.connection.OrdinaryCallingFormat(), ) bucket = conn.create_bucket('my-new-bucket') for bucket in conn.get_all_buckets(): print "{name}\t{created}".format( name = bucket.name, created = bucket.creation_date, )
Try it:
# python s3test.py my-new-bucket 2012-07-17T17:57:10.000Z
Test the swift connection
# apt-get install -y swift # swift -A http://ceph1/auth/1.0 -U testuser:swift -K "Cz9D3Ugx1P5RRWxwwgppAd9c4J5zBWXJwCWFJobZ" list my-new-bucket