4 Autocluster is set of scripts for building virtual clusters to test
5 clustered Samba. It uses Linux's libvirt and KVM virtualisation
8 Autocluster is a collection of scripts, template and configuration
9 files that allow you to create a cluster of virtual nodes very
10 quickly. You can create a cluster from scratch in less than 30
11 minutes. Once you have a base image you can then recreate a cluster
12 or create new virtual clusters in minutes.
14 The current implementation creates virtual clusters of RHEL5/6 nodes.
20 * INSTALLING AUTOCLUSTER
35 INSTALLING AUTOCLUSTER
36 ======================
38 Before you start, make sure you have the latest version of
39 autocluster. To download autocluster do this:
41 git clone git://git.samba.org/tridge/autocluster.git autocluster
43 Or to update it, run "git pull" in the autocluster directory
45 You probably want to add the directory where autocluster is installed
46 to your PATH, otherwise things may quickly become tedious.
52 This section explains how to setup a host machine to run virtual
53 clusters generated by autocluster.
56 1) Install and configure required software.
58 a) Install kvm, libvirt and expect.
60 Autocluster creates virtual machines that use libvirt to run under
61 KVM. This means that you will need to install both KVM and
62 libvirt on your host machine. Expect is used by the "waitfor"
63 script and should be available for installation form your
70 Autocluster should work with the standard RHEL6 qemu-kvm and
71 libvirt packages. However, you'll need to tell autocluster
72 where the KVM executable is:
74 KVM=/usr/libexec/qemu-kvm
76 For RHEL5/CentOS5, useful packages for both kvm and libvirt used
79 http://www.lfarkas.org/linux/packages/centos/5/x86_64/
81 However, since recent versions of RHEL5 ship with KVM, 3rd party
82 KVM RPMs for RHEL5 are now scarce.
84 RHEL5.4's KVM also has problems when autocluster uses virtio
85 shared disks, since multipath doesn't notice virtio disks. This
86 is fixed in RHEL5.6 and in a recent RHEL5.5 update - you should
87 be able to use the settings recommended above for RHEL6.
89 If you're still running RHEL5.4, you have lots of time, you have
90 lots of disk space and you like complexity then see the sections
91 below on "iSCSI shared disks" and "Raw IDE system disks".
95 Useful packages ship with Fedora Core 10 (Cambridge) and later.
96 Some of the above notes on RHEL might apply to Fedora's KVM.
100 Useful packages ship with Ubuntu 8.10 (Intrepid Ibex) and later.
101 In recent Ubuntu versions (e.g. 10.10 Maverick Meerkat) the KVM
102 package is called "qemu-kvm". Older versions have a package
105 For other distributions you'll have to backport distro sources or
106 compile from upstream source as described below.
108 * For KVM see the "Downloads" and "Code" sections at:
110 http://www.linux-kvm.org/
116 b) Install guestfish or qemu-nbd and nbd-client.
118 Recent Linux distributions, including RHEL since 6.0, contain
119 guestfish. Guestfish (see http://libguestfs.org/ - there are
120 binary packages for several distros here) is a CLI for
121 manipulating KVM/QEMU disk images. Autocluster supports
122 guestfish, so if guestfish is available then you should use it.
123 It should be more reliable than NBD.
125 Autocluster attempts to use the best available method (guestmount
126 -> guestfish -> loopback) for accessing disk image. If it chooses
127 a suboptimal method, you can force the method:
129 SYSTEM_DISK_ACCESS_METHOD=guestfish
131 If you can't use guestfish then you'll have to use NBD. For this
132 you will need the qemu-nbd and nbd-client programs, which
133 autocluster uses to loopback-nbd-mount the disk images when
134 configuring each node.
136 NBD for various distros:
140 qemu-nbd is only available in the old packages from lfarkas.org.
141 Recompiling the RHEL5 kvm package to support NBD is quite
142 straightforward. RHEL6 doesn't have an NBD kernel module, so is
143 harder to retrofit for NBD support - use guestfish instead.
145 Unless you can find an RPM for nbd-client then you need to
146 download source from:
148 http://sourceforge.net/projects/nbd/
154 qemu-nbd is in the qemu-kvm or kvm package.
156 nbd-client is in the nbd package.
160 qemu-nbd is in the qemu-kvm or kvm package. In older releases
161 it is called kvm-nbd, so you need to set the QEMU_NBD
162 configuration variable.
164 nbd-client is in the nbd-client package.
166 * As mentioned above, nbd can be found at:
168 http://sourceforge.net/projects/nbd/
170 c) Environment and libvirt virtual networks
172 You will need to add the autocluster directory to your PATH.
174 You will need to configure the right kvm networking setup. The
175 files in host_setup/etc/libvirt/qemu/networks/ should help. This
176 command will install the right networks for kvm:
178 rsync -av --delete host_setup/etc/libvirt/qemu/networks/ /etc/libvirt/qemu/networks/
180 Note that you'll need to edit the installed files to reflect any
181 changes to IPBASE, IPNET0, IPNET1, IPNET2 away from the defaults.
182 This is also true for named.conf.local and squid.conf (see below).
184 After this you might need to reload libvirt:
186 /etc/init.d/libvirt reload
190 You might also need to set:
192 VIRSH_DEFAULT_CONNECT_URI=qemu:///system
194 in your environment so that virsh does KVM/QEMU things by default.
196 2) If your install server is far away then you may need a caching web
197 proxy on your local network.
199 If you don't have one, then you can install a squid proxy on your
202 WEBPROXY="http://10.0.0.1:3128/"
204 See host_setup/etc/squid/squid.conf for a sample config suitable
205 for a virtual cluster. Make sure it caches large objects and has
206 plenty of space. This will be needed to make downloading all the
207 RPMs to each client sane
209 To test your squid setup, run a command like this:
211 http_proxy=http://10.0.0.1:3128/ wget <some-url>
213 Check your firewall setup. If you have problems accessing the
214 proxy from your nodes (including from kickstart postinstall) then
215 check it again! Some distributions install nice "convenient"
216 firewalls by default that might block access to the squid port
217 from the nodes. On a current version of Fedora Core you may be
218 able to run system-config-firewall-tui to reconfigure the
221 3) Setup a DNS server on your host. See host_setup/etc/bind/ for a
222 sample config that is suitable. It needs to redirect DNS queries
223 for your virtual domain to your windows domain controller
225 4) Download a RHEL install ISO.
231 A cluster comprises a single base disk image, a copy-on-write disk
232 image for each node and some XML files that tell libvirt about each
233 node's virtual hardware configuration. The copy-on-write disk images
234 save a lot of disk space on the host machine because they each use the
235 base disk image - without them the disk image for each cluster node
236 would need to contain the entire RHEL install.
238 The cluster creation process can be broken down into 2 mains steps:
240 1) Creating the base disk image.
242 2) Create the per-node disk images and corresponding XML files.
244 However, before you do this you will need to create a configuration
245 file. See the "CONFIGURATION" section below for more details.
247 Here are more details on the "create cluster" process. Note that
248 unless you have done something extra special then you'll need to run
251 1) Create the base disk image using:
253 ./autocluster create base
255 The first thing this step does is to check that it can connect to
256 the YUM server. If this fails make sure that there are no
257 firewalls blocking your access to the server.
259 The install will take about 10 to 15 minutes and you will see the
260 packages installing in your terminal
262 The installation process uses kickstart. The choice of
263 postinstall script is set using the POSTINSTALL_TEMPLATE variable.
264 An example is provided in
265 base/all/root/scripts/gpfs-nas-postinstall.sh.
267 It makes sense to install packages that will be common to all
268 nodes into the base image. This save time later when you're
269 setting up the cluster nodes. However, you don't have to do this
270 - you can set POSTINSTALL_TEMPLATE to "" instead - but then you
271 will lose the quick cluster creation/setup that is a major feature
274 When that has finished you should mark that base image immutable
277 chattr +i /virtual/ac-base.img
279 That will ensure it won't change. This is a precaution as the
280 image will be used as a basis file for the per-node images, and if
281 it changes your cluster will become corrupt
283 2) Now run "autocluster create cluster" specifying a cluster
286 autocluster create cluster c1
288 This will create and install the XML node descriptions and the
289 disk images for your cluster nodes, and any other nodes you have
290 configured. Each disk image is initially created as an "empty"
291 copy-on-write image, which is linked to the base image. Those
292 images are then attached to using guestfish or
293 loopback-nbd-mounted, and populated with system configuration
294 files and other potentially useful things (such as scripts).
300 At this point the cluster has been created but isn't yet running.
301 Autocluster provides a command called "vircmd", which is a thin
302 wrapper around libvirt's virsh command. vircmd takes a cluster name
303 instead of a node/domain name and runs the requested command on all
304 nodes in the cluster.
306 1) Now boot your cluster nodes like this:
310 The most useful vircmd commands are:
313 shutdown : graceful shutdown of a node
314 destroy : power off a node immediately
316 2) You can watch boot progress like this:
318 tail -f /var/log/kvm/serial.c1*
320 All the nodes have serial consoles, making it easier to capture
321 kernel panic messages and watch the nodes via ssh
327 Now you have a cluster of nodes, which might have a variety of
328 packages installed and configured in a common way. Now that the
329 cluster is up and running you might need to configure specialised
330 subsystems like GPFS or Samba. You can do this by hand or use the
331 sample scripts/configurations that are provided.
333 Now you can ssh into your nodes. You may like to look at the small set
334 of scripts in /root/scripts on the nodes for some scripts. In
337 mknsd.sh : sets up the local shared disks as GPFS NSDs
338 setup_gpfs.sh : sets up GPFS, creates a filesystem etc
339 setup_cluster.sh : sets up clustered Samba and other NAS services
340 setup_tsm_server.sh: run this on the TSM node to setup the TSM server
341 setup_tsm_client.sh: run this on the GPFS nodes to setup HSM
342 setup_ad_server.sh : run this on a node to setup a Samba4 AD
344 To setup a clustered NAS system you will normally need to run
345 setup_gpfs.sh and setup_cluster.sh on one of the nodes.
348 AUTOMATED CLUSTER CREATION
349 ==========================
351 The last 2 steps can be automated. An example script for doing this
352 can be found in examples/create_cluster.sh.
361 Autocluster uses configuration files containing Unix shell style
362 variables. For example,
366 indicates that the last octet of the first IP address in the cluster
367 will be 30. If an option contains multiple words then they will be
368 separated by underscores ('_'), as in:
372 All options have an equivalent command-line option, such
377 Command-line options are lowercase. Words are separated by dashes
382 Normally you would use a configuration file with variables so that you
383 can repeat steps easily. The command-line equivalents are useful for
384 trying things out without resorting to an editor. You can specify a
385 configuration file to use on the autocluster command-line using the -c
388 autocluster -c config-foo create base
390 If you don't provide a configuration variable then autocluster will
391 look for a file called "config" in the current directory.
393 You can also use environment variables to override the default values
394 of configuration variables. However, both command-line options and
395 configuration file entries will override environment variables.
397 Potentially useful information:
399 * Use "autocluster --help" to list all available command-line options
400 - all the items listed under "configuration options:" are the
401 equivalents of the settings for config files. This output also
402 shows descriptions of the options.
404 * You can use the --dump option to check the current value of
405 configuration variables. This is most useful when used in
406 combination with grep:
408 autocluster --dump | grep ISO_DIR
410 In the past we recommended using --dump to create initial
411 configuration file. Don't do this - it is a bad idea! There are a
412 lot of options and you'll create a huge file that you don't
413 understand and can't debug!
415 * Configuration options are defined in config.d/*.defconf. You
416 shouldn't need to look in these files... but sometimes they contain
417 comments about options that are too long to fit into help strings.
422 * I recommend that you aim for the smallest possible configuration file.
427 and move on from there.
429 * The NODES configuration variable controls the types of nodes that
430 are created. At the time of writing, the default value is:
432 NODES="sofs_front:0-3 rhel_base:4"
434 This means that you get 4 clustered NAS nodes, at IP offsets 0, 1,
435 2, & 3 from FIRSTIP, all part of the CTDB cluster. You also get an
436 additional utility node at IP offset 4 that can be used, for
437 example, as a test client. Since sofs_* nodes are present, the base
438 node will not be part of the CTDB cluster - it is just extra.
440 For many standard use cases the nodes specified by NODES can be
441 modified by setting NUMNODES, WITH_SOFS_GUI and WITH_TSM_NODE.
442 However, these options can't be used to create nodes without
443 specifying IP offsets - except WITH_TSM_NODE, which checks to see if
444 IP offset 0 is vacant. Therefore, for many uses you can ignore the
447 However, NODES is the recommended mechanism for specifying the nodes
448 that you want in your cluster. It is powerful, easy to read and
449 centralises the information in a single line of your configuration
455 The RHEL5 version of KVM does not support the SCSI block device
456 emulation. Therefore, you can use either virtio or iSCSI shared
457 disks. Unfortunately, in RHEL5.4 and early versions of RHEL5.5,
458 virtio block devices are not supported by the version of multipath in
459 RHEL5. So this leaves iSCSI as the only choice.
461 The main configuration options you need for iSCSI disks are:
463 SHARED_DISK_TYPE=iscsi
464 NICMODEL=virtio # Recommended for performance
465 add_extra_package iscsi-initiator-utils
467 Note that SHARED_DISK_PREFIX and SHARED_DISK_CACHE are ignored for
468 iSCSI shared disks because KVM doesn't (need to) know about them.
470 You will need to install the scsi-target-utils package on the host
471 system. After creating a cluster, autocluster will print a message
472 that points you to a file tmp/iscsi.$CLUSTER - you need to run the
473 commands in this file (probably via: sh tmp/iscsi.$CLUSTER) before
474 booting your cluster. This will remove any old target with the same
475 ID, and create the new target, LUNs and ACLs.
477 You can use the following command to list information about the
480 tgtadm --lld iscsi --mode target --op show
482 If you need multiple clusters using iSCSI on the same host then each
483 cluster will need to have a different setting for ISCSI_TID.
488 RHEL versions of KVM do not support the SCSI block device emulation,
489 so autocluster now defaults to using an IDE system disk instead of a
490 SCSI one. Therefore, you can use virtio or ide system disks.
491 However, writeback caching, qcow2 and virtio are incompatible and
492 result in I/O corruption. So, you can use either virtio system disks
493 without any caching, accepting reduced performance, or you can use IDE
494 system disks with writeback caching, with nice performance.
496 For IDE disks, here are the required settings:
499 SYSTEM_DISK_PREFIX=hd
500 SYSTEM_DISK_CACHE=writeback
502 The next problem is that RHEL5's KVM does not include qemu-nbd. The
503 best solution is to build your own qemu-nbd and stop reading this
506 If, for whatever reason, you're unable to build your own qemu-nbd,
507 then you can use raw, rather than qcow2, system disks. If you do this
508 then you need significantly more disk space (since the system disks
509 will be *copies* of the base image) and cluster creation time will no
510 longer be pleasantly snappy (due to the copying time - the images are
511 large and a single copy can take several minutes). So, having tried
512 to warn you off this option, if you really want to do this then you'll
515 SYSTEM_DISK_FORMAT=raw
518 Note that if you're testing cluster creation with iSCSI shared disks
519 then you should find a way of switching off raw disks. This avoids
520 every iSCSI glitch costing you a lot of time while raw disks are
526 The -e option provides support for executing arbitrary bash code.
527 This is useful for testing and debugging.
529 One good use of this option is to test template substitution using the
530 function substitute_vars(). For example:
532 ./autocluster -c example.autocluster -e 'CLUSTER=foo; DISK=foo.qcow2; UUID=abcdef; NAME=foon1; set_macaddrs; substitute_vars templates/node.xml'
534 This prints templates/node.xml with all appropriate substitutions
535 done. Some internal variables (e.g. CLUSTER, DISK, UUID, NAME) are
536 given fairly arbitrary values but the various MAC address strings are
537 set using the function set_macaddrs().
539 The -e option is also useful when writing scripts that use
540 autocluster. Given the complexities of the configuration system you
541 probably don't want to parse configuration files yourself to determine
542 the current settings. Instead, you can ask autocluster to tell you
543 useful pieces of information. For example, say you want to script
544 creating a base disk image and you want to ensure the image is
547 base_image=$(autocluster -c $CONFIG -e 'echo $VIRTBASE/$BASENAME.img')
548 chattr -V -i "$base_image"
550 if autocluster -c $CONFIG create base ; then
551 chattr -V +i "$base_image"
554 Note that the command that autocluster should run is enclosed in
555 single quotes. This means that $VIRTBASE and $BASENAME will be expand
556 within autocluster after the configuration file has been loaded.