README

   1 INTRODUCTION
   2 ============
   3
   4 Autocluster is set of scripts for building virtual clusters to test
   5 clustered Samba.  It uses Linux's libvirt and KVM virtualisation
   6 engine.
   7
   8 Autocluster is a collection of scripts, template and configuration
   9 files that allow you to create a cluster of virtual nodes very
  10 quickly.  You can create a cluster from scratch in less than 30
  11 minutes.  Once you have a base image you can then recreate a cluster
  12 or create new virtual clusters in minutes.
  13
  14 Autocluster has recently been tested to create virtual clusters of
  15 RHEL 6/7 nodes.  Older versions were tested with RHEL 5 and some
  16 versions of CentOS.
  17
  18
  19 CONTENTS
  20 ========
  21
  22 * INSTALLING AUTOCLUSTER
  23
  24 * HOST MACHINE SETUP
  25
  26 * CREATING A CLUSTER
  27
  28 * BOOTING A CLUSTER
  29
  30 * POST-CREATION SETUP
  31
  32 * CONFIGURATION
  33
  34 * DEVELOPMENT HINTS
  35
  36
  37 INSTALLING AUTOCLUSTER
  38 ======================
  39
  40 Before you start, make sure you have the latest version of
  41 autocluster. To download autocluster do this:
  42
  43   git clone git://git.samba.org/autocluster.git
  44
  45 Or to update it, run "git pull" in the autocluster directory
  46
  47 You probably want to add the directory where autocluster is installed
  48 to your PATH, otherwise things may quickly become tedious.
  49
  50
  51 HOST MACHINE SETUP
  52 ==================
  53
  54 This section explains how to setup a host machine to run virtual
  55 clusters generated by autocluster.
  56
  57
  58  1) Install and configure required software.
  59
  60  a) Install kvm, libvirt and expect.
  61
  62     Autocluster creates virtual machines that use libvirt to run under
  63     KVM.  This means that you will need to install both KVM and
  64     libvirt on your host machine.  Expect is used by the waitfor()
  65     function and should be available for installation from your
  66     distribution.
  67
  68     For various distros:
  69
  70     * RHEL/CentOS
  71
  72       Autocluster should work with the standard RHEL qemu-kvm and
  73       libvirt packages.  It will try to find the qemu-kvm binary.  If
  74       you've done something unusual then you'll need to set the KVM
  75       configuration variable.
  76
  77       For RHEL5/CentOS5, useful packages for both kvm and libvirt used
  78       to be found here:
  79
  80         http://www.lfarkas.org/linux/packages/centos/5/x86_64/
  81
  82       However, since recent versions of RHEL5 ship with KVM, 3rd party
  83       KVM RPMs for RHEL5 are now scarce.
  84
  85       RHEL5.4's KVM also has problems when autocluster uses virtio
  86       shared disks, since multipath doesn't notice virtio disks.  This
  87       is fixed in RHEL5.6 and in a recent RHEL5.5 update - you should
  88       be able to use the settings recommended above for RHEL6.
  89
  90       If you're still running RHEL5.4, you have lots of time, you have
  91       lots of disk space, and you like complexity, then see the
  92       sections below on "iSCSI shared disks" and "Raw IDE system
  93       disks".  :-)
  94
  95     * Fedora
  96
  97       Useful packages ship with Fedora Core 10 (Cambridge) and later.
  98       Some of the above notes on RHEL might apply to Fedora's KVM.
  99
 100     * Ubuntu
 101
 102       Useful packages ship with Ubuntu 8.10 (Intrepid Ibex) and later.
 103       In recent Ubuntu versions (e.g. 10.10 Maverick Meerkat) the KVM
 104       package is called "qemu-kvm".  Older versions have a package
 105       called "kvm".
 106
 107     For other distributions you'll have to backport distro sources or
 108     compile from upstream source as described below.
 109
 110     * For KVM see the "Downloads" and "Code" sections at:
 111
 112         http://www.linux-kvm.org/
 113
 114     * For libvirt see:
 115
 116         http://libvirt.org/
 117
 118  b) Install guestfish or qemu-nbd and nbd-client.
 119
 120     Autocluster needs a method of updating files in the disk image for
 121     each node.
 122
 123     Recent Linux distributions, including RHEL since 6.0, contain
 124     guestfish.  Guestfish (see http://libguestfs.org/ - there are
 125     binary packages for several distros here) is a CLI for
 126     manipulating KVM/QEMU disk images.  Autocluster supports
 127     guestfish, so if guestfish is available then you should use it.
 128     It should be more reliable than NBD.
 129
 130     Autocluster attempts to use the best available method (guestmount
 131     -> guestfish -> loopback) for accessing disk image.  If it chooses
 132     a suboptimal method (e.g. nodes created with guestmount sometimes
 133     won't boot), you can force the method:
 134
 135       SYSTEM_DISK_ACCESS_METHOD=guestfish
 136
 137     If you can't use guestfish then you'll have to use NBD.  For this
 138     you will need the qemu-nbd and nbd-client programs, which
 139     autocluster uses to loopback-nbd-mount the disk images when
 140     configuring each node.
 141
 142     NBD for various distros:
 143
 144     * RHEL/CentOS
 145
 146       qemu-nbd is only available in the old packages from lfarkas.org.
 147       Recompiling the RHEL5 kvm package to support NBD is quite
 148       straightforward.  RHEL6 doesn't have an NBD kernel module, so is
 149       harder to retrofit for NBD support - use guestfish instead.
 150
 151       Unless you can find an RPM for nbd-client then you need to
 152       download source from:
 153
 154         http://sourceforge.net/projects/nbd/
 155
 156       and build it.
 157
 158     * Fedora Core
 159
 160       qemu-nbd is in the qemu-kvm or kvm package.
 161
 162       nbd-client is in the nbd package.
 163
 164     * Ubuntu
 165
 166       qemu-nbd is in the qemu-kvm or kvm package.  In older releases
 167       it is called kvm-nbd, so you need to set the QEMU_NBD
 168       configuration variable.
 169
 170       nbd-client is in the nbd-client package.
 171
 172     * As mentioned above, nbd can be found at:
 173
 174         http://sourceforge.net/projects/nbd/
 175
 176  c) Environment and libvirt virtual networks
 177
 178     You will need to add the autocluster directory to your PATH.
 179
 180     You will need to configure the right libvirt networking setup. To
 181     do this, run:
 182
 183       host_setup/setup_networks.sh [ <myconfig> ]
 184
 185     If you're using a network setup different to the default then pass
 186     your autocluster configuration filename, which should set the
 187     NETWORKS variable.  If you're using a variety of networks for
 188     different clusters then you can probably run this script multiple
 189     times.
 190
 191     You might also need to set:
 192
 193       VIRSH_DEFAULT_CONNECT_URI=qemu:///system
 194
 195     in your environment so that virsh does KVM/QEMU things by default.
 196
 197  2) Configure a local web/install server to provide required YUM
 198     repositories
 199
 200     If your install server is far away then you may need a caching web
 201     proxy on your local network.
 202
 203     If you don't have one, then you can install a squid proxy on your
 204     host amd set:
 205
 206       WEBPROXY="http://10.0.0.1:3128/"
 207
 208     See host_setup/etc/squid/squid.conf for a sample config suitable
 209     for a virtual cluster. Make sure it caches large objects and has
 210     plenty of space. This will be needed to make downloading all the
 211     RPMs to each client sane
 212
 213     To test your squid setup, run a command like this:
 214
 215       http_proxy=http://10.0.0.1:3128/ wget <some-url>
 216
 217     Check your firewall setup.  If you have problems accessing the
 218     proxy from your nodes (including from kickstart postinstall) then
 219     check it again!  Some distributions install nice "convenient"
 220     firewalls by default that might block access to the squid port
 221     from the nodes.  On a current version of Fedora Core you may be
 222     able to run system-config-firewall-tui to reconfigure the
 223     firewall.
 224
 225  3) Setup a DNS server on your host. See host_setup/etc/bind/ for a
 226     sample config that is suitable. It needs to redirect DNS queries
 227     for your virtual domain to your windows domain controller.
 228
 229  4) Download a RHEL (or CentOS) install ISO.
 230
 231
 232 CREATING A CLUSTER
 233 ==================
 234
 235 A cluster comprises a single base disk image, a copy-on-write disk
 236 image for each node and some XML files that tell libvirt about each
 237 node's virtual hardware configuration.  The copy-on-write disk images
 238 save a lot of disk space on the host machine because they each use the
 239 base disk image - without them the disk image for each cluster node
 240 would need to contain the entire RHEL install.
 241
 242 The cluster creation process can be broken down into several main
 243 steps:
 244
 245  1) Create a base disk image.
 246
 247  2) Create per-node disk images and corresponding XML files.
 248
 249  3) Update /etc/hosts to include cluster nodes.
 250
 251  4) Boot virtual machines for the nodes.
 252
 253  5) Post-boot configuration.
 254
 255 However, before you do this you will need to create a configuration
 256 file.  See the "CONFIGURATION" section below for more details.
 257
 258 Here are more details on the "create cluster" process.  Note that
 259 unless you have done something extra special then you'll need to run
 260 all of this as root.
 261
 262  1) Create the base disk image using:
 263
 264       ./autocluster base create
 265
 266     The first thing this step does is to check that it can connect to
 267     the YUM server.  If this fails make sure that there are no
 268     firewalls blocking your access to the server.
 269
 270     The install will take about 10 to 15 minutes and you will see the
 271     packages installing in your terminal
 272
 273     The installation process uses kickstart.  The choice of
 274     postinstall script is set using the POSTINSTALL_TEMPLATE variable.
 275     This can be used to install packages that will be common to all
 276     nodes into the base image.  This save time later when you're
 277     setting up the cluster nodes.  However, current usage (given that
 278     we test many versions of CTDB) is to default POSTINSTALL_TEMPLATE
 279     to "" and install packages post-boot.  This seems to be a
 280     reasonable compromise between flexibility (the base image can be,
 281     for example, a pristine RHEL7.0-base.qcow2, CTDB/Samba packages
 282     are selected post-base creation) and speed of cluster creation.
 283
 284     When that has finished you should mark that base image immutable
 285     like this:
 286
 287       chattr +i /virtual/ac-base.img
 288
 289     That will ensure it won't change. This is a precaution as the
 290     image will be used as a basis file for the per-node images, and if
 291     it changes your cluster will become corrupt
 292
 293  2-5)
 294     Now run "autocluster cluster build", specifying a configuration
 295     file. For example:
 296
 297       autocluster -c m1.autocluster cluster build
 298
 299     This will create and install the XML node descriptions and the
 300     disk images for your cluster nodes, and any other nodes you have
 301     configured.  Each disk image is initially created as an "empty"
 302     copy-on-write image, which is linked to the base image.  Those
 303     images are then attached to using guestfish or
 304     loopback-nbd-mounted, and populated with system configuration
 305     files and other potentially useful things (such as scripts).
 306     /etc/hosts is updated, the cluster is booted and post-boot
 307     setup is done.
 308
 309     Instead of doing all of the steps 2-5 using 1 command you call do:
 310
 311     2) autocluster -c m1.autocluster cluster create
 312
 313     3) autocluster -c m1.autocluster cluster update_hosts
 314
 315     4) autocluster -c m1.autocluster cluster boot
 316
 317     5) autocluster -c m1.autocluster cluster setup
 318
 319 BOOTING/DESTROY A CLUSTER
 320 =========================
 321
 322 Autocluster provides a command called "vircmd", which is a thin
 323 wrapper around libvirt's virsh command.  vircmd takes a cluster name
 324 instead of a node/domain name and runs the requested command on all
 325 nodes in the cluster.
 326
 327     The most useful vircmd commands are:
 328
 329       start    : boot a cluster
 330       shutdown : graceful shutdown of a cluster
 331       destroy  : power off a cluster immediately
 332
 333     You can watch boot progress like this:
 334
 335        tail -f /var/log/kvm/serial.c1*
 336
 337     All the nodes have serial consoles, making it easier to capture
 338     kernel panic messages and watch the nodes via ssh
 339
 340
 341 POST-BOOT SETUP
 342 ===============
 343
 344 Autocluster copies some scripts to cluster nodes to enable post-boot
 345 configuration.  These are used to configure specialised subsystems
 346 like GPFS or Samba, and are installed in /root/scripts/ on each node.
 347 The main entry point is cluster_setup.sh, which invokes specialised
 348 scripts depending on the cluster filesystem type or the node type.
 349 cluster_setup.sh is invoked by the cluster_setup() function in
 350 autocluster.
 351
 352 See cluster_setup() if you want to do things manually or if you want
 353 to add support for other node types and/or cluster filesystems.
 354
 355 There are also some older scripts that haven't been used for a while
 356 and have probably bit-rotted, such as setup_tsm_client.sh and
 357 setup_tsm_server.sh.  However, they are still provided as examples.
 358
 359 CONFIGURATION
 360 =============
 361
 362 Basics
 363 ======
 364
 365 Autocluster uses configuration files containing Unix shell style
 366 variables.  For example,
 367
 368   FIRSTIP=30
 369
 370 indicates that the last octet of the first IP address in the cluster
 371 will be 30.  If an option contains multiple words then they will be
 372 separated by underscores ('_'), as in:
 373
 374   ISO_DIR=/data/ISOs
 375
 376 All options have an equivalent command-line option, such
 377 as:
 378
 379   --firstip=30
 380
 381 Command-line options are lowercase.  Words are separated by dashes
 382 ('-'), as in:
 383
 384   --iso-dir=/data/ISOs
 385
 386 Normally you would use a configuration file with variables so that you
 387 can repeat steps easily.  The command-line equivalents are useful for
 388 trying things out without resorting to an editor.  You can specify a
 389 configuration file to use on the autocluster command-line using the -c
 390 option.  For example:
 391
 392   autocluster -c config-foo create base
 393
 394 If you don't provide a configuration variable then autocluster will
 395 look for a file called "config" in the current directory.
 396
 397 You can also use environment variables to override the default values
 398 of configuration variables.  However, both command-line options and
 399 configuration file entries will override environment variables.
 400
 401 Potentially useful information:
 402
 403 * Use "autocluster --help" to list all available command-line options
 404   - all the items listed under "configuration options:" are the
 405   equivalents of the settings for config files.  This output also
 406   shows descriptions of the options.
 407
 408 * You can use the --dump option to check the current value of
 409   configuration variables.  This is most useful when used in
 410   combination with grep:
 411
 412     autocluster --dump | grep ISO_DIR
 413
 414   In the past we recommended using --dump to create initial
 415   configuration file.  Don't do this - it is a bad idea!  There are a
 416   lot of options and you'll create a huge file that you don't
 417   understand and can't debug!
 418
 419 * Configuration options are defined in config.d/*.defconf.  You
 420   shouldn't need to look in these files... but sometimes they contain
 421   comments about options that are too long to fit into help strings.
 422
 423 Keep it simple
 424 ==============
 425
 426 * I recommend that you aim for the smallest possible configuration file.
 427   Perhaps start with:
 428
 429     FIRSTIP=<whatever>
 430
 431   and move on from there.
 432
 433 * The NODES configuration variable controls the types of nodes that
 434   are created.  At the time of writing, the default value is:
 435
 436     NODES="nas:0-3 rhel_base:4"
 437
 438   This means that you get 4 clustered NAS nodes, at IP offsets 0, 1,
 439   2, & 3 from FIRSTIP, all part of the CTDB cluster.  You also get an
 440   additional utility node at IP offset 4 that can be used, for
 441   example, as a test client.  The base node will not be part of the
 442   CTDB cluster.  It is just extra node that can be used as a test
 443   client or similar.
 444
 445 Corrupt system disks
 446 ====================
 447
 448 Recent versions of KVM seem to have fixed problems where the
 449 combination of qcow2 file format, virtio block devices and writeback
 450 caching would cause result in corrupt.  This means the default system
 451 disk bus type (a.k.a. SYSTEM_DISK_TYPE) is now virtio.
 452
 453 If using an older version of KVM or if you experience corruption of
 454 the system disk, try using IDE system disks:
 455
 456   SYSTEM_DISK_TYPE=ide
 457
 458 iSCSI shared disks
 459 ==================
 460
 461 The RHEL5 version of KVM does not support the SCSI block device
 462 emulation.  Therefore, you can use either virtio or iSCSI shared
 463 disks.  Unfortunately, in RHEL5.4 and early versions of RHEL5.5,
 464 virtio block devices are not supported by the version of multipath in
 465 RHEL5.  So this leaves iSCSI as the only choice.
 466
 467 The main configuration options you need for iSCSI disks are:
 468
 469   SHARED_DISK_TYPE=iscsi
 470   NICMODEL=virtio        # Recommended for performance
 471   add_extra_package iscsi-initiator-utils
 472
 473 Note that SHARED_DISK_PREFIX and SHARED_DISK_CACHE are ignored for
 474 iSCSI shared disks because KVM doesn't (need to) know about them.
 475
 476 You will need to install the scsi-target-utils package on the host
 477 system.  After creating a cluster, autocluster will print a message
 478 that points you to a file tmp/iscsi.$CLUSTER - you need to run the
 479 commands in this file (probably via: sh tmp/iscsi.$CLUSTER) before
 480 booting your cluster.  This will remove any old target with the same
 481 ID, and create the new target, LUNs and ACLs.
 482
 483 You can use the following command to list information about the
 484 target:
 485
 486   tgtadm --lld iscsi --mode target --op show
 487
 488 If you need multiple clusters using iSCSI on the same host then each
 489 cluster will need to have a different setting for ISCSI_TID.
 490
 491 Raw IDE system disks
 492 ====================
 493
 494 Older RHEL versions of KVM did not support the SCSI block device
 495 emulation, and produced corruption when virtio disks were used with
 496 qcow2 disk images and writeback caching.  In this case, you can use
 497 either virtio system disks without any caching, accepting reduced
 498 performance, or you can use IDE system disks with writeback caching,
 499 with nice performance.
 500
 501 For IDE disks, here are the required settings:
 502
 503   SYSTEM_DISK_TYPE=ide
 504   SYSTEM_DISK_PREFIX=hd
 505   SYSTEM_DISK_CACHE=writeback
 506
 507 The next problem is that RHEL5's KVM does not include qemu-nbd.  The
 508 best solution is to build your own qemu-nbd and stop reading this
 509 section.
 510
 511 If, for whatever reason, you're unable to build your own qemu-nbd,
 512 then you can use raw, rather than qcow2, system disks.  If you do this
 513 then you need significantly more disk space (since the system disks
 514 will be *copies* of the base image) and cluster creation time will no
 515 longer be pleasantly snappy (due to the copying time - the images are
 516 large and a single copy can take several minutes).  So, having tried
 517 to warn you off this option, if you really want to do this then you'll
 518 need these settings:
 519
 520   SYSTEM_DISK_FORMAT=raw
 521   BASE_FORMAT=raw
 522
 523 Note that if you're testing cluster creation with iSCSI shared disks
 524 then you should find a way of switching off raw disks.  This avoids
 525 every iSCSI glitch costing you a lot of time while raw disks are
 526 copied.
 527
 528 DEVELOPMENT HINTS
 529 =================
 530
 531 The -e option provides support for executing arbitrary bash code.
 532 This is useful for testing and debugging.
 533
 534 One good use of this option is to test template substitution using the
 535 function substitute_vars().  For example:
 536
 537   ./autocluster -c example.autocluster -e 'CLUSTER=foo; DISK=foo.qcow2; UUID=abcdef; NAME=foon1; set_macaddrs; substitute_vars templates/node.xml'
 538
 539 This prints templates/node.xml with all appropriate substitutions
 540 done.  Some internal variables (e.g. CLUSTER, DISK, UUID, NAME) are
 541 given fairly arbitrary values but the various MAC address strings are
 542 set using the function set_macaddrs().
 543
 544 The -e option is also useful when writing scripts that use
 545 autocluster.  Given the complexities of the configuration system you
 546 probably don't want to parse configuration files yourself to determine
 547 the current settings.  Instead, you can ask autocluster to tell you
 548 useful pieces of information.  For example, say you want to script
 549 creating a base disk image and you want to ensure the image is
 550 marked immutable:
 551
 552   base_image=$(autocluster -c $CONFIG -e 'echo $VIRTBASE/$BASENAME.img')
 553   chattr -V -i "$base_image"
 554
 555   if autocluster -c $CONFIG create base ; then
 556     chattr -V +i "$base_image"
 557     ...
 558
 559 Note that the command that autocluster should run is enclosed in
 560 single quotes.  This means that $VIRTBASE and $BASENAME will be expand
 561 within autocluster after the configuration file has been loaded.