README

   1 INTRODUCTION
   2 ============
   3
   4 Autocluster is set of scripts for building virtual clusters to test
   5 clustered Samba.  It uses Linux's libvirt and KVM virtualisation
   6 engine.
   7
   8 Autocluster is a collection of scripts, template and configuration
   9 files that allow you to create a cluster of virtual nodes very
  10 quickly.  You can create a cluster from scratch in less than 30
  11 minutes.  Once you have a base image you can then recreate a cluster
  12 or create new virtual clusters in minutes.
  13
  14 Autocluster has recently been tested to create virtual clusters of
  15 RHEL 6/7 nodes.  Older versions were tested with RHEL 5 and some
  16 versions of CentOS.
  17
  18
  19 CONTENTS
  20 ========
  21
  22 * INSTALLING AUTOCLUSTER
  23
  24 * HOST MACHINE SETUP
  25
  26 * CREATING A CLUSTER
  27
  28 * BOOTING A CLUSTER
  29
  30 * POST-CREATION SETUP
  31
  32 * CONFIGURATION
  33
  34 * DEVELOPMENT HINTS
  35
  36
  37 INSTALLING AUTOCLUSTER
  38 ======================
  39
  40 Before you start, make sure you have the latest version of
  41 autocluster. To download autocluster do this:
  42
  43   git clone git://git.samba.org/autocluster.git
  44
  45 Or to update it, run "git pull" in the autocluster directory
  46
  47 You probably want to add the directory where autocluster is installed
  48 to your PATH, otherwise things may quickly become tedious.
  49
  50
  51 HOST MACHINE SETUP
  52 ==================
  53
  54 This section explains how to setup a host machine to run virtual
  55 clusters generated by autocluster.
  56
  57
  58  1) Install and configure required software.
  59
  60  a) Install kvm, libvirt and expect.
  61
  62     Autocluster creates virtual machines that use libvirt to run under
  63     KVM.  This means that you will need to install both KVM and
  64     libvirt on your host machine.  Expect is used by the waitfor()
  65     function and should be available for installation from your
  66     distribution.
  67
  68     For various distros:
  69
  70     * RHEL/CentOS
  71
  72       Autocluster should work with the standard RHEL qemu-kvm and
  73       libvirt packages.  It will try to find the qemu-kvm binary.  If
  74       you've done something unusual then you'll need to set the KVM
  75       configuration variable.
  76
  77       For RHEL5/CentOS5, useful packages for both kvm and libvirt used
  78       to be found here:
  79
  80         http://www.lfarkas.org/linux/packages/centos/5/x86_64/
  81
  82       However, since recent versions of RHEL5 ship with KVM, 3rd party
  83       KVM RPMs for RHEL5 are now scarce.
  84
  85       RHEL5.4's KVM also has problems when autocluster uses virtio
  86       shared disks, since multipath doesn't notice virtio disks.  This
  87       is fixed in RHEL5.6 and in a recent RHEL5.5 update - you should
  88       be able to use the settings recommended above for RHEL6.
  89
  90       If you're still running RHEL5.4, you have lots of time, you have
  91       lots of disk space, and you like complexity, then see the
  92       sections below on "iSCSI shared disks" and "Raw IDE system
  93       disks".  :-)
  94
  95     * Fedora
  96
  97       Useful packages ship with Fedora Core 10 (Cambridge) and later.
  98       Some of the above notes on RHEL might apply to Fedora's KVM.
  99
 100     * Ubuntu
 101
 102       Useful packages ship with Ubuntu 8.10 (Intrepid Ibex) and later.
 103       In recent Ubuntu versions (e.g. 10.10 Maverick Meerkat) the KVM
 104       package is called "qemu-kvm".  Older versions have a package
 105       called "kvm".
 106
 107     For other distributions you'll have to backport distro sources or
 108     compile from upstream source as described below.
 109
 110     * For KVM see the "Downloads" and "Code" sections at:
 111
 112         http://www.linux-kvm.org/
 113
 114     * For libvirt see:
 115
 116         http://libvirt.org/
 117
 118  b) Install guestfish or qemu-nbd and nbd-client.
 119
 120     Autocluster needs a method of updating files in the disk image for
 121     each node.
 122
 123     Recent Linux distributions, including RHEL since 6.0, contain
 124     guestfish.  Guestfish (see http://libguestfs.org/ - there are
 125     binary packages for several distros here) is a CLI for
 126     manipulating KVM/QEMU disk images.  Autocluster supports
 127     guestfish, so if guestfish is available then you should use it.
 128     It should be more reliable than NBD.
 129
 130     Autocluster attempts to use the best available method (guestmount
 131     -> guestfish -> loopback) for accessing disk image.  If it chooses
 132     a suboptimal method (e.g. nodes created with guestmount sometimes
 133     won't boot), you can force the method:
 134
 135       SYSTEM_DISK_ACCESS_METHOD=guestfish
 136
 137     If you can't use guestfish then you'll have to use NBD.  For this
 138     you will need the qemu-nbd and nbd-client programs, which
 139     autocluster uses to loopback-nbd-mount the disk images when
 140     configuring each node.
 141
 142     NBD for various distros:
 143
 144     * RHEL/CentOS
 145
 146       qemu-nbd is only available in the old packages from lfarkas.org.
 147       Recompiling the RHEL5 kvm package to support NBD is quite
 148       straightforward.  RHEL6 doesn't have an NBD kernel module, so is
 149       harder to retrofit for NBD support - use guestfish instead.
 150
 151       Unless you can find an RPM for nbd-client then you need to
 152       download source from:
 153
 154         http://sourceforge.net/projects/nbd/
 155
 156       and build it.
 157
 158     * Fedora Core
 159
 160       qemu-nbd is in the qemu-kvm or kvm package.
 161
 162       nbd-client is in the nbd package.
 163
 164     * Ubuntu
 165
 166       qemu-nbd is in the qemu-kvm or kvm package.  In older releases
 167       it is called kvm-nbd, so you need to set the QEMU_NBD
 168       configuration variable.
 169
 170       nbd-client is in the nbd-client package.
 171
 172     * As mentioned above, nbd can be found at:
 173
 174         http://sourceforge.net/projects/nbd/
 175
 176  c) Environment and libvirt virtual networks
 177
 178     You will need to add the autocluster directory to your PATH.
 179
 180     You will need to configure the right libvirt networking setup. To
 181     do this, run:
 182
 183       host_setup/setup_networks.sh [ <myconfig> ]
 184
 185     If you're using a network setup different to the default then pass
 186     your autocluster configuration filename, which should set the
 187     NETWORKS variable.  If you're using a variety of networks for
 188     different clusters then you can probably run this script multiple
 189     times.
 190
 191     You might also need to set:
 192
 193       VIRSH_DEFAULT_CONNECT_URI=qemu:///system
 194
 195     in your environment so that virsh does KVM/QEMU things by default.
 196
 197  2) Configure a local web/install server to provide required YUM
 198     repositories
 199
 200     If your install server is far away then you may need a caching web
 201     proxy on your local network.
 202
 203     If you don't have one, then you can install a squid proxy on your
 204     host amd set:
 205
 206       WEBPROXY="http://10.0.0.1:3128/"
 207
 208     See host_setup/etc/squid/squid.conf for a sample config suitable
 209     for a virtual cluster. Make sure it caches large objects and has
 210     plenty of space. This will be needed to make downloading all the
 211     RPMs to each client sane
 212
 213     To test your squid setup, run a command like this:
 214
 215       http_proxy=http://10.0.0.1:3128/ wget <some-url>
 216
 217     Check your firewall setup.  If you have problems accessing the
 218     proxy from your nodes (including from kickstart postinstall) then
 219     check it again!  Some distributions install nice "convenient"
 220     firewalls by default that might block access to the squid port
 221     from the nodes.  On a current version of Fedora Core you may be
 222     able to run system-config-firewall-tui to reconfigure the
 223     firewall.
 224
 225  3) Setup a DNS server on your host. See host_setup/etc/bind/ for a
 226     sample config that is suitable. It needs to redirect DNS queries
 227     for your virtual domain to your windows domain controller.
 228
 229  4) Download a RHEL (or CentOS) install ISO.
 230
 231
 232 CREATING A CLUSTER
 233 ==================
 234
 235 A cluster comprises a single base disk image, a copy-on-write disk
 236 image for each node and some XML files that tell libvirt about each
 237 node's virtual hardware configuration.  The copy-on-write disk images
 238 save a lot of disk space on the host machine because they each use the
 239 base disk image - without them the disk image for each cluster node
 240 would need to contain the entire RHEL install.
 241
 242 The cluster creation process can be broken down into several main
 243 steps:
 244
 245  1) Create a base disk image.
 246
 247  2) Create per-node disk images and corresponding XML files.
 248
 249  3) Update /etc/hosts to include cluster nodes.
 250
 251  4) Boot virtual machines for the nodes.
 252
 253  5) Post-boot configuration.
 254
 255 However, before you do this you will need to create a configuration
 256 file.  See the "CONFIGURATION" section below for more details.
 257
 258 Here are more details on the "create cluster" process.  Note that
 259 unless you have done something extra special then you'll need to run
 260 all of this as root.
 261
 262  1) Create the base disk image using:
 263
 264       ./autocluster base create
 265
 266     The first thing this step does is to check that it can connect to
 267     the YUM server.  If this fails make sure that there are no
 268     firewalls blocking your access to the server.
 269
 270     The install will take about 10 to 15 minutes and you will see the
 271     packages installing in your terminal
 272
 273     The installation process uses kickstart.  The choice of
 274     postinstall script is set using the POSTINSTALL_TEMPLATE variable.
 275     This can be used to install packages that will be common to all
 276     nodes into the base image.  This save time later when you're
 277     setting up the cluster nodes.  However, current usage (given that
 278     we test many versions of CTDB) is to default POSTINSTALL_TEMPLATE
 279     to "" and install packages post-boot.  This seems to be a
 280     reasonable compromise between flexibility (the base image can be,
 281     for example, a pristine RHEL7.0-base.qcow2, CTDB/Samba packages
 282     are selected post-base creation) and speed of cluster creation.
 283
 284     When that has finished you should mark that base image immutable
 285     like this:
 286
 287       chattr +i /virtual/ac-base.img
 288
 289     That will ensure it won't change. This is a precaution as the
 290     image will be used as a basis file for the per-node images, and if
 291     it changes your cluster will become corrupt
 292
 293  2-5)
 294     Now run "autocluster cluster build", specifying a configuration
 295     file. For example:
 296
 297       autocluster -c m1.autocluster cluster build
 298
 299     This will create and install the XML node descriptions and the
 300     disk images for your cluster nodes, and any other nodes you have
 301     configured.  Each disk image is initially created as an "empty"
 302     copy-on-write image, which is linked to the base image.  Those
 303     images are then attached to using guestfish or
 304     loopback-nbd-mounted, and populated with system configuration
 305     files and other potentially useful things (such as scripts).
 306     /etc/hosts is updated, the cluster is booted and post-boot
 307     setup is done.
 308
 309     Instead of doing all of the steps 2-5 using 1 command you call do:
 310
 311     2) autocluster -c m1.autocluster cluster create
 312
 313     3) autocluster -c m1.autocluster cluster update_hosts
 314
 315     4) autocluster -c m1.autocluster cluster boot
 316
 317     5) autocluster -c m1.autocluster cluster configure
 318
 319 BOOTING/DESTROY A CLUSTER
 320 =========================
 321
 322 Autocluster provides a command called "vircmd", which is a thin
 323 wrapper around libvirt's virsh command.  vircmd takes a cluster name
 324 instead of a node/domain name and runs the requested command on all
 325 nodes in the cluster.
 326
 327     The most useful vircmd commands are:
 328
 329       start    : boot a node
 330       shutdown : graceful shutdown of a node
 331       destroy  : power off a node immediately
 332
 333     You can watch boot progress like this:
 334
 335        tail -f /var/log/kvm/serial.c1*
 336
 337     All the nodes have serial consoles, making it easier to capture
 338     kernel panic messages and watch the nodes via ssh
 339
 340
 341 POST-BOOT SETUP
 342 ===============
 343
 344 Autocluster copies some scripts to cluster nodes to enable post-boot
 345 configuration.  These are used to configure specialised subsystems
 346 like GPFS or Samba and are installed in /root/scripts/ on each node.
 347 The main 2 entry points are install_packages.sh and setup_cluster.sh.
 348 To setup a clustered NAS system you will normally need to run
 349 setup_gpfs.sh and setup_cluster.sh on one of the nodes.  If you want
 350 to run these manually, see autocluster's cluster_configure() function
 351 for example usage.
 352
 353 There are also some older scripts that haven't been used for a while
 354 and have probably bit-rotted, such as setup_tsm_client.sh and
 355 setup_tsm_server.sh.  However, they are still provided as examples.
 356
 357 CONFIGURATION
 358 =============
 359
 360 Basics
 361 ======
 362
 363 Autocluster uses configuration files containing Unix shell style
 364 variables.  For example,
 365
 366   FIRSTIP=30
 367
 368 indicates that the last octet of the first IP address in the cluster
 369 will be 30.  If an option contains multiple words then they will be
 370 separated by underscores ('_'), as in:
 371
 372   ISO_DIR=/data/ISOs
 373
 374 All options have an equivalent command-line option, such
 375 as:
 376
 377   --firstip=30
 378
 379 Command-line options are lowercase.  Words are separated by dashes
 380 ('-'), as in:
 381
 382   --iso-dir=/data/ISOs
 383
 384 Normally you would use a configuration file with variables so that you
 385 can repeat steps easily.  The command-line equivalents are useful for
 386 trying things out without resorting to an editor.  You can specify a
 387 configuration file to use on the autocluster command-line using the -c
 388 option.  For example:
 389
 390   autocluster -c config-foo create base
 391
 392 If you don't provide a configuration variable then autocluster will
 393 look for a file called "config" in the current directory.
 394
 395 You can also use environment variables to override the default values
 396 of configuration variables.  However, both command-line options and
 397 configuration file entries will override environment variables.
 398
 399 Potentially useful information:
 400
 401 * Use "autocluster --help" to list all available command-line options
 402   - all the items listed under "configuration options:" are the
 403   equivalents of the settings for config files.  This output also
 404   shows descriptions of the options.
 405
 406 * You can use the --dump option to check the current value of
 407   configuration variables.  This is most useful when used in
 408   combination with grep:
 409
 410     autocluster --dump | grep ISO_DIR
 411
 412   In the past we recommended using --dump to create initial
 413   configuration file.  Don't do this - it is a bad idea!  There are a
 414   lot of options and you'll create a huge file that you don't
 415   understand and can't debug!
 416
 417 * Configuration options are defined in config.d/*.defconf.  You
 418   shouldn't need to look in these files... but sometimes they contain
 419   comments about options that are too long to fit into help strings.
 420
 421 Keep it simple
 422 ==============
 423
 424 * I recommend that you aim for the smallest possible configuration file.
 425   Perhaps start with:
 426
 427     FIRSTIP=<whatever>
 428
 429   and move on from there.
 430
 431 * The NODES configuration variable controls the types of nodes that
 432   are created.  At the time of writing, the default value is:
 433
 434     NODES="nas:0-3 rhel_base:4"
 435
 436   This means that you get 4 clustered NAS nodes, at IP offsets 0, 1,
 437   2, & 3 from FIRSTIP, all part of the CTDB cluster.  You also get an
 438   additional utility node at IP offset 4 that can be used, for
 439   example, as a test client.  The base node will not be part of the
 440   CTDB cluster.  It is just extra node that can be used as a test
 441   client or similar.
 442
 443 iSCSI shared disks
 444 ==================
 445
 446 The RHEL5 version of KVM does not support the SCSI block device
 447 emulation.  Therefore, you can use either virtio or iSCSI shared
 448 disks.  Unfortunately, in RHEL5.4 and early versions of RHEL5.5,
 449 virtio block devices are not supported by the version of multipath in
 450 RHEL5.  So this leaves iSCSI as the only choice.
 451
 452 The main configuration options you need for iSCSI disks are:
 453
 454   SHARED_DISK_TYPE=iscsi
 455   NICMODEL=virtio        # Recommended for performance
 456   add_extra_package iscsi-initiator-utils
 457
 458 Note that SHARED_DISK_PREFIX and SHARED_DISK_CACHE are ignored for
 459 iSCSI shared disks because KVM doesn't (need to) know about them.
 460
 461 You will need to install the scsi-target-utils package on the host
 462 system.  After creating a cluster, autocluster will print a message
 463 that points you to a file tmp/iscsi.$CLUSTER - you need to run the
 464 commands in this file (probably via: sh tmp/iscsi.$CLUSTER) before
 465 booting your cluster.  This will remove any old target with the same
 466 ID, and create the new target, LUNs and ACLs.
 467
 468 You can use the following command to list information about the
 469 target:
 470
 471   tgtadm --lld iscsi --mode target --op show
 472
 473 If you need multiple clusters using iSCSI on the same host then each
 474 cluster will need to have a different setting for ISCSI_TID.
 475
 476 Raw IDE system disks
 477 ====================
 478
 479 RHEL versions of KVM do not support the SCSI block device emulation,
 480 so autocluster now defaults to using an IDE system disk instead of a
 481 SCSI one.  Therefore, you can use virtio or ide system disks.
 482 However, writeback caching, qcow2 and virtio are incompatible and
 483 result in I/O corruption.  So, you can use either virtio system disks
 484 without any caching, accepting reduced performance, or you can use IDE
 485 system disks with writeback caching, with nice performance.
 486
 487 For IDE disks, here are the required settings:
 488
 489   SYSTEM_DISK_TYPE=ide
 490   SYSTEM_DISK_PREFIX=hd
 491   SYSTEM_DISK_CACHE=writeback
 492
 493 The next problem is that RHEL5's KVM does not include qemu-nbd.  The
 494 best solution is to build your own qemu-nbd and stop reading this
 495 section.
 496
 497 If, for whatever reason, you're unable to build your own qemu-nbd,
 498 then you can use raw, rather than qcow2, system disks.  If you do this
 499 then you need significantly more disk space (since the system disks
 500 will be *copies* of the base image) and cluster creation time will no
 501 longer be pleasantly snappy (due to the copying time - the images are
 502 large and a single copy can take several minutes).  So, having tried
 503 to warn you off this option, if you really want to do this then you'll
 504 need these settings:
 505
 506   SYSTEM_DISK_FORMAT=raw
 507   BASE_FORMAT=raw
 508
 509 Note that if you're testing cluster creation with iSCSI shared disks
 510 then you should find a way of switching off raw disks.  This avoids
 511 every iSCSI glitch costing you a lot of time while raw disks are
 512 copied.
 513
 514 DEVELOPMENT HINTS
 515 =================
 516
 517 The -e option provides support for executing arbitrary bash code.
 518 This is useful for testing and debugging.
 519
 520 One good use of this option is to test template substitution using the
 521 function substitute_vars().  For example:
 522
 523   ./autocluster -c example.autocluster -e 'CLUSTER=foo; DISK=foo.qcow2; UUID=abcdef; NAME=foon1; set_macaddrs; substitute_vars templates/node.xml'
 524
 525 This prints templates/node.xml with all appropriate substitutions
 526 done.  Some internal variables (e.g. CLUSTER, DISK, UUID, NAME) are
 527 given fairly arbitrary values but the various MAC address strings are
 528 set using the function set_macaddrs().
 529
 530 The -e option is also useful when writing scripts that use
 531 autocluster.  Given the complexities of the configuration system you
 532 probably don't want to parse configuration files yourself to determine
 533 the current settings.  Instead, you can ask autocluster to tell you
 534 useful pieces of information.  For example, say you want to script
 535 creating a base disk image and you want to ensure the image is
 536 marked immutable:
 537
 538   base_image=$(autocluster -c $CONFIG -e 'echo $VIRTBASE/$BASENAME.img')
 539   chattr -V -i "$base_image"
 540
 541   if autocluster -c $CONFIG create base ; then
 542     chattr -V +i "$base_image"
 543     ...
 544
 545 Note that the command that autocluster should run is enclosed in
 546 single quotes.  This means that $VIRTBASE and $BASENAME will be expand
 547 within autocluster after the configuration file has been loaded.