README

   1 INTRODUCTION
   2 ============
   3
   4 Autocluster is set of scripts for building virtual clusters to test
   5 clustered Samba.  It uses Linux's libvirt and KVM virtualisation
   6 engine.
   7
   8 Autocluster is a collection of scripts, template and configuration
   9 files that allow you to create a cluster of virtual nodes very
  10 quickly.  You can create a cluster from scratch in less than 30
  11 minutes.  Once you have a base image you can then recreate a cluster
  12 or create new virtual clusters in minutes.
  13
  14 The current implementation creates virtual clusters of RHEL5/6 nodes.
  15
  16
  17 CONTENTS
  18 ========
  19
  20 * INSTALLING AUTOCLUSTER
  21
  22 * HOST MACHINE SETUP
  23
  24 * CREATING A CLUSTER
  25
  26 * BOOTING A CLUSTER
  27
  28 * POST-CREATION SETUP
  29
  30 * CONFIGURATION
  31
  32 * DEVELOPMENT HINTS
  33
  34
  35 INSTALLING AUTOCLUSTER
  36 ======================
  37
  38 Before you start, make sure you have the latest version of
  39 autocluster. To download autocluster do this:
  40
  41   git clone git://git.samba.org/tridge/autocluster.git autocluster
  42
  43 Or to update it, run "git pull" in the autocluster directory
  44
  45 You probably want to add the directory where autocluster is installed
  46 to your PATH, otherwise things may quickly become tedious.
  47
  48
  49 HOST MACHINE SETUP
  50 ==================
  51
  52 This section explains how to setup a host machine to run virtual
  53 clusters generated by autocluster.
  54
  55
  56  1) Install and configure required software.
  57
  58  a) Install kvm, libvirt and expect.
  59
  60     Autocluster creates virtual machines that use libvirt to run under
  61     KVM.  This means that you will need to install both KVM and
  62     libvirt on your host machine.  Expect is used by the "waitfor"
  63     script and should be available for installation form your
  64     distribution.
  65
  66     For various distros:
  67
  68     * RHEL/CentOS
  69
  70       Autocluster should work with the standard RHEL6 qemu-kvm and
  71       libvirt packages.  However, you'll need to tell autocluster
  72       where the KVM executable is:
  73
  74         KVM=/usr/libexec/qemu-kvm
  75
  76       For RHEL5/CentOS5, useful packages for both kvm and libvirt used
  77       to be found here:
  78
  79         http://www.lfarkas.org/linux/packages/centos/5/x86_64/
  80
  81       However, since recent versions of RHEL5 ship with KVM, 3rd party
  82       KVM RPMs for RHEL5 are now scarce.
  83
  84       RHEL5.4's KVM also has problems when autocluster uses virtio
  85       shared disks, since multipath doesn't notice virtio disks.  This
  86       is fixed in RHEL5.6 and in a recent RHEL5.5 update - you should
  87       be able to use the settings recommended above for RHEL6.
  88
  89       If you're still running RHEL5.4, you have lots of time, you have
  90       lots of disk space and you like complexity then see the sections
  91       below on "iSCSI shared disks" and "Raw IDE system disks".
  92
  93     * Fedora
  94
  95       Useful packages ship with Fedora Core 10 (Cambridge) and later.
  96       Some of the above notes on RHEL might apply to Fedora's KVM.
  97
  98     * Ubuntu
  99
 100       Useful packages ship with Ubuntu 8.10 (Intrepid Ibex) and later.
 101       In recent Ubuntu versions (e.g. 10.10 Maverick Meerkat) the KVM
 102       package is called "qemu-kvm".  Older versions have a package
 103       called "kvm".
 104
 105     For other distributions you'll have to backport distro sources or
 106     compile from upstream source as described below.
 107
 108     * For KVM see the "Downloads" and "Code" sections at:
 109
 110         http://www.linux-kvm.org/
 111
 112     * For libvirt see:
 113
 114         http://libvirt.org/
 115
 116  b) Install guestfish or qemu-nbd and nbd-client.
 117
 118     Recent Linux distributions, including RHEL since 6.0, contain
 119     guestfish.  Guestfish (see http://libguestfs.org/ - there are
 120     binary packages for several distros here) is a CLI for
 121     manipulating KVM/QEMU disk images.  Autocluster supports
 122     guestfish, so if guestfish is available then you should use it.
 123     It should be more reliable than NBD.
 124
 125     Autocluster attempts to use the best available method (guestmount
 126     -> guestfish -> loopback) for accessing disk image.  If it chooses
 127     a suboptimal method, you can force the method:
 128
 129       SYSTEM_DISK_ACCESS_METHOD=guestfish
 130
 131     If you can't use guestfish then you'll have to use NBD.  For this
 132     you will need the qemu-nbd and nbd-client programs, which
 133     autocluster uses to loopback-nbd-mount the disk images when
 134     configuring each node.
 135
 136     NBD for various distros:
 137
 138     * RHEL/CentOS
 139
 140       qemu-nbd is only available in the old packages from lfarkas.org.
 141       Recompiling the RHEL5 kvm package to support NBD is quite
 142       straightforward.  RHEL6 doesn't have an NBD kernel module, so is
 143       harder to retrofit for NBD support - use guestfish instead.
 144
 145       Unless you can find an RPM for nbd-client then you need to
 146       download source from:
 147
 148         http://sourceforge.net/projects/nbd/
 149
 150       and build it.
 151
 152     * Fedora Core
 153
 154       qemu-nbd is in the qemu-kvm or kvm package.
 155
 156       nbd-client is in the nbd package.
 157
 158     * Ubuntu
 159
 160       qemu-nbd is in the qemu-kvm or kvm package.  In older releases
 161       it is called kvm-nbd, so you need to set the QEMU_NBD
 162       configuration variable.
 163
 164       nbd-client is in the nbd-client package.
 165
 166     * As mentioned above, nbd can be found at:
 167
 168         http://sourceforge.net/projects/nbd/
 169
 170  c) Environment and libvirt virtual networks
 171
 172     You will need to add the autocluster directory to your PATH.
 173
 174     You will need to configure the right kvm networking setup. The
 175     files in host_setup/etc/libvirt/qemu/networks/ should help. This
 176     command will install the right networks for kvm:
 177
 178        rsync -av --delete host_setup/etc/libvirt/qemu/networks/ /etc/libvirt/qemu/networks/
 179
 180     Note that you'll need to edit the installed files to reflect any
 181     changes to IPBASE, IPNET0, IPNET1, IPNET2 away from the defaults.
 182     This is also true for named.conf.local and squid.conf (see below).
 183
 184     After this you might need to reload libvirt:
 185
 186       /etc/init.d/libvirt reload
 187
 188     or similar.
 189
 190     You might also need to set:
 191
 192       VIRSH_DEFAULT_CONNECT_URI=qemu:///system
 193
 194     in your environment so that virsh does KVM/QEMU things by default.
 195
 196  2) If your install server is far away then you may need a caching web
 197     proxy on your local network.
 198
 199     If you don't have one, then you can install a squid proxy on your
 200     host amd set:
 201
 202       WEBPROXY="http://10.0.0.1:3128/"
 203
 204     See host_setup/etc/squid/squid.conf for a sample config suitable
 205     for a virtual cluster. Make sure it caches large objects and has
 206     plenty of space. This will be needed to make downloading all the
 207     RPMs to each client sane
 208
 209     To test your squid setup, run a command like this:
 210
 211       http_proxy=http://10.0.0.1:3128/ wget <some-url>
 212
 213     Check your firewall setup.  If you have problems accessing the
 214     proxy from your nodes (including from kickstart postinstall) then
 215     check it again!  Some distributions install nice "convenient"
 216     firewalls by default that might block access to the squid port
 217     from the nodes.  On a current version of Fedora Core you may be
 218     able to run system-config-firewall-tui to reconfigure the
 219     firewall.
 220
 221  3) Setup a DNS server on your host. See host_setup/etc/bind/ for a
 222     sample config that is suitable. It needs to redirect DNS queries
 223     for your virtual domain to your windows domain controller
 224
 225  4) Download a RHEL install ISO.
 226
 227
 228 CREATING A CLUSTER
 229 ==================
 230
 231 A cluster comprises a single base disk image, a copy-on-write disk
 232 image for each node and some XML files that tell libvirt about each
 233 node's virtual hardware configuration.  The copy-on-write disk images
 234 save a lot of disk space on the host machine because they each use the
 235 base disk image - without them the disk image for each cluster node
 236 would need to contain the entire RHEL install.
 237
 238 The cluster creation process can be broken down into 2 mains steps:
 239
 240  1) Creating the base disk image.
 241
 242  2) Create the per-node disk images and corresponding XML files.
 243
 244 However, before you do this you will need to create a configuration
 245 file.  See the "CONFIGURATION" section below for more details.
 246
 247 Here are more details on the "create cluster" process.  Note that
 248 unless you have done something extra special then you'll need to run
 249 all of this as root.
 250
 251  1) Create the base disk image using:
 252
 253       ./autocluster create base
 254
 255     The first thing this step does is to check that it can connect to
 256     the YUM server.  If this fails make sure that there are no
 257     firewalls blocking your access to the server.
 258
 259     The install will take about 10 to 15 minutes and you will see the
 260     packages installing in your terminal
 261
 262     The installation process uses kickstart.  The choice of
 263     postinstall script is set using the POSTINSTALL_TEMPLATE variable.
 264     An example is provided in
 265     base/all/root/scripts/gpfs-nas-postinstall.sh.
 266
 267     It makes sense to install packages that will be common to all
 268     nodes into the base image.  This save time later when you're
 269     setting up the cluster nodes.  However, you don't have to do this
 270     - you can set POSTINSTALL_TEMPLATE to "" instead - but then you
 271     will lose the quick cluster creation/setup that is a major feature
 272     of autocluster.
 273
 274     When that has finished you should mark that base image immutable
 275     like this:
 276
 277       chattr +i /virtual/ac-base.img
 278
 279     That will ensure it won't change. This is a precaution as the
 280     image will be used as a basis file for the per-node images, and if
 281     it changes your cluster will become corrupt
 282
 283  2) Now run "autocluster create cluster" specifying a cluster
 284     name. For example:
 285
 286       autocluster create cluster c1
 287
 288     This will create and install the XML node descriptions and the
 289     disk images for your cluster nodes, and any other nodes you have
 290     configured.  Each disk image is initially created as an "empty"
 291     copy-on-write image, which is linked to the base image.  Those
 292     images are then attached to using guestfish or
 293     loopback-nbd-mounted, and populated with system configuration
 294     files and other potentially useful things (such as scripts).
 295
 296
 297 BOOTING A CLUSTER
 298 =================
 299
 300 At this point the cluster has been created but isn't yet running.
 301 Autocluster provides a command called "vircmd", which is a thin
 302 wrapper around libvirt's virsh command.  vircmd takes a cluster name
 303 instead of a node/domain name and runs the requested command on all
 304 nodes in the cluster.
 305
 306  1) Now boot your cluster nodes like this:
 307
 308       vircmd start c1
 309
 310     The most useful vircmd commands are:
 311
 312       start    : boot a node
 313       shutdown : graceful shutdown of a node
 314       destroy  : power off a node immediately
 315
 316  2) You can watch boot progress like this:
 317
 318        tail -f /var/log/kvm/serial.c1*
 319
 320     All the nodes have serial consoles, making it easier to capture
 321     kernel panic messages and watch the nodes via ssh
 322
 323
 324 POST-CREATION SETUP
 325 ===================
 326
 327 Now you have a cluster of nodes, which might have a variety of
 328 packages installed and configured in a common way.  Now that the
 329 cluster is up and running you might need to configure specialised
 330 subsystems like GPFS or Samba.  You can do this by hand or use the
 331 sample scripts/configurations that are provided.
 332
 333 Now you can ssh into your nodes. You may like to look at the small set
 334 of scripts in /root/scripts on the nodes for some scripts. In
 335 particular:
 336
 337     mknsd.sh           :  sets up the local shared disks as GPFS NSDs
 338     setup_gpfs.sh      :  sets up GPFS, creates a filesystem etc
 339     setup_cluster.sh   :  sets up clustered Samba and other NAS services
 340     setup_tsm_server.sh:  run this on the TSM node to setup the TSM server
 341     setup_tsm_client.sh:  run this on the GPFS nodes to setup HSM
 342     setup_ad_server.sh :  run this on a node to setup a Samba4 AD
 343
 344 To setup a clustered NAS system you will normally need to run
 345 setup_gpfs.sh and setup_cluster.sh on one of the nodes.
 346
 347
 348 AUTOMATED CLUSTER CREATION
 349 ==========================
 350
 351 The last 2 steps can be automated.  An example script for doing this
 352 can be found in examples/create_cluster.sh.
 353
 354
 355 CONFIGURATION
 356 =============
 357
 358 Basics
 359 ======
 360
 361 Autocluster uses configuration files containing Unix shell style
 362 variables.  For example,
 363
 364   FIRSTIP=30
 365
 366 indicates that the last octet of the first IP address in the cluster
 367 will be 30.  If an option contains multiple words then they will be
 368 separated by underscores ('_'), as in:
 369
 370   ISO_DIR=/data/ISOs
 371
 372 All options have an equivalent command-line option, such
 373 as:
 374
 375   --firstip=30
 376
 377 Command-line options are lowercase.  Words are separated by dashes
 378 ('-'), as in:
 379
 380   --iso-dir=/data/ISOs
 381
 382 Normally you would use a configuration file with variables so that you
 383 can repeat steps easily.  The command-line equivalents are useful for
 384 trying things out without resorting to an editor.  You can specify a
 385 configuration file to use on the autocluster command-line using the -c
 386 option.  For example:
 387
 388   autocluster -c config-foo create base
 389
 390 If you don't provide a configuration variable then autocluster will
 391 look for a file called "config" in the current directory.
 392
 393 You can also use environment variables to override the default values
 394 of configuration variables.  However, both command-line options and
 395 configuration file entries will override environment variables.
 396
 397 Potentially useful information:
 398
 399 * Use "autocluster --help" to list all available command-line options
 400   - all the items listed under "configuration options:" are the
 401   equivalents of the settings for config files.  This output also
 402   shows descriptions of the options.
 403
 404 * You can use the --dump option to check the current value of
 405   configuration variables.  This is most useful when used in
 406   combination with grep:
 407
 408     autocluster --dump | grep ISO_DIR
 409
 410   In the past we recommended using --dump to create initial
 411   configuration file.  Don't do this - it is a bad idea!  There are a
 412   lot of options and you'll create a huge file that you don't
 413   understand and can't debug!
 414
 415 * Configuration options are defined in config.d/*.defconf.  You
 416   shouldn't need to look in these files... but sometimes they contain
 417   comments about options that are too long to fit into help strings.
 418
 419 Keep it simple
 420 ==============
 421
 422 * I recommend that you aim for the smallest possible configuration file.
 423   Perhaps start with:
 424
 425     FIRSTIP=<whatever>
 426
 427   and move on from there.
 428
 429 * The NODES configuration variable controls the types of nodes that
 430   are created.  At the time of writing, the default value is:
 431
 432     NODES="sofs_front:0-3 rhel_base:4"
 433
 434   This means that you get 4 clustered NAS nodes, at IP offsets 0, 1,
 435   2, & 3 from FIRSTIP, all part of the CTDB cluster.  You also get an
 436   additional utility node at IP offset 4 that can be used, for
 437   example, as a test client.  Since sofs_* nodes are present, the base
 438   node will not be part of the CTDB cluster - it is just extra.
 439
 440   For many standard use cases the nodes specified by NODES can be
 441   modified by setting NUMNODES, WITH_SOFS_GUI and WITH_TSM_NODE.
 442   However, these options can't be used to create nodes without
 443   specifying IP offsets - except WITH_TSM_NODE, which checks to see if
 444   IP offset 0 is vacant.  Therefore, for many uses you can ignore the
 445   NODES variable.
 446
 447   However, NODES is the recommended mechanism for specifying the nodes
 448   that you want in your cluster.  It is powerful, easy to read and
 449   centralises the information in a single line of your configuration
 450   file.
 451
 452 iSCSI shared disks
 453 ==================
 454
 455 The RHEL5 version of KVM does not support the SCSI block device
 456 emulation.  Therefore, you can use either virtio or iSCSI shared
 457 disks.  Unfortunately, in RHEL5.4 and early versions of RHEL5.5,
 458 virtio block devices are not supported by the version of multipath in
 459 RHEL5.  So this leaves iSCSI as the only choice.
 460
 461 The main configuration options you need for iSCSI disks are:
 462
 463   SHARED_DISK_TYPE=iscsi
 464   NICMODEL=virtio        # Recommended for performance
 465   add_extra_package iscsi-initiator-utils
 466
 467 Note that SHARED_DISK_PREFIX and SHARED_DISK_CACHE are ignored for
 468 iSCSI shared disks because KVM doesn't (need to) know about them.
 469
 470 You will need to install the scsi-target-utils package on the host
 471 system.  After creating a cluster, autocluster will print a message
 472 that points you to a file tmp/iscsi.$CLUSTER - you need to run the
 473 commands in this file (probably via: sh tmp/iscsi.$CLUSTER) before
 474 booting your cluster.  This will remove any old target with the same
 475 ID, and create the new target, LUNs and ACLs.
 476
 477 You can use the following command to list information about the
 478 target:
 479
 480   tgtadm --lld iscsi --mode target --op show
 481
 482 If you need multiple clusters using iSCSI on the same host then each
 483 cluster will need to have a different setting for ISCSI_TID.
 484
 485 Raw IDE system disks
 486 ====================
 487
 488 RHEL versions of KVM do not support the SCSI block device emulation,
 489 so autocluster now defaults to using an IDE system disk instead of a
 490 SCSI one.  Therefore, you can use virtio or ide system disks.
 491 However, writeback caching, qcow2 and virtio are incompatible and
 492 result in I/O corruption.  So, you can use either virtio system disks
 493 without any caching, accepting reduced performance, or you can use IDE
 494 system disks with writeback caching, with nice performance.
 495
 496 For IDE disks, here are the required settings:
 497
 498   SYSTEM_DISK_TYPE=ide
 499   SYSTEM_DISK_PREFIX=hd
 500   SYSTEM_DISK_CACHE=writeback
 501
 502 The next problem is that RHEL5's KVM does not include qemu-nbd.  The
 503 best solution is to build your own qemu-nbd and stop reading this
 504 section.
 505
 506 If, for whatever reason, you're unable to build your own qemu-nbd,
 507 then you can use raw, rather than qcow2, system disks.  If you do this
 508 then you need significantly more disk space (since the system disks
 509 will be *copies* of the base image) and cluster creation time will no
 510 longer be pleasantly snappy (due to the copying time - the images are
 511 large and a single copy can take several minutes).  So, having tried
 512 to warn you off this option, if you really want to do this then you'll
 513 need these settings:
 514
 515   SYSTEM_DISK_FORMAT=raw
 516   BASE_FORMAT=raw
 517
 518 Note that if you're testing cluster creation with iSCSI shared disks
 519 then you should find a way of switching off raw disks.  This avoids
 520 every iSCSI glitch costing you a lot of time while raw disks are
 521 copied.
 522
 523 DEVELOPMENT HINTS
 524 =================
 525
 526 The -e option provides support for executing arbitrary bash code.
 527 This is useful for testing and debugging.
 528
 529 One good use of this option is to test template substitution using the
 530 function substitute_vars().  For example:
 531
 532   ./autocluster -c example.autocluster -e 'CLUSTER=foo; DISK=foo.qcow2; UUID=abcdef; NAME=foon1; set_macaddrs; substitute_vars templates/node.xml'
 533
 534 This prints templates/node.xml with all appropriate substitutions
 535 done.  Some internal variables (e.g. CLUSTER, DISK, UUID, NAME) are
 536 given fairly arbitrary values but the various MAC address strings are
 537 set using the function set_macaddrs().
 538
 539 The -e option is also useful when writing scripts that use
 540 autocluster.  Given the complexities of the configuration system you
 541 probably don't want to parse configuration files yourself to determine
 542 the current settings.  Instead, you can ask autocluster to tell you
 543 useful pieces of information.  For example, say you want to script
 544 creating a base disk image and you want to ensure the image is
 545 marked immutable:
 546
 547   base_image=$(autocluster -c $CONFIG -e 'echo $VIRTBASE/$BASENAME.img')
 548   chattr -V -i "$base_image"
 549
 550   if autocluster -c $CONFIG create base ; then
 551     chattr -V +i "$base_image"
 552     ...
 553
 554 Note that the command that autocluster should run is enclosed in
 555 single quotes.  This means that $VIRTBASE and $BASENAME will be expand
 556 within autocluster after the configuration file has been loaded.