INTRODUCTION
============

Autocluster is set of scripts for building virtual clusters to test
clustered Samba.  It uses Linux's libvirt and KVM virtualisation
engine.

Autocluster is a collection of scripts, template and configuration
files that allow you to create a cluster of virtual nodes very
quickly.  You can create a cluster from scratch in less than 30
minutes.  Once you have a base image you can then recreate a cluster
or create new virtual clusters in minutes.

Autocluster has recently been tested to create virtual clusters of
RHEL 6/7 nodes.  Older versions were tested with RHEL 5 and some
versions of CentOS.


CONTENTS
========

* INSTALLING AUTOCLUSTER

* HOST MACHINE SETUP

* CREATING A CLUSTER

* BOOTING A CLUSTER

* POST-CREATION SETUP

* CONFIGURATION

* DEVELOPMENT HINTS


INSTALLING AUTOCLUSTER
======================

Before you start, make sure you have the latest version of
autocluster. To download autocluster do this:

  git clone git://git.samba.org/autocluster.git

Or to update it, run "git pull" in the autocluster directory

You probably want to add the directory where autocluster is installed
to your PATH, otherwise things may quickly become tedious.


HOST MACHINE SETUP
==================

This section explains how to setup a host machine to run virtual
clusters generated by autocluster.


 1) Install and configure required software.

 a) Install kvm, libvirt and expect.

    Autocluster creates virtual machines that use libvirt to run under
    KVM.  This means that you will need to install both KVM and
    libvirt on your host machine.  Expect is used by the waitfor()
    function and should be available for installation from your
    distribution.

    For various distros:

    * RHEL/CentOS

      Autocluster should work with the standard RHEL qemu-kvm and
      libvirt packages.  It will try to find the qemu-kvm binary.  If
      you've done something unusual then you'll need to set the KVM
      configuration variable.

      For RHEL5/CentOS5, useful packages for both kvm and libvirt used
      to be found here:

        http://www.lfarkas.org/linux/packages/centos/5/x86_64/

      However, since recent versions of RHEL5 ship with KVM, 3rd party
      KVM RPMs for RHEL5 are now scarce.

      RHEL5.4's KVM also has problems when autocluster uses virtio
      shared disks, since multipath doesn't notice virtio disks.  This
      is fixed in RHEL5.6 and in a recent RHEL5.5 update - you should
      be able to use the settings recommended above for RHEL6.

      If you're still running RHEL5.4, you have lots of time, you have
      lots of disk space, and you like complexity, then see the
      sections below on "iSCSI shared disks" and "Raw IDE system
      disks".  :-)

    * Fedora

      Useful packages ship with Fedora Core 10 (Cambridge) and later.
      Some of the above notes on RHEL might apply to Fedora's KVM.

    * Ubuntu

      Useful packages ship with Ubuntu 8.10 (Intrepid Ibex) and later.
      In recent Ubuntu versions (e.g. 10.10 Maverick Meerkat) the KVM
      package is called "qemu-kvm".  Older versions have a package
      called "kvm".

    For other distributions you'll have to backport distro sources or
    compile from upstream source as described below.

    * For KVM see the "Downloads" and "Code" sections at:

        http://www.linux-kvm.org/

    * For libvirt see:

        http://libvirt.org/

 b) Install guestfish or qemu-nbd and nbd-client.

    Autocluster needs a method of updating files in the disk image for
    each node.

    Recent Linux distributions, including RHEL since 6.0, contain
    guestfish.  Guestfish (see http://libguestfs.org/ - there are
    binary packages for several distros here) is a CLI for
    manipulating KVM/QEMU disk images.  Autocluster supports
    guestfish, so if guestfish is available then you should use it.
    It should be more reliable than NBD.

    Autocluster attempts to use the best available method (guestmount
    -> guestfish -> loopback) for accessing disk image.  If it chooses
    a suboptimal method (e.g. nodes created with guestmount sometimes
    won't boot), you can force the method:

      SYSTEM_DISK_ACCESS_METHOD=guestfish

    If you can't use guestfish then you'll have to use NBD.  For this
    you will need the qemu-nbd and nbd-client programs, which
    autocluster uses to loopback-nbd-mount the disk images when
    configuring each node.

    NBD for various distros:

    * RHEL/CentOS

      qemu-nbd is only available in the old packages from lfarkas.org.
      Recompiling the RHEL5 kvm package to support NBD is quite
      straightforward.  RHEL6 doesn't have an NBD kernel module, so is
      harder to retrofit for NBD support - use guestfish instead.

      Unless you can find an RPM for nbd-client then you need to
      download source from:
 
        http://sourceforge.net/projects/nbd/

      and build it.

    * Fedora Core

      qemu-nbd is in the qemu-kvm or kvm package.

      nbd-client is in the nbd package.

    * Ubuntu

      qemu-nbd is in the qemu-kvm or kvm package.  In older releases
      it is called kvm-nbd, so you need to set the QEMU_NBD
      configuration variable.

      nbd-client is in the nbd-client package.

    * As mentioned above, nbd can be found at:

        http://sourceforge.net/projects/nbd/

 c) Environment and libvirt virtual networks

    You will need to add the autocluster directory to your PATH.

    You will need to configure the right libvirt networking setup. To
    do this, run:

      host_setup/setup_networks.sh [ <myconfig> ]

    If you're using a network setup different to the default then pass
    your autocluster configuration filename, which should set the
    NETWORKS variable.  If you're using a variety of networks for
    different clusters then you can probably run this script multiple
    times.

    You might also need to set:

      VIRSH_DEFAULT_CONNECT_URI=qemu:///system

    in your environment so that virsh does KVM/QEMU things by default.

 2) Configure a local web/install server to provide required YUM
    repositories

    If your install server is far away then you may need a caching web
    proxy on your local network.

    If you don't have one, then you can install a squid proxy on your
    host amd set:

      WEBPROXY="http://10.0.0.1:3128/"

    See host_setup/etc/squid/squid.conf for a sample config suitable
    for a virtual cluster. Make sure it caches large objects and has
    plenty of space. This will be needed to make downloading all the
    RPMs to each client sane

    To test your squid setup, run a command like this:

      http_proxy=http://10.0.0.1:3128/ wget <some-url>

    Check your firewall setup.  If you have problems accessing the
    proxy from your nodes (including from kickstart postinstall) then
    check it again!  Some distributions install nice "convenient"
    firewalls by default that might block access to the squid port
    from the nodes.  On a current version of Fedora Core you may be
    able to run system-config-firewall-tui to reconfigure the
    firewall.

 3) Setup a DNS server on your host. See host_setup/etc/bind/ for a
    sample config that is suitable. It needs to redirect DNS queries
    for your virtual domain to your windows domain controller.

 4) Download a RHEL (or CentOS) install ISO.


CREATING A CLUSTER
==================

A cluster comprises a single base disk image, a copy-on-write disk
image for each node and some XML files that tell libvirt about each
node's virtual hardware configuration.  The copy-on-write disk images
save a lot of disk space on the host machine because they each use the
base disk image - without them the disk image for each cluster node
would need to contain the entire RHEL install.

The cluster creation process can be broken down into several main
steps:

 1) Create a base disk image.

 2) Create per-node disk images and corresponding XML files.

 3) Update /etc/hosts to include cluster nodes.

 4) Boot virtual machines for the nodes.

 5) Post-boot configuration.

However, before you do this you will need to create a configuration
file.  See the "CONFIGURATION" section below for more details.

Here are more details on the "create cluster" process.  Note that
unless you have done something extra special then you'll need to run
all of this as root.

 1) Create the base disk image using:

      ./autocluster base create

    The first thing this step does is to check that it can connect to
    the YUM server.  If this fails make sure that there are no
    firewalls blocking your access to the server.

    The install will take about 10 to 15 minutes and you will see the
    packages installing in your terminal

    The installation process uses kickstart.  The choice of
    postinstall script is set using the POSTINSTALL_TEMPLATE variable.
    This can be used to install packages that will be common to all
    nodes into the base image.  This save time later when you're
    setting up the cluster nodes.  However, current usage (given that
    we test many versions of CTDB) is to default POSTINSTALL_TEMPLATE
    to "" and install packages post-boot.  This seems to be a
    reasonable compromise between flexibility (the base image can be,
    for example, a pristine RHEL7.0-base.qcow2, CTDB/Samba packages
    are selected post-base creation) and speed of cluster creation.

    When that has finished you should mark that base image immutable
    like this:

      chattr +i /virtual/ac-base.img

    That will ensure it won't change. This is a precaution as the
    image will be used as a basis file for the per-node images, and if
    it changes your cluster will become corrupt

 2-5)
    Now run "autocluster cluster build", specifying a configuration
    file. For example:

      autocluster -c m1.autocluster cluster build

    This will create and install the XML node descriptions and the
    disk images for your cluster nodes, and any other nodes you have
    configured.  Each disk image is initially created as an "empty"
    copy-on-write image, which is linked to the base image.  Those
    images are then attached to using guestfish or
    loopback-nbd-mounted, and populated with system configuration
    files and other potentially useful things (such as scripts).
    /etc/hosts is updated, the cluster is booted and post-boot
    setup is done.

    Instead of doing all of the steps 2-5 using 1 command you call do:

    2) autocluster -c m1.autocluster cluster create
 
    3) autocluster -c m1.autocluster cluster update_hosts

    4) autocluster -c m1.autocluster cluster boot

    5) autocluster -c m1.autocluster cluster setup

BOOTING/DESTROY A CLUSTER
=========================

Autocluster provides a command called "vircmd", which is a thin
wrapper around libvirt's virsh command.  vircmd takes a cluster name
instead of a node/domain name and runs the requested command on all
nodes in the cluster.

    The most useful vircmd commands are:
 
      start    : boot a cluster
      shutdown : graceful shutdown of a cluster
      destroy  : power off a cluster immediately

    You can watch boot progress like this:

       tail -f /var/log/kvm/serial.c1*

    All the nodes have serial consoles, making it easier to capture
    kernel panic messages and watch the nodes via ssh


POST-BOOT SETUP
===============

Autocluster copies some scripts to cluster nodes to enable post-boot
configuration.  These are used to configure specialised subsystems
like GPFS or Samba, and are installed in /root/scripts/ on each node.
The main entry point is cluster_setup.sh, which invokes specialised
scripts depending on the cluster filesystem type or the node type.
cluster_setup.sh is invoked by the cluster_setup() function in
autocluster.

See cluster_setup() if you want to do things manually or if you want
to add support for other node types and/or cluster filesystems.

There are also some older scripts that haven't been used for a while
and have probably bit-rotted, such as setup_tsm_client.sh and
setup_tsm_server.sh.  However, they are still provided as examples.

CONFIGURATION
=============

Basics
======

Autocluster uses configuration files containing Unix shell style
variables.  For example,

  FIRSTIP=30

indicates that the last octet of the first IP address in the cluster
will be 30.  If an option contains multiple words then they will be
separated by underscores ('_'), as in:

  ISO_DIR=/data/ISOs

All options have an equivalent command-line option, such
as:

  --firstip=30

Command-line options are lowercase.  Words are separated by dashes
('-'), as in:

  --iso-dir=/data/ISOs

Normally you would use a configuration file with variables so that you
can repeat steps easily.  The command-line equivalents are useful for
trying things out without resorting to an editor.  You can specify a
configuration file to use on the autocluster command-line using the -c
option.  For example:

  autocluster -c config-foo create base

If you don't provide a configuration variable then autocluster will
look for a file called "config" in the current directory.

You can also use environment variables to override the default values
of configuration variables.  However, both command-line options and
configuration file entries will override environment variables.

Potentially useful information:

* Use "autocluster --help" to list all available command-line options
  - all the items listed under "configuration options:" are the
  equivalents of the settings for config files.  This output also
  shows descriptions of the options.

* You can use the --dump option to check the current value of
  configuration variables.  This is most useful when used in
  combination with grep:

    autocluster --dump | grep ISO_DIR

  In the past we recommended using --dump to create initial
  configuration file.  Don't do this - it is a bad idea!  There are a
  lot of options and you'll create a huge file that you don't
  understand and can't debug!

* Configuration options are defined in config.d/*.defconf.  You
  shouldn't need to look in these files... but sometimes they contain
  comments about options that are too long to fit into help strings.

Keep it simple
==============

* I recommend that you aim for the smallest possible configuration file.
  Perhaps start with:

    FIRSTIP=<whatever>

  and move on from there.

* The NODES configuration variable controls the types of nodes that
  are created.  At the time of writing, the default value is:

    NODES="nas:0-3 rhel_base:4"

  This means that you get 4 clustered NAS nodes, at IP offsets 0, 1,
  2, & 3 from FIRSTIP, all part of the CTDB cluster.  You also get an
  additional utility node at IP offset 4 that can be used, for
  example, as a test client.  The base node will not be part of the
  CTDB cluster.  It is just extra node that can be used as a test
  client or similar.

Corrupt system disks
====================

Recent versions of KVM seem to have fixed problems where the
combination of qcow2 file format, virtio block devices and writeback
caching would cause result in corrupt.  This means the default system
disk bus type (a.k.a. SYSTEM_DISK_TYPE) is now virtio.

If using an older version of KVM or if you experience corruption of
the system disk, try using IDE system disks:

  SYSTEM_DISK_TYPE=ide

iSCSI shared disks
==================

The RHEL5 version of KVM does not support the SCSI block device
emulation.  Therefore, you can use either virtio or iSCSI shared
disks.  Unfortunately, in RHEL5.4 and early versions of RHEL5.5,
virtio block devices are not supported by the version of multipath in
RHEL5.  So this leaves iSCSI as the only choice.

The main configuration options you need for iSCSI disks are:

  SHARED_DISK_TYPE=iscsi
  NICMODEL=virtio        # Recommended for performance
  add_extra_package iscsi-initiator-utils

Note that SHARED_DISK_PREFIX and SHARED_DISK_CACHE are ignored for
iSCSI shared disks because KVM doesn't (need to) know about them.

You will need to install the scsi-target-utils package on the host
system.  After creating a cluster, autocluster will print a message
that points you to a file tmp/iscsi.$CLUSTER - you need to run the
commands in this file (probably via: sh tmp/iscsi.$CLUSTER) before
booting your cluster.  This will remove any old target with the same
ID, and create the new target, LUNs and ACLs.

You can use the following command to list information about the
target:

  tgtadm --lld iscsi --mode target --op show

If you need multiple clusters using iSCSI on the same host then each
cluster will need to have a different setting for ISCSI_TID.

Raw IDE system disks
====================

Older RHEL versions of KVM did not support the SCSI block device
emulation, and produced corruption when virtio disks were used with
qcow2 disk images and writeback caching.  In this case, you can use
either virtio system disks without any caching, accepting reduced
performance, or you can use IDE system disks with writeback caching,
with nice performance.

For IDE disks, here are the required settings:

  SYSTEM_DISK_TYPE=ide
  SYSTEM_DISK_PREFIX=hd
  SYSTEM_DISK_CACHE=writeback

The next problem is that RHEL5's KVM does not include qemu-nbd.  The
best solution is to build your own qemu-nbd and stop reading this
section.

If, for whatever reason, you're unable to build your own qemu-nbd,
then you can use raw, rather than qcow2, system disks.  If you do this
then you need significantly more disk space (since the system disks
will be *copies* of the base image) and cluster creation time will no
longer be pleasantly snappy (due to the copying time - the images are
large and a single copy can take several minutes).  So, having tried
to warn you off this option, if you really want to do this then you'll
need these settings:

  SYSTEM_DISK_FORMAT=raw
  BASE_FORMAT=raw

Note that if you're testing cluster creation with iSCSI shared disks
then you should find a way of switching off raw disks.  This avoids
every iSCSI glitch costing you a lot of time while raw disks are
copied.

DEVELOPMENT HINTS
=================

The -e option provides support for executing arbitrary bash code.
This is useful for testing and debugging.

One good use of this option is to test template substitution using the
function substitute_vars().  For example:

  ./autocluster -c example.autocluster -e 'CLUSTER=foo; DISK=foo.qcow2; UUID=abcdef; NAME=foon1; set_macaddrs; substitute_vars templates/node.xml'

This prints templates/node.xml with all appropriate substitutions
done.  Some internal variables (e.g. CLUSTER, DISK, UUID, NAME) are
given fairly arbitrary values but the various MAC address strings are
set using the function set_macaddrs().

The -e option is also useful when writing scripts that use
autocluster.  Given the complexities of the configuration system you
probably don't want to parse configuration files yourself to determine
the current settings.  Instead, you can ask autocluster to tell you
useful pieces of information.  For example, say you want to script
creating a base disk image and you want to ensure the image is
marked immutable:

  base_image=$(autocluster -c $CONFIG -e 'echo $VIRTBASE/$BASENAME.img')
  chattr -V -i "$base_image"

  if autocluster -c $CONFIG create base ; then
    chattr -V +i "$base_image"
    ...

Note that the command that autocluster should run is enclosed in
single quotes.  This means that $VIRTBASE and $BASENAME will be expand
within autocluster after the configuration file has been loaded.