-BASIC SETUP
-===========
+INTRODUCTION
+============
+
+Autocluster is set of scripts for building virtual clusters to test
+clustered Samba. It uses Linux's libvirt and KVM virtualisation
+engine.
+
+Autocluster is a collection of scripts, template and configuration
+files that allow you to create a cluster of virtual nodes very
+quickly. You can create a cluster from scratch in less than 30
+minutes. Once you have a base image you can then recreate a cluster
+or create new virtual clusters in minutes.
+
+Autocluster has recently been tested to create virtual clusters of
+RHEL 6/7 nodes. Older versions were tested with RHEL 5 and some
+versions of CentOS.
+
+
+CONTENTS
+========
+
+* INSTALLING AUTOCLUSTER
+
+* HOST MACHINE SETUP
+
+* CREATING A CLUSTER
+
+* BOOTING A CLUSTER
+
+* POST-CREATION SETUP
+
+* CONFIGURATION
+
+* DEVELOPMENT HINTS
+
+
+INSTALLING AUTOCLUSTER
+======================
Before you start, make sure you have the latest version of
autocluster. To download autocluster do this:
- git clone git://git.samba.org/tridge/autocluster.git autocluster
+ git clone git://git.samba.org/autocluster.git
Or to update it, run "git pull" in the autocluster directory
-To setup a virtual cluster for SoFS with autocluster follow these steps:
+You probably want to add the directory where autocluster is installed
+to your PATH, otherwise things may quickly become tedious.
+
+
+HOST MACHINE SETUP
+==================
+
+This section explains how to setup a host machine to run virtual
+clusters generated by autocluster.
+
+
+ 1) Install and configure required software.
+
+ a) Install kvm, libvirt and expect.
+
+ Autocluster creates virtual machines that use libvirt to run under
+ KVM. This means that you will need to install both KVM and
+ libvirt on your host machine. Expect is used by the waitfor()
+ function and should be available for installation from your
+ distribution.
+
+ For various distros:
+
+ * RHEL/CentOS
+
+ Autocluster should work with the standard RHEL qemu-kvm and
+ libvirt packages. It will try to find the qemu-kvm binary. If
+ you've done something unusual then you'll need to set the KVM
+ configuration variable.
+
+ For RHEL5/CentOS5, useful packages for both kvm and libvirt used
+ to be found here:
+
+ http://www.lfarkas.org/linux/packages/centos/5/x86_64/
+
+ However, since recent versions of RHEL5 ship with KVM, 3rd party
+ KVM RPMs for RHEL5 are now scarce.
+
+ RHEL5.4's KVM also has problems when autocluster uses virtio
+ shared disks, since multipath doesn't notice virtio disks. This
+ is fixed in RHEL5.6 and in a recent RHEL5.5 update - you should
+ be able to use the settings recommended above for RHEL6.
+
+ If you're still running RHEL5.4, you have lots of time, you have
+ lots of disk space, and you like complexity, then see the
+ sections below on "iSCSI shared disks" and "Raw IDE system
+ disks". :-)
+
+ * Fedora
+
+ Useful packages ship with Fedora Core 10 (Cambridge) and later.
+ Some of the above notes on RHEL might apply to Fedora's KVM.
+
+ * Ubuntu
+
+ Useful packages ship with Ubuntu 8.10 (Intrepid Ibex) and later.
+ In recent Ubuntu versions (e.g. 10.10 Maverick Meerkat) the KVM
+ package is called "qemu-kvm". Older versions have a package
+ called "kvm".
+
+ For other distributions you'll have to backport distro sources or
+ compile from upstream source as described below.
+
+ * For KVM see the "Downloads" and "Code" sections at:
+
+ http://www.linux-kvm.org/
+
+ * For libvirt see:
+
+ http://libvirt.org/
+
+ b) Install guestfish or qemu-nbd and nbd-client.
+
+ Autocluster needs a method of updating files in the disk image for
+ each node.
+
+ Recent Linux distributions, including RHEL since 6.0, contain
+ guestfish. Guestfish (see http://libguestfs.org/ - there are
+ binary packages for several distros here) is a CLI for
+ manipulating KVM/QEMU disk images. Autocluster supports
+ guestfish, so if guestfish is available then you should use it.
+ It should be more reliable than NBD.
+
+ Autocluster attempts to use the best available method (guestmount
+ -> guestfish -> loopback) for accessing disk image. If it chooses
+ a suboptimal method (e.g. nodes created with guestmount sometimes
+ won't boot), you can force the method:
+
+ SYSTEM_DISK_ACCESS_METHOD=guestfish
+
+ If you can't use guestfish then you'll have to use NBD. For this
+ you will need the qemu-nbd and nbd-client programs, which
+ autocluster uses to loopback-nbd-mount the disk images when
+ configuring each node.
+
+ NBD for various distros:
+
+ * RHEL/CentOS
+
+ qemu-nbd is only available in the old packages from lfarkas.org.
+ Recompiling the RHEL5 kvm package to support NBD is quite
+ straightforward. RHEL6 doesn't have an NBD kernel module, so is
+ harder to retrofit for NBD support - use guestfish instead.
+
+ Unless you can find an RPM for nbd-client then you need to
+ download source from:
+
+ http://sourceforge.net/projects/nbd/
+
+ and build it.
+
+ * Fedora Core
+
+ qemu-nbd is in the qemu-kvm or kvm package.
+
+ nbd-client is in the nbd package.
+
+ * Ubuntu
+
+ qemu-nbd is in the qemu-kvm or kvm package. In older releases
+ it is called kvm-nbd, so you need to set the QEMU_NBD
+ configuration variable.
+
+ nbd-client is in the nbd-client package.
+
+ * As mentioned above, nbd can be found at:
+
+ http://sourceforge.net/projects/nbd/
+
+ c) Environment and libvirt virtual networks
+
+ You will need to add the autocluster directory to your PATH.
+
+ You will need to configure the right libvirt networking setup. To
+ do this, run:
+ host_setup/setup_networks.sh [ <myconfig> ]
- 1) download and install the latest kvm-userspace and kvm tools
- from http://kvm.qumranet.com/kvmwiki/Code
+ If you're using a network setup different to the default then pass
+ your autocluster configuration filename, which should set the
+ NETWORKS variable. If you're using a variety of networks for
+ different clusters then you can probably run this script multiple
+ times.
- You need a x86_64 Linux box to run this on. I use a Ubuntu Hardy
- system. It also needs plenty of memory - at least 3G to run a SoFS
- cluster.
+ You might also need to set:
- You may also find you need a newer version of libvirt. If you get
- an error when running create_base.sh about not handling a device
- named 'sda' then you need a newer libvirt. Get it like this:
+ VIRSH_DEFAULT_CONNECT_URI=qemu:///system
- git clone git://git.et.redhat.com/libvirt.git
+ in your environment so that virsh does KVM/QEMU things by default.
- When building it, you probably want to configure it like this:
+ 2) Configure a local web/install server to provide required YUM
+ repositories
- ./configure --without-xen --prefix=/usr
+ If your install server is far away then you may need a caching web
+ proxy on your local network.
- You will need to configure the right kvm networking setup. The
- files in host_setup/etc/libvirt/qemu/networks/ should help. This
- command will install the right networks for kvm:
+ If you don't have one, then you can install a squid proxy on your
+ host amd set:
- rsync -av --delete host_setup/etc/libvirt/qemu/networks/ /etc/libvirt/qemu/networks/
+ WEBPROXY="http://10.0.0.1:3128/"
- 2) You need a cacheing web proxy on your local network. If you don't
- have one, then install a squid proxy on your host. See
- host_setup/etc/squid/squid.conf for a sample config suitable for a
- virtual cluster. Make sure it caches large objects and has plenty
- of space. This will be needed to make downloading all the RPMs to
- each client sane
+ See host_setup/etc/squid/squid.conf for a sample config suitable
+ for a virtual cluster. Make sure it caches large objects and has
+ plenty of space. This will be needed to make downloading all the
+ RPMs to each client sane
To test your squid setup, run a command like this:
- http_proxy=http://10.0.0.1:3128/ wget http://9.155.61.11/mediasets/SoFS-daily/
+ http_proxy=http://10.0.0.1:3128/ wget <some-url>
+ Check your firewall setup. If you have problems accessing the
+ proxy from your nodes (including from kickstart postinstall) then
+ check it again! Some distributions install nice "convenient"
+ firewalls by default that might block access to the squid port
+ from the nodes. On a current version of Fedora Core you may be
+ able to run system-config-firewall-tui to reconfigure the
+ firewall.
- 3) setup a DNS server on your host. See host_setup/etc/bind/ for a
+ 3) Setup a DNS server on your host. See host_setup/etc/bind/ for a
sample config that is suitable. It needs to redirect DNS queries
- for your SOFS virtual domain to your windows domain controller
+ for your virtual domain to your windows domain controller.
+ 4) Download a RHEL (or CentOS) install ISO.
- 4) download a RHEL-5.2 install ISO. You can get it from the install
- server in Mainz. See the FSCC wiki page on autocluster for
- details.
- 5) create a 'config' file in the autocluster directory. See the
- "CONFIGURATION" section below for more details.
+CREATING A CLUSTER
+==================
+
+A cluster comprises a single base disk image, a copy-on-write disk
+image for each node and some XML files that tell libvirt about each
+node's virtual hardware configuration. The copy-on-write disk images
+save a lot of disk space on the host machine because they each use the
+base disk image - without them the disk image for each cluster node
+would need to contain the entire RHEL install.
+
+The cluster creation process can be broken down into several main
+steps:
+
+ 1) Create a base disk image.
+
+ 2) Create per-node disk images and corresponding XML files.
+
+ 3) Update /etc/hosts to include cluster nodes.
+
+ 4) Boot virtual machines for the nodes.
+
+ 5) Post-boot configuration.
+
+However, before you do this you will need to create a configuration
+file. See the "CONFIGURATION" section below for more details.
+
+Here are more details on the "create cluster" process. Note that
+unless you have done something extra special then you'll need to run
+all of this as root.
+
+ 1) Create the base disk image using:
+
+ ./autocluster base create
+
+ The first thing this step does is to check that it can connect to
+ the YUM server. If this fails make sure that there are no
+ firewalls blocking your access to the server.
- 6) use "./autocluster create base" to create the base install image.
The install will take about 10 to 15 minutes and you will see the
packages installing in your terminal
- Before you start create base make sure your web proxy cache is
- authenticated with the Mainz BSO (eg. connect to
- https://9.155.61.11 with a web browser)
+ The installation process uses kickstart. The choice of
+ postinstall script is set using the POSTINSTALL_TEMPLATE variable.
+ This can be used to install packages that will be common to all
+ nodes into the base image. This save time later when you're
+ setting up the cluster nodes. However, current usage (given that
+ we test many versions of CTDB) is to default POSTINSTALL_TEMPLATE
+ to "" and install packages post-boot. This seems to be a
+ reasonable compromise between flexibility (the base image can be,
+ for example, a pristine RHEL7.0-base.qcow2, CTDB/Samba packages
+ are selected post-base creation) and speed of cluster creation.
+ When that has finished you should mark that base image immutable
+ like this:
- 7) when that has finished I recommend you mark that base image
- immutable like this:
-
- chattr +i /virtual/SoFS-1.5-base.img
+ chattr +i /virtual/ac-base.img
That will ensure it won't change. This is a precaution as the
image will be used as a basis file for the per-node images, and if
it changes your cluster will become corrupt
+ 2-5)
+ Now run "autocluster cluster build", specifying a configuration
+ file. For example:
- 8) now run "./autocluster create cluster" specifying a cluster
- name. For example:
+ autocluster -c m1.autocluster cluster build
- ./autocluster create cluster c1
+ This will create and install the XML node descriptions and the
+ disk images for your cluster nodes, and any other nodes you have
+ configured. Each disk image is initially created as an "empty"
+ copy-on-write image, which is linked to the base image. Those
+ images are then attached to using guestfish or
+ loopback-nbd-mounted, and populated with system configuration
+ files and other potentially useful things (such as scripts).
+ /etc/hosts is updated, the cluster is booted and post-boot
+ setup is done.
- That will create your cluster nodes and the TSM server node
+ Instead of doing all of the steps 2-5 using 1 command you call do:
+ 2) autocluster -c m1.autocluster cluster create
+
+ 3) autocluster -c m1.autocluster cluster update_hosts
- 9) now boot your cluster nodes like this:
+ 4) autocluster -c m1.autocluster cluster boot
- ./vircmd start c1
+ 5) autocluster -c m1.autocluster cluster setup
- The most useful vircmd commands are:
-
- start : boot a node
- shutdown : graceful shutdown of a node
- destroy : power off a node immediately
+BOOTING/DESTROY A CLUSTER
+=========================
+Autocluster provides a command called "vircmd", which is a thin
+wrapper around libvirt's virsh command. vircmd takes a cluster name
+instead of a node/domain name and runs the requested command on all
+nodes in the cluster.
- 10) you can watch boot progress like this:
+ The most useful vircmd commands are:
+
+ start : boot a cluster
+ shutdown : graceful shutdown of a cluster
+ destroy : power off a cluster immediately
+
+ You can watch boot progress like this:
tail -f /var/log/kvm/serial.c1*
kernel panic messages and watch the nodes via ssh
- 11) now you can ssh into your nodes. You may like to look at the
- small set of scripts in /root/scripts on the nodes for
- some scripts. In particular:
+POST-BOOT SETUP
+===============
- setup_tsm_server.sh: run this on the TSM node to setup the TSM server
- setup_tsm_client.sh: run this on the GPFS nodes to setup HSM
- mknsd.sh : this sets up the local shared disks as GPFS NSDs
- setup_gpfs.sh : this sets GPFS, creates a filesystem etc,
- byppassing the SoFS GUI. Useful for quick tests.
+Autocluster copies some scripts to cluster nodes to enable post-boot
+configuration. These are used to configure specialised subsystems
+like GPFS or Samba, and are installed in /root/scripts/ on each node.
+The main entry point is cluster_setup.sh, which invokes specialised
+scripts depending on the cluster filesystem type or the node type.
+cluster_setup.sh is invoked by the cluster_setup() function in
+autocluster.
+See cluster_setup() if you want to do things manually or if you want
+to add support for other node types and/or cluster filesystems.
- 12) If using the SoFS GUI, then you may want to lower the memory it
- uses so that it fits easily on the first node. Just edit this
- file on the first node:
+There are also some older scripts that haven't been used for a while
+and have probably bit-rotted, such as setup_tsm_client.sh and
+setup_tsm_server.sh. However, they are still provided as examples.
- /opt/IBM/sofs/conf/overrides/sofs.javaopt
+CONFIGURATION
+=============
+Basics
+======
- 13) For automating the SoFS GUI, you may wish to install the iMacros
- extension to firefox, and look at some sample macros I have put
- in the imacros/ directory of autocluster. They will need editing
- for your environment, but they should give you some hints on how
- to automate the final GUI stage of the installation of a SoFS
- cluster.
+Autocluster uses configuration files containing Unix shell style
+variables. For example,
-CONFIGURATION
-=============
+ FIRSTIP=30
+
+indicates that the last octet of the first IP address in the cluster
+will be 30. If an option contains multiple words then they will be
+separated by underscores ('_'), as in:
+
+ ISO_DIR=/data/ISOs
+
+All options have an equivalent command-line option, such
+as:
-* See config.sample for an example of a configuration file. Note that
- all items in the sample file are commented out by default
+ --firstip=30
-* Configuration options are defined in config.d/*.defconf. All
- configuration options have an equivalent command-line option.
+Command-line options are lowercase. Words are separated by dashes
+('-'), as in:
+
+ --iso-dir=/data/ISOs
+
+Normally you would use a configuration file with variables so that you
+can repeat steps easily. The command-line equivalents are useful for
+trying things out without resorting to an editor. You can specify a
+configuration file to use on the autocluster command-line using the -c
+option. For example:
+
+ autocluster -c config-foo create base
+
+If you don't provide a configuration variable then autocluster will
+look for a file called "config" in the current directory.
+
+You can also use environment variables to override the default values
+of configuration variables. However, both command-line options and
+configuration file entries will override environment variables.
+
+Potentially useful information:
* Use "autocluster --help" to list all available command-line options
- all the items listed under "configuration options:" are the
- equivalents of the settings for config files.
+ equivalents of the settings for config files. This output also
+ shows descriptions of the options.
+
+* You can use the --dump option to check the current value of
+ configuration variables. This is most useful when used in
+ combination with grep:
+
+ autocluster --dump | grep ISO_DIR
+
+ In the past we recommended using --dump to create initial
+ configuration file. Don't do this - it is a bad idea! There are a
+ lot of options and you'll create a huge file that you don't
+ understand and can't debug!
+
+* Configuration options are defined in config.d/*.defconf. You
+ shouldn't need to look in these files... but sometimes they contain
+ comments about options that are too long to fit into help strings.
+
+Keep it simple
+==============
+
+* I recommend that you aim for the smallest possible configuration file.
+ Perhaps start with:
+
+ FIRSTIP=<whatever>
+
+ and move on from there.
+
+* The NODES configuration variable controls the types of nodes that
+ are created. At the time of writing, the default value is:
+
+ NODES="nas:0-3 rhel_base:4"
+
+ This means that you get 4 clustered NAS nodes, at IP offsets 0, 1,
+ 2, & 3 from FIRSTIP, all part of the CTDB cluster. You also get an
+ additional utility node at IP offset 4 that can be used, for
+ example, as a test client. The base node will not be part of the
+ CTDB cluster. It is just extra node that can be used as a test
+ client or similar.
+
+Corrupt system disks
+====================
+
+Recent versions of KVM seem to have fixed problems where the
+combination of qcow2 file format, virtio block devices and writeback
+caching would cause result in corrupt. This means the default system
+disk bus type (a.k.a. SYSTEM_DISK_TYPE) is now virtio.
+
+If using an older version of KVM or if you experience corruption of
+the system disk, try using IDE system disks:
+
+ SYSTEM_DISK_TYPE=ide
+
+iSCSI shared disks
+==================
+
+The RHEL5 version of KVM does not support the SCSI block device
+emulation. Therefore, you can use either virtio or iSCSI shared
+disks. Unfortunately, in RHEL5.4 and early versions of RHEL5.5,
+virtio block devices are not supported by the version of multipath in
+RHEL5. So this leaves iSCSI as the only choice.
+
+The main configuration options you need for iSCSI disks are:
+
+ SHARED_DISK_TYPE=iscsi
+ NICMODEL=virtio # Recommended for performance
+ add_extra_package iscsi-initiator-utils
+
+Note that SHARED_DISK_PREFIX and SHARED_DISK_CACHE are ignored for
+iSCSI shared disks because KVM doesn't (need to) know about them.
+
+You will need to install the scsi-target-utils package on the host
+system. After creating a cluster, autocluster will print a message
+that points you to a file tmp/iscsi.$CLUSTER - you need to run the
+commands in this file (probably via: sh tmp/iscsi.$CLUSTER) before
+booting your cluster. This will remove any old target with the same
+ID, and create the new target, LUNs and ACLs.
+
+You can use the following command to list information about the
+target:
+
+ tgtadm --lld iscsi --mode target --op show
+
+If you need multiple clusters using iSCSI on the same host then each
+cluster will need to have a different setting for ISCSI_TID.
+
+Raw IDE system disks
+====================
+
+Older RHEL versions of KVM did not support the SCSI block device
+emulation, and produced corruption when virtio disks were used with
+qcow2 disk images and writeback caching. In this case, you can use
+either virtio system disks without any caching, accepting reduced
+performance, or you can use IDE system disks with writeback caching,
+with nice performance.
+
+For IDE disks, here are the required settings:
+
+ SYSTEM_DISK_TYPE=ide
+ SYSTEM_DISK_PREFIX=hd
+ SYSTEM_DISK_CACHE=writeback
+
+The next problem is that RHEL5's KVM does not include qemu-nbd. The
+best solution is to build your own qemu-nbd and stop reading this
+section.
+
+If, for whatever reason, you're unable to build your own qemu-nbd,
+then you can use raw, rather than qcow2, system disks. If you do this
+then you need significantly more disk space (since the system disks
+will be *copies* of the base image) and cluster creation time will no
+longer be pleasantly snappy (due to the copying time - the images are
+large and a single copy can take several minutes). So, having tried
+to warn you off this option, if you really want to do this then you'll
+need these settings:
+
+ SYSTEM_DISK_FORMAT=raw
+ BASE_FORMAT=raw
-* Run "autocluster --dump > config.foo" (or similar) to create a
- config file containing the default values for all options that you
- can set. You can then delete all options for which you wish to keep
- the default values and then modify the remaining ones, resulting in
- a relatively small config file.
+Note that if you're testing cluster creation with iSCSI shared disks
+then you should find a way of switching off raw disks. This avoids
+every iSCSI glitch costing you a lot of time while raw disks are
+copied.
-* Use the --with-release option on the command-line or the
- with_release function in a configuration file to get default values
- for building virtual clusters for releases of particular "products".
- Currently there are only release definitions for SoFS.
+DEVELOPMENT HINTS
+=================
- For example, you can setup default values for SoFS-1.5.3 by running:
+The -e option provides support for executing arbitrary bash code.
+This is useful for testing and debugging.
- autocluster --with-release=SoFS-1.5.3 ...
+One good use of this option is to test template substitution using the
+function substitute_vars(). For example:
- Equivalently you can use the following syntax in a configuration
- file:
+ ./autocluster -c example.autocluster -e 'CLUSTER=foo; DISK=foo.qcow2; UUID=abcdef; NAME=foon1; set_macaddrs; substitute_vars templates/node.xml'
+
+This prints templates/node.xml with all appropriate substitutions
+done. Some internal variables (e.g. CLUSTER, DISK, UUID, NAME) are
+given fairly arbitrary values but the various MAC address strings are
+set using the function set_macaddrs().
- with_release "SoFS-1.5.3"
+The -e option is also useful when writing scripts that use
+autocluster. Given the complexities of the configuration system you
+probably don't want to parse configuration files yourself to determine
+the current settings. Instead, you can ask autocluster to tell you
+useful pieces of information. For example, say you want to script
+creating a base disk image and you want to ensure the image is
+marked immutable:
- The release definitions are stored in releases/*.release. The
- available releases are listed in the output of "autocluster --help".
+ base_image=$(autocluster -c $CONFIG -e 'echo $VIRTBASE/$BASENAME.img')
+ chattr -V -i "$base_image"
- NOTE: Occasionally you will need to consider the position of
- with_release in your configuration. If you want to override options
- handled by a release definition then you will obviously need to set
- them later in your configuration. Some options will need to appear
- before with_release so that they can be used within a release
- definition - the most obvious one is the (rarely used) RHEL_ARCH
- option, which is used in the default ISO setting for each release.
+ if autocluster -c $CONFIG create base ; then
+ chattr -V +i "$base_image"
+ ...
+
+Note that the command that autocluster should run is enclosed in
+single quotes. This means that $VIRTBASE and $BASENAME will be expand
+within autocluster after the configuration file has been loaded.