INTRODUCTION ============ Autocluster is set of scripts for building virtual clusters to test clustered Samba. It uses Linux's libvirt and KVM virtualisation engine. Autocluster is a collection of scripts, template and configuration files that allow you to create a cluster of virtual nodes very quickly. You can create a cluster from scratch in less than 30 minutes. Once you have a base image you can then recreate a cluster or create new virtual clusters in minutes. The current implementation creates virtual clusters of RHEL5 nodes. CONTENTS ======== * INSTALLING AUTOCLUSTER * HOST MACHINE SETUP * CREATING A CLUSTER * BOOTING A CLUSTER * POST-CREATION SETUP * CONFIGURATION * DEVELOPMENT HINTS INSTALLING AUTOCLUSTER ====================== Before you start, make sure you have the latest version of autocluster. To download autocluster do this: git clone git://git.samba.org/tridge/autocluster.git autocluster Or to update it, run "git pull" in the autocluster directory You probably want to add the directory where autocluster is installed to your PATH, otherwise things may quickly become tedious. HOST MACHINE SETUP ================== This section explains how to setup a host machine to run virtual clusters generated by autocluster. 1) Install kvm, libvirt, qemu-nbd, nbd-client and expect. Autocluster creates virtual machines that use libvirt to run under KVM. This means that you will need to install both KVM and libvirt on your host machine. You will also need the qemu-nbd and nbd-client programs, which autocluster uses to loopback-nbd-mount the disk images when configuring each node. Expect is used by the "waitfor" script and should be available for installation form your distribution. For various distros: * RHEL/CentOS For RHEL5/CentOS5, useful packages for both kvm and libvirt used to be found here: http://www.lfarkas.org/linux/packages/centos/5/x86_64/ However, since recent versions of RHEL5 ship with KVM, 3rd party KVM RPMs for RHEL5 are now scarce. RHEL5.4 ships with KVM but it doesn't have the SCSI disk emulation that autocluster uses by default. There are also problems when autocluster uses virtio on RHEL5.4's KVM. See the sections below on "iSCSI shared disks" and "Raw IDE system disks". Also, to use the RHEL5 version of KVM you will need to set KVM=/usr/libexec/qemu-kvm in your configuration file. Unless you can find an RPM for nbd-client then you need to download source from: http://sourceforge.net/projects/nbd/ and build it. * Fedora Core Useful packages ship with Fedora Core 10 (Cambridge) and later. qemu-nbd is in the kvm package. nbd-client is in the nbd package. * Ubuntu Useful packages ship with Ubuntu 8.10 (Intrepid Ibex) and later. qemu-nbd is in the kvm package but is called kvm-nbd, so you need to set the QEMU_NBD configuration variable. nbd-client is in the nbd-client package. For other distributions you'll have to backport distro sources or compile from upstream source as described below. * For KVM see the "Downloads" and "Code" sections at: http://www.linux-kvm.org/ * For libvirt see: http://libvirt.org/ * As mentioned about, nbd can be found at: http://sourceforge.net/projects/nbd/ You will need to add the autocluster directory to your PATH. You will need to configure the right kvm networking setup. The files in host_setup/etc/libvirt/qemu/networks/ should help. This command will install the right networks for kvm: rsync -av --delete host_setup/etc/libvirt/qemu/networks/ /etc/libvirt/qemu/networks/ After this you might need to reload libvirt: /etc/init.d/libvirt reload or similar. You might also need to set: VIRSH_DEFAULT_CONNECT_URI=qemu:///system in your environment so that virsh does KVM/QEMU things by default. 2) You need a caching web proxy on your local network. If you don't have one, then install a squid proxy on your host. See host_setup/etc/squid/squid.conf for a sample config suitable for a virtual cluster. Make sure it caches large objects and has plenty of space. This will be needed to make downloading all the RPMs to each client sane To test your squid setup, run a command like this: http_proxy=http://10.0.0.1:3128/ wget Check your firewall setup. If you have problems accessing the proxy from your nodes (including from kickstart postinstall) then check it again! Some distributions install nice "convenient" firewalls by default that might block access to the squid port from the nodes. On a current version of Fedora Core you may be able to run system-config-firewall-tui to reconfigure the firewall. 3) Setup a DNS server on your host. See host_setup/etc/bind/ for a sample config that is suitable. It needs to redirect DNS queries for your virtual domain to your windows domain controller 4) Download a RHEL install ISO. CREATING A CLUSTER ================== A cluster comprises a single base disk image, a copy-on-write disk image for each node and some XML files that tell libvirt about each node's virtual hardware configuration. The copy-on-write disk images save a lot of disk space on the host machine because they each use the base disk image - without them the disk image for each cluster node would need to contain the entire RHEL install. The cluster creation process can be broken down into 2 mains steps: 1) Creating the base disk image. 2) Create the per-node disk images and corresponding XML files. However, before you do this you will need to create a configuration file. See the "CONFIGURATION" section below for more details. Here are more details on the "create cluster" process. Note that unless you have done something extra special then you'll need to run all of this as root. 1) Create the base disk image using: ./autocluster create base The first thing this step does is to check that it can connect to the YUM server. If this fails make sure that there are no firewalls blocking your access to the server. The install will take about 10 to 15 minutes and you will see the packages installing in your terminal The installation process uses kickstart. If your configuration uses a SoFS release then the last stage of the kickstart configuration will be a postinstall script that installs and configures packages related to SoFS. The choice of postinstall script is set using the POSTINSTALL_TEMPLATE variable, allowing you to adapt the installation process for different types of clusters. It makes sense to install packages that will be common to all nodes into the base image. This save time later when you're setting up the cluster nodes. However, you don't have to do this - you can set POSTINSTALL_TEMPLATE to "" instead - but then you will lose the quick cluster creation/setup that is a major feature of autocluster. When that has finished you should mark that base image immutable like this: chattr +i /virtual/ac-base.img That will ensure it won't change. This is a precaution as the image will be used as a basis file for the per-node images, and if it changes your cluster will become corrupt 2) Now run "autocluster create cluster" specifying a cluster name. For example: autocluster create cluster c1 This will create and install the XML node descriptions and the disk images for your cluster nodes, and any other nodes you have configured. Each disk image is initially created as an "empty" copy-on-write image, which is linked to the base image. Those images are then loopback-nbd-mounted and populated with system configuration files and other potentially useful things (such as scripts). BOOTING A CLUSTER ================= At this point the cluster has been created but isn't yet running. Autocluster provides a command called "vircmd", which is a thin wrapper around libvirt's virsh command. vircmd takes a cluster name instead of a node/domain name and runs the requested command on all nodes in the cluster. 1) Now boot your cluster nodes like this: vircmd start c1 The most useful vircmd commands are: start : boot a node shutdown : graceful shutdown of a node destroy : power off a node immediately 2) You can watch boot progress like this: tail -f /var/log/kvm/serial.c1* All the nodes have serial consoles, making it easier to capture kernel panic messages and watch the nodes via ssh POST-CREATION SETUP =================== Now you have a cluster of nodes, which might have a variety of packages installed and configured in a common way. Now that the cluster is up and running you might need to configure specialised subsystems like GPFS or Samba. You can do this by hand or use the sample scripts/configurations that are provided 1) Now you can ssh into your nodes. You may like to look at the small set of scripts in /root/scripts on the nodes for some scripts. In particular: mknsd.sh : sets up the local shared disks as GPFS NSDs setup_gpfs.sh : sets up GPFS, creates a filesystem etc setup_samba.sh : sets up Samba and many other system compoents setup_tsm_server.sh: run this on the TSM node to setup the TSM server setup_tsm_client.sh: run this on the GPFS nodes to setup HSM To setup a SoFS system you will normally need to run setup_gpfs.sh and setup_samba.sh. 2) If using the SoFS GUI, then you may want to lower the memory it uses so that it fits easily on the first node. Just edit this file on the first node: /opt/IBM/sofs/conf/overrides/sofs.javaopt 3) For automating the SoFS GUI, you may wish to install the iMacros extension to firefox, and look at some sample macros I have put in the imacros/ directory of autocluster. They will need editing for your environment, but they should give you some hints on how to automate the final GUI stage of the installation of a SoFS cluster. CONFIGURATION ============= Basics ====== Autocluster uses configuration files containing Unix shell style variables. For example, FIRSTIP=30 indicates that the last octet of the first IP address in the cluster will be 30. If an option contains multiple words then they will be separated by underscores ('_'), as in: ISO_DIR=/data/ISOs All options have an equivalent command-line option, such as: --firstip=30 Command-line options are lowercase. Words are separated by dashes ('-'), as in: --iso-dir=/data/ISOs Normally you would use a configuration file with variables so that you can repeat steps easily. The command-line equivalents are useful for trying things out without resorting to an editor. You can specify a configuration file to use on the autocluster command-line using the -c option. For example: autocluster -c config-foo create base If you don't provide a configuration variable then autocluster will look for a file called "config" in the current directory. You can also use environment variables to override the default values of configuration variables. However, both command-line options and configuration file entries will override environment variables. Potentially useful information: * Use "autocluster --help" to list all available command-line options - all the items listed under "configuration options:" are the equivalents of the settings for config files. This output also shows descriptions of the options. * You can use the --dump option to check the current value of configuration variables. This is most useful when used in combination with grep: autocluster --dump | grep ISO_DIR In the past we recommended using --dump to create initial configuration file. Don't do this - it is a bad idea! There are a lot of options and you'll create a huge file that you don't understand and can't debug! * Configuration options are defined in config.d/*.defconf. You shouldn't need to look in these files... but sometimes they contain comments about options that are too long to fit into help strings. Keep it simple ============== * I recommend that you aim for the smallest possible configuration file. Perhaps start with: FIRSTIP= and move on from there. * Use the --with-release option on the command-line or the with_release function in a configuration file to get default values for building virtual clusters for releases of particular "products". Currently there are only release definitions for SoFS. For example, you can setup default values for SoFS-1.5.3 by running: autocluster --with-release=SoFS-1.5.3 ... Equivalently you can use the following syntax in a configuration file: with_release "SoFS-1.5.3" So the smallest possible config file would have something like this as the first line and would then set FIRSTIP: with_release "SoFS-1.5.3" FIRSTIP= Add other options as you need them. The release definitions are stored in releases/*.release. The available releases are listed in the output of "autocluster --help". NOTE: Occasionally you will need to consider the position of with_release in your configuration. If you want to override options handled by a release definition then you will obviously need to set them later in your configuration. This will be the case for most options you will want to set. However, some options will need to appear before with_release so that they can be used within a release definition - the most obvious one is the (rarely used) RHEL_ARCH option, which is used in the default ISO setting for each release. If things don't work as expected use --dump to confirm that configuration variables have the values that you expect. * The NODES configuration variable controls the types of nodes that are created. At the time of writing, the default value is: NODES="rhel_base:0-3" This means that you get 4 nodes, at IP offsets 0, 1, 2, & 3 from FIRSTIP, all part of the CTDB cluster. That is, with standard settings and FIRSTIP=35, 4 nodes will be created in the IP range 10.0.0.35 to 10.0.0.38. The SoFS releases use a default of: NODES="tsm_server:0 sofs_gui:1 sofs_front:2-4" which should produce a set of nodes the same as the old SoFS default. You can add extra rhel_base nodes if you need them for test clients or some other purpose: NODES="$NODES rhel_base:7,8" This produces an additional 2 base RHEL nodes at IP offsets 7 & 8 from FIRSTIP. Since sofs_* nodes are present, these base nodes will not be part of the CTDB cluster - they're just extra. For many standard use cases the nodes specified by NODES can be modified by setting NUMNODES, WITH_SOFS_GUI and WITH_TSM_NODE. However, these options can't be used to create nodes without specifying IP offsets - except WITH_TSM_NODE, which checks to see if IP offset 0 is vacant. Therefore, for many uses you can ignore the NODES variable. However, NODES is the recommended mechanism for specifying the nodes that you want in your cluster. It is powerful, easy to read and centralises the information in a single line of your configuration file. iSCSI shared disks ================== The RHEL5 version of KVM does not support the SCSI block device emulation. Therefore, you can use either virtio or iSCSI shared disks. Unfortunately, at the time of writing, virtio block devices are not supported by the version of multipath in RHEL5. So this leaves iSCSI as the only choice. The main configuration options you need for iSCSI disks are: SHARED_DISK_TYPE=iscsi NICMODEL=virtio # Recommended for performance add_extra_package iscsi-initiator-utils Note that SHARED_DISK_PREFIX and SHARED_DISK_CACHE are ignored for iSCSI shared disks because KVM doesn't (need to) know about them. You will need to install the scsi-target-utils package on the host system. After creating a cluster, autocluster will print a message that points you to a file tmp/iscsi.$CLUSTER - you need to run the commands in this file (probably via: sh tmp/iscsi.$CLUSTER) before booting your cluster. This will remove any old target with the same ID, and create the new target, LUNs and ACLs. You can use the following command to list information about the target: tgtadm --lld iscsi --mode target --op show If you need multiple clusters using iSCSI on the same host then each cluster will need to have a different setting for ISCSI_TID. Raw IDE system disks ==================== The RHEL5 version of KVM does not support the SCSI block device emulation. Therefore, you can use virtio or ide system disks. However, writeback caching, qcow2 and virtio are incompatible and result in I/O corruption. So, you can use either virtio system disks without any caching, accepting reduced performance, or you can use IDE system disks with writeback caching, with nice performance. For IDE disks, here are the required settings: SYSTEM_DISK_TYPE=ide SYSTEM_DISK_PREFIX=hd SYSTEM_DISK_CACHE=writeback The next problem is that RHEL5's KVM does not include qemu-nbd. The best solution is to build your own qemu-nbd and stop reading this section. If, for whatever reason, you're unable to build your own qemu-nbd, then you can use raw, rather than qcow2, system disks. If you do this then you need significantly more disk space (since the system disks will be *copies* of the base image) and cluster creation time will no longer be pleasantly snappy (due to the copying time - the images are large and a single copy can take several minutes). So, having tried to warn you off this option, if you really want to do this then you'll need these settings: SYSTEM_DISK_FORMAT=raw BASE_FORMAT=raw Note that if you're testing cluster creation with iSCSI shared disks then you should find a way of switching off raw disks. This avoids every iSCSI glitch costing you a lot of time while raw disks are copied. DEVELOPMENT HINTS ================= The -e option provides support for executing arbitrary bash code. This is useful for testing and debugging. One good use of this option is to test template substitution using the function substitute_vars(). For example: ./autocluster --with-release=SoFS-1.5.3 -e 'CLUSTER=foo; DISK=foo.qcow2; UUID=abcdef; NAME=foon1; set_macaddrs; substitute_vars templates/node.xml' This prints templates/node.xml with all appropriate substitutions done. Some internal variables (e.g. CLUSTER, DISK, UUID, NAME) are given fairly arbitrary values but the various MAC address strings are set using the function set_macaddrs(). The -e option is also useful when writing scripts that use autocluster. Given the complexities of the configuration system you probably don't want to parse configuration files yourself to determine the current settings. Instead, you can ask autocluster to tell you useful pieces of information. For example, say you want to script creating a base disk image and you want to ensure the image is marked immutable: base_image=$(autocluster -c $CONFIG -e 'echo $VIRTBASE/$BASENAME.img') chattr -V -i "$base_image" if autocluster -c $CONFIG create base ; then chattr -V +i "$base_image" ... Note that the command that autocluster should run is enclosed in single quotes. This means that $VIRTBASE and $BASENAME will be expand within autocluster after the configuration file has been loaded.