Read-only file system

If you get "Read-only file system" as an error when sending to a rsync daemon then you probably forgot to set "read only = no" for that module.


copies every file

Some people occasionally report that rsync copies every file when they expect it to copy only a small subset. In most cases the explanation is that you forgot to include the --times (-t) option in the original copy, so rsync is forced to check every file to see if it has changed (because the modified time and size do not match).

If you think that rsync is erroneously copying every file then look at the stats produced with -v and see if rsync is really sending all the data.


is your shell clean

The "is your shell clean" message and the "protocol mismatch" message are usually caused by having some sort of program in your .cshrc, .profile, .bashrc or equivalent file that writes a message every time you connect using a remote-shell program (such as ssh or rsh). Data written in this way corrupts the rsync data stream. rsync detects this at startup and produces those error messages. However, if you are using rsync-daemon syntax (host::path or rsync://) without using a remote-shell program (no --rsh or -e option), there is not remote-shell program involved, and the problem is probably caused by an error on the daemon side (so check the daemon logs).

A good way to test if your remote-shell connection is clean is to try something like this (use ssh or rsh, as appropriate):

ssh remotesystem /bin/true > test.dat

That should create a file called test.dat with nothing in it. If test.dat is not of zero length then your shell is not clean. Look at the contents of test.dat to see what was sent. Look at all the startup files on remotesystem to try and find the problem.


memory usage

Yes, rsync uses a lot of memory. The majority of the memory is used to hold the list of files being transferred. This takes about 100 bytes per file, so if you are transferring 800,000 files then rsync will consume about 80M of memory. It will be higher if you use -H or --delete.

To fix this requires a major rewrite of rsync, which my or may not happen.


out of memory

The usual reason for "out of memory" when running rsync is that you are transferring a _very_ large number of files. The size of the files doesn't matter, only the total number of files.

As a rule of thumb you should expect rsync to consume about 100 bytes per file in the file list. This happens because rsync builds a internal file list structure containing all the vital details of each file. rsync needs to hold structure in memory because it is being constantly traversed.

A future version of rsync could be built with an improved protocol that transfers files in a more incremental fashion, which would require a lot less memory. Unfortunately, such an rsync does not yet exist.


rsync through a firewall

If you have a setup where there is no way to directly connect two systems for an rsync transfer, there are several ways to use the firewall system to act as an intermediary in the transfer.

Method 1

Use your remote shell (e.g. ssh) to access the middle system and have it use a remote shell to hop over to the actual target system.

To effect this extra hop, you'll need to make sure that the remote-shell connection from the middle system to the target system does not involve any tty-based user interaction (such as prompting for a password) because there is no way for the middle system to access the local user's tty.

One way that works for both rsh and ssh is to enable host-based authentication, which would allow all connections from the middle system to the target system to succeed (when the username remains the same). However, this may not be a desirable setup.

Another method that works with ssh (and is also very safe) is to setup an ssh key (see the ssh-key manpage) and ensure that ssh-agent forwarding is turned on (e.g. "ForwardAgent yes"). You would put the public version of your key onto the middle and target systems, and the private key on your local system (which I recommend you encrypt). With this setup, a series of ssh connections that starts from the system where your private key is available will auto-authorize (after the pass-phrase prompt on the first system).

You should then test that a series of ssh connections works without multiple prompts by running a command like this (put in the real "middle" and "target" hostnames, of course):

ssh middle ssh target uptime

If you get a password/passphrase prompt to get into the middle system that's fine, but the extra hop needs to occur without any extra user interaction.

Once that's done, you can do an rsync copy like this:

rsync -av -e "ssh middle ssh" target:/source/ /dest/

Method 2

Assuming you're using ssh as your remote shell, you can configure ssh to use a proxy command to get to the remote host you're interested in reaching. Doing this will allow the multi-hop connection to work with rsync, even if both hosts prompt for a password -- this is because both ssh connections originate from the localhost, and thus both instances of ssh have access to the local console to use for an out-of-band password prompt.

Here is an example config for your ~/.ssh/config file (substitute "target", "target_user", and "middle" as appropriate):

Host target
  ProxyCommand nohup ssh middle nc -w1 %h %p
  User target_user

This proxy setup uses ssh to login to the firewall system ("middle") and uses nc (netcat) to connect to the target host (%h) using the target port number (%p). The use of "nohup" silences a warning at the end of the run, and the "-w1" option tells nc to shut down when the connection closes.

With this done, you could run a normal-looking rsync command to "target" that would run the proxy command to get through the firewall system:

rsync -av /src/ target:/dest/

Method 3

Assuming you're using ssh as your remote shell, you can configure ssh to forward a local port through your middle system to the ssh port (22) on the target system. This method does not require the use of "nc" (it uses only ssh to effect the extra hop), but otherwise it is similar to, but slightly less convenient than, method 2.

The first thing we need is an ssh configuration that will allow us to connect to the forwarded port as if we were connecting to the target system, and we need ssh to know what we're doing so that it doesn't complain about the host keys being wrong. We can do this by adding this section to your ~/.ssh/config file (substitute "target" and "target_user" as appropriate):

Host target
  HostName localhost
  Port 2222
  HostKeyAlias target
  User target_user

Next, we need to enable the port forwarding:

ssh -fN -l middle_user -L 2222:target:22 middle

What this does is cause a connection to port 2222 on the local system to get tunneled to the middle system and then turn into a connection to the target system's port 22. The -N option tells ssh not to start a shell on the remote system, which works with modern ssh versions (you can run a sleep command if -N doesn't work). The -f option tells ssh to put the command in the background after any password/passphrase prompts.

With this done, you could run a normal-looking rsync command to "target" that would use a connection to port 2222 on localhost automatically:

rsync -av target:/src/ /dest/

Note: starting an ssh tunnel allows anyone on the source system to connect to the localhost port 2222, not just you, but they'd still need to be able to login to the target system using their own credentials.

Method 4

Install and configure an rsync daemon on the target and use an ssh tunnel to reach the rsync sever. This is similar to method 3, but it tunnels the daemon port for those that prefer to use an rsync daemon.

Installing the rsync daemon is beyond the scope of this document, but see the rsyncd.conf manpage for more information. Keep in mind that you don't need to be root to run an rsync daemon as long as you don't use a protected port.

Once your rsync daemon is up and running, you build an ssh tunnel through your middle system like this:

ssh -fN -l middle_user -L 8873:target:873 middle

What this does is cause a connection to port 8873 on the local system to turn into a connection from the middle system to the target system on port 873. (Port 873 is the normal port for an rsync daemon.) The -N option tells ssh not to start a shell on the remote system, which works with modern ssh versions (you can run a sleep command if -N doesn't work). The -f option tells ssh to put the command in the background after any password/passphrase prompts.

Now when an rsync command is executed with a daemon-mode command-line syntax to the local system, the conversation is directed to the target system. For example:

rsync -av --port 8873 localhost::module/source dest/
rsync -av rsync://localhost:8873/module/source dest/

Note: starting an ssh tunnel allows anyone on the source system to connect to the localhost port 8873, not just you, so you may want to enable username/password restrictions on your rsync daemon.


rsync and cron

On some systems (notably SunOS4) cron supplies what looks like a socket to rsync, so rsync thinks that stdin is a socket. This means that if you start rsync with the --daemon switch from a cron job you end up rsync thinking it has been started from inetd. The fix is simple—just redirect stdin from /dev/null in your cron job.


rsync: Command not found

This error is produced when the remote shell is unable to locate the rsync binary in your path. There are 3 possible solutions:

  1. install rsync in a "standard" location that is in your remote path.
  2. modify your .cshrc, .bashrc etc on the remote system to include the path that rsync is in
  3. use the --rsync-path option to explicitly specify the path on the remote system where rsync is installed

You may echo find the command:

ssh host 'echo $PATH'

for determining what your remote path is.


spaces in filenames

Can rsync copy files with spaces in them?

Short answer: Yes, rsync can handle filenames with spaces.

Long answer:

Rsync handles spaces just like any other unix command line application. Within the code spaces are treated just like any other character so a filename with a space is no different from a filename with any other character in it.

The problem of spaces is in the argv processing done to interpret the command line. As with any other unix application you have to escape spaces in some way on the command line or they will be used to separate arguments.

It is slightly trickier in rsync (and other remote-copy programs like scp) because rsync sends a command line to the remote system to launch the peer copy of rsync (this assumes that we're not talking about daemon mode, which is not affected by this problem because no remote shell is involved in the reception of the filenames). The command line is interpreted by the remote shell and thus the spaces need to arrive on the remote system escaped so that the shell doesn't split such filenames into multiple arguments.

For example:

rsync -av host:'a long filename' /tmp/

This is usually a request for rsync to copy 3 files from the remote system, "a", "long", and "filename" (the only exception to this is for a system running a shell that does not word-split arguments in its commands, and that is exceedingly rare). If you wanted to request a single file with spaces, you need to get some kind of space-quoting characters to the remote shell that is running the remote rsync command. The following commands should all work:

rsync -av host:'"a long filename"' /tmp/
rsync -av host:'a\ long\ filename' /tmp/
rsync -av host:a\\\ long\\\ filename /tmp/

You might also like to use a '?' in place of a space as long as there are no other matching filenames than the one with spaces (since '?' matches any character):

rsync -av host:a?long?filename /tmp/

As long as you know that the remote filenames on the command line are interpreted by the remote shell then it all works fine.


ignore "vanished files" warning

Some folks would like to ignore the "vanished files" warning, which manifests as an exit-code 24. The easiest way to do this is to create a shell script wrapper. For instance, name this something like "rsync-no24":

#!/bin/sh
rsync "$@"
e=$?
if test $e = 24; then
    exit 0
fi
exit $e