faqbody.html

   1 <table><tr><td><ul>
   2 <li><a href="#1">Read-only file system</a><br>
   3 <li><a href="#2">copies every file</a><br>
   4 <li><a href="#3">is your shell clean</a><br>
   5 <li><a href="#4">memory usage</a><br>
   6 <li><a href="#5">out of memory</a><br>
   7 </ul></td><td>&nbsp;&nbsp;&nbsp;&nbsp;</td><td><ul>
   8 <li><a href="#6">rsync through a firewall</a><br>
   9 <li><a href="#7">rsync and cron</a><br>
  10 <li><a href="#8">rsync: Command not found</a><br>
  11 <li><a href="#9">spaces in filenames</a><br>
  12 <li><a href="#10">ignore "vanished files" warning</a><br>
  13 </ul></td></tr></table>
  14
  15 <h3><a name=1>Read-only file system</a></h3>
  16
  17 <p>If you get "Read-only file system" as an error when sending to a rsync
  18 daemon then you probably forgot to set "read only = no" for that module.
  19
  20 <hr>
  21 <h3><a name=2>copies every file</a></h3>
  22
  23 <p>Some people occasionally report that rsync copies every file when they
  24 expect it to copy only a small subset. In most cases the explanation is
  25 that you forgot to include the --times (-t) option in the original copy,
  26 so rsync is forced to check every file to see if it has changed (because
  27 the modified time and size do not match).
  28
  29 <p>If you think that rsync is erroneously copying every file then look at
  30 the stats produced with -v and see if rsync is really sending all the data.
  31
  32 <hr>
  33 <h3><a name=3>is your shell clean</a></h3>
  34
  35 <p>The "is your shell clean" message and the "protocol mismatch" message
  36 are usually caused by having some sort of program in your .cshrc, .profile,
  37 .bashrc or equivalent file that writes a message every time you connect
  38 using a remote-shell program (such as ssh or rsh).  Data written in this
  39 way corrupts the rsync data stream. rsync detects this at startup and
  40 produces those error messages.  However, if you are using rsync-daemon
  41 syntax (host::path or rsync://) without using a remote-shell program (no
  42 --rsh or -e option), there is not remote-shell program involved, and the
  43 problem is probably caused by an error on the daemon side (so check the
  44 daemon logs).
  45
  46 <p>A good way to test if your remote-shell connection is clean is to try
  47 something like this (use ssh or rsh, as appropriate):
  48
  49 <blockquote><pre>ssh remotesystem /bin/true &gt; test.dat</pre></blockquote>
  50
  51 <p>That should create a file called test.dat with nothing in it. If
  52 test.dat is not of zero length then your shell is not clean.  Look at the
  53 contents of test.dat to see what was sent. Look at all the startup files on
  54 remotesystem to try and find the problem.
  55
  56 <hr>
  57 <h3><a name=4>memory usage</a></h3>
  58
  59 <p>Yes, rsync uses a lot of memory. The majority of the memory is used to
  60 hold the list of files being transferred. This takes about 100 bytes per
  61 file, so if you are transferring 800,000 files then rsync will consume
  62 about 80M of memory. It will be higher if you use -H or --delete.
  63
  64 <p>To fix this requires a major rewrite of rsync, which my or may not
  65 happen.
  66
  67 <hr>
  68 <h3><a name=5>out of memory</a></h3>
  69
  70 <p>The usual reason for "out of memory" when running rsync is that you are
  71 transferring a _very_ large number of files.  The size of the files doesn't
  72 matter, only the total number of files.
  73
  74 <p>As a rule of thumb you should expect rsync to consume about 100 bytes
  75 per file in the file list. This happens because rsync builds a internal
  76 file list structure containing all the vital details of each file.  rsync
  77 needs to hold structure in memory because it is being constantly traversed.
  78
  79 <p>A future version of rsync could be built with an improved protocol that
  80 transfers files in a more incremental fashion, which would require a lot
  81 less memory.  Unfortunately, such an rsync does not yet exist.
  82
  83 <hr>
  84 <h3><a name=6>rsync through a firewall</a></h3>
  85
  86 <p>If you have a setup where there is no way to directly connect two
  87 systems for an rsync transfer, there are several ways to use the firewall
  88 system to act as an intermediary in the transfer.
  89
  90 <h4>Method 1</h4>
  91
  92 <p>Use your remote shell (e.g. ssh) to access the middle system and have it
  93 use a remote shell to hop over to the actual target system.
  94
  95 <p>To effect this extra hop, you'll need to make sure that the remote-shell
  96 connection from the middle system to the target system does not involve any
  97 tty-based user interaction (such as prompting for a password) because there
  98 is no way for the middle system to access the local user's tty.
  99
 100 <p>One way that works for both rsh and ssh is to enable host-based
 101 authentication, which would allow all connections from the middle system to
 102 the target system to succeed (when the username remains the same).
 103 However, this may not be a desirable setup.
 104
 105 <p>Another method that works with ssh (and is also very safe) is to setup
 106 an ssh key (see the ssh-key manpage) and ensure that ssh-agent forwarding
 107 is turned on (e.g. "ForwardAgent&nbsp;yes").  You would put the public
 108 version of your key onto the middle and target systems, and the private key
 109 on your local system (which I recommend you encrypt).  With this setup, a
 110 series of ssh connections that starts from the system where your private
 111 key is available will auto-authorize (after the pass-phrase prompt on the
 112 first system).
 113
 114 <p>You should then test that a series of ssh connections works without
 115 multiple prompts by running a command like this (put in the real "middle"
 116 and "target" hostnames, of course):
 117
 118 <blockquote><pre>ssh middle ssh target uptime</pre></blockquote>
 119
 120 <p>If you get a password/passphrase prompt to get into the middle system
 121 that's fine, but the extra hop needs to occur without any extra user
 122 interaction.
 123
 124 <p>Once that's done, you can do an rsync copy like this:
 125
 126 <blockquote><pre>rsync -av -e "ssh middle ssh" target:/source/ /dest/</pre></blockquote>
 127
 128 <h4>Method 2</h4>
 129
 130 <p>Assuming you're using ssh as your remote shell, you can configure ssh to
 131 use a proxy command to get to the remote host you're interested in reaching.
 132
 133 <p>Here is an example config for your ~/.ssh/config file (substitute "target",
 134 "target_user", and "middle" as appropriate):
 135
 136 <blockquote><pre>Host target
 137   ProxyCommand nohup ssh middle nc -w1 %h %p
 138   User target_user
 139 </pre></blockquote>
 140
 141 <p>This proxy setup uses ssh to login to the firewall system ("middle") and
 142 uses nc (netcat) to connect to the target host (%h) using the target port
 143 number (%p).  The use of "nohup" silences a warning at the end of the run,
 144 and the "-w1" option tells nc to shut down when the connection closes.
 145
 146 <p>With this done, you could run a normal-looking rsync command to "target"
 147 that would run the proxy command to get through the firewall system:
 148
 149 <blockquote><pre>rsync -av /src/ target:/dest/</pre></blockquote>
 150
 151 <h4>Method 3</h4>
 152
 153 <p>Assuming you're using ssh as your remote shell, you can configure ssh to
 154 forward a local port through your middle system to the ssh port (22) on the
 155 target system.  This method does not require the use of "nc" (it uses only
 156 ssh to effect the extra hop), but otherwise it is similar to, but slightly
 157 less convenient than, method 2.
 158
 159 <p>The first thing we need is an ssh configuration that will allow us to
 160 connect to the forwarded port as if we were connecting to the target
 161 system, and we need ssh to know what we're doing so that it doesn't
 162 complain about the host keys being wrong.  We can do this by adding this
 163 section to your ~/.ssh/config file (substitute "target" and "target_user"
 164 as appropriate):
 165
 166 <blockquote><pre>Host target
 167   HostName localhost
 168   Port 2222
 169   HostKeyAlias target
 170   User target_user
 171 </pre></blockquote>
 172
 173 <p>Next, we need to enable the port forwarding:
 174
 175 <blockquote><pre>ssh -fN -l middle_user -L 2222:target:22 middle</pre></blockquote>
 176
 177 <p>What this does is cause a connection to port 2222 on the local system to
 178 get tunneled to the middle system and then turn into a connection to the
 179 target system's port 22.  The -N option tells ssh not to run a command on
 180 the remote system, which works with modern ssh versions (you can run a
 181 sleep command if -N doesn't work).  The -f option tells ssh to put the
 182 command in the background after any password/passphrase prompts.
 183
 184 <p>With this done, you could run a normal-looking rsync command to "target"
 185 that would use a connection to port 2222 on localhost automatically:
 186
 187 <blockquote><pre>rsync -av target:/src/ /dest/</pre></blockquote>
 188
 189 <p><b>Note:</b> starting an ssh tunnel allows anyone on the source system
 190 to connect to the localhost port 2222, not just you, but they'd still need
 191 to be able to login to the target system using their own credentials.
 192
 193 <h4>Method 4</h4>
 194
 195 <p>Install and configure an rsync daemon on the target and use an ssh
 196 tunnel to reach the rsync sever.  This is similar to method 3, but it
 197 tunnels the daemon port for those that prefer to use an rsync daemon.
 198
 199 <p>Installing the rsync daemon is beyond the scope of this document, but
 200 see the rsyncd.conf manpage for more information.  Keep in mind that you
 201 don't need to be root to run an rsync daemon as long as you don't use a
 202 protected port.
 203
 204 <p>Once your rsync daemon is up and running, you build an ssh tunnel
 205 through your middle system like this:
 206
 207 <blockquote><pre>ssh -fN -l middle_user -L 8873:target:873 middle</pre></blockquote>
 208
 209 <p>What this does is cause a connection to port 8873 on the local system to
 210 turn into a connection from the middle system to the target system on port
 211 873.  (Port 873 is the normal port for an rsync daemon.) The -N option
 212 tells ssh not to run a command on the remote system, which works with
 213 modern ssh versions (you can run a sleep command if -N doesn't work).  The
 214 -f option tells ssh to put the command in the background after any
 215 password/passphrase prompts.
 216
 217 <p>Now when an rsync command is executed with a daemon-mode command-line
 218 syntax to the local system, the conversation is directed to the target
 219 system.  For example:
 220
 221 <blockquote><pre>rsync -av --port 8873 localhost::module/source dest/
 222 rsync -av rsync://localhost:8873/module/source dest/</pre></blockquote>
 223
 224 <p><b>Note:</b> starting an ssh tunnel allows anyone on the source system
 225 to connect to the localhost port 8873, not just you, so you may want to
 226 enable username/password restrictions on you rsync daemon.
 227
 228 <hr>
 229 <h3><a name=7>rsync and cron</a></h3>
 230
 231 <p>On some systems (notably SunOS4) cron supplies what looks like a socket
 232 to rsync, so rsync thinks that stdin is a socket. This means that if you
 233 start rsync with the --daemon switch from a cron job you end up rsync
 234 thinking it has been started from inetd. The fix is simple&mdash;just
 235 redirect stdin from /dev/null in your cron job.
 236
 237 <hr>
 238 <h3><a name=8>rsync: Command not found</a></h3>
 239
 240 <p>This error is produced when the remote shell is unable to locate the rsync
 241 binary in your path. There are 3 possible solutions:
 242
 243 <ol>
 244
 245 <li>install rsync in a "standard" location that is in your remote path.
 246
 247 <li>modify your .cshrc, .bashrc etc on the remote system to include the path
 248 that rsync is in
 249
 250 <li>use the --rsync-path option to explicitly specify the path on the
 251 remote system where rsync is installed
 252
 253 </ol>
 254
 255 <p>You may echo find the command:
 256
 257 <blockquote><pre>ssh host 'echo $PATH'</pre></blockquote>
 258
 259 <p>for determining what your remote path is.
 260
 261 <hr>
 262 <h3><a name=9>spaces in filenames</a></h3>
 263
 264 <p>Can rsync copy files with spaces in them?
 265
 266 <p>Short answer: Yes, rsync can handle filenames with spaces.
 267
 268 <p>Long answer:
 269
 270 <p>Rsync handles spaces just like any other unix command line application.
 271 Within the code spaces are treated just like any other character so a
 272 filename with a space is no different from a filename with any other
 273 character in it.
 274
 275 <p>The problem of spaces is in the argv processing done to interpret the
 276 command line.  As with any other unix application you have to escape spaces
 277 in some way on the command line or they will be used to separate arguments.
 278
 279 <p>It is slightly trickier in rsync (and other remote-copy programs like
 280 scp) because rsync sends a command line to the remote system to launch the
 281 peer copy of rsync (this assumes that we're not talking about daemon mode,
 282 which is not affected by this problem because no remote shell is involved
 283 in the reception of the filenames).  The command line is interpreted by the
 284 remote shell and thus the spaces need to arrive on the remote system
 285 escaped so that the shell doesn't split such filenames into multiple
 286 arguments.
 287
 288 <p>For example:
 289
 290 <blockquote><pre>rsync -av host:'a long filename' /tmp/</pre></blockquote>
 291
 292 <p>This is usually a request for rsync to copy 3 files from the remote
 293 system, "a", "long", and "filename" (the only exception to this is for a
 294 system running a shell that does not word-split arguments in its commands,
 295 and that is exceedingly rare).  If you wanted to request a single file with
 296 spaces, you need to get some kind of space-quoting characters to the remote
 297 shell that is running the remote rsync command.  The following commands
 298 should all work:
 299
 300 <blockquote><pre>rsync -av host:'"a long filename"' /tmp/
 301 rsync -av host:'a\ long\ filename' /tmp/
 302 rsync -av host:a\\\ long\\\ filename /tmp/</pre></blockquote>
 303
 304 <p>You might also like to use a '?' in place of a space as long as there
 305 are no other matching filenames than the one with spaces (since '?' matches
 306 any character):
 307
 308 <blockquote><pre>rsync -av host:a?long?filename /tmp/</pre></blockquote>
 309
 310 <p>As long as you know that the remote filenames on the command line
 311 are interpreted by the remote shell then it all works fine.
 312
 313 <hr>
 314 <h3><a name=10>ignore "vanished files" warning</a></h3>
 315
 316 <p>Some folks would like to ignore the "vanished files" warning, which
 317 manifests as an exit-code 24.  The easiest way to do this is to create
 318 a shell script wrapper.  For instance, name this something like
 319 "rsync-no24":
 320
 321 <blockquote><pre>#!/bin/sh
 322 rsync "$@"
 323 e=$?
 324 if test $e = 24; then
 325     exit 0
 326 fi
 327 exit $e</pre></blockquote>
 328
 329 <hr>
 330