From 35f7e8295b4dfca7f25ab283e9cbe369466d87f0 Mon Sep 17 00:00:00 2001 From: Wayne Davison Date: Sun, 11 Sep 2005 18:42:38 +0000 Subject: [PATCH] Updated to remove out-of-date info, add a new entry (on firewall forwarding), improve several entries, and get rid of the horrific
 formatting of most of the entries.

---
 faqbody.html | 388 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 212 insertions(+), 176 deletions(-)

diff --git a/faqbody.html b/faqbody.html
index 228fc22..7aa4a92 100644
--- a/faqbody.html
+++ b/faqbody.html
@@ -1,216 +1,252 @@
-
-HP compile
-Read-only file system
-copies every file
-is your shell clean
-memory usage
-out of memory
-
    -rsync 2.4.3 with rsh
    -rsync and cron
    -rsync: Command not found
    -spaces in filenames
    -stderr & stdout
    -subscribe
    -
-

HP compile

-

-For HPUX apparently you need to add the option -Ae to the CFLAGS. Edit
-the Makefile and change CFLAGS to:
-
- CFLAGS=-Ae -O
-

-

Read-only file system

-

-if you get "Read-only file system" as an error when sending to a rsync
-server then you probably forgot to set "read only = no" for that
-module.
-

-

copies every file

-

-Some people occasionally report that rsync copies every file when they 
-expect it to copy only a small subset. In most cases the explanation
-is that rsync is not in fact copying every file it is just trying
-to update file permissions or ownership and this is failing for some
-reason. rsync lists files with the -v option if it makes any change
-to the file, including minor changes such as group changes.
-
-If you think that rsync is erroneously copying every file then look
-at the stats produced with -v and see if rsync is really sending all
-the data. 
-
-
-

-

is your shell clean

-

-The "is your shell clean" message and the "protocol mismatch"
-message are usually caused by having some sort of program
-in your .cshrc, .profile, .bashrc or equivalent file that
-writes a message every time you connect. Data written
-in this way corrupts the rsync data stream. rsync detects this
-at startup and produces those error messages.
-
-A good way to test this is something like:
-
-	rsh remotemachine /bin/true > test.dat
-
-you should get a file called test.dat created of 0 length. If
-test.dat is not of zero length then your shell is not clean.
-Look at the contents of test.dat to see what was sent. Look
-at all the startup files on remotemachine to try and find the
-problem.
-
-
-

-

memory usage

-

-yes, rsync uses a lot of memory. The majority of the memory is used to
-hole the list of files being transferred. This takes about 100 bytes
-per file, so if you are transferring 800,000 files then rsync will consume
+
    
+ +

Read-only file system

+ +

If you get "Read-only file system" as an error when sending to a rsync +daemon then you probably forgot to set "read only = no" for that module. + +


+

copies every file

+ +

Some people occasionally report that rsync copies every file when they +expect it to copy only a small subset. In most cases the explanation is +that you forgot to include the --times (-t) option in the original copy, +so rsync is forced to check every file to see if it has changed (because +the modified time and size do not match). + +

If you think that rsync is erroneously copying every file then look at +the stats produced with -v and see if rsync is really sending all the data. + +


+

is your shell clean

+ +

The "is your shell clean" message and the "protocol mismatch" message +are usually caused by having some sort of program in your .cshrc, .profile, +.bashrc or equivalent file that writes a message every time you connect +using a remote-shell program (such as ssh or rsh). Data written in this +way corrupts the rsync data stream. rsync detects this at startup and +produces those error messages. However, if you are using rsync-daemon +syntax (host::path or rsync://) without using a remote-shell program (no +--rsh or -e option), there is not remote-shell program involved, and the +problem is probably caused by an error on the daemon side (so check the +daemon logs). + +

A good way to test if your remote-shell connection is clean is to try +something like this (use ssh or rsh, as appropriate): + +

ssh remotemachine /bin/true > test.dat
+ +

That should create a file called test.dat with nothing in it. If +test.dat is not of zero length then your shell is not clean. Look at the +contents of test.dat to see what was sent. Look at all the startup files on +remotemachine to try and find the problem. + +


+

memory usage

+ +

Yes, rsync uses a lot of memory. The majority of the memory is used to +hold the list of files being transferred. This takes about 100 bytes per +file, so if you are transferring 800,000 files then rsync will consume about 80M of memory. It will be higher if you use -H or --delete. -To fix this requires a major rewrite of rsync. I do plan on doing that, but -I don't know when I'll get to it. -


-

out of memory

-

-The usual reason for "out of memory" when running rsync is that you
-are transferring a _very_ large number of files.  The size of the
-files doesn't matter, only the total number of files.
-
-As a rule of thumb you should expect rsync to consume about 100 bytes per
-file in the file list. This happens because rsync builds a internal
-file list structure containing all the vital details of each file. 
-rsync needs to hold structure in memory because it is being constantly
-traversed.
-
-A future version of rsync could be built with an improved protocol that
+

To fix this requires a major rewrite of rsync, which my or may not +happen. + +


+

out of memory

+ +

The usual reason for "out of memory" when running rsync is that you are +transferring a _very_ large number of files. The size of the files doesn't +matter, only the total number of files. + +

As a rule of thumb you should expect rsync to consume about 100 bytes +per file in the file list. This happens because rsync builds a internal +file list structure containing all the vital details of each file. rsync +needs to hold structure in memory because it is being constantly traversed. + +

A future version of rsync could be built with an improved protocol that transfers files in a more incremental fashion, which would require a lot less memory. Unfortunately, such an rsync does not yet exist. +


+

rsync through a firewall

-

-

rsync 2.4.3 with rsh

-

-rsync 2.4.3 has a problem with some versions of rsh. The versions of rsh (such as the
-one on Solaris) that don't handle non-blocking IO will cause all sorts of errors,
-including "unexpected tag" "multiplexing overflow" etc.
+

If you have a setup where there is no way to directly connect two +machines for an rsync transfer, there are several ways to use the firewall +machine to act as an intermediary in the transfer. -The fix is to either use an earlier version of rsync or use ssh instead of rsh -or wait for rsync 2.4.4 +

Method 1

-

-

rsync and cron

-

-On some systems (notably SunOS4) cron supplies what looks like a
-socket to rsync, so rsync thinks that stdin is a socket. This means
-that if you start rsync with the --daemon switch from a cron job you
-end up rsync thiking it has been started from inetd. The fix is simple
-- just redirect stdin from /dev/null in your cron job.
+

Use ssh to access the intermediary system and have it ssh into the +actual target machine. -


-

rsync: Command not found

-

-> rsync: Command not found
+

To effect this extra ssh hop, you'll need to configure a authorization +method that does not involve any user interaction (such as prompting for a +password). The easiest way to do this is to setup an ssh key (see the +ssh-key manpage). You can encrypt this key (which requires a passphrase to +unlock it) as long as you have ssh-agent forwarding enabled -- this allows +the ssh connection between the intermediary system and the target machine +to authorize without a passphrase prompt because the authorization +information is coming from your local machine via the ssh protocol (which +has the benefit of not making intra-system logins password-less in +general). Another solution is to configure host-based authentication, +which makes all logins between authorized machines automatically authorized +(which may or may not be something that you are comfortable with). -This error is produced when the remote shell is unable to locate the rsync -binary in your path. There are 3 possible solutions: +

You should then test that the forwarded ssh connection works without a +prompt by running a command like this: -1) install rsync in a "standard" location that is in your remote path. +

ssh inter ssh target uptime
-2) modify your .cshrc, .bashrc etc on the remote machine to include the path -that rsync is in +

If you get a password/passphrase prompt to get into the intermediary +system that's fine, but the extra hop need to occur without any extra user +interaction. -3) use the --rsync-path option to explicitly specify the path on the -remote machine where rsync is installed +

Once that's done, you can do an rsync copy like this (one pull, one +push): + +

rsync -av --rsync-path="ssh target rsync" inter:/source/ /dest/
+rsync -av --rsync-path="ssh target rsync" /source/ inter:/dest/
-You may echo find the command: +

These commands looks like they are copying to/from the "inter" host, but the +remote-rsync command that we it to run performs the extra hop to the real +target system and runs the rsync command there. - rsh samba 'echo $PATH' +

Method 2

-for determining what your remote path is. +

Install and configure an rsync daemon on the target and use an ssh +tunnel to reach the rsync sever. +

Installing the rsync daemon is beyond the scope of this document, but +see the rsyncd.conf manpage for more information. Keep in mind that you +don't need to be root to run an rsync daemon as long as you don't use a +protected port. -


-

spaces in filenames

-

-Jim wrote:
-> This seems to imply rsync can't copy files with names containing
-> spaces.  A couple quick greps through the man page suggests that
-> this limitation isn't mentioned.
+

Once your rsync daemon is up and running, you build an ssh tunnel +through your intermediary system like this: -Short answer: rsync can handle filenames with spaces +

ssh -fN -l userid_on_inter -L 8873:target:8873 inter
-Long answer: +

What this does is cause a connection to port 8873 on the local system to +turn into a connection from the intermediary system to the target machine +on port 8873. (Port 8873 was chosen instead of the normal 873 port number +because it does not require root privileges--use whatever port number you +like.) The -N option tells ssh not to run a command on the remote system, +which works with modern ssh versions (you can run a sleep command if -N +doesn't work). The -f option tells ssh to put the command in the +background after any password/passphrase prompts. -rsync handles spaces just like any other unix command line application. -Within the code spaces are treated just like any other character so -a filename with a space is no different from a filename with any -other character in it. +

Now when an rsync command is executed with a daemon-mode command-line +syntax to the local machine, the conversation is directed to the target +system. For example: -The problem of spaces is in the argv processing done to interpret the -command line. As with any other unix application you have to escape -spaces in some way on the command line or they will be used to -separate arguments. +

rsync -av --port 8873 localhost::module/source dest/
+rsync -av rsync://localhost:8873/module/source dest/
-It is slightly trickier in rsync because rsync sends a command line -to the remote system to launch the peer copy of rsync. The command -line is interpreted by the remote shell and thus the spaces need -to arrive on the remote system escaped so that the shell doesn't -split such filenames into multiple arguments. +
+

rsync and cron

-For example: +

On some systems (notably SunOS4) cron supplies what looks like a socket +to rsync, so rsync thinks that stdin is a socket. This means that if you +start rsync with the --daemon switch from a cron job you end up rsync +thinking it has been started from inetd. The fix is simple—just +redirect stdin from /dev/null in your cron job. - rsync -av fjall:'a long filename' /tmp/ +


+

rsync: Command not found

-won't work because the remote shell gets an unquoted filename. Instead -you have to use: +

This error is produced when the remote shell is unable to locate the rsync +binary in your path. There are 3 possible solutions: - rsync -av fjall:'"a long filename"' /tmp/ +

    -or a similar construct (there are lots of varients that work). +
  1. install rsync in a "standard" location that is in your remote path. -As long as you know that the remote filenames on the command line -are interpreted by the remote shell then it all works fine. +
  2. modify your .cshrc, .bashrc etc on the remote machine to include the path +that rsync is in + +
  3. use the --rsync-path option to explicitly specify the path on the +remote machine where rsync is installed -I should probably provide the above examples in the docs :-) +
-Cheers, Andrew +

You may echo find the command: +

ssh host 'echo $PATH'
-

-

stderr & stdout

- -> Why does rsync produce some things on stdout and some on stderr? +

for determining what your remote path is. -

All messages which originate from the remote computer are sent to stderr. -All informational messages from the local computer are sent to stdout. -All error messages from the local computer are sent to stderr. +


+

spaces in filenames

-

-There is a reason to this system, and it would be quite difficult to change. -The reason is that rsync uses a remote shell for execution. The remote -shell provides stderr/stdout. The stdout stream is used for the rsync -protocol. Mixing error messages into this stdout stream would involve -lots of extra overhead and complexity in the protocol because each message -would need to be escaped, which means non-messages would need to be encoded -in some way. Instead rsync always sends remote messages to stderr. This means -they appear on stderr at the local computer. rsync can't intercept them. +

Can rsync copy files with spaces in them? -

-If you have a problem with scripts or cron jobs that produce stderr then I -suggest you use your shell to redirect stderr and stdout. For example you -could do a cron line like this: +

Short answer: Yes, rsync can handle filenames with spaces. -

-

-0 0 * * * /usr/local/bin/rsync -avz /foobar /foo > logfile 2>&1 
-
+

Long answer: + +

Rsync handles spaces just like any other unix command line application. +Within the code spaces are treated just like any other character so a +filename with a space is no different from a filename with any other +character in it. + +

The problem of spaces is in the argv processing done to interpret the +command line. As with any other unix application you have to escape spaces +in some way on the command line or they will be used to separate arguments. + +

It is slightly trickier in rsync (and other remote-copy programs like +scp) because rsync sends a command line to the remote system to launch the +peer copy of rsync (this assumes that we're not talking about daemon mode, +which is not affected by this problem because no remote shell is involved +in the reception of the filenames). The command line is interpreted by the +remote shell and thus the spaces need to arrive on the remote system +escaped so that the shell doesn't split such filenames into multiple +arguments. + +

For example: + +

rsync -av host:'a long filename' /tmp/
+ +

This is usually a request for rsync to copy 3 files from the remote +system, "a", "long", and "filename" (the only exception to this is for a +system running a shell that does not word-split arguments in its commands, +and that is exceedingly rare). If you wanted to request a single file with +spaces, you need to get some kind of space-quoting characters to the remote +shell that is running the remote rsync command. The following commands +should all work: + +

rsync -av host:'"a long filename"' /tmp/
+rsync -av host:'a\ long\ filename' /tmp/
+rsync -av host:a\\\ long\\\ filename /tmp/
+ +

You might also like to use a '?' in place of a space as long as there +are no other matching filenames than the one with spaces (since '?' matches +any character): + +

rsync -av host:a?long?filename /tmp/
+ +

As long as you know that the remote filenames on the command line +are interpreted by the remote shell then it all works fine. + +


+

HP compile

+ +

For HPUX apparently you need to add the option -Ae to the CFLAGS. Edit +the Makefile and change CFLAGS to: -

-this would send both stderr and stdout to "logfile". The magic bit is the -"2>&1" which says to redirect stderr to to the same descriptor to which -stdout is currently directed. +

CFLAGS=-Ae -O
+
-

-- 2.34.1