Saturday, April 18, 2009

Mirror Your Web Site With rsync

Mirror Your Web Site With rsync

Version 1.0
Author: Falko Timme
Last edited 04/20/2006


This tutorial shows how you can mirror your web site from your main web server to a backup server that can take over if the main server fails. We use the tool rsync for this, and we make it run through a cron job that checks every x minutes if there is something to update on the mirror. Thus your backup server should usually be up to date if it has to take over.

rsync updates only files that have changed, so you do not need to transfer 5 GB of data whenever you run rsync. It only mirrors new/changed files, and it can also delete files from the mirror that have been deleted on the main server. In addition to that it can preserve permissions and ownerships of mirrored files and directories; to preserve the ownerships, we need to run rsync as root which is what we do here. If permissions and/or ownerships change on the main server, rsync will also change them on the backup server.

In this tutorial we will tunnel rsync through SSH which is more secure; it also means you do not have to open another port in your firewall for rsync - it is enough if port 22 (SSH) is open. The problem is that SSH requires a password for logging in which is not good if you want to run rsync as a cron job. The need for a password requires human interaction which is not what we want.

But fortunately there is a solution: the use of public keys. We create a pair of keys (on our backup server mirror.example.com), one of which is saved in a file on the remote system (server1.example.com). Afterwards we will not be prompted for a password anymore when we run rsync. This also includes cron jobs which is exactly what we want.

As you might have guessed already from what I have written so far, the concept is that we initiate the mirroring of server1.example.com directly from mirror.example.com; server1.example.com does not have to do anything to get mirrored.

I will use the following setup here:

Main server: server1.example.com (server1) - IP address: 192.168.0.100
Mirror/backup server: mirror.example.com (mirror) - IP address: 192.168.0.175
The web site that is to be mirrored is in /var/www on server1.example.com.
rsync is for mirroring files and directories only; if you want to mirror your MySQL database, please take a look at these tutorials:

How To Set Up Database Replication In MySQL
How To Set Up A Load-Balanced MySQL Cluster
I want to say first that this is not the only way of setting up such a system. There are many ways of achieving this goal but this is the way I take. I do not issue any guarantee that this will work for you!


1 Install rsync
First we have to install rsync on both server1.example.com and mirror.example.com. For Debian systems, this looks like this:

server1/mirror:

(We do this as root!)

apt-get install rsync

On other Linux distributions you would use yum (Fedora/CentOS) or yast (SuSE) to install rsync.


2 Create An Unprivileged User On server1.example.com
Now we create an unprivileged user called someuser on server1.example.com that will be used by rsync on mirror.example.com to mirror the directory /var/www (of course, someuser must have read permissions on /var/www on server1.example.com).

server1:

(We do this as root!)

useradd -d /home/someuser -m -s /bin/bash someuser

This will create the user someuser with the home directory /home/someuser and the login shell /bin/bash (it is important that someuser has a valid login shell - something like /bin/false does not work!). Now give someuser a password:

passwd someuser


3 Test rsync
Next we test rsync on mirror.example.com. As root we do this:

mirror:

rsync -avz -e ssh someuser@server1.example.com:/var/www/ /var/www/

You should see something like this. Answer with yes:

The authenticity of host 'server1.example.com (192.168.0.100)' can't be established.RSA key fingerprint is 32:e5:79:8e:5f:5a:25:a9:f1:0d:ef:be:5b:a6:a6:23.Are you sure you want to continue connecting (yes/no)?

<-- yes

Then enter someuser's password, and you should see that server1.example.com's /var/www directory is mirrored to /var/www on mirror.example.com.

You can check that like this on both servers:

server1/mirror:

ls -la /var/www

You should see that all files and directories have been mirrored to mirror.example.com, and the files and directories should have the same permissions/ownerships as on server1.example.com.


4 Create The Keys On mirror.example.com
Now we create the private/public key pair on mirror.example.com:

mirror:

(We do this as root!)

mkdir /root/rsync
ssh-keygen -t dsa -b 1024 -f /root/rsync/mirror-rsync-key

You will see something like this:

Generating public/private dsa key pair.Enter passphrase (empty for no passphrase): [press enter here]Enter same passphrase again: [press enter here]Your identification has been saved in /root/cron/mirror-rsync-key.Your public key has been saved in /root/cron/mirror-rsync-key.pub.The key fingerprint is:68:95:35:44:91:f1:45:a4:af:3f:69:2a:ea:c5:4e:d7 root@mirror

It is important that you do not enter a passphrase otherwise the mirroring will not work without human interaction so simply hit enter!

Next, we copy our public key to server1.example.com:

mirror:

(Still, we do this as root.)

scp /root/rsync/mirror-rsync-key.pub someuser@server1.example.com:/home/someuser/

The public key mirror-rsync-key.pub should now be available in /home/someuser on server1.example.com.



5 Configure server1.example.com
Now log in through SSH on server1.example.com as someuser (not root!) and do this:

server1:

(Please do this as someuser!)

mkdir ~/.ssh
chmod 700 ~/.ssh
mv ~/mirror-rsync-key.pub ~/.ssh/
cd ~/.ssh
touch authorized_keys
chmod 600 authorized_keys
cat mirror-rsync-key.pub >> authorized_keys

By doing this, we have appended the contents of mirror-rsync-key.pub to the file /home/someuser/.ssh/authorized_keys. /home/someuser/.ssh/authorized_keys should look similar to this:

server1:

(Still as someuser!)

vi /home/someuser/.ssh/authorized_keys

ssh-dss AAAAB3NzaC1kc3MAAA[...]lSUom root@mirror

Now we want to allow connections only from mirror.example.com, and the connecting user should be allowed to use only rsync, so we add

command="/home/someuser/rsync/checkrsync",from="mirror.example.com",no-port-forwarding,no-X11-forwarding,no-pty

right at the beginning of /home/someuser/.ssh/authorized_keys:

server1:

(Still as someuser!)

vi /home/someuser/.ssh/authorized_keys

command="/home/someuser/rsync/checkrsync",from="mirror.example.com",no-port-forwarding,no-X11-forwarding,no-pty ssh-dss AAAAB3NzaC1kc3MAAA[...]lSUom root@mirror

It is important that you use a FQDN like mirror.example.com instead of an IP address after from=, otherwise the automated mirroring will not work!

Now we create the script /home/someuser/rsync/checkrsync that rejects all commands except rsync.

server1:

(We still do this as someuser!)

mkdir ~/rsync
vi ~/rsync/checkrsync

#!/bin/shcase "$SSH_ORIGINAL_COMMAND" in *\&*) echo "Rejected" ;; *\(*) echo "Rejected" ;; *\{*) echo "Rejected" ;; *\;*) echo "Rejected" ;; *\<*) echo "Rejected" ;; *\`*) echo "Rejected" ;; rsync\ --server*) $SSH_ORIGINAL_COMMAND ;; *) echo "Rejected" ;;esac

chmod 700 ~/rsync/checkrsync




6 Test rsync On mirror.example.com
Now we must test on mirror.example.com if we can mirror server1.example.com without being prompted for someuser's password. We do this:

mirror:

(We do this as root!)

rsync -avz --delete --exclude=**/stats --exclude=**/error --exclude=**/files/pictures -e "ssh -i /root/rsync/mirror-rsync-key" someuser@server1.example.com:/var/www/ /var/www/

(The --delete option means that files that have been deleted on server1.example.com should also be deleted on mirror.example.com. The --exclude option means that these files/directories should not be mirrored; e.g. --exclude=**/error means "do not mirror /var/www/error". You can use multiple --exclude options. I have listed these options as examples; you can adjust the command to your needs. Have a look at

man rsync

for more information.)

You should now see that the mirroring takes place:

receiving file list ... donesent 71 bytes received 643 bytes 476.00 bytes/sectotal size is 64657 speedup is 90.56

without being prompted for a password! This is what we wanted.




7 Create A Cron Job
We want to automate the mirroring, that is why we create a cron job for it on mirror.example.com. Run crontab -e as root:

mirror:

(We do this as root!)

crontab -e

and create a cron job like this:

*/5 * * * * /usr/bin/rsync -azq --delete --exclude=**/stats --exclude=**/error --exclude=**/files/pictures -e "ssh -i /root/rsync/mirror-rsync-key" someuser@server1.example.com:/var/www/ /var/www/

This would run rsync every 5 minutes; adjust it to your needs (see

man 5 crontab

). I use the full path to rsync here (/usr/bin/rsync) just to go sure that cron knows where to find rsync. Your rsync location might differ. Run

mirror:

(We do this as root!)

which rsync

to find out where yours is.




8 Links
rsync: http://samba.anu.edu.au/rsync





Generating key
Submitted by Anonymous (not registered) on Tue, 2008-10-21 12:28.
About:

ssh-keygen -t dsa -b 2048 -f /root/rsync/mirror-rsync-key




DSA keys must be 1024 bits



ssh-keygen -t dsa -b 1024 -f /root/rsync/mirror-rsync-key


reply | view as pdf
parameter numeric-ids
Submitted by Anonymous (not registered) on Wed, 2006-05-24 12:58.
You should also add the parameter --numeric-ids to keep the value of the uid and gid of the file when its transfer.
reply | view as pdf
offsite target for rsync backups - rsync.net
Submitted by Anonymous (not registered) on Wed, 2006-06-07 22:55.
I am mentioning rsync.net because I am a customer that wants to see their business thrive. Take a look at their philosophy and their privacy/warrant policy and you'll see why ...

I use rsync (and Unison, and sftp) to automatically backup my most important files to a 4 GB offsite filesystem at rsync.net, which they in turn replicate to their secondary loccation in Colorado.

It's a great solution. You should check them out.

reply | view as pdf
Not relevant for dynamic websites
Submitted by Anonymous (not registered) on Fri, 2006-06-02 00:10.
As most websites these days are dynamic, you need to use different tools for mirroring.

I use wget to to create a mirror of my dyanmic pages.

Great Article
Submitted by Michael Potter (not registered) on Mon, 2009-03-02 16:26.
Thanks for the great how-to!
reply | view as pdf
rsync error: error in rsync protocol data stream (code 12)
Submitted by jed (not registered) on Tue, 2009-01-13 16:19.
server_A: Red hat

server_B: Window 2003 (cygwin)

i was on server_B running this command: ssh -i /root/rsync/mirror-rsync-key someuser@server_A:/var/www/ /var/www/

all files on server_A to be transferred on server_B /var/www/

I got this error: Connection closed by server_A rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at /home/lapo/packaging/rsync-3.0.4-1/src/rsync-3.0.4/io.c(632) [receiver=3.0.4]

I've already setup ssh on cygwin http://ist.uwaterloo.ca/~kscully/CygwinSSHD_W2K3.html.

Thanks in advance.

reply | view as pdf
Re: rsync error: error in rsync protocol data stream (code 12)
Submitted by Anonymous (not registered) on Wed, 2009-01-14 06:38.
I after few hours of tracing the file mirroring is already working. I found out that the error occurs when rsync encounters a unpermitted folders, files etc. that can't be transferred.
reply | view as pdf
Automated failover
Submitted by Diego (not registered) on Thu, 2008-09-11 15:34.
Newbie question: How about if the main server goes down, how could I make them switch over automatically to the mirror server?

reply | view as pdf
Re: Automated failover
Submitted by Anonymous (not registered) on Mon, 2008-09-15 06:52.
You can do that easily by creating a high-availability load balancer that uses haproxy/hearbeat. I've set up my web server cluster using these instructions:

http://www.howtoforge.com/high-availability-load-balancer-haproxy-heartbeat-debian-etch

It's pretty straightforward and works like a charm.


reply | view as pdf
missing coma
Submitted by vanthevirus (registered user) on Wed, 2008-04-16 19:14.
i have been spending several hours to make this thing automatic but it keeps asking for password.

but finally i found the reason of my failure. I forgot a COMA!

"... no-port-forwarding,no-X11-forwarding,no-pty, ssh-dss AAAAB3NzaC1kc3MAAAEBALGZJ34a5QwC2 .... "

please update your tutorial for the sake of other newbies out there...

anyway, this howto is very helpful. thanks

reply | view as pdf
Re: missing coma
Submitted by Anonymous (not registered) on Tue, 2009-03-24 17:26.
I have added the comma and I still have the issue with it requesting a password. What am I doing wrong?

reply | view as pdf
Comma or new-line
Submitted by trigar (registered user) on Mon, 2008-08-04 17:42.
What finally worked for me was a new-line instead of a comma. Also, if you use nano - be careful to switch off long-line wrapping (M-L).


reply | view as pdf
archive mode
Submitted by amcorona (registered user) on Tue, 2007-08-28 15:24.
I am not sure how important this is but you are using archive mode which preserves file ownership.

You should have the same accounts on both servers if you have some directories on the source server that are owned by different accounts. I have a file repository that only one specific user has write access to (not apache). I have to create that owner on the target machine before running this script. One thing I am not sure about is if the UID has to be identical





reply | view as pdf
Re: archive mode
Submitted by Anonymous (not registered) on Thu, 2008-12-04 15:21.
the --numeric-ids parameter to rsync is for this purpose. No need to create accounts.
reply | view as pdf
Looks great and simple, but...
Submitted by Anonymous (not registered) on Mon, 2006-04-24 00:09.
According to the suggested setup if something goes wrong to the contents of the server1.example.com:/var/www/ the mess would be propagated to the mirror server too. On the other hand, as you say you are offering just a plain rsync over ssh mirroring solution not a bullet proof backup solution..


reply | view as pdf
Troubleshooting rsync issues
Submitted by Anonymous (not registered) on Tue, 2006-06-13 02:32.
If things work up until the last rsync using the public/private key pair, and you're having problems, use the ssh -v switch:


rsync -avz --delete --exclude=**/stats --exclude=**/error --exclude=**/files/pictures -e "ssh -v -i /root/rsync/mirror-rsync-key" someuser@server1.example.com:/var/www/ /var/www/

reply | view as pdf
I understand Rsync has a wi
Submitted by Anonymous (not registered) on Mon, 2006-05-22 22:31.
I understand Rsync has a windows plugin. Does anyone know how to rsync a /Inetpub/ directory to a linux server /backup/ folder?
reply | view as pdf
Backup solution
Submitted by Anonymous (not registered) on Thu, 2006-04-27 09:07.
Well, if you run rsync like that then doing incremental backups isn't all that difficult again. This here was the base for my altered script:
http://www.mikerubel.org/computers/rsync_snapshots/
It uses hardlinks. Well, I run the thing as root because I want to keep permissions. Here's my backup.sh
#!/bin/bash
# ----------------------------------------------------------------------
# mikes handy rotating-filesystem-snapshot utility
# ----------------------------------------------------------------------
# this needs to be a lot more general, but the basic idea is it makes
# rotating backup-snapshots of /home whenever called
# ----------------------------------------------------------------------

unset PATH

# suggestion from H. Milz: avoid accidental use of $PATH


# Make MySQL Backups
#!/bin/bash
# Remove old files
rm -f /mysql_backup/*

#Dump new files
USER=root
PASSWORD=************
HOST=localhost

for i in $(echo 'SHOW DATABASES;' | mysql -u$USER -p$PASSWORD -h$HOST|grep -v '^Database$'); do
mysqldump \
-u$USER -p$PASSWORD -h$HOST \
-Q -c -C --add-drop-table --add-locks --quick --lock-tables \
$i > /mysql_backup/$i.sql;
done;


# ------------- system commands used by this script --------------------
ID=/usr/bin/id;
ECHO=/bin/echo;

RM=/bin/rm;
MV=/bin/mv;
CP=/bin/cp;
TOUCH=/bin/touch;

RSYNC=/usr/bin/rsync;
SSH=/usr/bin/ssh
KEY=/root/.ssh/id_rsa

# ------------- file locations -----------------------------------------

SNAPSHOT_RW=/backup/backup;
EXCLUDES=/backup/backup_exclude;

# ------------- the script itself --------------------------------------

# rotating snapshots of /home (fixme: this should be more general)

# step 1: delete the oldest snapshot, if it exists:
if [ -d $SNAPSHOT_RW/hourly.3 ] ; then\
$RM -Rf $SNAPSHOT_RW//hourly.3 ;\
fi;

# step 2: shift the middle snapshots(s) back by one, if they exist
if [ -d $SNAPSHOT_RW/hourly.2 ] ;then\
$MV $SNAPSHOT_RW/hourly.2 $SNAPSHOT_RW/hourly.3 ;\
fi;

if [ -d $SNAPSHOT_RW/hourly.1 ] ; then\
$MV $SNAPSHOT_RW/hourly.1 $SNAPSHOT_RW/hourly.2 ;\
fi;

# step 3: make a hard-link-only (except for dirs) copy of the latest snapshot,
# if that exists
if [ -d $SNAPSHOT_RW/hourly.0 ] ; then\
$CP -al $SNAPSHOT_RW/hourly.0 $SNAPSHOT_RW/hourly.1 ;\
fi;

# step 4: rsync from the system into the latest snapshot (notice that
# rsync behaves like cp --remove-destination by default, so the destination
# is unlinked first. If it were not so, this would copy over the other
# snapshot(s) too!

$RSYNC\
-avz --delete --delete-excluded \
--exclude-from="$EXCLUDES"\
-e "$SSH -i $KEY" \
root@www.server.com:/ $SNAPSHOT_RW/hourly.0 ;

# step 5: update the mtime of hourly.0 to reflect the snapshot time
$TOUCH $SNAPSHOT_RW/hourly.0 ;

I run this script 4 times daily through cron. Then I have another script which makes daily snapshots for 7 days (backup_daily.sh):

#!/bin/bash
# ----------------------------------------------------------------------
# mikes handy rotating-filesystem-snapshot utility
# ----------------------------------------------------------------------
# this needs to be a lot more general, but the basic idea is it makes
# rotating backup-snapshots of /home whenever called
# ----------------------------------------------------------------------

unset PATH

# suggestion from H. Milz: avoid accidental use of $PATH

# ------------- system commands used by this script --------------------
ID=/usr/bin/id;
ECHO=/bin/echo;

RM=/bin/rm;
MV=/bin/mv;
CP=/bin/cp;
TOUCH=/bin/touch;

RSYNC=/usr/bin/rsync;
SSH=/usr/bin/ssh
KEY=/root/.ssh/id_rsa

# ------------- file locations -----------------------------------------

SNAPSHOT_RW=/backup/backup;
EXCLUDES=/backup/backup_exclude;

# ------------- the script itself --------------------------------------

# rotating snapshots of /home (fixme: this should be more general)

# step 1: delete the oldest snapshot, if it exists:
if [ -d $SNAPSHOT_RW/daily.6 ] ; then\
$RM -Rf $SNAPSHOT_RW/daily.6 ;\
fi;

# step 2: shift the middle snapshots(s) back by one, if they exist
if [ -d $SNAPSHOT_RW/daily.5 ] ; then\
$MV $SNAPSHOT_RW/daily.5 $SNAPSHOT_RW/daily.6 ;\
fi;

if [ -d $SNAPSHOT_RW/daily.4 ] ; then\
$MV $SNAPSHOT_RW/daily.4 $SNAPSHOT_RW/daily.5 ;\
fi;

if [ -d $SNAPSHOT_RW/daily.3 ] ; then\
$MV $SNAPSHOT_RW/daily.3 $SNAPSHOT_RW/daily.4 ;\
fi;

if [ -d $SNAPSHOT_RW/daily.2 ] ; then\
$MV $SNAPSHOT_RW/daily.2 $SNAPSHOT_RW/daily.3 ;\
fi;

if [ -d $SNAPSHOT_RW/daily.1 ] ; then\
$MV $SNAPSHOT_RW/daily.1 $SNAPSHOT_RW/daily.2 ;\
fi;

if [ -d $SNAPSHOT_RW/daily.0 ] ; then\
$MV $SNAPSHOT_RW/daily.0 $SNAPSHOT_RW/daily.1 ;\
fi;


# step 3: make a hard-link-only (except for dirs) copy of the latest snapshot,
# if that exists
if [ -d $SNAPSHOT_RW/hourly.3 ] ; then\
$CP -al $SNAPSHOT_RW/hourly.3 $SNAPSHOT_RW/daily.0 ;\
fi;

# step 4: update the mtime of daily.0 to reflect the snapshot time
$TOUCH $SNAPSHOT_RW/daily.0 ;

And finally I have a weekly script that makes weekly snapshots during a 4-week period (backup_weekly.sh):

#!/bin/bash
# ----------------------------------------------------------------------
# mikes handy rotating-filesystem-snapshot utility
# ----------------------------------------------------------------------
# this needs to be a lot more general, but the basic idea is it makes
# rotating backup-snapshots of /home whenever called
# ----------------------------------------------------------------------

unset PATH

# suggestion from H. Milz: avoid accidental use of $PATH

# ------------- system commands used by this script --------------------
ID=/usr/bin/id;
ECHO=/bin/echo;

RM=/bin/rm;
MV=/bin/mv;
CP=/bin/cp;
TOUCH=/bin/touch;

RSYNC=/usr/bin/rsync;
SSH=/usr/bin/ssh
KEY=/root/.ssh/id_rsa

# ------------- file locations -----------------------------------------

SNAPSHOT_RW=/backup/backup;
EXCLUDES=/backup/backup_exclude;

# ------------- the script itself --------------------------------------

# rotating snapshots of /home (fixme: this should be more general)

# step 1: delete the oldest snapshot, if it exists:
if [ -d $SNAPSHOT_RW/weekly.3 ] ; then\
$RM -Rf $SNAPSHOT_RW/weekly.3 ;\
fi;

# step 2: shift the middle snapshots(s) back by one, if they exist
if [ -d $SNAPSHOT_RW/weekly.2 ] ; then\
$MV $SNAPSHOT_RW/weekly.2 $SNAPSHOT_RW/weekly.3 ;\
fi;

if [ -d $SNAPSHOT_RW/weekly.1 ] ; then\
$MV $SNAPSHOT_RW/weekly.1 $SNAPSHOT_RW/weekly.2 ;\
fi;

if [ -d $SNAPSHOT_RW/weekly.0 ] ; then\
$MV $SNAPSHOT_RW/weekly.0 $SNAPSHOT_RW/weekly.1 ;\
fi;


# step 3: make a hard-link-only (except for dirs) copy of the latest snapshot,
# if that exists
if [ -d $SNAPSHOT_RW/daily.6 ] ; then\
$CP -al $SNAPSHOT_RW/daily.6 $SNAPSHOT_RW/weekly.0 ;\
fi;

# step 4: update the mtime of weekly.0 to reflect the snapshot time
$TOUCH $SNAPSHOT_RW/weekly.0 ;


And of course you also need to run crons ^^ # Make Backups
0 0 * * Sun sh /backup/backup_weekly.sh
15 0 * * * sh /backup/backup_daily.sh
45 0,6,12,18 * * * sh /backup/backup.sh



With these scripts there is just a small issue. You need to create first a manual daily.0 and weekly.0 folder :)
reply | view as pdf
Some thoughts
Submitted by Anonymous (not registered) on Fri, 2006-04-21 20:03.
Three things:

1.) rsync does only transfer changed files but not even the whole file... but more than just the changes. rSync has some algorithm that splits up the file in multiple sections and then creates a checksum for it and compares it and only the changed parts will then be transmitted.
For example I had a mysql dump of 270 MB. I deleted it and made a new dump with a few changes. Now rsync noted that the file was changed but it didn't transmit the whole 270 MB again but only 25 MB.

2.) Instead of using --exclude I would rather use --exclude-from="/path/to/file" because I think it's much simpler to add there exclusions. Just add one pattern per line. I have for example this here:

/backup/
/bin/
/dev/
/initrd/
/lib/
/lost+found/
/mnt/
/opt/
/proc/
/sbin/
/sys/
/tmp/
/usr/
/var/log/
/var/cache/
/var/spool/
/var/lib/mysql/

I know, I still need to fine-tune that a bit.
3.) I would also add --delete-excluded for the simple reason that when you exclude something from being backuped then you don't want have older versions on the backup server any longer. This switch takes care of that.
reply | view as pdf
rdist
Submitted by Anonymous (not registered) on Mon, 2006-05-22 18:38.
Nice post and there is another way to do this also. I just implemented the same functionality for our main website by using rdist. The good thing about this is that it checks timestamp automatically and if the files have been modified it will replicate it accordingly. Put it in cron and you don't have to worry about anything else and let it email you stating which files have changed.
reply | view as pdf
rsync can replace files
Submitted by lazyman (registered user) on Mon, 2008-04-28 02:46.
If you use the flag -W rsync will copy the entire file not just the blocks it thinks are modified. http://www.oreillynet.com/linux/cmd/cmd.csp?path=r/rsync

No comments: