Sunday, January 6, 2013

rsync - synchronizing two file trees strcuture

rsync - synchronizing two file trees strcuture

rsync is an amazing and powerful tool for moving files around. I know of people that use it for file transfers, keeping dns server records up-to-date, and along with sshd to remote restart the services when rsync reports a file change (how they do that, I don't know, I'm just told they do it).

This article describes how you can use rsync to synchronize file trees. In this case, I'm using two websites to make sure one is a backup of the other. As an example, I'll be making sure that one box contains the same files as the other box in case I need to put the backup box into production, should a failure occur.
Overview

rsync can be used in six different ways, as documented in man rsync:

1. for copying local files. This is invoked when neither source nor destination path contains a : separator
2. for copying from the local machine to a remote machine using a remote shell program as the transport (such as rsh or ssh). This is invoked when the destination path contains a single : separator.
3. for copying from a remote machine to the local machine using a remote shell program. This is invoked when the source contains a : separator.
4. for copying from a remote rsync server to the local machine. This is invoked when the source path contains a :: separator or a rsync:// URL.
5. for copying from the local machine to a remote rsync server. This is invoked when the destination path contains a :: separator.
6. for listing files on a remote machine. This is done the same way as rsync transfers except that you leave off the local destination.

I'll only be looking at copying from a remote rsync server (4) to a local machine and when using a remote shell program (2).

Installing

This was an easy port to install (aren't they all, for the most part?). Remember, I have the entire ports tree, so I did this:

# cd /usr/ports/net/rsync
# make config-recursive
# make install clean distclean
===> The following configuration options are available for rsync-3.0.9:
     POPT_PORT=off "Use popt from devel/popt instead of bundled one"
     SSH=on "Use SSH instead of RSH"
     FLAGS=off "File system flags support patch, adds --fileflags"
     ATIMES=off "Preserve access times, adds --atimes"
     ACL=off "Add backward-compatibility for the --acls option"
     ICONV=on "Add iconv support"
     TIMELIMIT=on "Time limit patch"
===> Use 'make config' to modify these settings
If you don't have the ports tree installed, you have a bit more work to do.... As far as I know, you need rsync installed on both client and server, although you do not need to be running rsyncd unless you are connecting via method 4.

Setting up the server

Edit /etc/rc.conf
# vi /etc/rc.conf
### enable rsyncd, using IPv4 instead of the default IPv6.
rsyncd_enable="YES"
rsyncd_flags="-4"

You might run rsyncd manually. If your server only uses IPv4, then, make sure you add the "-4" argument to the command
# vi /usr/local/etc/rc.d/rsyncd

Change:
command_args="--daemon"

To:
command_args="-4 --daemon"

Edit /usr/local/etc/rsyncd.conf

In this example, we're going to be using a remote rsync server (4). On the production web server, I created the /usr/local/etc/rsyncd.conf file. The contents is based on man rsyncd.conf.

# vi /usr/local/etc/rsyncd.conf
address = 192.168.100.78

charset = utf-8

uid = www
gid = www

use chroot = no

max connections = 20

syslog facility = local5

log file = /var/log/rsyncd.log

#pid file=/var/run/rsyncd.pid
#lock file=/var/run/rsyncd.lock

max verbosity = 2

transfer logging = yes

[web]
hosts deny = 0.0.0.0/0.0.0.0
hosts allow = 192.168.1.2, 192.168.1.3

auth users = rsync_bot1, rsync_bot2
secrets file = /usr/local/etc/rsyncd.secrets

path = /www/rsync_tmp
comment = whoe www (approx 10gb)

read only = no

[home_ftp]
uid = root
gid = ftp
hosts deny = 0.0.0.0/0.0.0.0
hosts allow = 192.168.1.2

path = /home/ftp
comment = ftp files

auth users = home_ftp_user0
secrets file = /usr/local/etc/rsyncd.secrets

read only = no

Note: you can choose to put setting on global level or module (section) level.

Adding any local-net entries to your /etc/hosts file so that rsync's name lookup uses that information.
# vi /etc/hosts
192.168.1.2 test2
192.168.1.3 test3

Creating Log file
# touch /var/log/rsyncd.log

automatically rotate logs
# vi /etc/newsyslog.conf
### rsync
/var/log/rsyncd.log 600 9 100000 * Z

Create the Secret File
# touch /usr/local/etc/rsyncd.secrets

# vi /usr/local/etc/rsyncd.secrets
rsync_bot1:mypass
rsync_bot2:mypass
home_ftp_user0:mypass

Make /usr/local/etc/rsyncd.conf non-world readable:
# chmod 440 /usr/local/etc/rsyncd.secrets
# chown root:wheel /usr/local/etc/rsyncd.secrets

You'll note that I'm running rsync as www:www (or rsync:rsync depends on your situation).

# cat /etc/master.passwd | grep www
www:*:80:80::0:0:World Wide Web Owner:/nonexistent:/usr/sbin/nologin

# cat /etc/group | grep www
www:*:80:

Then I started the rsync daemon and verified it was running by doing this:
# /usr/local/etc/rc.d/rsyncd start

Monitor rsync log file
# tail -F /var/log/rsyncd.log
rsyncd version 3.0.6 starting, listening on port 873

# ps auxww | grep rsync
root 737 0.0 0.1 3128 1348 ?? Is 10:50AM 0:00.00 /usr/local/bin/rsync -4 --daemon

# sockstat | grep rsync
root rsync 1763 4 tcp4 *:873 *:*

Then I verified that I could connect to the daemon by doing this:

# telnet localhost 873
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
@RSYNCD: 21

I determined the port 873 by looking at man rsyncd.conf.

Open rsync port in firewall.
# vi /usr/local/etc/ipfw.rules
### rsync
$IPF 260 allow tcp from 192.168.1.2 to any 873 in

Rotate rsync log file:
# vim /etc/newsyslog.conf
/var/log/rsyncd.log 600 7 100000 * JC /var/run/rsyncd.pid

# /etc/rc.d/newsyslog restart

Setting up the client

You may have to install rsync on the client as well.. There wasn't much to set up on the client. I merely issued the following command. The rsync server in question is 192.168.1.1.

Create the password file first
# echo "mypass" > /usr/local/etc/rsyncd.passwd_rsync_bot1

Note: put password ONLY. Do NOT put username!

Perform a dry run (pseudo):
# rsync -navu --ipv4 --stats --safe-links --password-file=/usr/local/etc/rsyncd.passwd_rsync_bot1 rsync_bot1@192.168.1.1::web /www/rsync_tmp/

Note: -n parameter makes rsync perform a trial run preview that doesn't make any changes (and produces mostly the same output as a real run).

Pulling remote files from the remote rsync daemon:
# rsync -avu --ipv4 --stats --safe-links --password-file=/usr/local/etc/rsyncd.passwd_rsync_bot1 rsync_bot1@192.168.1.1::web /www/rsync_tmp/

Note: -a parameter turns on archive mode. Bascially this causes rsync to recurse the directory copying all the files and directories and perserving things like case, permissions, and ownership on the target. (Note: Ownership may not be preserved if you are not logged in as the root user.)

Note: -v parameter turns on verbose mode.

Note: the reason why I added the --safe-links parameter is because without it, the symbolic link files will be messed up.

Pushing local files to the remote rsync daemon:
# rsync -avu --ipv4 --stats --safe-links --password-file=/usr/local/etc/rsyncd.passwd_rsync_bot1 /www/rsync_tmp/ rsync_bot1@192.168.1.1::web

Note: do not forget the trailing slash of the directory path.

Rsync between two local directories:
# rsync -avu --ipv4 --stats --safe-links --iconv=CP950,utf-8 --exclude='*.svn' --exclude='*.log' /source/path/ /destination/path/

Note: if the filenames on the source server contain traditional chinese characters, make sure you do include the --iconv option.

To mount a remote Microsoft shared samba SMB / CIFS directories folders:
# mkdir /path/to/local/mnt
# mount_smbfs -f 400 -d 500 -I 1.2.3.4 //Username@NetBIOS-Server-Name/SharedFolder /path/to/local/mnt

-I 1.2.3.4 // Do not use NetBIOS name resolver and connect directly to host, which can be either a valid DNS name or an IP address.

Avoid password prompt:

Use smbutil to generate encrypted password:
# smbutil crypt MyPassword
$$14144762c293a0314e6e1

You need to create a ~/.nsmbrc file as follows:
# vim ~/.nsmbrc

Set username and password as follows:

[NetBIOS-Server-Name:Username]
password=$$14144762c293a0314e6e1

Now mount the directory as follows:
# mount_smbfs -f 400 -d 500 -N -I 10.1.2.3 //Username@NetBIOS-Server-Name/SharedFolder /path/to/local/mnt

The -N option forces to read a password from ~/.nsmbrc file. At run time, mount_smbfs reads the ~/.nsmbrc file for additional configuration parameters and a password. If no password is found, mount_smbfs prompts for it. You need to use the -N option while writing a shell script.

Note: ~/.nsmbrc Keeps static parameters for connections and other information. See /usr/share/examples/smbfs/dot.nsmbrc for details.

Note: man mount_smbfs

Mount the shared folder on system startup

mount_smbfs does not make the mount permanent. If the FreeBSD system is rebooted, you will have to mount the share again. To make the mount occur each time you start the FreeBSD system, you can put an entry in your /etc/fstab file. An example file would look like this:

//myUser@serverName/mySharedFolder /mnt/mySharedFolder smbfs rw,-N,-I192.168.1.1 0 0

If the share is password protected, don't forget to create ~/.nsmbrc with your usename and password.

Example:

From:
/some_path/test_link/

To:
/rsyncd-munged//some_path/test_link/

Other resource
# man rsync

# man rsyncd.conf

Connecting to remote rsync server via SSH

ServerA 192.168.1.1 // the server that is running rsync server.

ServerB 192.168.1.2 // the rsync client that is used for pulling and pushing files from the rsync server.

Pulling remote files from the remote rsync server:
ServerB # rsync -e ssh -avu --ipv4 --stats my_account@192.168.1.1:/www/rsync_tmp/ /www/rsync_tmp/

Pushing local files to the remote rsync server:
ServerB # rsync -e ssh -avu --ipv4 --stats /www/rsync_tmp/ my_account@192.168.1.1:/www/rsync_tmp/

Note: do not forget the trailing slash.

ServerB # ssh-keygen -t dsa

ServerB # scp ~/.ssh/id_dsa.pub my_account@192.168.1.1:/home/some_account

ServerA # cat id_dsa.pub >> ~/some_account/.ssh/authorized_keys2

ServerB # ssh my_account@192.168.1.1

=====================================================================

FAQ

Problem: name lookup failed for 192.168.100.157: hostname nor servname provided, or not known

Solution:
> Is there a way to prevent rsyncd from doing reverse IP lookups on
> connecting clients? I didn't find any config option for this.

There is no such option in the rsync code. I'd suggest adding any
local-net entries to your /etc/hosts file so that rsync's name lookup
uses that information.

Problem: secrets file must be owned by root when running as root (see strict modes)

Solution: Set both server side and client's secret file owned by root XD

Problem: ERROR: module is read only

Solution: add following line to the /usr/local/etc/rsyncd.conf file.
read only = no
=====================================================================

> > @ERROR: access denied to home from localhost (127.0.0.1)
>
> This is the important bit. This means that you got through
> to the rsync daemon and it rejected your access. The log
> file for the daemon will have more explicit information
> (which is hidden from the client on purpose), but I'd imagine
> that you need to add localhost to the list of acceptable IPs
> that are authorized to connect.

Rsync.conf
-------------------------------------------------
use chroot = no
strict modes = yes
auth users = backup
secrets file = /etc/rsyncd.secrets
hosts allow = *, localhost, 127.0.0.1, 192.168.180.53
log file = /var/log/rsyncd.log
max verbosity = 2
transfer logging = yes

[home]
path = /cygdrive/d/home/
exclude = .ssh/ .ssh/** TEST/ TEST/**
read only = no
timeout = 600
--------cut-other-modul-configuration---
========================================================================
- > I notice that the performance is pretty slow ranges between 2 and 6
> MB/s.
>
> The command that I use from a remote machine to this, the archiving
> host, is:
>
> rsync -avuz -e ssh ./dir archsrv:/archive/DATA
>
>
> An investigation with top shows that the system is cpu bound rather than
> IO bound and that the sshd process is consuming 75% of the CPU compared
> to the rsync process which uses about 25%.
>
> Why is ssh using so much CPU? It seems wrong to me. I would expect rsync
> to be using most as it has to do compression.

ssh has to do encryption, which is pretty CPU-intensive stuff.
You can tell ssh to use an encryption method that is less CPU-intensive,
such as arcfour; your command would look like this:

rsync -avuz -e 'ssh -c arcfour' ./dir archsrv:/archive/DATA

Alternatively, if security permits, use an rsync daemon.
======================================

Reference:
http://gala4th.blogspot.com/2010/02/rsync-synchronizing-two-file-trees_19.html
不使用密碼的SSH連線 - ssh-keygen
http://www.cyberciti.biz/faq/mounting-a-nas-with-freebsd-mount_smbfs/
http://blog.up-link.ro/freebsd-how-to-mount-smb-cifs-shares-under-freebsd/
http://www.freebsddiary.org/rsync.php
http://www.freebsddiary.org/secure-file-copy.php
http://www.freebsddiary.org/ssh-authorized-keys.php
http://blog.weithenn.org/2009/05/freebsdrsync.html
http://www.sanitarium.net/golug/rsync_backups_2010.html
http://slv922.pixnet.net/blog/post/26419814
http://lists.samba.org/archive/rsync/2005-October/013649.html

No comments: