Backups done right
Since this is a Blog, I've decided that I should start using it as such and therefore, this is my first "real" blog post. So today, I want to talk about Backups on Linux.
When you are doing backups on Linux, rsync is often the tool of choice, and therefore, I'm using it here, too.
Some Background
So well, I've bought a new router recenty. It's an ASUS RT-AC66U. A pretty nice device which offers 802.11ac wifi and has two USB ports for external storage.
What I only found out after purchase: this router doesn't only have a locked down little system, but you can easily get ipkg - a package manager for embedded Linux installations - on there. To do so, you only need to attach an external drive and then install "DownloadMaster" from the web interface. You can uninstall it again if you don't need it, but installing it once will bring you the whole ipkg system - and that will stay, even if you uninstall DownloadMaster.
This makes it possible to easily install rsync, cron and ssh. These are the tools I'm using for my backups.
However, the script below can not only be used for remote backups over SSH. It is completely generic and also very useful for local backups or just any backups that rsync can do.
The Backup
I've set up a cronjob for all machines on my network to back them up over ssh. To do so, I've written a neat little generic backup script:
from datetime import timedelta import datetime import subprocess import os import shutil import sys from syslog import syslog if len(sys.argv) < 2: print 'usage: ' + sys.argv[0] + ' configfile' sys.exit(1) syslog("Started incremental backup with configuration " + sys.argv[1]) execfile(sys.argv[1]) today = datetime.date.today() fntoday = BACKUP_DIR + '/' + PREFIX + '-' + today.isoformat() cmd = 'rsync --rsh="' + SSH + '" -avhuEiH --delete ' for e in EXCLUDES: cmd += '"--exclude=' + e + '" ' # we add the link-dest parameter for all existing previous backups so we always do incremental backups, # even if "yesterday" no backup could be created (or only partially) for i in os.listdir(BACKUP_DIR): if os.path.isdir(i) and str(i).startswith(PREFIX + '-') and str(i) != PREFIX + '-' + today.isoformat(): cmd += '--link-dest=' + BACKUP_DIR + '/' + i + ' ' cmd += SOURCE + ' ' + fntoday cmd += ' > ' + PREFIX + '.log 2>&1' if PREBACKUP_CMD != '': syslog("Executing: " + PREBACKUP_CMD) subprocess.call(PREBACKUP_CMD, shell=True) syslog("Executing: " + cmd) ret = subprocess.call(cmd, shell=True) syslog("rsync return code: " + str(ret)) if ret == 24: ret = 0 # ignore vanished files if ret != 0: if FAILBACKUP_CMD != '': syslog("Executing: " + FAILBACKUP_CMD) subprocess.call(FAILBACKUP_CMD, shell=True) syslog("Backup failed - exiting without deleting anything") sys.exit(0) if AFTERBACKUP_CMD != '': syslog("Executing: " + AFTERBACKUP_CMD) subprocess.call(AFTERBACKUP_CMD, shell=True) syslog("Backup done for " + sys.argv[1]) toKeep = [] for i in range(OLD_AGE): toKeep.append(PREFIX + '-' + (today - timedelta(days=i)).isoformat()) # delete all directories starting with PREFIX- which have are not in toKeep for f in os.listdir(BACKUP_DIR): if not str(f).startswith(PREFIX + '-'): continue if str(f) in toKeep: syslog("Keeping backup: " + str(f)) continue syslog("Deleting: " + str(f)) shutil.rmtree(BACKUP_DIR + '/' + str(f)) syslog("Backup of " + sys.argv[1] + " finished")
So what is this script doing?
- It will read a config file (an example can be found below) that specifies how the backup should be done
- It creates a directory for todays backup
- It constructs an rsync command according to the parameters given in the configuration
- It executes the command
- It deletes old backups
Why is this script cool?
The best part about this script is, that it merges the best parts of incremental backups and snapshot-style backups. It will not backup all your data every day (that might be a bit too much for a network/wifi connection), but only copies files that have changed.
At the same time, all files that have not changed will still be available in the daily snapshots. Rsync will hard-link them from older backups. So what you will get are snapshots for each day without using too much space or network bandwidth.
Example configuration
So this is what a configuration file looks like:
# All backups older than that will be deleted when a new backup was successfully done (in days) OLD_AGE = 7 # just the directory name for backups, will result in e.g. univac-2013-08-12 PREFIX = 'univac' # the directory where the backups should be stored BACKUP_DIR = '/mnt/sda1/backups/' # the source of the backup. might as well be a local directory SOURCE = 'root@192.168.1.2:/' # some special ssh-parameters (will be passed as the rsh=... parameter to rsync) required for passwordless authentication SSH = 'ssh -i /ssh_data/id_rsa -o UserKnownHostsFile=/ssh_data/known_hosts' # exclude these from the backup EXCLUDES = ['/run/*', '/var/run/*', '/media/*', '/dev/*', '/proc/*', '/sys/*', '/tmp/*', '/home/ben/.gvfs', '/cdrom/*', '/mnt/*'] # some commands to execute on special events. I'm sending a notification to the machine that's being backed up. PREBACKUP_CMD = SSH + ' root@192.168.1.2 -C \'DISPLAY=:0.0 sudo -u ben notify-send "Backing up system to router..."\'' AFTERBACKUP_CMD = SSH + ' root@192.168.1.2 -C \'DISPLAY=:0.0 sudo -u ben notify-send "Root-backups done"\'' FAILBACKUP_CMD = ''
About the Blog
I will be trying to post a bit more interesting stuff on this blog - at least if I find the time to do so.
I'm currently working on several pet projects. Some with more, some with less motivation. However, I will try to post about them from time to time (right now I'm also writing my Bachelor's thesis).
So prepare for some information about
- Solving the travelling salesman (optimization-)problem with Ant Colony Optimization.
I already have a basic version to solve a symmetric or asymmetric TSP problem, including an Implementation that runs on the GPU using OpenCL. Optimal or close-to-optimal solutions up to about ~50 cities can be found in only a few seconds. However, what I really want to do is extend this to be able to solve more complex problems such as CVRP, MDVRP, etc. - I'm writing another Automation tool. This one is similar to Apple's Automator and is therefore context-aware. Instead of recording macros, one can combine several plugins and define how data will be piped accross them. It is similar to executing shell commands and piping output of one command to another command. However, it offers a nice GUI with high-level features.
The basic framework already exist and one can create scripts. However, there are only test-plugins for now and nothing really useful. - A general purpose OwnCloud client App for Android that includes synchronization of files, calendars, contacts, apps, passwords, settings, music etc.
It should additionally allow to control owncloud isntances remotely from the Android phone (e.g. change settings, etc).
This is my latest project and has not come very far yet. I'm currently working on the backend/communication with Owncloud. This will probably still take a long time.