In this article we will setup automated backup using Duplicity and Amazon S3.
Installation
Run following commands to install latest duplicity and python-boto library (duplicity needs it for Amazon S3).
apt-add-repository ppa:duplicity-team/ppa
apt-get update
apt-get install duplicity
apt-get install python-pip
pip install boto
Amazon S3
S3 Bucket
Create an Amazon S3 bucket to store backup data. If you have multiple server then you may set bucket-name to server’s FQDN.
For tutorial, we have created Amazon S3 buckets in US Standard zone only as we have most servers located on US East Coast. You may need to change Amazon S3 endpoint URL if you pick some other zone. This might help.
API Access
Next create IAM user with API credentials. You will need Access Key Id and Secret Access Key later. Setting up password is not recommended as we will be using these credentials in backup script later on (in plaintext).
As we backup for many client servers, we create one IAM user per server with access to only one S3 bucket. You can generate access policies for IAM user using this amazon article.
In our case, we use following policy (credit to Mike Ferrier):
{
"Version":"2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "arn:aws:s3:::*"
},
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::BUCKET_NAME",
"arn:aws:s3:::BUCKET_NAME/*"
]
}
]
}
Make sure you replace BUCKET_NAME by actual S3 bucket name.
GPG Key (optional)
By default, duplicity uses symmetric encryption on backup. This means using a simple password which is fine in most cases.
Still, if you are paranoid about security, you can add asymmetric i.e. public-private key encryption.
We have a complete tutorial which you can follow to generate GPG keys. Make sure you remember passphrase.
Backup Script
Create Backup Script
Duplicity works with environment variables so we need to create a script to setup correct parameter. Also this script can be used with cron for automated backup.
Create backup script file:
vim /usr/local/sbin/backup.sh
Add following codes to it
#!/bin/bash
# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID=""
export AWS_SECRET_ACCESS_KEY=""
export PASSPHRASE=""
# Your GPG key
GPG_KEY=
# The S3 destination followed by bucket name
DEST="s3://s3.amazonaws.com//"
# Set up some variables for logging
LOGFILE="/var/log/duplicity/backup.log"
DAILYLOGFILE="/var/log/duplicity/backup.daily.log"
FULLBACKLOGFILE="/var/log/duplicity/backup.full.log"
HOST=`hostname`
DATE=`date +%Y-%m-%d`
MAILADDR="admin@example.com"
TODAY=$(date +%d%m%Y)
is_running=$(ps -ef | grep duplicity | grep python | wc -l)
if [ ! -d /var/log/duplicity ];then
mkdir -p /var/log/duplicity
fi
if [ ! -f $FULLBACKLOGFILE ]; then
touch $FULLBACKLOGFILE
fi
if [ $is_running -eq 0 ]; then
# Clear the old daily log file
cat /dev/null > ${DAILYLOGFILE}
# Trace function for logging, don't change this
trace () {
stamp=`date +%Y-%m-%d_%H:%M:%S`
echo "$stamp: $*" >> ${DAILYLOGFILE}
}
# How long to keep backups for
OLDER_THAN="1M"
# The source of your backup
SOURCE=/
FULL=
tail -1 ${FULLBACKLOGFILE} | grep ${TODAY} > /dev/null
if [ $? -ne 0 -a $(date +%d) -eq 1 ]; then
FULL=full
fi;
trace "Backup for local filesystem started"
trace "... removing old backups"
duplicity remove-older-than ${OLDER_THAN} ${DEST} >> ${DAILYLOGFILE} 2>&1
trace "... backing up filesystem"
duplicity \
${FULL} \
--encrypt-key=${GPG_KEY} \
--sign-key=${GPG_KEY} \
--include=/var/rsnap-mysql \
--include=/var/www \
--include=/etc \
--exclude=/** \
${SOURCE} ${DEST} >> ${DAILYLOGFILE} 2>&1
trace "Backup for local filesystem complete"
trace "------------------------------------"
# Send the daily log file by email
#cat "$DAILYLOGFILE" | mail -s "Duplicity Backup Log for $HOST - $DATE" $MAILADDR
BACKUPSTATUS=`cat "$DAILYLOGFILE" | grep Errors | awk '{ print $2 }'`
if [ "$BACKUPSTATUS" != "0" ]; then
cat "$DAILYLOGFILE" | mail -s "Duplicity Backup Log for $HOST - $DATE" $MAILADDR
elif [ "$FULL" = "full" ]; then
echo "$(date +%d%m%Y_%T) Full Back Done" >> $FULLBACKLOGFILE
fi
# Append the daily log file to the main log file
cat "$DAILYLOGFILE" >> $LOGFILE
# Reset the ENV variables. Don't need them sitting around
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset PASSPHRASE
fi
Make sure you substitute correct values for:
AWS_ACCESS_KEY_ID
IAM user’s access key IDAWS_SECRET_ACCESS_KEY
IAM user’s secret access keyPASSPHRASE
– this will be a symmetric encryption password or GPG key passphrase if asymmetric encryption is usedDEST="s3://s3.amazonaws.com/example.com/"
– AWS S3 region and bucket. example.com is bucket name here.
If you are using GPG then make sure:
PASSPHRASE
will be GPG Key passphrase- Uncomment line
#GPG_KEY=KEY_HERE
and replaceKEY_HERE
with actual GPG key - Uncomment line
# --encrypt-key=${GPG_KEY} \
- Uncomment line
# --sign-key=${GPG_KEY} \
Setup backup script permission and user:
chown root:root /usr/local/sbin/backup.sh
chmod 0700 /usr/local/sbin/backup.sh
Run Backup
Run backup script to verify if its actually backing up:
bash /usr/local/sbin/backup.sh
Setup Cron
After first backup, only changes are sent.
You can add following line to cron (crontab -e
)
0 * * * * /usr/local/sbin/backup.sh
Above cron will run our backup script hourly.
Restore Script
A backup is useless without easy restore support.
Create restore script:
vim /usr/local/sbin/restore.sh
Add following codes to it:
#!/bin/bash
# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID="IAM_USER_ACCESS_KEY_ID"
export AWS_SECRET_ACCESS_KEY="IAM_USER_SECRET_ACCESS_KEY"
export PASSPHRASE="GPG_OR_SOME_OTHER_PASSPHRASE"
# The S3 destination followed by bucket name
DEST="s3://s3.amazonaws.com/example.com/"
# Your GPG key
#GPG_KEY=YOUR_GPG_KEY
if [ $# -lt 3 ]; then echo "Usage $0 <date> <file> <restore-to>"; exit; fi
duplicity \
--restore-time $1 \
--file-to-restore $2 \
${DEST} $3
# Reset the ENV variables. Don't need them sitting around
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset PASSPHRASE
Setup restore script permission and user:
chown root:root /usr/local/sbin/restore.sh
chmod 0700 /usr/local/sbin/restore.sh
Usage:
restore.sh <date> <file> <restore-to>
It is strongly recommended that you create a new folder for restore location and it must be different than actual file/folder you are trying to restore.
Verify Script
Ideally, there must be a mechanism to check integrity of check periodically. I think duplicity command’s verify option does that. (not sure)
Create verify script:
vim /usr/local/sbin/verify.sh
Add following codes to it:
#!/bin/bash
# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID="IAM_USER_ACCESS_KEY_ID"
export AWS_SECRET_ACCESS_KEY="IAM_USER_SECRET_ACCESS_KEY"
export PASSPHRASE="GPG_OR_SOME_OTHER_PASSPHRASE"
# The S3 destination followed by bucket name
DEST="s3://s3.amazonaws.com/example.com/"
# The source of your backup
SOURCE=/
# Your GPG key
#GPG_KEY=YOUR_GPG_KEY
duplicity verify -v4 ${DEST} ${SOURCE}\
--include=/var/www \
--include=/etc \
--include=/home \
--include=/root \
--exclude=/** \
--exclude=/root/.cache
# Reset the ENV variables. Don't need them sitting around
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset PASSPHRASE
Setup restore script permission and user:
chown root:root /usr/local/sbin/verify.sh
chmod 0700 /usr/local/sbin/verify.sh
Usage:
verify.sh
TODO
- Publish mysqldump script and maybe call it from backup script (msyqldump followed by backup)
- Add support for running
restore.sh
without date. That should restore most recent version of file. - Add a verify script with cronjob. Not sure if it can be used for testing backup’s integrity. It will be good if it emails sysadmin about possible backup corruption.
- Tweak file selection to exclude backups and hidden files/folder. Useful reading.
- Move setting and unsetting credential to common file OR create a single script file with different functions are backup, restore, verify, etc. I guess this will be done in EasyEngine.