Automatic S3 backups
posted 2019.07.02 by Clark Wilkins, Simplexable

I want to share a (relatively basic) shell script that solved a really important problem for us. As you may know, we provide warehouse/order management services via mystockd.com and job management via myservicd.com. Part of our services package for both sites is independent databases for each client, and nightly backups.

In times past, I backed up the relatively small client list for INSIGHT to a remote 3TB drive. That's no longer tenable with these new services. I needed something easily extensible and completely cloud-based where our Amazon EC2 servers could backup to our Amazon S3 repositories.

A bit of research into Amazon Command Line Interface tools led to the answer.

#!/bin/bash

# very important to define the home directory and
# where to find the IAM credentials for access

export HOME=/root
AWS_CONFIG_FILE="~/.aws/config"

# add today's source code

now=$(date +"%H.%M.%S")
echo "$now: adding current servicd source code before creating backups"
rm -rf /root/backups/servicd/*
cd /root/backups/servicd/
mkdir source
cp -R /var/www/html/servicd/ source/

# strip out files not specific to the SaaS part of servicd

rm -rf ~/backups/servicd/$today/source/servicd/[long list of files that are on the public side of the site]

# prepare to dump

now=$(date +"%H.%M.%S")
echo "$now: starting backup of servicd clients"

# this is the list of active servicd clients

input="/root/backups/servicd_list"

# run through the list, getting one name at a time

while read -r line

do

now=$(date +"%H.%M.%S")
echo "$now: backup of $line"

# remove prior archives and dumps

rm -rf *.gz
rm -rf *.dmp

# dump databases (add to servicd_list as needed)

pg_dump $line > $line.dmp

# compress the archive

now=$(date +"%H.%M.%S")
echo "$now: compressing backup for $line"
tar -zcvf ~/backups/servicd/$line.gz ~/backups/servicd/

now=$(date +"%H.%M.%S")
echo "$now: ready to backup to S3"

# dump to S3 archive

date=$(date +"%Y.%m.%d")
/root/bin/aws s3 cp /root/backups/servicd/$line.gz s3://backups.myservicd.com/$line/$date.gz
now=$(date +"%H.%M.%S")
echo "$now: $line backup is done"

done < "$input"

now=$(date +"%H.%M.%S")
echo "$now: all backups are done"

Here are the main points:

  • The list of active clients is a single text file of their database names. Very easy to manage.
  • When the script runs, it first grabs a copy of the source directory and removes all public-facing (non-SaaS) files.
  • It iterates through each active user-base, dumping their database first.
  • A gzip archive is made with their database dump and the current source code.
  • The compressed archive is backed up to an S3 repository in a sub-directory tied to their database name.
  • To add a new backup, all we have to do is edit the config file.
  • To stop backing up a client (now inactive), we remove their database name from the config file.
  • The script runs as a crontab job every night.

That's it. We hope this is helpful to others. Regards!