HowTo Backup Your Website Files

Trackback or

I can never stress enough how important it is to backup your files. This goes for your website as well. In the “good old days”, it wasn’t a problem to back up your website because, in a sense, your website was the backup. You kept all of the original files on your system at work or home. Boy, how things have changed!

Today, we have web systems that allow you to do all of your web authoring right from within the website itself. This means, no backup! Luckily, there is not only an easy but automatic way to back up your website files.

Wonderful wget

For those of you in the *nix world, this is going to be old news. There is an amazing program called wget that allows you to copy the contents or select contents of a web site all from the command line!

“Command line!” you exclaim, “This is Windows! We don’t use the command line!” Actually there are a lot of really good programs out there that do not have a pretty graphical user interface (GUI). You use the command line or configuration files to pass information to them.

Wget is one such program. The entire program file is only 252kb but it sure is powerful. It has a very extensive list of options. I have listed them here so that you can use this as a future reference or to tweak your backup options:
GNU Wget 1.10.2, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...

Mandatory arguments to long options are mandatory for short options too.

Startup:
-V, --version display the version of Wget and exit.
-h, --help print this help.
-b, --background go to background after startup.
-e, --execute=COMMAND execute a `.wgetrc'-style command.

Logging and input file:
-o, --output-file=FILE log messages to FILE.
-a, --append-output=FILE append messages to FILE.
-d, --debug print lots of debugging information.
-q, --quiet quiet (no output).
-v, --verbose be verbose (this is the default).
-nv, --no-verbose turn off verboseness, without being quiet.
-i, --input-file=FILE download URLs found in FILE.
-F, --force-html treat input file as HTML.
-B, --base=URL prepends URL to relative links in -F -i file.

Download:
-t, --tries=NUMBER set number of retries to NUMBER (0 unlimits).
--retry-connrefused retry even if connection is refused.
-O, --output-document=FILE write documents to FILE.
-nc, --no-clobber skip downloads that would download to existing files.
-c, --continue resume getting a partially-downloaded file.
--progress=TYPE select progress gauge type.
-N, --timestamping don't re-retrieve files unless newer than local.
-S, --server-response print server response.
--spider don't download anything.
-T, --timeout=SECONDS set all timeout values to SECONDS.
--dns-timeout=SECS set the DNS lookup timeout to SECS.
--connect-timeout=SECS set the connect timeout to SECS.
--read-timeout=SECS set the read timeout to SECS.
-w, --wait=SECONDS wait SECONDS between retrievals.
--waitretry=SECONDS wait 1..SECONDS between retries of a retrieval.
--random-wait wait from 0...2*WAIT secs between retrievals.
-Y, --proxy explicitly turn on proxy.
--no-proxy explicitly turn off proxy.
-Q, --quota=NUMBER set retrieval quota to NUMBER.
--bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host.
--limit-rate=RATE limit download rate to RATE.
--no-dns-cache disable caching DNS lookups.
--restrict-file-names=OS restrict chars in file names to ones OS allows.
--user=USER set both ftp and http user to USER.
--password=PASS set both ftp and http password to PASS.

Directories:
-nd, --no-directories don't create directories.
-x, --force-directories force creation of directories.
-nH, --no-host-directories don't create host directories.
--protocol-directories use protocol name in directories.
-P, --directory-prefix=PREFIX save files to PREFIX/...
--cut-dirs=NUMBER ignore NUMBER remote directory components.

HTTP options:
--http-user=USER set http user to USER.
--http-password=PASS set http password to PASS.
--no-cache disallow server-cached data.
-E, --html-extension save HTML documents with `.html' extension.
--ignore-length ignore `Content-Length' header field.
--header=STRING insert STRING among the headers.
--proxy-user=USER set USER as proxy username.
--proxy-password=PASS set PASS as proxy password.
--referer=URL include `Referer: URL' header in HTTP request.
--save-headers save the HTTP headers to file.
-U, --user-agent=AGENT identify as AGENT instead of Wget/VERSION.
--no-http-keep-alive disable HTTP keep-alive (persistent connections).
--no-cookies don't use cookies.
--load-cookies=FILE load cookies from FILE before session.
--save-cookies=FILE save cookies to FILE after session.
--keep-session-cookies load and save session (non-permanent) cookies.
--post-data=STRING use the POST method; send STRING as the data.
--post-file=FILE use the POST method; send contents of FILE.

HTTPS (SSL/TLS) options:
--secure-protocol=PR choose secure protocol, one of auto, SSLv2, SSLv3, and TLSv1.
--no-check-certificate don't validate the server's certificate.
--certificate=FILE client certificate file.
--certificate-type=TYPE client certificate type, PEM or DER.
--private-key=FILE private key file.
--private-key-type=TYPE private key type, PEM or DER.
--ca-certificate=FILE file with the bundle of CA's.
--ca-directory=DIR directory where hash list of CA's is stored.
--random-file=FILE file with random data for seeding the SSL PRNG.
--egd-file=FILE file naming the EGD socket with random data.

FTP options:
--ftp-user=USER set ftp user to USER.
--ftp-password=PASS set ftp password to PASS.
--no-remove-listing don't remove `.listing' files.
--no-glob turn off FTP file name globbing.
--no-passive-ftp disable the "passive" transfer mode.
--retr-symlinks when recursing, get linked-to files (not dir).
--preserve-permissions preserve remote file permissions.

Recursive download:
-r, --recursive specify recursive download.
-l, --level=NUMBER maximum recursion depth (inf or 0 for infinite).
--delete-after delete files locally after downloading them.
-k, --convert-links make links in downloaded HTML point to local files.
-K, --backup-converted before converting file X, back up as X.orig.
-m, --mirror shortcut for -N -r -l inf --no-remove-listing.
-p, --page-requisites get all images, etc. needed to display HTML page.
--strict-comments turn on strict (SGML) handling of HTML comments.

Recursive accept/reject:
-A, --accept=LIST comma-separated list of accepted extensions.
-R, --reject=LIST comma-separated list of rejected extensions.
-D, --domains=LIST comma-separated list of accepted domains.
--exclude-domains=LIST comma-separated list of rejected domains.
--follow-ftp follow FTP links from HTML documents.
--follow-tags=LIST comma-separated list of followed HTML tags.
--ignore-tags=LIST comma-separated list of ignored HTML tags.
-H, --span-hosts go to foreign hosts when recursive.
-L, --relative follow relative links only.
-I, --include-directories=LIST list of allowed directories.
-X, --exclude-directories=LIST list of excluded directories.
-np, --no-parent don't ascend to the parent directory.

We are not going to go through all of the differnet wget options because that would be an entire book. We just want to focus on a few things. But first, where to get wget?

Download wget

Wget is an open source application which means that you can download it directly from the Internet and use the program.

Wget download: http://www.christopherlewis.com/WGet/WGetFiles.htm

As of the writing of this article, the recommended version of wget was 1.10.2 stable. The file I downloaded and use in all of these examples is wget-1.10.1b.zip.

Create Download Folder and Extract wget

You now need to make a place somewhere on your hard drive to store both your backup and the wget files. (It is possible to put the wget files in a separate location from the backup files but it requires more configuration and I want to make this as simple as possible.) A good location might be C:/WebBackup and I will use it for my example.

Once you have created your backup folder, double click on the wget-1.10.1b.zip file you downloaded earlier and copy its contents into your backup folder. We now have all of the tools needed to create a backup batch script.

Create Backup Batch Script

“Batch scripting? I thought batch scripting went out with leg warmers and Hammer pants!” Sorry, but batch scripting is alive and well. Any administrator worth his salt is proficient in one or more scripting languages. (For a quick overview of scripting languages, see Infrastructure Automation Primer.)

Open notepad or any other text editor of your choice. Copy the following lines of code into your text editor:

@echo off
wget --output-file=logfile.log –tries=5 –passive-ftp –mirror –ftp-user=username –ftp-password=password ftp://ftp.yourftpsite.com

Note: there are only two lines in the batch file. If there is some line wrapping occurring in your web browser, you may see the wget line as two lines. Everything after the word wget should be on one line.

Save the file in your backup folder as WebBU.bat (or some other file name that makes sense to you).

Customize Your Batch Script

You now have the basic framework to back up your website. But, it will not work correctly until you customize some parameters. Locate the following parts of the wget line and edit them to match the settings you want.

–output-file=logfile.log
This option creates a log file for you. In this case, the log file generated is called logfile.log. If you do not want a log file generated, you can leave this entire option out. Or you can change the log file name to something else by changing the logfile.log potion of the option

–tries=5
If wget cannot get a file for some reason (e.g. the Internet connection has gone down), it can be told to retry getting the file a specified number of times. In this instance, it will retry 5 times. To change this, change 5 to any number you want.

There is one exception. If you use 0, wget will attemtp to copy the file indefinitely until it gets the file.

–ftp-user=username
This is your username for your ftp account. Your hosting provider will provide this to you. Simply replace username with your ftp username.

–ftp-password=password
This is your password for your ftp account. Your hosting provider will provide this to you or you will set it up yourself. Simply replace password with your ftp password.

ftp://ftp.yourftpsite.com
This is the actual ftp site that you will be backing up. Replace ftp://ftp.yourftpsite.com with your actual ftp site name.

Once you have made these changes, make sure that you save the file again and close your text editor.

Security Alert! Note that all of your ftp account information is now in a single file in clear text. Anyone who gets their hands on this file will have complete access to your website account. Guard it closely.

Make It Happen

Once you have completed your backup script, you’re ready to run it. Do this by double clicking on your batch script. An empty black window should open. This window will be open for the duration of the backup process. The title on the window will display the file that is presently being copied to your system and its progress. It will automatically close when the copy is complete.

You should now have all of your files copied to your computer from your website. Be certain to run this at least once a week. More often if you make a lot of changes.

Other Considerations

Now, please remember that this will only back up your files on the network. This does not back up things like databases. That, my friends, is another article.

If you found this post useful, why don't you buy me a cup of coffee to show your gratitude?

Trackback link - http://www.dailycupoftech.com/howto-backup-your-website-files-in-progress/trackback/
Tim Fehlman

5 Responses to “HowTo Backup Your Website Files”

  1. Geoffrey Smith Says:

    Thius is very nice and gets you only a copy of the website files, but what about if the website uses a mysql database, how can that be integrated into this approach?

  2. mikado Says:

    Just set a cron script that dumps mysql database to a file previously to wget command.

  3. andrea Says:

    Hi! I tried your script but I get wget errors as it seems not supporting ftp-user and password options.
    In fact typing “wget –help | grep ftp” I don’t see those options.

    I have “GNU Wget 1.9.1″. Should I ask my sysadmin to upgrade?

    Thanks, Andrae

  4. Daily Cup of Tech Says:

    Disable IE7 Installation Via Windows Update Folder Recursion in AutoIt FreeNAS Basic Configuration FreeNAS System and Skill Requirements Have Your Lost USB Drive Ask For Help How To Be A Better Blogger HowTo Backup Your Website Files HowTo Install FreeNAS Infrastructure Automation Primer InstaCalc Contest Installing AutoIt Installing TrueCrypt Installing Ubuntu Desktop Series Installing Ubuntu Desktop Part 1 Installing Ubuntu Desktop Part 2

  5. Derek Ralston Says:

    Thanks! Note: When I pasted the batch script code into notepad, it didn’t work right. I had to manually put in dashes because the ones I pasted output something weird when I tried running it and gave an error.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>