Backup & Recovery Strategies

Introduction

Backup strategies refer to the way an organization plans to backup and restore databases. The process of backing up and restoring a database is different than deploying a web application, for instance.

The purpose of backing up a database is to minimize the loss of data in cases of a catastrophe or other incidents where the database is affected. It also helps to maintain backups in order to recover from data corruption or data losses due to manual errors, application problems, hacking, etc.

The Backup Process

In order to prepare a strategy for backing up a database (or databases), it is important to understand how often data changes, how critical the information is to operations, and how the database is used within the organization - or by customers and others.

For example, consider a database that supports an online magazine. Articles might only be added or updated once a week. If there are no tables that track reader comments or other activity, it might be practical to only backup the database on a weekly basis. (Remember that advertisements and other types of content may come from other sites - not the primary database).

On the other hand, a database that tracks local events might be updated hourly. In that case, it might be beneficial to backup the database several times a day.

For SQL Server, backing up the database produces one or more backup files. These backup files are NOT the same as the database. They are a specialized, compressed version of the database file(s). In other words, a backup is not just an exact copy of the database files. The backup is a copy of the schema and data, but in a different format.

There are multiple ways to backup a database. Some methods produce a .BAK file. For Azure, the backup process produces a .BACPAC file (Azure does not product .BAK files). The restoration process also differs based on what type of backup file was produced.

Depending on the size of the database, and the configuration of the database server, the backup process may take some time (seconds, to many minutes) and may slow performance of the database. For SQL Server, the backup process is normally quick, although very large databases (10GB+) might take several minutes.

Assessing Backup Needs

To determine what backup strategy to pursue, the following information is helpful:
  1. How large is the database?
  2. How critical is the information for operations?
  3. How often does core/critical information change?
  4. What are the affects to the organization or customers if information is lost?
  5. What are the patterns or cycles of database usage? (daytime only, all day/night, mostly weekdays, 24/7?)
  6. What is an acceptable downtime for restoring the database?
  7. Are there infrastructure issues (server, network, storage, etc.) that might impact backups or restores?
  8. Is the database server on-premise or in the cloud?
Consider two examples:

Online Jewelry Store Example

Based on the questions above, the web site owner might determine that:

  • The database is very small
  • It is very important that the online store products are available
  • Products only change monthly. There are 3-4 sales per week.
  • There is minimal impact if the database is down for part of a day
  • Almost all site visitors use the application in the afternoon and evening, and only sometimes during the weekend.
  • The site could be down for several hours, if needed
  • The database is stored in the cloud. There are only occassional issues with Internet access to the database.

This information suggests that the database can probably be backed up once a day (or night) - which would capture and backup any sales for that day. The backup and restore should be very quick (seconds). Performing the backup at night or early morning would impact users the least. Restoring a database backup could be done at almost any time with an acceptable impact on the organization and users.

Restaurant Example

Based on the questions above, the web site owner might determine that:

  • The database is medium size. There are several hundred items (and images) available for online ordering
  • It is "mission critical" that the food items are available for ordering, especially during peak hours (1:00pm - 11:00pm, every day)
  • Items change daily. Menus and specials are altered every day. There are over 200 orders placed daily - and more on weekends
  • There is substantial impact if the database is down during peak hours
  • All visitors use the application during peak hours. Only managers use the database on off hours to manage menus and specials.
  • The site cannot be down during peak hours
  • The database is stored in the cloud. There are never issues with Internet access from the restaurant.

This information suggests that that database should be backed up multiple times day. The backup and restore may take several minutes. A good strategy might be to perform a backup before the peak period (11:00am), and every half-hour from 1:00pm to 10:00pm, and then a final daily backup after 11:00pm. Restoring a database backup could be done during off-peak hours with little impact, but any restore during peak hours (although necessary) might be problematic. This strategy would mean no more than 30 minutes of data would be lost if there was a catastrophic issue with the database.

Backup Types

There are three(3) basic backup types

  • Full/Master: backs up the entire database
  • Differential: backs up the changes since the last full backup
  • Log File: backs up changes since the last differential OR full backup

Full/Master Backup

This backups the entire database at a point in time. For example, if the backup is made Monday at midnight, and the server crashes and corrupts the database on Tuesday afternoon, the database could be restored to exactly how it existed on Monday at midnight.

Most organizations perform a full backup at least once a week. If practical, it is also common to make a full backup every night (or at a convenient time that does not impact anything).

Differential Backup

This backs up all changes since the last full backup. For example, and organization that makes a full backup on Sunday night, might make a differential backup on Tuesday night - to capture any changes that occur on Monday and Tuesday. Differential backups are faster and smaller, which makes them ideal for very large databases.

A differential backup is restored in conjunction with a full backup. It is not restored by itself - meaning the backup process is faster, but the recovery process is slower. In the example above, if a differential backup is made on Tuesday night and the server crashes on Thursday, the master backup from Sunday is first restored, then the differential is restored to recover the database to how it looked on Tuesday night. Notice that activity since Tuesday night would not be restored.

Log File Backup

This backs up all changes since the last backup (full, differential or log file backup). Log file backups are very fast. SQL Server maintains a transaction log that records every change made to the schema or data. For example, and organization that makes a full backup on Sunday night, might make log file backups every four hours. They might also make a differential backup on Tuesday night.

A log file backup is restored in conjunction with a full backup and any differential backup. In the example above, if the server crashes on Thursday, the master backup from Sunday is first restored, then the differential is restored from Tuesday night, then each log file backup after the Tuesday night differential is restored. The log file backups from Monday and Tuesday are not needed because there was a differential made on Tuesday night. This strategy would allow the organization to lose no more than four hours of data changes.

The Tail Log

If a database is completly corrupted or lost, it can only be restored based on the most current backup.

If the database is still available, but needs to be backed up or moved, for example, the current transaction log (the "tail log") can be backed up immediately to the current point in time. This allows the database to be completely restored to it's current state when it is recreated or moved.

Recovery Planning

There is more to a backup strategy than deciding when and how to schedule backups. The process to recover the data must also be considered. The recovery process is often much more time-consuming. It might involve deploying a new server and recreating the database, as well as applying all the backups.

Consider an organization that makes a single Full backup on Sunday at midnight and then Log file backups every six hours, all week long. If the database is lost on Friday afternoon, there are probably 20 backups to apply: one full backup from Sunday night, and 19 backups covering every four hours from Monday to Friday noon. That process might take several hours, even if the server and database are already prepared.

Mission Critical Data and Operations

A database backup strategy is a critical part of maintaing operations in case of hardware failure, hacking, or other events. However, a backup strategy should not be the only process in place for many organizations.

For organizations that rely on up-to-the-second data, backups are not sufficient. If a server crashes and/or a database is corrupted, recovery takes time. Consider the situation of an emergency call center, a hospital, or a large 24/7 factory. There isn't time to stop and restore a database.

For situations where it is not practical to stop operations to restore a database, organizations must deploy server clusters, server farms, shared storage systems, and other strategies to make sure that if any part of the infrastructure fails (network, server, HVAC, database, etc.) that the entire system can failover to backup systems.

In the scenario where backup systems will immediately kick in, the recovery process can then be performed on the companion system that is offline.

A clustered server arrangement involved multiple servers that share data storage. For example, two identically configured servers both share a networked disk array that contains the database. If server A is currently active and fails, server B (which is also running and connected) kicks in and immediately begins handling all database requests. Server A can then be repaired, replaced, or updated, as needed. In such a scenario, the database itself is striped across multiple physical hard drives, so a hard drive failure does not affect the database file(s).

Backup Files

Backup files are useful only if they are stored properly, are available when needed, and not damaged or corrupted. Many organizations keep multiple backup files on a rotating basis. For example, if a full backup is made every night, it would be common for the organization to retain and least seven days of backup files. If the most current file is corrupt for some reason, the previous version would hopefully be good. Some organizations might also retain the last full backup file made for each of the last twelve months.

Backups are also maintained for an extended period so that data was corrupted in the near term could be recovered. For example, if all sales for the prior month were lost or overwritten, an older backup file could be used to recover the lost data.

Lastly, if backups are normally stored on a local system, it is important that backups copies be moved to an external location, or stored in the cloud. If backup files are normally made and stored on a cloud system, it is important to also make local copies - in case the cloud provider fails or encounters a catastophic failure. In general, backup files should never be retained on the same system (cloud or premise) on which they were created.