From natural disasters to ransomware, every organization faces the potential for unexpected downtime of their web-based systems. Preparation is key to ensuring your business can withstand a significant disruption, and determining your Recovery Time Objectives is where it all starts. Recovery Time Objective (RTO) and Recovery Point Objective (RPO) calculations are integral to a successful Disaster Recovery Plan and maintaining business continuity. But what exactly are they? RTO is the maximum amount of time a computer, system, network, or application takes to recover after an outage or data loss without detrimental effect to business operations and service-level agreements (SLAs). While similar, RPO is the maximum amount of data loss your organization can handle without significant effect on operations.
While RTO has significant implications for ensuring the continuity of business operations, factoring in the values of the RTOs for different systems and applications also directly affects the timing of backups and recovery strategy. Since RTO is the amount of time between either an outage or data loss event and the point in time systems come back online—that needs to be calculated into your data backup plans. Keep reading for a deep dive into key aspects of Recovery Time Objectives.
The objective? A return of business systems and data to pre-disruption status. The timeline? It depends. Your RTOs will be different for various applications and data, depending on how critical the system or data is—but those RTOs are the foundation upon which your Recovery Plan is built. It’s important to balance out the expense of shortening an RTO versus the importance of the timeliness of restoration. Calculating RTO includes determining how much each hour (or minute) of downtime is going to cost the organization as well as the cost and steps necessary to restore the system or recover the data. What’s most important is making sure you get it right! This will help determine your strategy but you will also want to know your general RTO times...given your volume of data and your available capabilities...if your mission critical system is down, you want to know how quickly you can bring it back up.
It would seem that the obvious answer for the perfect RTO is “near-zero.” But that’s not necessarily the case. You may spend excessive time and money upfront on the steps and processes to ensure an RTO that isn’t necessary for your business. A company that can easily revert to paper invoices and filing for a time doesn’t need to spend as much money to ensure a short RTO as a company that is entirely web-based. On the other hand, if you rely more heavily on online systems and data or have strict SLAs you need to meet, near-zero RTO may be worth the additional effort and cost. If your RTO is too long, even if you “successfully” meet it after a disruption, you may discover the outage has done irreparable damage in time, money, and reputation.
Finding the balance between the cost of recovering data and restoring systems, and the cost of having those systems down requires a thoughtful and thorough Business Impact Assessment and Disaster Recovery Plan. Whatever the length, ensuring your RTOs are achievable and fit for purpose is key to any company’s recovery from disruption. If you create an RTO that’s impossible to meet, you could end up with unrecoverable damage to your business. Your RTOs must factor in your RPOs—and backup and recovery plans are based on those. If you consider the timeline of a data loss event, the RPO timer works backward from the time of the incident, and the RTO timer works forwards in time. Remember also that these are maximum acceptable values. If you have a workload with an RPO of 6 hours, you must have a recoverable backup taken at least every 6 hours. Consider also that backups often involve the transmission of significant volumes of data from primary to backup storage, so the actual process of taking a backup should be considered. If your workload demands a backup every hour, but your actual backup process takes 80 minutes, you cannot possibly hit your RPO.
That’s a lot of verbiage. An example might help!
The HR department of Zaffre Fashion Group have determined that the maximum amount of data they could lose in event of a disaster for their annual leave booking system is 6 hours - this is their RPO. Because of this, they back their application data up at midnight, 6am, noon, and 6am every day. They determine that in the event of a disaster, they need to restore service within 2 hours - this is their RTO. If they discover a data loss event at 9am the RTO clock starts then, so they can recover from the 6am backup, and be back up and running by 11am.
Recovery Point Objective: 6 hours
Recovery Time Objective: 2 hours
Last Scheduled Data Backup Complete: 6:00am
Next Scheduled Data Backup: 12:00pm
Data Loss Incident: 9:00am
Service Restored: 11:00am at the latest
Another term you may come across when researching RPO and RTO is RTA, or recovery ime actual. This is the amount of time from the detection of a data loss incident to the actual time taken to restore full service. In many cases, this will be lower than the RTO, but at a maximum should be the same as the RTO. In the above example, Zaffre Fashion Group’s IT department are on point, and were able to restore service at 9:12am. In this case, the Recovery Time Actual is 12 minutes. Impressive!
Imagine now that same scenario, but the recovery time actually ended up being 3 hours instead of twelves minutes, and you’re now an hour over your RPO. What might the consequences be? To avoid that type of calamity, it’s critical to prioritize your systems and data and consider the cost per hour of outage (in dollars as well as reputation, customer service, employee safety, SLAs, and legal consequences), the cost/benefit of recovery solutions, and to have a clear plan for the steps necessary to achieve your set RTOs.
That’s a lot of information! Let’s take a look at some examples of RTOs and solutions.
There are virtually unending ways that your business may be disrupted—from a server outage to a natural disaster; from an employee accidentally deleting critical data to ransomware encrypting that same key data. There are also many options for achieving the appropriate RTOs for each category—from granular item recovery to a full system recovery from immutable backups. Knowing how critical your systems and data are is a first step to knowing what processes and recovery systems you need to have in place. Also key is knowing the best way to recover those systems: while a full system recovery will likely restore service to the same state as a database log rollback, the latter will potentially have you back up and running much quicker, allowing you to hit those RTO targets.
Organizations begin by assessing the criticality of systems and data to their overall business. It’s important to always keep in mind that RTOs are time-based. There may be systems and data that are absolutely critical to your company’s overall goals and strategy that don’t necessarily need a short RTO. For instance, if a hospital highly prizes employee training and their Learning Management Software (LMS) goes offline—possibly with the data loss of training records of staff, the time-based consequences of that outage are not nearly as critical as the consequences of a ransomware attack on their HIPAA-protected data. For some systems, like that LMS system, an RTO can be weeks or even months. Some systems and data may need an RTO of minutes or hours. And some may be so critical that instant recovery is an absolute must. For instance, consider disruption of the life-saving equipment and Electronic Medical Record (EMR) data necessary to make split second healthcare decisions at that same hospital. The consequences of that downtime are dire, with literal lives on the line—requiring a near-zero RTO.
Many companies rely on regular data backups and feel secure that they’re protected—companies employing automatic backups feel even more protected. However, as cybercriminals become more and more advanced and ransomware more complex, simple backup solutions are no longer as effective for data that requires low or near-zero RTO. Many cyberattacks now target the backups themselves—sometimes infiltrating a system over time. By the time the attack is recognized by IT, weeks’ worth of data backups may already be corrupted. While using tape backups stored offsite was once standard practice, timeliness has made tape backups unrealistic for companies that have critical business needs that make it untenable to wait for tapes to be found, transported, and restored.
For your valuable data, immutability ensures you have backups. But on top of having backup data, you need capabilities that provide instant recovery to bring it back. Rubrik has things like Live Mount and Instant Recovery.
Take a look at how Rubrik helped Kern Medical Center defend and recover from a ransomware attack, then read on for more RTO information.
At the end of the day, your organization cannot afford not to be prepared. Ransomware is on the rise, and the attacks are becoming more complex every day. Setting your company up for success is critical to surviving a ransomware attack (or any other incident or natural disaster that brings down your systems). A data breach can be costly—in actual dollars, as well as having reputational and even legal consequences. Calculating the right Recovery Time Objectives and Recovery Point Objectives is key to surviving a data breach crisis and ensuring business continuity.
Remember: RTOs and RPOs are both about time and are interconnected. RTO is the maximum amount of time to restore systems or applications to a pre-incident state without serious consequences to your business. RTOs will be different for every company, every vertical, and every system. A thorough Business Impact Assessment will help you determine your categories of RTOs based on their criticality to your business. Whether it’s a hacker, a natural disaster, or even basic human error, ensuring you put the right systems in place to meet your RTOs will mitigate data loss as well as the loss of time, money, customer loyalty, and brand damage.