The terms Recovery Point Objective (RPO) and Recovery Time Objective (RTO) are used frequently when talking about a backup strategy, but they are rarely defined in the Salesforce ecosystem with the same definitions used in the general IT world. For example, many solutions claim to provide point-in-time recovery for Salesforce. These same solutions allow a once-a-day Salesforce backup, leading to a reality with up to 23 hours of potential data loss, meaning a 23-hour Salesforce RPO. Is this truly a point-in-time backup?
This post is not to debate the definition of point-in-time or even the advantages or disadvantages of this approach but rather to define RPO and RTO and provide the basis to help you decide what level of risk is acceptable for your business.
What are RPO & RTO?
RPO is the amount of acceptable downtime for any system before there is a significant impact on business operations. In plainer terms, this is the amount of data you are willing to lose before the loss really hurts your company.
The higher the RPO, the higher amount of data that has been lost and the bigger the impact on the business.
The shaded area is the amount of data loss. Data loss increases significantly with a higher RPO
Why should I care?
The determined RPO is a goal, a statement of the amount of acceptable data loss. Though every company’s goal is to operate with zero data loss, this is not the reality of cloud operations. Data loss is inevitable, as businesses are made up of humans, and humans make mistakes. Even worse, humans have emotions, which can lead to malicious actions with intentional data corruption and loss. A realistic recovery point objective goal should be part of every Salesforce backup strategy.
The RPO is used to create the RTO, or recovery time objective. It is important to set a tolerance for the amount of data you are willing to lose, but it is equally critical to determine the acceptable period of time it will take to recover that data.
Recovery time objective (RTO) is the amount of time it takes to recover from a data loss. The shorter the RTO, the more rapidly the business can get back to normal operations. In many Salesforce data corruption scenarios, a short RTO also minimizes revenue loss and enterprise risk.
A longer recovery time increases the amount of lost revenue and increases the risk to the business.
For some organizations, a data loss scenario can mean an interruption in the supply chain or customer operations until the data loss has been recovered.
Why should I care?
The impact of Salesforce data loss can stretch far beyond the actual CRM environment. Technology like Mulesoft drives data syncs between Salesforce and many other business systems, leading to a chain reaction when Salesforce data is corrupted. For many organizations, a data loss scenario can mean an interruption in the supply chain or customer operations until the data loss has been restored.
How do RPO and RTO work together?
RPO and RTO are a part of a Salesforce business continuity strategy, but they are also just metrics. It is easy to say that a business plans for a 15-minute RTO; it is much harder to prove that this RTO is actually achievable. Let’s take a look at what RPO and RTO mean in practice, along with the business trade-offs that are made to get as close as possible to zero system downtime.
RPO in Practice
Fortunately, preparing for and providing a short recovery point objective is simple. Remembering that RPO is an indication of the amount of acceptable data loss, the baseline for RPO is the frequency of your Salesforce backup. Minimal time gaps in between each backup will naturally result in a short RPO, as the ability to recover is only as good as the time that the last backup was taken. An automated backup that meets your business’s risk acceptance criteria is key here.
For a concrete example:
ACME takes a backup of Salesforce three times per day, at 7am, noon, and 5pm. A data loss occurs at 11am when an accidental developer mistake corrupts the ‘phone’ field on every single contact record. The data loss is observed at 1:30pm when a sales realizes that the country codes are missing from all of their international contact records.
What is the RPO?
In this example, the closest recovery point is the 7 am backup, the last backup taken prior to the data corruption. The noon backup occurred after the data corruption, and thus this backup replicated corrupted data. A recovery initiative would start by identifying all records changed after 11 am, then fixing all records from that period until now.
RTO in Practice
The recovery time objective in the scenario above would be the amount of time it takes to recover the corrupted contact records and get the sales team back on the phone.
What is the RTO?
Unless this scenario is actually tested, there is no way to know what the actual RTO will be until the Production data issue occurs. It’s easy to set a goal, but, without practice, it is impossible to determine if the goal is actually achievable.
RPO and RTO Tradeoffs
A short RPO can be achieved with frequent backups, and a short RTO can be accomplished through Salesforce disaster recovery testing. Like any business decision, determining short recovery point and time objectives will come with tradeoffs as you can see below:
A more rapid recovery point objective simply requires more frequent backups. A 5 minute Salesforce data backup will result in a maximum amount of 4 minutes total of data loss. Longer RPOs increase the amount of potential data loss, but also decrease the amount of storage required to hold the backups. Most SaaS vendors provide a backup of Salesforce once every 24 hours.
This chart outlines the raw personnel cost of a Salesforce data disaster. The short 5-minute RTO comes with a higher cost due to the amount of practice it takes to maintain a robust testing regime. A longer RTO allows the team to be less experienced in the restore solution; however, there is a point where the people cost begins to increase. This sharp increase in cost ties directly to the revenue loss and increased risk resulting from the data loss, where more senior personnel will need to be brought in to resolve the problem.
How to Decide On Proper RTO and RPO Times
The best way to start with a Salesforce disaster recovery strategy is to determine the acceptable RPO for your business. This determines the intervals in which backups must occur, and you can build on this to determine RTO targets. The RTO goals must be practiced at least once a month to validate that the return time in the plan matches the amount of time it takes to recover Salesforce data or metadata.
Once you determine RPO and set RTO targets, you will need to find a vendor to meet these targets. A marketing claim is not enough. These targets should be substantiated with real backup and recovery testing in your own environment. A full-copy Salesforce sandbox is a good practice environment, as the amount of data will mirror the current state of Salesforce production.
CapStorm solutions provide self-hosted Salesforce backup and recovery, allowing customers to set their own RPO and RTO. For a short recovery point objective, simply schedule incremental Salesforce backups to run every 5 to 15 minutes. For a short recovery time objective, practice restore fire drills with Salesforce sandboxes and create templates that can be used to mimic expected production issues. With CapStorm, you can achieve Salesforce data protection against accidental data loss or corruption.