The Day Before I Broke Prod
Before we get to the alarming cause of data corruption, we need to take a step back and look at the events that led to this catastrophe:
The marketing team had just added a new 3rd party tool to enhance customer and prospect communication. This app included a web portal and Salesforce integration, designed to update the app when new contacts are added to Salesforce. The tool was also designed to sync communications back to Salesforce in order to alert the account teams of recent customer activity.
The integration was set up halfway when I was asked to look into why a large number of Salesforce opportunities and accounts were not syncing to the app.
Like any overly confident Admin, I dove right in and discovered some missing simple field mappings. I also found that Accounts had not been set up to pull into the marketing tool automatically. Once these minor items were resolved, I foolishly skimmed over the rest of the integrated fields then moved on to another project. What I did not realize is that I had unintentionally set the stage for a massive data corruption.
The Day I Broke It
Around 8 am the next day, I received a group email from our accountant, alarmed that a Salesforce opportunity that had just closed/won that morning was not connected to an account. I quickly took a look in Salesforce and made my second fatal mistake – assuming that this was an isolated issue! Just when I was congratulating myself for only creating a minor mistake, I received a second email from the accountant….
It only took a few minutes to determine that this was not an isolated issue – the result of the overnight integration sync was mass data corruption.
Every opportunity in Production was now orphaned, entirely disconnected from the related Account.
20 Minutes Later
Twenty minutes. That is the total amount of time it took to recover from this mass data corruption. How?
5 Minutes: To start, we back up Salesforce multiple times daily. This backup is a mirror copy of our Salesforce data, and the backup process also automatically snapshots data changes. This helped us pinpoint precisely when the data corruption occurred – 2 am. The backup is kept in a PostgreSQL database on AWS so the team can access the data anytime, without vendor support.
5 Minutes: Next, the problem was re-created in a sandbox to have a testing environment. This did not require a Salesforce sandbox refresh – we seeded a developer org with some intentionally corrupted data from the most recent production backup.
10 Minutes: This sandbox provided a place to test the recovery process. A restore job patched the broken Account – opportunity relationships, and I quickly validated that the restore worked by creating a simple list view.
5 Minutes: The same recovery configuration used with the sandbox was connected to Production.
What Did I Learn After Breaking Prod?
- A Salesforce backup is hyper-critical.
- A recovery process that you can test is simply a requirement.
- Don’t mess with integrations if you don’t know what they do!