There’s a strong chance that you, a colleague, or a peer at another company has been hit by a ransomware attack. This means that someone penetrated your perimeter defense, likely through human phishing methods or insecure external access (such as RDP), and has landed malicious code within a permissive zone of your production environment. The outcome of these attacks comes in the form of encrypted content (files, folders, operating systems, etc.) that require cryptocurrency payment(s) to make it accessible once more.
This pain can hit especially hard when:
- Identifying where the malicious code exists to remove or neuter it.
- Scoping out the damage and either paying the “ransom” or restoring data from backup.
- Determining how to prevent the intrusion from repeating, if possible.
Fortunately, we at Rubrik understand this pain all too well. One of our earliest customers, Langs Building Supplies, had their production environment hit by a ransomware snag back in 2016. Their team acted quickly and used the immutable nature of Rubrik’s backups to recover the encrypted data without paying the ransom. Huzzah!
Since then, we’ve taken the state of the art to a new frontier with the release of Radar, an application that lives on our Rubrik Polaris SaaS platform, to detect anomalies, analyze the threat, and accelerate recovery with a few clicks. This is the same application that received two Best of VMworld Europe 2018 Awards, recognizing the exceptional impact of Rubrik’s Radar ransomware threat protection for ASL Airlines. Their experience thus far has been glowing around the pain points we are working together to remediate:
We experience a minimum of 1 ransomware attack per month. Before Radar, the team spent 15 hours to recover from a minor ransomware attack. If we had been hit with a major attack, I fear recovery could’ve taken weeks.
In this post, I’m going to touch on the more technical pieces of how Radar works, and then dive into the real-world training and testing we’ve done to ensure our solution is able to detect the file encryption and filesystem metadata encryption that typically accompany a ransomware attack.
How Rubrik Polaris Radar Works
Consider the fact that every backup “snapshot” you take is a linear data point. Each snapshot contains a wealth of metadata about the source that you’ve protected, such as a VM, its characteristics, and all of the content that lives within the virtual disk(s). This metadata can be securely collected from an on-premises or cloud environment and sent back to Radar for inspection via a Metadata Sync (MDS) service that can retrieve information from numerous Rubrik clusters.
I’ll go a bit deeper into the analysis pipeline and showcase what metadata is being sent to Radar below. If that’s not your cup of tea, skip to the next section. 🙂
Filesystem Behavior Analysis Pipeline
We begin with the local Rubrik cluster that is protecting one or more workloads. Assuming that Radar is associated with the Rubrik cluster, extremely small Metadata Files (MDFs) are generated that contain a payload of metadata “diffs” related to filesystem changes since the last snapshot was taken. These files are then sent up to the Rubrik Polaris platform where a poller will detect new content to crack open and begin the process of behavior analysis.
The initial task is to perform a preliminary analysis for ransomware by examining the MDFs for changes in the filesystem itself. The information examined is given by metadata detailing the changes that have occurred in a filesystem.
This includes, but is not limited to:
- Path
- Size
- ACL details
- UIDs
- GIDs
- Attributes
These MDFs tend to be quite small, ranging from 1-2 kilobytes in size when represented in a binary encoding. The next step is to scan for any anomalous patterns that can be observed in the metadata file when compared to historical data points. For example, a large number of add-file or move-file operations (outside of standard usage norms) might be indicative of a malware infection. The compute requirements for this stage are kept minimal in order to limit compute.
Applying Machine Learning Models
As each snapshot’s metadata is collected by Rubrik Polaris, we leverage a deep neural network (DNN) to build out a full perspective of what is going on with the workload. The network is trained to identify trends that exist across all samples and classify new data by their similarities without requiring human input. The analysis is largely based off of file system behavior and content analysis.
The detection pipeline has two parts:
- File System Analysis: Performs behavioral analysis on the file system metadata information by looking at items like number of files added, number of files deleted, and so forth.
- File Content Analysis: Once outlier behavior is detected, further analysis can be performed on that snapshot.
Overall, this pipeline excels at creating a historical baseline that gets refined over time through machine learning (ML) algorithms. This information is used to detect anomalies in behavior for future scans and compares for encryption performed by ransomware. It is worth noting that the method of the analysis is a lightweight process and is designed to efficiently use CPU cycles to lower compute overhead.
If an anomaly alert is generated, Radar is able to go deeper into the content of the files to look for signs of encryption and compute an encryption probability using a statistical model. This allows the analysis pipeline to compute entropy features to measure the level of encryption in the filesystem without the wastefulness of a “brute force” workflow.
Testing Known Live Ransomware Samples
While all of the conceptual design behind Radar may sound great, how do we know it works? After all, it’s not like you are going to let loose various ransomware variants into your production environment just to see if Radar sends over an alert, right? 🙂
Fortunately, the data scientists at Rubrik have your back. In order to ensure that Radar’s detection model adequately defends customer environments from cyber threats like ransomware, extensive testing was required.
Training the Model
Radar’s detection model was trained, validated, and tested against a large amount of real-world labeled data containing a diverse mix of snapshots from real-world usage, simulated usage, and snapshot changes caused by various ransomware and malicious activities.
We followed a standard ML practice of segmenting the labeled data into 3 categories: training, validation, and testing. This enabled us to ensure that the model was not overfit to the testing data; training and validation sets are used to tune the model, while testing data is used to evaluate the model on unseen data.
The result was a test accuracy of 99.90%, a false positive rate of 0.073%, and a true positive rate of 99.84%.
Production Resources Tested
We took a sample of 100 resources that consist of:
- Rubrik filesets of file servers
- Linux application servers
- Linux database servers
- Windows application servers
- Windows database servers
- Windows desktops
- Windows file servers
These objects have varying file system sizes ranging from gigabytes to terabytes. This range of applications and sizes served as a sample size representative of our broader customer base.
Ransomware Strains Tested
We started with the top ransomware by frequency of outbreak in order to prioritize the most likely types of infection. Unfortunately, some of these variants required communication with a functioning command & control center in order to initially activate, so the dormant samples were excluded from our testing.
The samples we obtained were:
- Cryptolocker
- Wannacry
- Locky
- Cryptxxx
- Cerber
- Petya
We observed two general types of ransomware behavior across these samples:
- File Encryption: The vast majority of the ransomware exhibited behavior where it targeted specific files with a known set of extensions. Once a file match was found, the file was copied, encrypted, renamed with a new extension, and the original good copy was deleted.
- Filesystem Metadata Encryption: Petya overwrote the bootloader and encrypted the NTFS filesystem metadata table, causing the filesystem to become unavailable.
Testing File Encryption Ransomware
Radar correctly flagged every ransomware matching the first type of behavior outlined above.
Overall, Radar performed excellently against all of the file encryption ransomware, successfully detecting each and every variant it was tested against with 100% accuracy.
Testing Filesystem Metadata Encryption Ransomware
Petya had a unique behavior amongst our samples in that it overwrote the bootloader and encrypted the NTFS filesystem metadata table. This lead to filesystem corruption of the source VM, which prevented indexing of the machine. Rubrik enables full volume recovery of both VMs and physical systems that have been infected with Petya ransomware.
For more on decoupling data from physical servers and providing an image level, or Bare Metal Recovery (BMR), restore process, see this excellent post from Mike Preston.
Conclusions
Three years ago, we set out on a mission to solve for customer pain around threats related to encryption and ransom. As our lead security engineer said, “With an effective backup solution, ransomware can ideally be reduced to a minor inconvenience.” Today, we can see that many of our customers are finally able to gather a clear picture into the anomalies that impact their environment on a regular basis.
As ransomware becomes increasingly sophisticated, successful attacks are more prevalent. To respond quickly, enterprises are adopting a holistic ransomware response strategy. The introduction of Rubrik Polaris Radar to our SaaS platform has expanded upon that idea to accelerate recovery from ransomware with minimal business disruption and data loss.
Interested in learning more? See how ASL Airlines France is building a multi-leveled defense strategy with Radar.