Product Quality at Rubrik - Part 1

At Rubrik, we are on a mission to Secure the World’s Data and we consider product quality a top priority. In this blog, we will talk about the automated test strategy we follow at Rubrik to ensure the best quality products for our customers.

Before we deep dive into our test strategy and the process we follow, let’s quickly understand what product quality means and why it’s important to our organization as well as our customers.

Importance of Product Quality

What is Product Quality?

Product quality refers to how well a product satisfies customer needs, serves its purpose and meets industry standards. Everyone has their own perspective when assessing a product’s quality but ultimately it boils down to the value the product brings to the customer, given its cost. The value could be determined by the product’s stability, reliability, durability, performance, and security of the feature set it offers.

Why is Product Quality important?

Quality of a product can make or break an organization and we understand that quality products help establish a good reputation for Rubrik in the customer marketplace. Our products are responsible for managing and securing highly valuable customer data that powers our customers’ businesses.

In the Data Security industry, our solution needs to be on active duty at the exact moment our customers experience trouble within systems protected by us. Given that these problems are complex, providing an extremely secure and simple user experience could be challenging. At Rubrik, we strive to build products with great quality that help you recover the needed data whenever you need to in a simple, secure and fast manner.

Strategy to Ensure Product Quality

We believe quality is more than making a good product. Quality has to be part of every step in our Engineering process. Let’s understand what goes into our Engineering Design Process.

Product Quality at Rubrik - Part 1 Diagram 1

Along the way in our Engineering process, when evaluating our product quality, we ask ourselves a few questions to make sure we are building the right product and doing the right thing.

Is it solving a customer problem?
Is it simple and easy to use?
Is it secure?
Is it robust and reliable?
Is it efficient?

For us to be able to iterate fast and deliver a high-quality product that satisfies the customer needs, we should be able to gauge the product quality as frequently and as fast as possible. This is where automation helps us immensely and we automate the steps that need to be repeated frequently in our entire software development life cycle.

Why automation?

Manually repeating the tests is expensive and time consuming. Automated testing saves time and money by enabling us to repeat the execution of test sets efficiently and effectively.

Automated testing of our product components and feature set is one of the most important things we do day-in and day-out, since it allows us to provide feedback faster to our developer community. For us to be able to test in an efficient manner, we adopted the following strategy as shown in the below test pyramid.

Product Quality at Rubrik - Part 1 Diagram 2

Here’s the list of different kinds of testing we rely on at Rubrik.

Unit Testing
Component Testing
Integration Testing
E2E Testing
- Functional Testing
- Non-Functional Testing
  - Performance, Stress, Scale, Longevity, Security, Upgrade, Platform, etc

Unit Testing

As we can see in the Test Pyramid picture above, at Rubrik, Unit tests (UTs) form the base of our testing and we rely heavily on them as they are fast and inexpensive. These are our first level of defense in the developer workflow. In unit tests, the code-under-test (CUT) is the individual unit of source code. A unit is defined as a small piece of code, typically a class or library. These tests are run in an isolated environment without relying on any external resources. This is where we test the different paths like the happy path, error handling, fault injection, etc. We also measure the code coverage by these UTs for each component to understand where it stands. This guides our teams to uncover the gaps or areas to add the missing UTs and drive towards better product quality. We also provide the code coverage stats for every diff that’s authored by our developers to provide early feedback.

Component Testing

Component tests validate components and the CUT, here, is a component. We define a component as a collection of software which is deployed within a single process in production. Components are typically assembled into a binary and deployed as a service, e.g. a thrift or a gRPC server. A component generally consists of multiple units which would be unit tested separately before a component test is written.

Integration Testing

These are our next layer of tests to identify any anomalies in our product. Unit and Component tests verify the quality and integrity of a single component or service. Our complex system has multiple components or services running which interact with each other. In order to understand how a component or service under test works or interacts with others, we use integration tests. Hence, the CUT for integration test is a set of components for which integration is being tested. In contrast to component tests, CUT comprises of code that is deployed in different processes in production.

End-to-End Testing

The next level of tests are the End-to-End (E2E) tests. These are used to test the entire product functionality by testing user workflows end to end on systems deployed with all the services. The CUT for an E2E test is the entire system.

Since these tests have a lot of external dependencies, they could be fragile. Failures found using these tests are expensive, take a lot of time and resources to debug and rerun as:

We have a large set of dependencies(all services are deployed) being exercised and the issues could occur somewhere deep in the stack.
We have a large set of commits between two runs as E2E tests are run at a lesser cadence than Unit, Component and Integration tests.

As part of E2E testing, we cover a lot of ground by testing Upgrade, Performance, Security, Stress, Scale, Longevity, etc. While we have UI testing automated, we also manually verify the UI to make sure it displays as expected to our customers’ naked eye.

Here’s a real-world analogy on how we should test our products.

Product Quality at Rubrik - Part 1 Diagram 3

In this case, the right way is to test each unit and component to build the door completely before it is installed at a home. Similarly, strategy is followed when we test our products as explained above.

Conclusion

Without automation, it’s very hard to do things repetitively which makes it hard to gauge and maintain product quality over a long period. Having a good test strategy is key to ensuring the product meets the desired quality standards and customers expectations in a systematic way.

We have had our fair share and unique set of challenges in delivering exceptional quality products at a fast pace. We had to ensure our infrastructure is stable, reliable, secure and resilient as we have different kinds of hardware and software that can help validate our products at system level. We will talk more about our challenges in the upcoming blogs.

As a part of Product Quality at Rubrik blog series, we will next talk about Automated Testing: Iterate and Deliver Faster.

blogpost | 16 min read | Feb 16, 2024

Product Quality at Rubrik - Part 2

In this blog, let’s dive deeper into one key aspect of automated testing - E2E testing, which ensures our solutions function seamlessly from start to finish resulting in products that meet our customer expectations.

blogpost | 14 min read | Aug 9, 2024

Product Quality at Rubrik - Part 3

Delve into how Rubrik’s Testing Process and Infrastructure enabled our Engineering community with faster iteration and efficient delivery of our Products.

Products

Solutions

Resources

Partners