close
close

Using Apono and PagerDuty for effective incident response at LabelBox

Session overview

In this webinar, you will learn how LabelBox used PagerDuty and Apono to create a new solution for faster, safer resolution of critical incidents.

presentation of the participants

Sharon Kisluk, Product Director at Apono

“I’m Sharon, the Product Lead at Apono. Today we want to talk about incident response and how to handle this issue. Every company inevitably experiences product downtime, and we want to ensure incident response is lightning fast to avoid SLA violations and minimize downtime for users and customers. The challenge is to balance the need for unlimited access, which enables faster incident response, with the best practice of restricting access to prevent potential damage to production.”

Mandi Walls, DevOps Advocate, Pagerduty

“I’m Mandy from PagerDuty. We started as an incident response platform to help responders get where they need to be on their platforms when incidents occur. Managing security and access to all components of your infrastructure is critical. We’ve seen many customers struggle with this and it’s great to see a product like Apono streamline these processes and provide the access needed when incidents occur.”

Aaron Bacchi, Sr. DevOps Engineer at Labelbox

“Hey, I’m Aaron from Labelbox. We provide a SaaS platform for training AI models, and the data and databases are critical. I’m on the security team and focus on cloud configuration and software security. Initially, our response to cloud SQL database incidents was to give all developers access to static service accounts, which was not secure. We needed a better solution.

Importance of a break-glass solution

The challenge

Dealing with break-glass situations was a major challenge for the Labelbox team. They had to balance security concerns with the need to maintain productivity. The team found that using shared service accounts for database access was not secure, so a more robust solution was required.

Aaron’s insights

“We knew we needed a break-glass system because we once had a critical incident where a key responder was denied access. This incident underscored the importance of having a flexible and reliable break-glass solution.”

Mandy’s insights

“From PagerDuty’s perspective, customers often face challenges in architecting microservices, determining access requirements, and ensuring compliance reporting. Having an auditable access trail is essential in the event of an incident, and flexibility in access management is critical to resolving production issues.”

Sharon’s insights

“Managing access to sensitive resources is about balancing risk with speed of response. Customer data is highly sensitive and even minimal access can be risky. When responding to incidents, you need to ensure rapid access while preventing overuse. Apono’s policies enable this balance by providing rapid access during incidents and allowing responders to call in additional help when needed without compromising security.”

Towards a solution

When it comes to incident response, a robust and flexible access management system is critical. The use of Apono and PagerDuty effectively meets this need by leveraging a combination of tools and processes to ensure engineers have the access they need when they need it, while maintaining strict security controls. Let’s take a closer look at how this innovative approach was implemented and the key components that make it successful.

Integration with Apono and PagerDuty

The integration between Apono and PagerDuty is a critical aspect of the solution. By leveraging these tools, Aaron was able to create a seamless workflow that improves incident response efficiency:

  1. Apono’s role: Apono allows the configuration of flows integrated with Pagerduty. It allows automatic approval and revocation of access based on predefined criteria and schedules.
  2. The role of PagerDuty: PagerDuty manages the incident response process, including shift changes and incident notifications, ensuring the right personnel are notified and can take immediate action.
  3. Combined workflow: By integrating these tools, Aaron has created a workflow that dynamically manages access requests. Technicians in the PagerDuty environment can approve access requests directly through Apono, ensuring that only the necessary permissions are granted when they are needed.

The “Break Glass” Google Group

One of the core components of the solution is the Google Break-Glass group. This group serves as a temporary access point for engineers who are not part of the regular database administration team but need immediate access in the event of an incident. Here’s how it works:

  1. Incident identification: When an incident occurs, the engineer responsible for resolving the issue can request access to the Break Glass Google group.
  2. Approval process: This request is forwarded via PagerDuty to the on-duty database administrator who has the authority to approve access. This step ensures that only authorized personnel can grant access.
  3. Temporary access: Once approved, the engineer is added to the Google group, which grants him the necessary permissions to interact with the production database. This access is time-limited, usually two hours, to minimize security risks.

Employee training and implementation

Implementing a new system requires appropriate training and team buy-in. Aaron addressed this by holding bi-weekly technical meetings and creating comprehensive documentation:

  1. Technical discussions: These sessions provided an opportunity to present the new system to the entire engineering team and demonstrate its benefits and how it works in practice.
  2. documentation: A detailed Confluence page has been created to document the steps and procedures for using the new system. This resource is an invaluable resource for engineers who need to refresh their knowledge or learn the system for the first time.

Consideration of compliance and security concerns

One of the key benefits of this system is its ability to effectively address compliance and security concerns:

  1. Auditing: Apono provides a complete audit trail of all access requests, including who made the request, who approved it, and the exact times of access and revocation. This detailed logging is critical for compliance reporting and internal audits.
  2. observance: Implementing this system allows Aaron’s team to meet strict compliance requirements. The detailed control and audit features ensure that access to confidential data is strictly controlled and documented.
  3. Security: Time-limited access and integration with PagerDuty ensure that permissions are only granted when absolutely necessary and are automatically revoked once the incident is resolved.

Aaron’s innovative approach to incident response and access management demonstrates how combining the right tools and processes can create a secure, efficient and compliant system. By leveraging the “Break-Glass” Google group, integrating Apono with PagerDuty and ensuring thorough training and documentation, Aaron has successfully improved his team’s ability to respond to incidents quickly and safely. This solution not only improves operational efficiency, but also meets the high standards required for auditing and compliance in today’s complex IT environments.

***This is a syndicated blog from Apono of the Security Bloggers Network, written by Ofir Stein. Read the original post at: https://www.apono.io/blog/leveraging-apono-and-pagerduty-for-effective-incident-response-at-labelbox/