Site Reliability Engineering (SRE) Tech Lead Job at Obsidian Security, Palo Alto, CA

dHpnL3F6S3hVY3NzRi8vZUZCWndTVzgwT1E9PQ==
  • Obsidian Security
  • Palo Alto, CA

Job Description

Founded in 2017 Obsidian Security was created to close a critical gap: securing the SaaS applications where modern business happensplatforms like Microsoft 365 Salesforce and hundreds more.

Backed by top investors including Greylock Norwest Venture Partners and IVP weve built a complete SaaS security platform to reduce risk detect and respond to threats and prevent breaches at the source. Our team includes leaders who helped define the categories of endpoint and identity security at CrowdStrike Okta Cylance and Carbon Black.

Now were transforming how SaaS is securedin the era of agentic AI.

Today Obsidian is trusted by global enterprises like Snowflake T-Mobile and Pure Storage. We protect more than 200 organizations across North America Europe the Middle East Southeast Asia Australia and New Zealandincluding many of the worlds largest Fortune 1000 and Global 2000 companies.

With strong global momentum a growing partner ecosystem including SentinelOne Databricks and Google Cloud and a major fundraise on the horizon were scaling quickly toward long-term growth and IPO readiness. Join us as we define the future of SaaS security!

Site Reliability Engineering (SRE) Tech Lead

Role Overview

As the SRE Tech Lead at Obsidian you will define and build the reliability foundation for a complex multi-tenant SaaS platform serving enterprise and financial customers. You will operate as a peer to the DevOps and Platform Engineering leads driving a unified reliability strategy across the organization.

Your core mandate: ensure Obsidian detects every system failure before customers doand communicates proactively when issues arise.

This is a hands-on technical leadership role with high ownership and visibility reporting directly to the CTO. You will architect and implement systems that handle real-world complexity: upstream SaaS dependencies sparse and noisy data and mission-critical enterprise workloads.

Key Responsibilities:

  • Map and instrument critical system paths for top-tier enterprise customers
  • Build connector health models to classify issues:
    • Internal defects (our bug)
    • Upstream SaaS outages
    • Expected sparse/low-signal scenarios
  • Establish tiered incident communication:
    • Public status page for all customers
    • Direct outreach for high-priority accounts
  • Define and begin rollout of SLI/SLO standards across microservices
  • Develop self-service instrumentation tooling enabling engineering teams to own observability
  • Implement baseline-aware anomaly detection across all connectors (beyond static thresholds)
  • Mature incident response processes including:
    • Structured post-mortems
    • Continuous reliability improvements

Required Qualifications

  • 7 years in SRE production engineering or similar roles
  • 2 years operating as a technical lead
  • Deep expertise with:
    • AWS and/or GCP
    • Kubernetes Helm
    • Observability stack (Prometheus Grafana)
    • CI/CD systems (GitLab CI/CD ArgoCD)
  • Proven experience building monitoring for multi-tenant SaaS systems with complex data pipelines
  • Strong debugging skills across distributed microservices and legacy systems
  • Hands-on engineering mindset able to instrument services directly not just configure tooling
  • Track record of building or significantly improving incident detection and response systems

Preferred Qualifications

  • Experience in B2B SaaS serving enterprise or financial customers
  • Familiarity with third-party SaaS connector ingestion patterns
  • Experience building anomaly detection systems or baseline-aware alerting
  • Experience implementing customer-facing status pages and incident communication frameworks

Why This Role

  • Direct impact: Work closely with the CTO and shape company-wide reliability strategy
  • Greenfield opportunity: Build a detection and reliability platform from the ground up
  • Technically challenging: Solve for multi-tenant systems with upstream dependencies and sparse data
  • High stakes: Protect systems relied upon by major financial institutions

What Success Looks Like

  • Obsidian consistently detects and diagnoses issues before customers are impacted
  • Clear proactive communication builds customer trust during incidents
  • Engineering teams independently own observability through scalable tooling
  • Reliability becomes a measurable continuously improving capability across the platform

If youre excited about building systems that make failure predictableand invisible to customersthis role offers both the challenge and the ownership to do it right.

Employee Benefits

Our competitive benefits packages are designed to support our employees well-being both at work and at home. Our US based employees enjoy:

  • Competitive compensation with equity and 401k
  • Comprehensive healthcare with dental and vision coverage
  • Flexible paid time off and paid holiday time off
  • 12 weeks of new parent or family leave
  • Personal and professional development resources

For more details on our US benefits or for information on our international benefits please see here.

Pay Transparancy

Please note that the base pay range is a guideline and for candidates who receive an offer the base pay will vary based on factors such as work location as well as the knowledge skills and experience of the addition to a competitive base salary this position is eligible for equity awards and may be eligible for sales commission or incentive compensation based on the role or function within the company.

At Obsidian we are proud to be an equal-opportunity employer. We value diversity and hire for talent passion and compliance with federal law all persons hired will be required to submit satisfactory proof of identity and legal authorization. If you have a need that requires accommodation please contact

Information collected and processed as part of any job applications you choose to submit is subject to Obsidians Applicant Privacy Policy.

Base Salary Range

$250000 - $280000 USD

Required Experience:

Staff IC

Job Tags

Full time, Work from home, Flexible hours

Similar Jobs

Gpac

Construction Safety Manager Job at Gpac

We are seeking an experienced Safety Manager to lead and drive safety initiatives across heavy civil construction projects. This role will oversee jobsite safety compliance, training, and culture across projects involving earthwork, utilities, roadways, bridges, and infrastructure... 

Croo

Growth Marketing Intern Job at Croo

 ...short form clips, saving them the time and money normally spent on editing. Job Overview: We are looking for a dedicated marketer looking to develop and execute campaigns to grow Croos user base and waitlist. You will be working directly with the Co-Founders... 

Nigel Frank

GP Consultant Job at Nigel Frank

 ...effective system solutions. Key requirements: * Strong proficiency with Microsoft Dynamics GP (Great Plains)* Experience with EDI integrations (SPS Commerce or TrueCommerce)* Knowledge of Finance / Accounting, Manufacturing, and Inventory modules * Ability to... 

Scattermesh

Senior VFX Compositor Job at Scattermesh

Job Summary Scattermesh is expanding its team and we are looking to hire experienced VFX Compositors to work across client projects on a full time basis in Herndon, VA. The role requires a skilled artist with experience in VFX compositing and a basic understanding of...

NP Now

OB/GYN -1:10 Call Job at NP Now

 ...Thriving non-profit medical group is looking for an OB/GYN Physician in the Yuma, Arizona area! (1hr West of Tucson) Step into a well staffed and established OB/GYN clinic! Contact: Raymond Stiles (***) ***-**** ****@*****.*** About the Group :...