Senior SRE - Chaos Engineering Practitioner

Location

Dhanbad

Job Type

FULL_TIME

Experience

Skilled work

Job Description

Job Summary

Global MNC Tech is seeking an experienced and highly motivated Senior Site Reliability Engineer (SRE) – Chaos Engineering Practitioner to join our advanced reliability engineering team. This role is designed for professionals who are passionate about building resilient, scalable, and fault-tolerant distributed systems.

As a Senior SRE specializing in Chaos Engineering, you will lead initiatives to proactively identify system weaknesses, improve platform reliability, and embed resilience into our core digital infrastructure. You will work closely with software engineers, cloud architects, DevOps teams, and business stakeholders to ensure our systems can withstand failures and continue to deliver exceptional user experiences at scale.

This position offers a strategic and hands-on opportunity to shape reliability practices across enterprise-grade platforms used by millions of users worldwide.


Key Responsibilities

  • Design, implement, and lead Chaos Engineering experiments across cloud-native and distributed systems.

  • Develop and maintain reliability frameworks, SLOs, SLIs, and error budgets.

  • Identify systemic risks, single points of failure, and performance bottlenecks.

  • Collaborate with engineering teams to embed resilience into CI/CD pipelines.

  • Automate reliability testing, fault injection, and recovery processes.

  • Analyze incident data and conduct post-incident reviews (blameless postmortems).

  • Improve observability using monitoring, logging, and tracing tools.

  • Lead resilience workshops and mentor junior SREs and engineers.

  • Influence architectural decisions with a reliability-first mindset.

  • Partner with security and infrastructure teams to enhance system robustness.


Required Skills and Qualifications

  • Strong experience in Site Reliability Engineering (SRE) or DevOps roles.

  • Deep knowledge of Chaos Engineering principles and tools (e.g., Gremlin, Chaos Mesh, Litmus, Chaos Monkey).

  • Proficiency in cloud platforms (AWS, Azure, or GCP).

  • Hands-on experience with Kubernetes, Docker, and microservices architectures.

  • Strong programming skills in Python, Go, Java, or similar languages.

  • Expertise in monitoring and observability tools (Prometheus, Grafana, Datadog, New Relic, Splunk).

  • Solid understanding of CI/CD pipelines and infrastructure as code (Terraform, CloudFormation).

  • Experience with distributed systems, networking, and performance engineering.

  • Excellent communication and stakeholder management skills.


Experience

  • 6+ years of experience in SRE, DevOps, or Platform Engineering.

  • 3+ years of hands-on experience implementing Chaos Engineering practices.

  • Proven track record of working with large-scale, high-availability systems.

  • Experience in enterprise or high-growth technology environments preferred.


Working Hours

  • Full-time position (40 hours per week).

  • Flexible working hours with global team collaboration.

  • Remote or hybrid work model depending on location.


Knowledge, Skills and Abilities

  • Advanced knowledge of distributed systems and cloud infrastructure.

  • Strong problem-solving and analytical thinking abilities.

  • Ability to design experiments and translate findings into actionable insights.

  • Leadership skills with the ability to influence engineering culture.

  • High level of ownership, accountability, and attention to detail.

  • Ability to thrive in fast-paced, complex technical environments.

  • Strong documentation and knowledge-sharing skills.


Benefits

  • Competitive salary and performance-based bonuses.

  • Work-from-home or hybrid flexibility.

  • Comprehensive health and wellness benefits.

  • Learning and development budget for certifications and training.

  • Career progression into Staff/Principal SRE roles.

  • Paid time off, holidays, and flexible leave policies.

  • Access to cutting-edge tools and technologies.

  • Inclusive and diverse global work culture.


Why Join Global MNC Tech?

At Global MNC Tech, we believe reliability is not an afterthought—it is a core business strategy. You will be part of a forward-thinking organization that invests in engineering excellence and innovation.

This role offers:

  • The opportunity to shape enterprise reliability practices.

  • High-impact projects affecting global systems.

  • Collaboration with world-class engineers and architects.

  • A culture that values experimentation, learning, and growth.

  • Long-term career stability with a global technology leader.


How to Apply

Interested candidates are invited to submit their updated resume along with a brief cover letter highlighting their experience in Site Reliability Engineering and Chaos Engineering practices.

Shortlisted candidates will be contacted for a multi-stage interview process, including technical discussions and system design assessments.

Additional Details

Similar Jobs

Apply Now