Site Reliability Engineer - SaaS Product Monitoring

Location

Cochin

Job Type

FULL_TIME

Experience

Skilled work

Job Description

Job Summary

Global MNC Tech is seeking a highly skilled and proactive Site Reliability Engineer (SRE) to join our growing SaaS engineering team. In this role, you will be responsible for ensuring the reliability, availability, scalability, and performance of our cloud-based SaaS platforms. You will work at the intersection of software engineering and systems operations, focusing on monitoring, automation, incident response, and continuous improvement of system health.

As an SRE, you will play a critical role in building resilient systems, implementing advanced monitoring solutions, and driving a culture of reliability across engineering teams. This is an exciting opportunity to work on mission-critical products used by global customers and contribute directly to business success.


Key Responsibilities

  • Design, implement, and maintain end-to-end monitoring, alerting, and observability solutions for SaaS products.

  • Ensure high availability and reliability of production systems through proactive monitoring and automation.

  • Manage incident response processes, including root cause analysis (RCA) and post-incident reviews.

  • Develop and maintain SLIs, SLOs, and SLAs to measure and improve system performance.

  • Automate operational tasks using scripting and infrastructure-as-code principles.

  • Collaborate with development teams to improve system architecture, scalability, and fault tolerance.

  • Perform capacity planning and performance tuning for cloud-based services.

  • Implement best practices for logging, tracing, and metrics collection.

  • Continuously improve CI/CD pipelines to support reliable deployments.

  • Document operational procedures and create runbooks for system support.


Required Skills and Qualifications

  • Strong experience in Site Reliability Engineering, DevOps, or Production Engineering roles.

  • Proficiency in cloud platforms such as AWS, Azure, or Google Cloud Platform.

  • Hands-on experience with monitoring and observability tools (Prometheus, Grafana, Datadog, New Relic, ELK, Splunk).

  • Solid understanding of Linux/Unix systems and networking concepts.

  • Experience with containerization and orchestration tools (Docker, Kubernetes).

  • Knowledge of scripting languages such as Python, Bash, or Go.

  • Familiarity with Infrastructure as Code tools (Terraform, CloudFormation, Ansible).

  • Understanding of CI/CD tools (Jenkins, GitHub Actions, GitLab CI, Azure DevOps).

  • Strong problem-solving, debugging, and analytical skills.

  • Excellent communication and collaboration abilities.


Experience

  • 3–7 years of experience in Site Reliability Engineering, DevOps, or similar roles.

  • Experience supporting large-scale SaaS or cloud-native applications.

  • Proven track record in incident management and system reliability improvements.

  • Experience working in Agile/Scrum environments is preferred.


Working Hours

  • Full-time position, 40 hours per week.

  • Flexible working hours with overlap across global teams.

  • On-call rotation may be required for critical production support.

  • Remote or hybrid work options available depending on location.


Knowledge, Skills and Abilities

  • Deep understanding of system architecture, distributed systems, and microservices.

  • Ability to design highly available and fault-tolerant systems.

  • Strong troubleshooting and root cause analysis skills.

  • Ability to work under pressure in incident scenarios.

  • Proactive mindset with a focus on automation and continuous improvement.

  • Strong documentation and knowledge-sharing abilities.

  • Business awareness to align reliability goals with customer needs.


Benefits

  • Competitive salary and performance-based incentives.

  • Comprehensive health, life, and accident insurance.

  • Flexible work arrangements (remote/hybrid options).

  • Learning and development programs, certifications, and training support.

  • Paid time off, holidays, and wellness programs.

  • Access to global projects and cutting-edge technologies.

  • Employee assistance and mental wellness support.


Why Join Global MNC Tech?

At Global MNC Tech, we believe reliability is the foundation of customer trust. You will be part of a world-class engineering organization that values innovation, collaboration, and continuous learning. We provide a culture where your ideas matter, your skills are recognized, and your career growth is actively supported. You will work on impactful SaaS products used by customers worldwide and play a key role in shaping the future of cloud reliability.


How to Apply

Interested candidates are invited to submit their updated resume along with a brief cover letter highlighting their experience in Site Reliability Engineering and SaaS operations. Shortlisted candidates will be contacted for technical interviews and further assessment.

Additional Details

Similar Jobs

Apply Now