Johannesburg
FULL_TIME
Skilled work
Global MNC Tech is seeking an experienced SRE Observability Engineer to join our remote operations team. This role is crucial for ensuring the reliability, performance, and scalability of our monitoring systems across global infrastructure. You will design, implement, and maintain observability frameworks that provide real-time insights into system health, enabling proactive identification and resolution of performance issues. The ideal candidate combines expertise in site reliability engineering with a deep understanding of observability tools, cloud environments, and large-scale distributed systems.
Develop and maintain observability pipelines, including metrics, logs, traces, and alerts for complex distributed systems.
Implement monitoring solutions using modern observability platforms such as Prometheus, Grafana, OpenTelemetry, or Datadog.
Collaborate with development and operations teams to design scalable and reliable monitoring architectures.
Proactively identify system bottlenecks, anomalies, and performance issues, and drive incident resolution processes.
Automate alerting, reporting, and dashboards to support both technical and business stakeholders.
Participate in on-call rotation to ensure high availability and rapid incident response.
Conduct post-incident analysis, document findings, and recommend improvements for system reliability and performance.
Contribute to the adoption of best practices for monitoring, observability, and SRE within the organization.
Strong knowledge of Site Reliability Engineering (SRE) principles and practices.
Expertise with observability tools and frameworks such as Prometheus, Grafana, Elasticsearch, OpenTelemetry, Datadog, or Splunk.
Hands-on experience with cloud platforms (AWS, Azure, GCP) and containerized environments (Kubernetes, Docker).
Proficiency in scripting and automation (Python, Go, Bash, or similar).
Solid understanding of distributed systems, microservices, and networking concepts.
Experience with CI/CD pipelines and infrastructure as code tools (Terraform, Ansible, etc.).
Strong analytical and problem-solving skills with attention to detail.
Excellent communication skills for collaborating with cross-functional teams and reporting system metrics effectively.
Minimum 3–5 years of experience in Site Reliability Engineering, DevOps, or observability roles.
Proven track record of implementing and maintaining monitoring systems in large-scale production environments.
Experience in incident management, root cause analysis, and reliability engineering.
Full-time, remote role with flexible hours.
May require participation in on-call rotation and occasional support outside regular business hours for critical incidents.
Ability to analyze complex systems and identify potential reliability risks.
Strong collaboration skills to work effectively with development, operations, and product teams.
Ability to write clear technical documentation and dashboards for both technical and non-technical stakeholders.
High adaptability to fast-paced, evolving environments and new technologies.
Strong commitment to continuous learning and applying best practices in SRE and observability.
Competitive salary and performance-based bonuses.
Comprehensive health, dental, and vision insurance.
Flexible remote work arrangements and generous paid time off.
Professional development support, including training and certification reimbursement.
Access to cutting-edge technologies and large-scale, global infrastructure.
At Global MNC Tech, you will be part of a forward-thinking, global team that values innovation, collaboration, and operational excellence. You will work on highly scalable systems that impact millions of users worldwide while enjoying the flexibility and autonomy of a remote-first work culture. This role offers a unique opportunity to shape the observability practices of a global technology leader while advancing your career in Site Reliability Engineering.
Interested candidates are invited to submit their resume and a cover letter outlining their relevant experience and achievements to us. Please include SRE Observability Engineer – Remote Monitoring Systems in the subject line.