Seabra
FULL_TIME
Skilled work
Global MNC Tech is seeking an experienced and highly motivated SRE Observability Engineer to join our Platform Operations team. In this role, you will design, implement, and maintain observability solutions that ensure the reliability, performance, and scalability of our global technology platforms. You will work closely with engineering teams to proactively detect, analyze, and resolve issues, leveraging cutting-edge monitoring, logging, and tracing technologies. This is a fully remote position offering the opportunity to influence the stability and resilience of mission-critical systems at a global scale.
Develop and maintain observability strategies for distributed systems, including metrics, logging, tracing, and alerting solutions.
Collaborate with development and operations teams to improve system reliability and performance.
Implement monitoring tools, dashboards, and automated alerts to proactively detect system anomalies.
Analyze incidents and outages, perform root cause analysis, and contribute to postmortem reports.
Optimize and scale observability pipelines and infrastructure to handle large volumes of data.
Continuously evaluate and integrate new observability technologies and best practices.
Mentor and guide other team members in observability practices and SRE principles.
Drive reliability initiatives through capacity planning, performance tuning, and fault-tolerant system design.
Strong experience with observability tools and frameworks (e.g., Prometheus, Grafana, ELK Stack, OpenTelemetry, Jaeger).
Proficiency in cloud environments (AWS, Azure, GCP) and container orchestration platforms (Kubernetes, Docker).
Solid understanding of SRE principles, reliability engineering, and incident management.
Experience with programming or scripting languages (Python, Go, Bash, or similar).
Strong analytical, problem-solving, and troubleshooting skills.
Familiarity with CI/CD pipelines and automation practices.
Excellent communication skills, with the ability to convey technical concepts to diverse teams.
Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or related roles.
Proven track record in implementing observability solutions for large-scale distributed systems.
Hands-on experience with incident response and production system troubleshooting.
Full-time, remote role with flexibility to support global operations across multiple time zones.
May require occasional participation in on-call rotations for incident response.
Deep understanding of distributed systems architecture and operational challenges.
Ability to design scalable and resilient monitoring and alerting frameworks.
Strong organizational and project management skills.
Ability to work independently while collaborating effectively with remote cross-functional teams.
Passion for continuous improvement and learning emerging observability technologies.
Competitive salary and performance-based bonuses.
Comprehensive health, dental, and vision insurance.
Flexible remote work arrangements and paid time off.
Professional development opportunities, including certifications and training.
Access to cutting-edge tools and technologies in a global engineering environment.
At Global MNC Tech, we are committed to innovation, excellence, and employee growth. You will join a highly skilled, diverse team dedicated to building reliable and scalable systems that impact millions of users worldwide. If you are passionate about reliability, observability, and system performance, this is your opportunity to make a tangible impact while advancing your career in a dynamic, global setting.
Interested candidates are invited to submit their resume and cover letter through our careers portal.
Please include relevant experience with observability tools, SRE practices, and distributed systems in your application.