Smartwork IT Services
Observability Engineer - Monitoring/Log Management Tools
Job Location
hyderabad, India
Job Description
Job Title : Observability Engineer Location : Hyderabad Experience : 5-10 Years Job Description We are looking for a highly skilled Observability Engineer to design, develop, and maintain observability solutions that provide deep visibility into our infrastructure, applications, and services. You will be responsible for implementing monitoring, logging, and tracing solutions to ensure the reliability, performance, and availability of our systems. Working closely with development, Infra Engineers, DevOps, and SRE teams, you will play a critical role in optimizing system observability and improving incident response. Key Responsibilities : - Design and implement observability solutions for monitoring, logging, and tracing across cloud and on-premises environments. - Develop and maintain monitoring tools such as Prometheus, Grafana, Datadog, New Relic, and AppDynamics. - Implement distributed tracing using OpenTelemetry, Jaeger, Zipkin, or similar tools to improve application performance and troubleshooting. - Optimize log management and analysis with tools like Elasticsearch, Splunk, Loki, or Fluentd. - Create alerting and anomaly detection strategies to proactively identify system issues and reduce mean time to resolution (MTTR). - Collaborate with development and SRE teams to enhance observability in CI/CD pipelines and microservices architectures. - Automate observability processes using scripting languages like Python, Bash, or Golang. - Ensure scalability and efficiency of monitoring solutions to handle large-scale distributed systems. - Support incident response and root cause analysis by providing actionable insights through observability data. - Stay up to date with industry trends in observability and site reliability engineering (SRE). Required Qualifications : - 5 years of experience in observability, SRE, DevOps, or a related field. - Proficiency in observability tools such as Prometheus, Grafana, Datadog, New Relic, or AppDynamics. - Experience with logging platforms like Elasticsearch, Splunk, Loki, or Fluentd. - Strong knowledge of distributed tracing (OpenTelemetry, Jaeger, Zipkin). - Hands-on experience with Azure cloud platforms and Kubernetes. - Proficiency in scripting languages (Python, Bash, PowerShell) and infrastructure as code (Terraform, Ansible). - Solid understanding of system performance, networking, and troubleshooting. - Strong problem-solving and analytical skills. - Excellent communication and collaboration abilities. Preferred Qualifications : - Experience with AI-driven observability and anomaly detection. - Familiarity with microservices, serverless architectures, and event-driven systems. - Experience working with on-call rotations and incident management workflows. - Relevant certifications in observability tools, cloud platforms, or SRE practices. (ref:hirist.tech)
Location: hyderabad, IN
Posted Date: 5/9/2025
Location: hyderabad, IN
Posted Date: 5/9/2025
Contact Information
Contact | Human Resources Smartwork IT Services |
---|