Exploro Solutions

Site Reliability Engineer - Prometheus/Grafana

Job Location

bangalore, India

Job Description

Job Role : Site Reliability Engineer YOE : 4 to 7 yrs Key Responsibilities : Payment Monitoring and Alert Triage : - Monitoring of the Payments Flow Based Alerts across multiple applications in rotation 24 X 7 shifts and identify the issue proactively. - Triage the alerts by analysing the trends on affected dimensions of payment flow, and co-relate the same with other services metrics, logs and traces to find the root cause along with the documentation of triage. - Ensure timely escalation and closure of issues reported while working with Engineering Teams of payment Services. Observability Development : - Design and implement alerting frameworks using tools like Datadog, Grafana, Kiban a, Splunk, and Prometheus. - Set up custom dashboards and streamline alerting to reduce noise while ensuring critical issues are addressed. - Drive the adoption of SLO-based alerting, burn rate metrics, and anomaly detection techniques. Incident Management : - Lead incident management efforts from identification to resolution. - Conduct post-incident reviews and implement preventive measures to avoid recurring issues. - Maintain detailed documentation and performance reports on incident trends and team efficiency. Automation and Optimization : - Automate repetitive processes using programming languages like Python or Java. - Develop and refine scripts to manage and fine-tune alerts. - Collaborate with engineering teams to implement solutions that reduce manual effort and operational toil. Required Skills and Qualifications : - Proven expertise in SRE Observability Concepts and monitoring architecture design. - Extensive experience with alerting frameworks like Prometheus, Grafana, Kibana, Splunk, and Datadog. - Hands-on experience with alert noise reduction and advanced alerting techniques such as anomaly detection and burn rate alerting. - Strong proficiency in incident management, including analysis, root cause identification, and preventive measures. - Familiarity with payment monitoring systems and operational requirements. - Proficient in automation tools and scripting languages like Python or Java. - Excellent collaboration and communication skills to interact with cross-functional teams. - Flexibility to work in rotational 24x7 shifts from the office. Notice Period : Immediate to 20 days (ref:hirist.tech)

Location: bangalore, IN

Posted Date: 5/1/2025
View More Exploro Solutions Jobs

Contact Information

Contact Human Resources
Exploro Solutions

Posted

May 1, 2025
UID: 5114729661

AboutJobs.com does not guarantee the validity or accuracy of the job information posted in this database. It is the job seeker's responsibility to independently review all posting companies, contracts and job offers.