EPAM Systems

Lead Operational Intelligence Developer

Job Location

Brazil

Job Description

2 weeks ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. We are looking for a highly experienced and dynamic Lead Operational Intelligence Developer to join our team. In this role, you will take ownership of leading the development, maintenance, and enhancement of our Elastic & Observability Platform deployed across GCP and Elastic Cloud. You will drive strategic initiatives, guide a high-performing technical team, and ensure platform reliability while fostering innovation and enabling self-service capabilities for platform consumers. This position also involves participating in an on-call rotation to oversee platform health and functionality. Responsibilities Oversee the availability, functionality, performance, and security of observability and search platforms to exceed business SLAs Provide technical leadership during complex incidents and escalate resolutions promptly during on-call periods Develop and maintain comprehensive platform documentation, standard operating procedures, and knowledge-sharing resources Collaborate with cross-functional teams, stakeholders, and vendors to oversee operational requirements, drive strategic initiatives, and manage installations, troubleshooting, and upgrades Lead the enhancement of platform features and self-service capabilities, including advanced Elastic Synthetics and chargeback automation Architect and implement proof-of-concepts for platform innovation, such as AI-driven observability, advanced data processing models, or Kubernetes-based platform migration Supervise the building, deployment, and maintenance of Elastic clusters using Infrastructure-as-Code (IaC) tools like Terraform and Ansible, while mentoring team members on best practices Oversee platform lifecycle management activities, including component upgrades, capacity planning, cost optimization, and evolving compliance requirements Continuously assess and fine-tune ELK stack performance, including ingestion, indexing, and query optimization for large-scale environments Establish and enhance comprehensive alerting and incident management workflows, integrating sophisticated monitoring tools such as Kibana Rules, Watchers, and PagerDuty Supervise the ingestion, enrichment, backup, and restoration of large-scale platform data while optimizing data workflows Lead and plan critical operational events such as SSL certificate rotations, cluster migrations, or scalability optimization projects Requirements 5 years of experience in Operational Intelligence, with a proven track record of leadership and technical expertise in managing large-scale observability platforms Demonstrated ability to architect and manage Elastic clusters in complex, multi-cloud environments In-depth knowledge of Elastic Stack components, including advanced configurations of Elasticsearch, Kibana, and Logstash Advanced proficiency in Infrastructure-as-Code (IaC) tools like Terraform and Ansible, with demonstrated flexibility in adapting other tools like Jenkins CI or GitOps frameworks Advanced Python scripting skills for automation, data processing, and extending platform interoperability Deep understanding of incident management frameworks and workflows with tools like PagerDuty, Uptrends, and other enterprise monitoring solutions Proven expertise in troubleshooting and resolving complex platform challenges under tight SLAs Strong capability in managing and scaling fault-tolerant platforms while ensuring performance, security, and compliance across large distributed systems Demonstrated ability to mentor and grow team members, manage priorities, and act as a bridge between technical and non-technical teams Excellent command of English (B2 level), both written and spoken, with a strong emphasis on technical communication skills Nice to have Expertise in scripting with Groovy or experience in advanced Linux administration to optimize platform processes Track record of optimizing observability workflows with additional integrations or customizations in tools like Uptrends, PagerDuty, or Elastic features Hands-on experience with advanced Elastic Synthetics setups for robust monitoring and custom synthetic testing frameworks Experience driving strategic initiatives such as modernization through AI tooling, cloud-native transitions, or cost-saving observability optimizations We offer International projects with top brands Work with global teams of highly skilled, diverse peers Employee financial programs Paid time off and sick leave Upskilling, reskilling and certification courses Unlimited access to the LinkedIn Learning library and 22,000 courses Global career opportunities Volunteer and community involvement opportunities EPAM Employee Groups Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn Seniority level Mid-Senior level Employment type Full-time Job function Information Technology, Engineering, and Business Development Industries Software Development, IT Services and IT Consulting, and Venture Capital and Private Equity Principals Referrals increase your chances of interviewing at EPAM Systems by 2x Get notified about new Project Lead Developer jobs in Brazil . SAP Project Manager – Global Governance & AMS J-18808-Ljbffr

Location: Brazil, BR

Posted Date: 11/3/2025
View More EPAM Systems Jobs

Contact Information

Contact Human Resources
EPAM Systems

Posted

November 3, 2025
UID: 5398981305

AboutJobs.com does not guarantee the validity or accuracy of the job information posted in this database. It is the job seeker's responsibility to independently review all posting companies, contracts and job offers.