Role :- SRE Lead(Site Reliability Engineer) Lead
Location :- Houston, TX(Onsite)
Primary skillset – New Relic developer, Automation/Programing (Python or Java), Linux
Job Summary:
We seek an experienced SRE Lead to lead our team in ensuring system reliability, performance, and scalability. The candidate will drive infrastructure automation, optimize performance, and lead incident management, while fostering a culture of continuous improvement
Key Responsibilities:
Technical Leadership: Build and mentor a team of SREs; set goals, conduct reviews, and drive SRE best practices.
System Reliability: Oversee the design and maintenance of high-availability systems; lead performance monitoring and issue resolution.
Automation & CI/CD: Lead development of automation scripts and enhance CI/CD pipelines using tools like Terraform, Ansible, and others
Observability: Deploy and manage tools (e.g., New Relic) for system monitoring; develop dashboards and alerts
Incident Management: Lead Root Cause Analysis (RCA) and refine incident response processes
Performance Optimization: Provide strategic insights to enhance application and database performance (Java, Kafka, SQL)
Qualifications:
Proven experience managing SRE or related teams in an eCommerce or highly distributed systems environment.
Strong skills in automation tools (Terraform, Ansible) and observability solutions (New Relic), with an emphasis on managing large-scale distributed systems.
Experience working with SAP modules in conjunction with custom applications or microservices architectures.
Good understanding of storage technologies (SAN/NAS), network infrastructure (load balancers, firewalls), and their impact on system performance in high-throughput environments.
Background in optimizing performance for Java-based applications, Spring Boot services, Kafka message brokers, SQL/NoSQL databases, and middleware components.
Familiarity with middleware technologies such as Kafka in distributed environments.
Excellent leadership, problem-solving, communication skills with experience working cross-functionally between development teams, infrastructure teams, and business stakeholders.