Hi , Hope you are doing Good!!!
Please find the attached Job Description. If you feel comfortable then please send me your updated resume or call me back on 512-399-0788.
Position: Senior Site Reliability Engineer (SRE)
Location: Atlanta, GA (F2F interview – locals only)
Duration: 12+ months
No H1B
Job Overview:
As a Senior Site Reliability Engineer (SRE) with our Retail Technology team, you will be at the forefront of Cloud and Big Data technology. You'll play a key role in ensuring the reliability and performance of our critical applications and services. This position offers the opportunity to work with industry-leading technologies and establish yourself as a technical leader.
Key Responsibilities:
- Implement, improve, and maintain monitoring, alerting, and logging solutions to detect and respond to incidents.
- Collaborate closely with the development team to deploy applications and services, ensuring they meet reliability and performance standards.
- Automate deployment, configuration management, and troubleshooting processes to streamline operations.
- Participate in on-call rotation, triage production incidents, lead Root Cause Analysis (RCA) efforts, and implement preventive actions.
- Serve as the escalation point for complex issues in both on-premise and AWS environments.
Qualifications:
- Deep understanding of AWS services: (Lambda, S3, SQS, IAM, Route 53, etc.) and proficiency in infrastructure as code (e.g., Terraform, CloudFormation).
- Hands-on experience with monitoring tools: (e.g., CloudWatch, Sumo Logic, Dynatrace, Grafana) for application performance monitoring and alerting.
- Proficiency in scripting and automation: (e.g., Python, Bash) to build and maintain deployment pipelines and infrastructure.
- Strong analytical and troubleshooting skills to diagnose and resolve complex infrastructure, application, and data issues.
- Experience with containerization: (Docker, Kubernetes) and serverless architecture (AWS Lambda).
Core Responsibilities:
- Manage and optimize data streaming and API components in OpenShift On-premise and AWS.
- Proactively review application APIs and processes to identify opportunities for optimizing response times.
- Automate various types of testing, including data quality checks, delivery to production, and deployment processes.
- Develop integrations between On-premise applications, AWS, and third-party tools (ServiceNow, VersionOne, Sumo).
- Collaborate with teams to create SLI/SLOs.
- Monitor and lead troubleshooting of performance issues for platform applications, develop solutions, and document artifacts from root cause analysis.
- Evolve the cloud infrastructure ecosystem by experimenting with emerging technologies.
- Design and develop CI/CD pipelines for deploying application artifacts, APIs, and Data Process Jobs.
- Maintain data integrity and access control using AWS security tools (e.g., HSM, IAM).
- Develop tools to monitor AWS billing, generate cost-related reports, and implement cost optimization strategies.
- Design and implement data security tools in collaboration with enterprise security architects.
- Monitor and analyze platform capacity and performance, implementing elastic infrastructure as needed.
- Contribute to backup strategies and disaster recovery solutions.
- Provide continuous improvement input on design, performance, and security enhancements.
Desired Skillset:
- Deep understanding of AWS cloud platforms.
- Proficiency in automation, scripting, and monitoring using tools like OpenShift, CloudFormation, Terraform, Ansible, Shell, Python.
- Strong technical knowledge of infrastructure layers (Linux OS, virtualization platforms, networking, storage, backup strategies).
- Experience in end-to-end operations for enterprise systems and applications, including issue resolution for mission-critical systems.
- Familiarity with CI/CD tools (Gitlab, Github, Jenkins, Maven, Gradle, Nexus).
- Experience with Software Release Management.
Required Qualifications:
- Education: BS in Computer Science or related technical field (or equivalent practical experience).
- Experience:
- 3+ years of DevOps/SysOps engineering experience focusing on major cloud platforms (AWS preferred).
- 2+ years of application development experience, including data streaming and deploying/monitoring high-availability critical application components.
- 1+ years of experience in Site Reliability Engineering is preferred.
Thanks & Regards-
Nishant Aggarwal || Mob:- 512-399-0788
E-mail 📩 [email protected]
linkedin.com/in/nishant-aggarwal-b7a5b39a
5900 Belcones drive Suit #100, Austin, TX , 78731
|
|
|