Senior Site Reliability Engineer (SRE) for Atlanta, GA (F2F interview – locals only)

 
 
 

Hi ,
Hope you are doing Good!!!

Please find the attached Job Description. If you feel comfortable then please send me your updated resume or call me back on 512-399-0788.

           

 

Position:             Senior Site Reliability Engineer (SRE)

Location:            Atlanta, GA (F2F interview – locals only)

Duration:            12+ months

 

No H1B

 

Job Overview:

As a Senior Site Reliability Engineer (SRE) with our Retail Technology team, you will be at the forefront of Cloud and Big Data technology. You'll play a key role in ensuring the reliability and performance of our critical applications and services. This position offers the opportunity to work with industry-leading technologies and establish yourself as a technical leader.

Key Responsibilities:

  • Implement, improve, and maintain monitoring, alerting, and logging solutions to detect and respond to incidents.
  • Collaborate closely with the development team to deploy applications and services, ensuring they meet reliability and performance standards.
  • Automate deployment, configuration management, and troubleshooting processes to streamline operations.
  • Participate in on-call rotation, triage production incidents, lead Root Cause Analysis (RCA) efforts, and implement preventive actions.
  • Serve as the escalation point for complex issues in both on-premise and AWS environments.

Qualifications:

  • Deep understanding of AWS services: (Lambda, S3, SQS, IAM, Route 53, etc.) and proficiency in infrastructure as code (e.g., Terraform, CloudFormation).
  • Hands-on experience with monitoring tools: (e.g., CloudWatch, Sumo Logic, Dynatrace, Grafana) for application performance monitoring and alerting.
  • Proficiency in scripting and automation: (e.g., Python, Bash) to build and maintain deployment pipelines and infrastructure.
  • Strong analytical and troubleshooting skills to diagnose and resolve complex infrastructure, application, and data issues.
  • Experience with containerization: (Docker, Kubernetes) and serverless architecture (AWS Lambda).

Core Responsibilities:

  • Manage and optimize data streaming and API components in OpenShift On-premise and AWS.
  • Proactively review application APIs and processes to identify opportunities for optimizing response times.
  • Automate various types of testing, including data quality checks, delivery to production, and deployment processes.
  • Develop integrations between On-premise applications, AWS, and third-party tools (ServiceNow, VersionOne, Sumo).
  • Collaborate with teams to create SLI/SLOs.
  • Monitor and lead troubleshooting of performance issues for platform applications, develop solutions, and document artifacts from root cause analysis.
  • Evolve the cloud infrastructure ecosystem by experimenting with emerging technologies.
  • Design and develop CI/CD pipelines for deploying application artifacts, APIs, and Data Process Jobs.
  • Maintain data integrity and access control using AWS security tools (e.g., HSM, IAM).
  • Develop tools to monitor AWS billing, generate cost-related reports, and implement cost optimization strategies.
  • Design and implement data security tools in collaboration with enterprise security architects.
  • Monitor and analyze platform capacity and performance, implementing elastic infrastructure as needed.
  • Contribute to backup strategies and disaster recovery solutions.
  • Provide continuous improvement input on design, performance, and security enhancements.

Desired Skillset:

  • Deep understanding of AWS cloud platforms.
  • Proficiency in automation, scripting, and monitoring using tools like OpenShift, CloudFormation, Terraform, Ansible, Shell, Python.
  • Strong technical knowledge of infrastructure layers (Linux OS, virtualization platforms, networking, storage, backup strategies).
  • Experience in end-to-end operations for enterprise systems and applications, including issue resolution for mission-critical systems.
  • Familiarity with CI/CD tools (Gitlab, Github, Jenkins, Maven, Gradle, Nexus).
  • Experience with Software Release Management.

Required Qualifications:

  • Education: BS in Computer Science or related technical field (or equivalent practical experience).
  • Experience:
    • 3+ years of DevOps/SysOps engineering experience focusing on major cloud platforms (AWS preferred).
    • 2+ years of application development experience, including data streaming and deploying/monitoring high-availability critical application components.
    • 1+ years of experience in Site Reliability Engineering is preferred.

 

 

 

Thanks & Regards-

Nishant Aggarwal || Mob:- 512-399-0788

E-mail 📩 [email protected]

linkedin.com/in/nishant-aggarwal-b7a5b39a

5900 Belcones drive Suit #100, Austin, TX , 78731

 

 

 

 

 

To unsubscribe from future emails or to update your email preferences click here

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments