Sr SRE not Devops || Midtown, NY (Hybrid) || No H1b/OPT
End-client: iCapital || Working with a Prime Vendor
We need a Sr (10+ years) Site Reliability Engineer/SRE (NOT PURE DEVOPS) with excellent experience working with AWS (Certifications preferred).
Candidates must have experience architecting, implementing, and managing monitoring tools such as Prometheus/Grafana, CloudWatch, Splunk, NewRelic and ELK in the cloud.
Strong Linux OS-level and command-line/scripting knowledge and configuration management principles as well as Experience with computer provisioning on a Cloud based platform using Terraform and/or Cloud formation.
Job Description:
The Platform Infrastructure team at iCapital plays a critical role in keeping the production and development environments running smoothly and securely. This role will utilize advanced cloud capabilities to facilitate the Platform Infrastructure strategy of market agility and lean operating principles with a strict focus on quality to meet the ever-growing demands of our clients. iCapital is seeking a highly collaborative, creative, and intellectually curious Platform Infrastructure Engineer who is passionate about forming and implementing cutting-edge cloud computing capabilities. This Platform Infrastructure Engineer will wear multiple hats in a highly visible role, interacting with all aspects of the business is essential.
Responsibilities:
Build highly available solutions across the entire SDLC stack with a primary focus on an internet-facing fintech site.
Develop and maintain tools to support the development environment on MacOS and Linux tool environments, focusing on improving developer productivity.
Maintain site reliability with a focus on building highly scalable systems, integrating resiliency and high availability at all levels.
Develop software and tooling to secure and automate cloud infrastructure building software delivery capabilities with fully automatic workflows.
Design and operation of a Kubernetes environment for container management and orchestration.
Participate in on-call rotations to help understand the system while helping build tools for automation.
Qualifications:
10+ years of DevOps, TechOps, or SRE experience with 5+ years of AWS experience
Microservices (Docker, Kubernetes) experience in a production environment strongly desired
Strong Linux OS-level and command-line/scripting knowledge and configuration management principles
Working knowledge of databases such as MongoDB, Postgres, and DynamoDB
Experience in architecting, implementing, and managing monitoring tools such as Prometheus/Grafana, CloudWatch, Splunk, NewRelic, and ELK in the cloud
Coding beyond simple scripting with strong opinions on maintainable/reusable code in Python, Ruby, or Java desired
Experience with computer provisioning on a Cloud-based platform using Terraform and/or Cloud formation
Experience with distributed systems design, maintenance, and hands-on troubleshooting/debugging skills
Exceptional analytical skills, able to apply knowledge and experience in decision-making to arrive at creative and commercial solutions
Experience building a Microservice based architecture
Excellent written and verbal communication skills
Experience in updating runbooks, tools, and documentation that help the team to respond to incidents proactively
Able to design and implement complex, but easily managed, automated infrastructure
A desire to share, teach, and learn as part of a team