Site Reliability Engineer (SRE) Engineer – Dallas, TX–Plano, TX (Onsite)

Post author:recruitmatlog
Post published:January 21, 2025
Post category:JOBS
Post comments:0 Comments

Greetings,

Here is our open requirement which can be filled immediately. Kindly respond to this requirement with your resume, contact and current location info to speed up the interview process.

Role- Site Reliability Engineer (SRE) Engineer

Location- Dallas, TX–Plano, TX (Onsite)

Duration: 12 months Contract

Client: TCS

Job Description:

This SRE role will primarily involve learning GPU clusters, assisting in bringing up these systems, and developing automation to keep them operational, as well as working with various other DC GPU teams to incorporate requirements and address any issues on the systems- Specific responsibilities o working with the platform engineering team to develop an automate management of an infrastructure control panel unemployment system for GPU clusterso working with the release engineering team to automate the application of updates and system configuration management toolso resolution of problem tickets reported by internal and external customers for GPU cluster systemso develop and enhance internal and 3rd party network and cluster management tools, applications, and processes that enable internal teams and clusters to build, test, optimize high performance networks supporting large scale GPU clusterso assist in developing these software ecosystem needed for at scale cluster operations providing cluster as a service for internal and customer access systems. This responsibility includes some involvement with rakan stack data center operations, add skill software install and configuration management, and add scale system provisioning helping to build and operate an on Prem cloud service for internal stakeholders that form a model for customer adoptiono helping to create an enterprise class operational model for internal cluster systems that provide or reliable, secure, automated infrastructure for rapid response to changing requirements, efficient use of assets, and a reference template for customer adoptiono participate in a strong customer centric culture focused on meeting commitments

Key Skills: Site Reliability Engineer, Python, Admin, Puppet, Chef, KVM, docker, podman, openShift, Kubernetes, Git, etc.

Education: At least a bachelor’s degree (or equivalent experience) in Computer Science, Software/Electronics Engineering, Information Systems, or closely related field is required.

Thanks & Regards,

Pavan Yantrapati,

PH: +1 904-242-1993 Ext 352

Email: [email protected]

To unsubscribe from future emails or to update your email preferences click here

0 0 votes

Article Rating

0 Comments

Most Voted

Newest Oldest

Inline Feedbacks

View all comments

SEARCH BELOW BY STRING OR TECHNOLOGY e.g. JAVA AND SPRINGBOOT AND AWS

IF ANYTHING IS MISSING FROM THE POST LIKE EMAIL ID OR JOB DESCRIPTION ITSELF - PLEASE COMMENT BELOW THE POST ITSELF - WE WILL UPDATE WITHIN AN HOUR. BUT ALL THE POSTS AND JDs WILL BE UPDATED WITHIN 24 HOURS.

Follow your Favourite Recruiters in RECRUIT DISCUZ by clicking on the Radio signal type below their Avatars and get alerted whenever they post a new Requirement or HOTLIST in RECRUIT DISCUZ. See the below image for ref.

Share

You Might Also Like

PST Candidates Only// Urgently Looking For SQL Database Admin// 100% Remote// Pacific Clinic// USC or GC Only

SR financial – Functional Support Role| Center Valley, PA| (Hybrid)

Job Opening || QA Lead (Playwright, Typescript, C#) || Remote (Washington) – (Need Locals to Washington Only)