Hadoop Platform Engineer | Dallas, TX |(Onsite)

Hi,

Hope you are doing well

I do have a position from one of our client. Below is the job description, let me know if you are interested.

Job title:–  Hadoop Platform Engineer

Location:- Dallas, TX, (Onsite)

End-Client:- (Working with Implementation Partner)

must local to dallas tx only

Job Description:-

 Banking background is preferred

 Required Skills: 

Platform Engineering: 

Cluster Management: 

  • Expertise in design, implement, and maintain Hadoop clusters in large volume, including components such as HDFS, YARN, and MapReduce. 
  • Collaborate with data engineers and data scientists to understand data requirements and optimize data pipelines. 

Administration and Monitoring: 

  • Experience in administering and monitoring Hadoop clusters to ensure high availability, reliability, and performance. 
  • Experience in troubleshooting and resolving issues related to Hadoop infrastructure, data ingestion, data processing, and data storage. 

Security Implementation: 

  • Experience in Implementing and managing security measures within Hadoop clusters, including authentication, authorization, and encryption. 

Backup and Disaster Recovery: 

  • Collaborate with cross-functional teams to define and implement backup and disaster recovery strategies for Hadoop clusters. 

Performance Optimization: 

  • Experience in optimizing Hadoop performance through fine-tuning configurations, capacity planning, and implementing performance monitoring and tuning techniques. 

Automation and DevOps Collaboration: 

  • Work with DevOps teams to automate Hadoop infrastructure provisioning, deployment, and management processes. 

Technology Adoption and Recommendations: 

  • Stay up to date with the latest developments in the Hadoop ecosystem. 
  • Recommend and implement new technologies and tools that enhance the platform. 

Documentation: 

  • Experience in documenting Hadoop infrastructure configurations, processes, and best practices. 

Technical Support and Guidance: 

  • Provide technical guidance and support to other team members and stakeholders. 

Admin: 

User Interface Design: 

  • Relevant for designing interfaces for tools within the Hadoop ecosystem that provide self-service capabilities, such as Hadoop cluster management interfaces or job scheduling dashboards. 

Role-Based Access Control (RBAC): 

  • Important for controlling access to Hadoop clusters, ensuring that users have appropriate permissions to perform self-service tasks. 

Cluster Configuration Templates: 

  • Useful for maintaining consistent configurations across Hadoop clusters, ensuring that users follow best practices and guidelines. 

Resource Management: 

  • Important for optimizing resource utilization within Hadoop clusters, allowing users to manage resources dynamically based on their needs. 

Self-Service Provisioning: 

  • Pertinent for features that enable users to provision and manage nodes within Hadoop clusters independently. 

Monitoring and Alerts: 

  • Essential for monitoring the health and performance of Hadoop clusters, providing users with insights into their cluster's status. 

Automated Scaling: 

  • Relevant for automatically adjusting the size of Hadoop clusters based on workload demands. 

Job Scheduling and Prioritization: 

  • Important for managing data processing jobs within Hadoop clusters efficiently. 

Self-Service Data Ingestion: 

  • Applicable to features that facilitate users in ingesting data into Hadoop clusters independently. 

Query Optimization and Tuning Assistance: 

  • Relevant for providing users with tools or guidance to optimize and tune their queries when interacting with Hadoop-based data. 

Documentation and Training: 

  • Important for creating resources that help users understand how to use self-service features within the Hadoop ecosystem effectively. 

Data Access Control: 

  • Pertinent for controlling access to data stored within Hadoop clusters, ensuring proper data governance. 

Backup and Restore Functionality: 

  • Applicable to features that allow users to perform backup and restore operations for data stored within Hadoop clusters. 

Containerization and Orchestration: 

  • Relevant for deploying and managing applications within Hadoop clusters using containerization and orchestration tools. 

User Feedback Mechanism: 

  • Important for continuously improving self-service features based on user input and experience within the Hadoop ecosystem. 

Cost Monitoring and Optimization: 

  • Applicable to tools or features that help users monitor and optimize costs associated with their usage of Hadoop clusters. 

Compliance and Auditing: 

Relevant for ensuring compliance with organizational policies and auditing user activities within the Hadoop ecosystem. 

 Data Engineering: 

ETL (Extract, Transform, Load) Processes: 

  • Proficiency in designing and implementing ETL processes for ingesting, transforming, and loading data into Hadoop clusters. 
  • Experience with tools like Apache NiFi 

Data Modeling and Database Design: 

  • Understanding of data modeling principles and database design concepts. 
  • Ability to design and implement effective data storage structures in Hadoop. 

SQL and Query Optimization: 

  • Strong SQL skills for data extraction and analysis from Hadoop-based data stores. 
  • Experience in optimizing SQL queries for efficient data retrieval. 

Streaming Data Processing: 

  • Familiarity with real-time data processing and streaming technologies, such as Apache Kafka and Spark Streaming. 
  • Experience in designing and implementing streaming data pipelines. 

Data Quality and Governance: 

  • Knowledge of data quality assurance and governance practices. 
  • Implementing measures to ensure data accuracy, consistency, and integrity. 

Workflow Orchestration: 

  • Experience with workflow orchestration tools (e.g., Apache Airflow) to manage and schedule data processing workflows. 
  • Automating and orchestrating data pipelines. 

Data Warehousing Concepts: 

  • Understanding of data warehousing concepts and best practices. 
  • Integrating Hadoop-based solutions with traditional data warehousing systems. 

Version Control: 

  • Proficiency in version control systems (e.g., Git) for managing and tracking changes in code and configurations. 

Collaboration with Data Scientists: 

  • Collaborate effectively with data scientists to understand analytical requirements and support the deployment of machine learning models. 

Data Security and Compliance: 

  • Implementing security measures within data pipelines to protect sensitive information. 
  • Ensuring compliance with data security and privacy regulations. 

Data Catalog and Metadata Management: 

  • Implementing data catalog solutions to manage metadata and enhance data discovery. 
  • Enabling metadata-driven data governance. 

Big Data Technologies Beyond Hadoop: 

  • Familiarity with other big data technologies beyond Hadoop, such as Apache Flink or Apache Beam. 

Data Transformation and Serialization: 

  • Expertise in data serialization formats (e.g., Avro, Parquet) and transforming data between formats. 

Data Storage Optimization: 

  • Optimizing data storage strategies for cost-effectiveness and performance. 

 Desired Skills: 

Problem-Solving and Analytical Thinking: 

  • Strong analytical and problem-solving skills to troubleshoot complex issues in Hadoop clusters. 
  • Ability to analyze data requirements and optimize data processing workflows. 

Collaboration and Teamwork: 

  • Collaborative mindset to work effectively with cross-functional teams, including data engineers, data scientists, and DevOps teams. 
  • Ability to provide technical guidance and support to team members. 

Adaptability and Continuous Learning: 

  • Ability to adapt to changes in technology and industry trends within the Hadoop ecosystem and willingness to continuously learn and upgrade skills to stay current. 

Performance Monitoring and Tuning: 

  • Proactive approach to performance monitoring and tuning, ensuring optimal performance of Hadoop clusters. 
  • Ability to analyze and address performance bottlenecks. 

Security Best Practices: 

  • knowledge of security best practices within the Hadoop ecosystem. 

Capacity Planning: 

  • Skill in capacity planning to anticipate and scale Hadoop clusters according to data processing needs. 

Automation and Scripting: 

  • Strong scripting skills for automation (e.g., Python, Ansible) beyond shell scripting. Familiarity with configuration management tools for infrastructure automation. 

Monitoring and Observability: 

  • Experience in setting up comprehensive monitoring and observability tools for Hadoop clusters. Ability to proactively identify and address potential issues. 

Networking Skills: 

  • Understanding of networking concepts relevant to Hadoop clusters. 

  Skills: 

Technical Proficiency: 

  • Experience with Hadoop and Big Data technologies, including Cloudera CDH/CDP, Data Bricks, HD Insights, etc. 
  • Strong understanding of core Hadoop services such as HDFS, MapReduce, Kafka, Spark, Hive, Impala, HBase, Kudu, Sqoop, and Oozie. 
  • Proficiency in RHEL Linux operating systems, databases, and hardware administration. 

Operations and Design: 

  • Operations, design, capacity planning, cluster setup, security, and performance tuning in large-scale Enterprise Hadoop environments. 

Scripting and Automation: 

  • Proficient in shell scripting (e.g., Bash, KSH) for automation. 

Security Implementation: 

  • Experience in setting up, configuring, and managing security for Hadoop clusters using Kerberos with integration with LDAP/AD. 

Problem Solving and Troubleshooting: 

  • Expertise in system administration and programming skills for storage capacity management, debugging, and performance tuning. 

Collaboration and Communication: 

  • Collaborate with cross-functional teams, including data engineers, data scientists, and DevOps teams. 
  • Provide technical guidance and support to team members and stakeholders. 

 Skills: 

  • On-prem instance 
  • Hadoop config, performance, tuning 
  • Ability to manage  very large clusters and understand scalability 
  • Interfacing with multiple teams 
  • Many teams have self service capabilities, so should have this experience managing this with multiple teams across large clusters.Hands-on and strong understanding of Hadoop architecture 
  • Experience with Hadoop ecosystem components – HDFS, YARN, MapReduce & cluster management tools like Ambari or Cloudera Manager and related technologies. 
  • Proficiency in scripting, Linux system administration, networking, and troubleshooting skills 
 
 

click here

0 0 votes
Article Rating
Subscribe
Notify of
guest


0 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments