Hi,
Hope you are doing well
I do have a position from one of our client. Below is the job description, let me know if you are interested.
Job title:– Hadoop Platform Engineer
Location:- Dallas, TX, (Onsite)
End-Client:- (Working with Implementation Partner)
need hadoop administrator or hadoop developer
localto texas only
work authorisation:- Usc, Gc, Gcead, H4ead
Job Description:-
Banking background is preferred
Required Skills:
Platform Engineering:
Cluster Management:
- Expertise in design, implement, and maintain Hadoop clusters in large volume, including components such as HDFS, YARN, and MapReduce.
- Collaborate with data engineers and data scientists to understand data requirements and optimize data pipelines.
Administration and Monitoring:
- Experience in administering and monitoring Hadoop clusters to ensure high availability, reliability, and performance.
- Experience in troubleshooting and resolving issues related to Hadoop infrastructure, data ingestion, data processing, and data storage.
Security Implementation:
- Experience in Implementing and managing security measures within Hadoop clusters, including authentication, authorization, and encryption.
Backup and Disaster Recovery:
- Collaborate with cross-functional teams to define and implement backup and disaster recovery strategies for Hadoop clusters.
Performance Optimization:
- Experience in optimizing Hadoop performance through fine-tuning configurations, capacity planning, and implementing performance monitoring and tuning techniques.
Automation and DevOps Collaboration:
- Work with DevOps teams to automate Hadoop infrastructure provisioning, deployment, and management processes.
Technology Adoption and Recommendations:
- Stay up to date with the latest developments in the Hadoop ecosystem.
- Recommend and implement new technologies and tools that enhance the platform.
Documentation:
- Experience in documenting Hadoop infrastructure configurations, processes, and best practices.
Technical Support and Guidance:
- Provide technical guidance and support to other team members and stakeholders.
Admin:
User Interface Design:
- Relevant for designing interfaces for tools within the Hadoop ecosystem that provide self-service capabilities, such as Hadoop cluster management interfaces or job scheduling dashboards.
Role-Based Access Control (RBAC):
- Important for controlling access to Hadoop clusters, ensuring that users have appropriate permissions to perform self-service tasks.
Cluster Configuration Templates:
- Useful for maintaining consistent configurations across Hadoop clusters, ensuring that users follow best practices and guidelines.
Resource Management:
- Important for optimizing resource utilization within Hadoop clusters, allowing users to manage resources dynamically based on their needs.
Self-Service Provisioning:
- Pertinent for features that enable users to provision and manage nodes within Hadoop clusters independently.
Monitoring and Alerts:
- Essential for monitoring the health and performance of Hadoop clusters, providing users with insights into their cluster's status.
Automated Scaling:
- Relevant for automatically adjusting the size of Hadoop clusters based on workload demands.
Job Scheduling and Prioritization:
- Important for managing data processing jobs within Hadoop clusters efficiently.
Self-Service Data Ingestion:
- Applicable to features that facilitate users in ingesting data into Hadoop clusters independently.
Query Optimization and Tuning Assistance:
- Relevant for providing users with tools or guidance to optimize and tune their queries when interacting with Hadoop-based data.
Documentation and Training:
- Important for creating resources that help users understand how to use self-service features within the Hadoop ecosystem effectively.
Data Access Control:
- Pertinent for controlling access to data stored within Hadoop clusters, ensuring proper data governance.
Backup and Restore Functionality:
- Applicable to features that allow users to perform backup and restore operations for data stored within Hadoop clusters.
Containerization and Orchestration:
- Relevant for deploying and managing applications within Hadoop clusters using containerization and orchestration tools.
User Feedback Mechanism:
- Important for continuously improving self-service features based on user input and experience within the Hadoop ecosystem.
Cost Monitoring and Optimization:
- Applicable to tools or features that help users monitor and optimize costs associated with their usage of Hadoop clusters.
Compliance and Auditing:
Relevant for ensuring compliance with organizational policies and auditing user activities within the Hadoop ecosystem.
Data Engineering:
ETL (Extract, Transform, Load) Processes:
- Proficiency in designing and implementing ETL processes for ingesting, transforming, and loading data into Hadoop clusters.
- Experience with tools like Apache NiFi
Data Modeling and Database Design:
- Understanding of data modeling principles and database design concepts.
- Ability to design and implement effective data storage structures in Hadoop.
SQL and Query Optimization:
- Strong SQL skills for data extraction and analysis from Hadoop-based data stores.
- Experience in optimizing SQL queries for efficient data retrieval.
Streaming Data Processing:
- Familiarity with real-time data processing and streaming technologies, such as Apache Kafka and Spark Streaming.
- Experience in designing and implementing streaming data pipelines.
Data Quality and Governance:
- Knowledge of data quality assurance and governance practices.
- Implementing measures to ensure data accuracy, consistency, and integrity.
Workflow Orchestration:
- Experience with workflow orchestration tools (e.g., Apache Airflow) to manage and schedule data processing workflows.
- Automating and orchestrating data pipelines.
Data Warehousing Concepts:
- Understanding of data warehousing concepts and best practices.
- Integrating Hadoop-based solutions with traditional data warehousing systems.
Version Control:
- Proficiency in version control systems (e.g., Git) for managing and tracking changes in code and configurations.
Collaboration with Data Scientists:
- Collaborate effectively with data scientists to understand analytical requirements and support the deployment of machine learning models.
Data Security and Compliance:
- Implementing security measures within data pipelines to protect sensitive information.
- Ensuring compliance with data security and privacy regulations.
Data Catalog and Metadata Management:
- Implementing data catalog solutions to manage metadata and enhance data discovery.
- Enabling metadata-driven data governance.
Big Data Technologies Beyond Hadoop:
- Familiarity with other big data technologies beyond Hadoop, such as Apache Flink or Apache Beam.
Data Transformation and Serialization:
- Expertise in data serialization formats (e.g., Avro, Parquet) and transforming data between formats.
Data Storage Optimization:
- Optimizing data storage strategies for cost-effectiveness and performance.
Desired Skills:
Problem-Solving and Analytical Thinking:
- Strong analytical and problem-solving skills to troubleshoot complex issues in Hadoop clusters.
- Ability to analyze data requirements and optimize data processing workflows.
Collaboration and Teamwork:
- Collaborative mindset to work effectively with cross-functional teams, including data engineers, data scientists, and DevOps teams.
- Ability to provide technical guidance and support to team members.
Adaptability and Continuous Learning:
- Ability to adapt to changes in technology and industry trends within the Hadoop ecosystem and willingness to continuously learn and upgrade skills to stay current.
Performance Monitoring and Tuning:
- Proactive approach to performance monitoring and tuning, ensuring optimal performance of Hadoop clusters.
- Ability to analyze and address performance bottlenecks.
Security Best Practices:
- knowledge of security best practices within the Hadoop ecosystem.
Capacity Planning:
- Skill in capacity planning to anticipate and scale Hadoop clusters according to data processing needs.
Automation and Scripting:
- Strong scripting skills for automation (e.g., Python, Ansible) beyond shell scripting. Familiarity with configuration management tools for infrastructure automation.
Monitoring and Observability:
- Experience in setting up comprehensive monitoring and observability tools for Hadoop clusters. Ability to proactively identify and address potential issues.
Networking Skills:
- Understanding of networking concepts relevant to Hadoop clusters.
Skills:
Technical Proficiency:
- Experience with Hadoop and Big Data technologies, including Cloudera CDH/CDP, Data Bricks, HD Insights, etc.
- Strong understanding of core Hadoop services such as HDFS, MapReduce, Kafka, Spark, Hive, Impala, HBase, Kudu, Sqoop, and Oozie.
- Proficiency in RHEL Linux operating systems, databases, and hardware administration.
Operations and Design:
- Operations, design, capacity planning, cluster setup, security, and performance tuning in large-scale Enterprise Hadoop environments.
Scripting and Automation:
- Proficient in shell scripting (e.g., Bash, KSH) for automation.
Security Implementation:
- Experience in setting up, configuring, and managing security for Hadoop clusters using Kerberos with integration with LDAP/AD.
Problem Solving and Troubleshooting:
- Expertise in system administration and programming skills for storage capacity management, debugging, and performance tuning.
Collaboration and Communication:
- Collaborate with cross-functional teams, including data engineers, data scientists, and DevOps teams.
- Provide technical guidance and support to team members and stakeholders.
Skills:
- On-prem instance
- Hadoop config, performance, tuning
- Ability to manage very large clusters and understand scalability
- Interfacing with multiple teams
- Many teams have self service capabilities, so should have this experience managing this with multiple teams across large clusters.Hands-on and strong understanding of Hadoop architecture
- Experience with Hadoop ecosystem components – HDFS, YARN, MapReduce & cluster management tools like Ambari or Cloudera Manager and related technologies.
- Proficiency in scripting, Linux system administration, networking, and troubleshooting skills
click here