Data Engineer- Big Data Engineer

Post author:recruitmatlog
Post published:January 27, 2025
Post category:JOBS
Post comments:0 Comments

Role:Data Engineer- Big Data Engineer
Location::Remote
Client:-HCL
Job description:
Job Overview:
We’re seeking a highly skilled Data Engineer, Big Data Engineer to build scalable data pipelines, develop ML models, and integrate big data systems. You'll work with structured, semi-structured,
and unstructured data, focusing on optimizing data systems, building ETL pipelines, and deploying AI models in cloud environments.
Key Responsibilities:
Data Ingestion: Build scalable ETL pipelines using Apache Spark, Talend, AWS Glue, Google Dataflow, Apache NiFi. Ingest data from APIs, file systems, and databases.
Data TransformationValidation: Use Pandas, Apache Beam, and Dask for data cleaning, transformation, and validation. Automate data quality checks with Pytest, Unittest.
Big Data Systems: Process large datasets with Hadoop, Kafka, Apache Flink, Apache Hive. Stream real-time data using Kafka, Google Cloud PubSub.
Task Queues: Manage asynchronous processing with Celery, RQ, RabbitMQ, or Kafka. Implement retry mechanisms and track task status.
Scalability: Optimize for performance with distributed processing (Spark, Flink), parallelization (joblib), and data partitioning.
CloudStorage: Work with AWS, Azure, GCP, Databricks. Store and manage data with S3, BigQuery, Redshift, Synapse Analytics, and HDFS.

Required Skills:
ETL Data Processing: Expertise in Apache Spark, AWS Glue, Google Dataflow, Talend.
Big Data Tools: Proficient with Hadoop, Kafka, Apache Flink, Hive, Presto.
Databases: Strong experience with MySQL, PostgreSQL, MongoDB, Cassandra.
Machine Learning: Hands-on with TensorFlow, PyTorch, Scikit-learn, XGBoost.
Cloud Platforms: Experience with AWS, Azure, GCP, Databricks.
Task Management: Familiar with Celery, RQ, RabbitMQ, Kafka.
Version Control: Git for source code management.
Desirable Skills:
Real-time Data Processing: Experience with Apache Pulsar, Google Cloud PubSub.
Data Warehousing: Familiarity with Redshift, BigQuery, Synapse Analytics.
Scalability Optimization: Knowledge of load balancing (NGINX, HAProxy) and parallel processing.
Data Governance: Use of MLflow, DVC, or other tools for model and data versioning.

Tools Technologies:
ETL: Apache Spark, Talend, AWS Glue, Google Dataflow.
Big Data: Hadoop, Kafka, Apache Flink, Presto.
Databases: MySQL, PostgreSQL, MongoDB, Cassandra.
Cloud: AWS, GCP, Azure, Databricks.
Storage: S3, BigQuery, Redshift, Synapse Analytics, HDFS.
Version Control: Git.

0 0 votes

Article Rating

0 Comments

Most Voted

Newest Oldest

Inline Feedbacks

View all comments

SEARCH BELOW BY STRING OR TECHNOLOGY e.g. JAVA AND SPRINGBOOT AND AWS

IF ANYTHING IS MISSING FROM THE POST LIKE EMAIL ID OR JOB DESCRIPTION ITSELF - PLEASE COMMENT BELOW THE POST ITSELF - WE WILL UPDATE WITHIN AN HOUR. BUT ALL THE POSTS AND JDs WILL BE UPDATED WITHIN 24 HOURS.

Follow your Favourite Recruiters in RECRUIT DISCUZ by clicking on the Radio signal type below their Avatars and get alerted whenever they post a new Requirement or HOTLIST in RECRUIT DISCUZ. See the below image for ref.

Share

You Might Also Like

Urgent Closable Position : JAVA SQL With Financial Domain Middle Office

Urgently looking for ServiceNow BA with AI ops and data analytics experience // REMOTE — Santa Clara, California // NOH1B

Python AWS Developer (Testing Focus)