Data Engineer

Post author:recruitmatlog
Post published:June 17, 2024
Post category:JOBS
Post comments:0 Comments

Role: Data Engineer
Location: Remote
Rate: $65/h – $68/h
Visa: OPT, H1, GC EAD, GC, USC

Note: Please share your Updated resume along with Work Authorization

Must to have skills
Python
Pyspark
SQL
Data Engineering

Job Description
We’re seeking a Data Engineer to take the lead in implementing and scaling data
collection, storage, processing, and filtering for fine-tuning large language models (LLMs) within
Conversational Engineering. These data pipelines are crucial for powering our cutting-edge
research, safety systems, and product development. If you’re passionate about working with
data and are eager to create solutions that directly impact the advancement of LLMs, we’d love
to hear from you. This role provides the exciting opportunity to collaborate closely with applied
ML engineers, software engineers, and data scientists that create our AI systems today.

In this role, you will:
• Design, build, and manage scalable data pipelines for collecting, storing, processing, and filtering large volumes of text data for fine-tuning LLMs.
• Develop and optimize data storage architectures to handle the massive scale of data required for training state-of-the-art language models.
• Implement efficient data preprocessing, cleaning, and feature extraction techniques to ensure high-quality data for model training.
• Collaborate with machine learning engineers and researchers to understand their data requirements and provide tailored solutions for LLM fine-tuning.
• Design and implement robust and fault-tolerant systems for data ingestion, processing, and delivery.
• Optimize data pipelines for performance, scalability, and cost-efficiency, leveraging distributed computing frameworks and cloud platforms.
• Ensure the security, privacy, and compliance of data according to industry best practices and regulatory requirements.

You might thrive in this role if you:
• Have 7+ years of experience as a data engineer, with a strong background in designing and building large-scale data pipelines.
• Possess deep expertise in distributed computing frameworks such as Apache Spark,
Hadoop, or Flink, and have hands-on experience optimizing data processing at scale.
• Are proficient in programming languages commonly used in data engineering, such as
Python, and have a solid understanding of data structures and algorithms.
• Have extensive experience with cloud platforms like AWS, Google Cloud, or Azure for data storage, processing, and management.
• Are well-versed in various data storage technologies, including distributed file systems
(e.g., HDFS, S3), databases (e.g., Cassandra, HBase), and data warehouses (e.g., Redshift, BigQuery).
• Have hands-on experience with ETL orchestration tools such as Apache Airflow, Dagster, or perfect for managing complex data workflows.
• Possess knowledge of natural language processing (NLP) techniques and have worked with text data preprocessing, normalization, and feature extraction.
• Are passionate about staying up-to-date with the latest advancements in data engineering and NLP, and are eager to apply innovative techniques to solve challenging problems.
• Have strong problem-solving skills, are detail-oriented, and can effectively communicate technical concepts to both technical and non-technical stakeholders.

Thanks
Andrew Lima

andrewrandstadusa@gmail.com

Tags: #businessanalyst #businesssystemanalyst #bsa #ba #technical #healthcare #pharma #.net #dotnet #angular #fullstack #C2C #angularjs #pmp #projectmanager #manager #scrum

0 0 votes

Article Rating

0 Comments

Most Voted

Newest Oldest

Inline Feedbacks

View all comments

SEARCH BELOW BY STRING OR TECHNOLOGY e.g. JAVA AND SPRINGBOOT AND AWS

IF ANYTHING IS MISSING FROM THE POST LIKE EMAIL ID OR JOB DESCRIPTION ITSELF - PLEASE COMMENT BELOW THE POST ITSELF - WE WILL UPDATE WITHIN AN HOUR. BUT ALL THE POSTS AND JDs WILL BE UPDATED WITHIN 24 HOURS.

Follow your Favourite Recruiters in RECRUIT DISCUZ by clicking on the Radio signal type below their Avatars and get alerted whenever they post a new Requirement or HOTLIST in RECRUIT DISCUZ. See the below image for ref.

Share

You Might Also Like

Need |SAP Security Lead Consultant | Ewing, NJ (Onsite)

Iac Engineer with Terraform

Compliance Lead