Hi All
Please share the updated profile to [email protected]
Role: Principal Data Engineer – Azure
Location: USA – Bellevue, WA (UST)
Job Overview:
We are seeking a Principal Data Engineer to join our team, operating within an Azure + Databricks Lakehouse environment. The ideal candidate will have deep expertise in data engineering, with a strong focus on Azure services, Databricks, and CI/CD practices. You will be responsible for designing, building, and maintaining data pipelines, ensuring performance optimization, and contributing to the overall data strategy.
Key Responsibilities:
- Design, develop, and maintain ETL processes within Databricks, leveraging PySpark and SparkSQL for data transformation and processing.
- Manage data orchestration using Azure Data Factory (ADF) and ensure data storage efficiency with Azure Data Lake Storage (ADLS).
- Implement CI/CD pipelines using Azure DevOps, ensuring seamless integration and deployment of data solutions.
- Write and optimize SQL queries (TSQL, PostgreSQL) for data manipulation, extraction, and reporting.
- Use PowerShell for scripting and automation of data engineering tasks.
- Optimize performance by understanding and managing indexes, partitioning, and resource allocation in Databricks.
- Analyze and optimize DataFrame API executions, interpreting Directed Acyclic Graphs (DAGs) to improve compute efficiency.
- Ensure data solutions adhere to the Software Development Life Cycle (SDLC) with peer-reviewed code and best DevOps practices.
- Develop resilient, repeatable code that maintains consistent outputs across environments, adhering to a custom SQL Deployment framework.
Required Skills and Experience:
- Azure Expertise: Proficient with Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), and Azure DevOps for continuous integration and deployment.
- Databricks Proficiency: Strong experience with Databricks, including compute management and ETL development using PySpark and SparkSQL.
- SQL Proficiency: Advanced knowledge of SQL, including TSQL and PostgreSQL, with a strong focus on performance optimization.
- PowerShell: Experience in using PowerShell for scripting and automation within data engineering workflows.
- Parquet and Delta Formats: Familiarity with these data formats, including their use in large-scale data processing and storage.
- SDLC + CI/CD Practices: Experience in standard deployment processes, including peer-reviewed code and environment management (dev, test, prod).
- Performance Optimization: Expertise in optimizing code execution, understanding the impact of indexes and partitioning, and making data pipelines more efficient.
- Data Consistency: Ability to write code that produces consistent, repeatable results, supporting a custom SQL Deployment framework.
Additional Qualifications:
- Strong problem-solving skills with the ability to troubleshoot and optimize complex data workflows.
- Excellent communication skills, with the ability to collaborate effectively with cross-functional teams.
- A proactive attitude towards learning new technologies and improving existing data processes.
Regards
[email protected]
linkedin.com/in/raje-sh-56696321b
|