WHAT’S YOUR JOB ROLE ABOUT?
We’re seeking an experienced pipeline-centric data engineer to put it to good use. The ideal candidate will have the expected mathematical and statistical expertise, combined with a rare curiosity and creativity.
RESPONSIBILITIES
- Design and develop robust, scalable, and efficient data pipelines to support the extraction, transformation, and loading (ETL) processes from various data sources into data warehouses and data lakes.
- Collaborate closely with cross-functional teams, including data scientists, analysts, and software engineers, to understand data requirements and design optimal solutions.
- Build and manage data warehouses and data lakes to store and organize large volumes of structured and unstructured data efficiently.
- Implement data governance processes and best practices to ensure data quality, integrity, and security throughout the data lifecycle.
- Identify and address performance bottlenecks, data inconsistencies, and data quality issues in data pipelines, warehouses, and lakes.
- Develop and maintain monitoring and alerting systems to proactively identify and resolve data-related issues.
KEY COMPETENCIES
- 5+ years of proven experience in building and managing data pipelines, data warehouses, and data lakes in a production environment.
- Proficiency in programming languages such as Python, SQL and experience with data processing frameworks like Apache Spark or Apache Beam.
- Experience with data modeling, data catalog concepts, data formats, data pipelines/ETL design, implementation, and maintenance.
- Experience ETL/ELT frameworks and tools like AWS Glue, DBT, Airflow, Airbyte, etc.
- In-depth knowledge of relational databases (e.g., MySQL, PostgreSQL) and experience with columnar storage technologies (e.g., Redshift, Snowflake), performing analysis and performance optimizations.
- Strong understanding of distributed systems, data modeling, and database design principles.
- Familiarity with AWS cloud platform and services such as S3, Lambda, Step Functions, Glue and Athena.
- Experience with data visualization tools and infrastructures like Looker, PowerBI, or Tableau
- Experience with development practices – Agile, CI/CD, TDD
- Experience with Infrastructure as Code practices – Terraform