WHAT’S YOUR JOB ROLE ABOUT?

We’re seeking an experienced pipeline-centric data engineer to put it to good use. The ideal candidate will have the expected mathematical and statistical expertise, combined with a rare curiosity and creativity.

RESPONSIBILITIES

  • Design and develop robust, scalable, and efficient data pipelines to support the extraction, transformation, and loading (ETL) processes from various data sources into data warehouses and data lakes.
  • Collaborate closely with cross-functional teams, including data scientists, analysts, and software engineers, to understand data requirements and design optimal solutions.
  • Build and manage data warehouses and data lakes to store and organize large volumes of structured and unstructured data efficiently.
  • Implement data governance processes and best practices to ensure data quality, integrity, and security throughout the data lifecycle.
  • Identify and address performance bottlenecks, data inconsistencies, and data quality issues in data pipelines, warehouses, and lakes.
  • Develop and maintain monitoring and alerting systems to proactively identify and resolve data-related issues.

KEY COMPETENCIES

  • 5+ years of proven experience in building and managing data pipelines, data warehouses, and data lakes in a production environment.
  • Proficiency in programming languages such as Python, SQL and experience with data processing frameworks like Apache Spark or Apache Beam.
  • Experience with data modeling, data catalog concepts, data formats, data pipelines/ETL design, implementation, and maintenance.
  • Experience ETL/ELT frameworks and tools like AWS Glue, DBT, Airflow, Airbyte, etc.
  • In-depth knowledge of relational databases (e.g., MySQL, PostgreSQL) and experience with columnar storage technologies (e.g., Redshift, Snowflake), performing analysis and performance optimizations.
  • Strong understanding of distributed systems, data modeling, and database design principles.
  • Familiarity with AWS cloud platform and services such as S3, Lambda, Step Functions, Glue and Athena.
  • Experience with data visualization tools and infrastructures like Looker, PowerBI, or Tableau
  • Experience with development practices – Agile, CI/CD, TDD
  • Experience with Infrastructure as Code practices – Terraform