Lead Developer Snowflake + Dbt

OmegaHires Raritan, New Jersey, United States Information Technology

About this position

Job Title: Lead Developer Snowflake + Dbt
Location:
 New Jersey - US
Experience: 10–15 Years
Employment Type: Full-time

Job Summary

We are seeking a highly experienced Lead Data Engineer (10+ years) with deep expertise in Snowflake, DBT, Apache Airflow, and StreamSets, and strong hands-on experience in designing enterprise-grade ETL/ELT, data migration, and multi-source ingestion frameworks within the Life Sciences domain.

This role will lead large-scale data platform modernization initiatives including legacy-to-cloud migrations, cross-system integrations, and enterprise data harmonization in regulated environments.

Key Responsibilities

1. Snowflake Architecture & Enterprise Data Platform Design

  • Lead architecture and implementation of scalable Snowflake data platforms:
    • Multi-layered architecture (Landing → Raw → Staging → Curated → Data Marts)
    • Separation of compute and storage optimization
    • Multi-cluster warehouses and workload isolation
  • Design secure cross-account data sharing strategies.
  • Implement:
    • Snowpipe for automated ingestion
    • Streams & Tasks for CDC-based incremental processing
    • Time Travel & Zero-copy cloning for environment management
  • Implement data masking, row-level security, and RBAC frameworks.
  • Optimize storage, partitioning (micro-partition pruning), and query performance.

2. Data Migration & Modernization

  • Lead end-to-end data migration initiatives including:
    • Legacy data warehouse (Teradata, Oracle, SQL Server, Netezza) to Snowflake
    • On-prem to cloud modernization programs
  • Conduct:
    • Source system analysis and profiling
    • Data quality assessment and remediation planning
    • Schema conversion and transformation mapping
  • Design migration frameworks:
    • Bulk historical data loads
    • Incremental migration strategies
    • Parallel-run validation strategies
  • Perform reconciliation and data validation between legacy and target systems.
  • Develop automated validation scripts using SQL and DBT tests.
  • Support cutover planning and production readiness.

3. Data Ingestion & Multi-Source Integration

Design and implement ingestion frameworks for structured, semi-structured, and unstructured data from multiple enterprise systems:

Structured Sources

  • Oracle, SQL Server, SAP, PostgreSQL
  • Clinical systems (EDC, CDMS, CTMS)
  • Regulatory systems (RIM)
  • Commercial systems (CRM, ERP)

Semi-Structured Sources

  • JSON, XML, Avro files
  • API responses
  • External vendor feeds

Unstructured Sources (where applicable)

  • Document metadata ingestion
  • Log and audit trail ingestion

Ingestion Responsibilities

  • Build ingestion pipelines using:
    • StreamSets for batch and streaming ingestion
    • Snowpipe with cloud storage integration (S3/Azure Blob/GCS)
    • API-driven ingestion frameworks
  • Implement CDC mechanisms using:
    • Database log-based CDC
    • Timestamp-based incremental extraction
    • Snowflake Streams
  • Develop metadata-driven ingestion frameworks.
  • Design resilient pipelines with error handling, retry logic, and monitoring.
  • Ensure schema evolution handling and version control.

4. DBT – Enterprise Transformation Framework

  • Architect and govern DBT transformation layers:
    • Staging models
    • Intermediate models
    • Data marts
  • Implement:
    • Incremental models
    • Snapshot strategies for historical tracking
    • Surrogate key management
  • Develop custom macros and reusable transformation components.
  • Implement comprehensive DBT testing framework:
    • Source freshness tests
    • Schema validation tests
    • Business rule validation
  • Generate lineage documentation for audit and regulatory needs.
  • Optimize DBT models specifically for Snowflake compute efficiency.

5. ETL / ELT Orchestration & Automation

  • Design ELT-first architecture leveraging Snowflake processing power.
  • Orchestrate complex workflows using Apache Airflow:
    • DAG dependency management
    • SLA monitoring
    • Automated recovery workflows
  • Implement CI/CD for:
    • DBT deployments
    • Airflow pipelines
    • Snowflake objects
  • Build data observability frameworks (pipeline monitoring, anomaly detection).

6. Enterprise Data Modelling

  • Design scalable data models:
    • Dimensional (Star/Snowflake schemas)
    • Data Vault 2.0 (for auditability and traceability)
    • Canonical data models
  • Align models with Life Sciences business domains:
    • Clinical trial lifecycle
    • Regulatory submissions
    • Pharmacovigilance
    • Commercial analytics
  • Support cross-domain data harmonization.

7. Life Sciences Domain Expertise

Experience delivering data platforms supporting:

  • Clinical trial data (EDC, CDMS, CTMS)
  • Regulatory and submission systems
  • Pharmacovigilance & safety systems
  • Commercial & sales analytics
  • Real-World Evidence (RWE)

Ensure compliance with:

  • GxP validation standards
  • 21 CFR Part 11
  • HIPAA / GDPR
  • ALCOA+ principles

Support audit readiness and regulatory traceability.

Required Qualifications

  • 10+ years of experience in Data Engineering and Enterprise Data Platforms.
  • 4–6+ years hands-on Snowflake implementation experience.
  • Strong experience in:
    • Large-scale data migration programs
    • Multi-source data ingestion frameworks
    • DBT advanced transformation design
    • Apache Airflow orchestration
    • StreamSets ingestion pipelines
  • Advanced SQL expertise.
  • Experience in Life Sciences domain projects.
  • Cloud platform experience (AWS/Azure/GCP).

Powered by JazzHR