Job Overview
We are seeking an experienced Data Engineer with expertise in Dataiku to join our data team. As a Data Engineer, you will be responsible for designing, building, and maintaining data pipelines, data integration processes, and data infrastructure. You will collaborate closely with data scientists, analysts, and other stakeholders to ensure efficient data flow and support data-driven decision making across the organization.
Job Duties
- Design and implement robust data pipelines that ingest, process, and store unstructured data formats at scale within Snowflake and GCP
- Leverage Snowflake's unstructured data capabilities (Directory Tables, Scoped URLs, Snowpark) to make "dark data" queryable and actionable
- Build and maintain cloud-native ETL/ELT processes using BigQuery, Cloud Storage, and Dataflow, ensuring seamless integration between GCP and Snowflake
- Instead of just using LLMs, you will integrate AI tools (OCR, NLP entities, Document AI) into the engineering flow to transform unstructured blobs into structured insights
- Tune complex SQL queries and Python-based processing jobs to handle petabyte-scale environments efficiently
Desired Skills And Experience
- Snowflake
- GCP
- ETL/ELT
- BigQuery
- Gen AI
- LLMs
- Directory Tables, Scoped URLs, Snowpark
- Cloud Storage
- Dataflow
- OCR, NLP entities, Document AI
