Leverage the batch computation frameworks and our workflow management platform (Airflow) to assist in building out different data pipelines
Continuing to lower the latency and bridge the gap between our production systems and our data warehouse by rethinking and optimizing our core data pipeline jobs
Work with client to create and optimize critical batch processing jobs in Spark
Strong engineering background and interested in data.
Good understanding of data analysis using SQL queries.
Strong hold on Python or Scala as a programming language on Azure Databricks.
Will be writing production Scala/Spark and Python/Spark code on Azure databricks.
Experience developing and maintaining distributed systems built with Azure Databricks or native Apache Spark.
Experience building libraries and tooling that provide abstractions to users for accessing data.
Experience in writing and debugging ETL jobs using a distributed data framework (Spark/Hadoop MapReduce etc- ) on Azure Databricks
Experience optimising the end-to-end performance of distributed systems.
Ability to recommend and implement ways to improve data reliability, efficiency, and quality.
Experience with Scala or Python
Experience with Hadoop/Spark on Azure Databricks
Experience with Airflow or other similar scheduling tools
This job was posted by Aparna Khemka from ColorTokens.