Delta Lake is a fantastic technology for quickly querying massive data sets, but first you need those massive data sets! In this talk from Data and AI Summit 2025 I dive into the cloud-native architecture Scribd has adopted to ingest data from AWS Aurora, SQS, Kinesis Data Firehose and more!

By using off-the-shelf open source tools like kafka-delta-ingest, oxbow and Airbyte, Scribd has redefined its ingestion architecture to be more event-driven, reliable, and most importantly: cheaper. No jobs needed! Attendees will learn how to use third-party tools in concert with a Databricks and Unity Catalog environment to provide a highly efficient and available data platform.

This architecture will be presented in the context of AWS but can be adapted for Azure, Google Cloud Platform or even on-premise environments. The slides are also available on Scribd!