Let’s save tons of money with cloud-native data ingestion!

Let's save tons of money with cloud-native data ingestion!

Author
R Tyler Croy

Published
August 1, 2025

Team
Infrastructure Engineering, Core Platform

Delta Lake is a fantastic technology for quickly querying massive data sets, but first you need those massive data sets! In this talk from Data and AI Summit 2025 I dive into the cloud-native architecture Scribd has adopted to ingest data from AWS Aurora, SQS, Kinesis Data Firehose and more!

By using off-the-shelf open source tools like kafka-delta-ingest, oxbow and Airbyte, Scribd has redefined its ingestion architecture to be more event-driven, reliable, and most importantly: cheaper. No jobs needed! Attendees will learn how to use third-party tools in concert with a Databricks and Unity Catalog environment to provide a highly efficient and available data platform.

This architecture will be presented in the context of AWS but can be adapted for Azure, Google Cloud Platform or even on-premise environments. The slides are also available on Scribd!

Related Jobs View All Jobs

KeepReading

Dual-Embedding Trust Scoring

Screaming in the Cloud

Deploying a Cost-Effective, Scalable PhotoDNA System for CSAM Detection

Supercharging S3 Intelligent Tiering with Content Crush

Keep
Reading