Merge pull request #69 from scribd/spark-aisummit

Add a blog post about my Spark and AI summit session
2020-09-09 20:44:57 -07:00 · 2020-09-09 20:44:57 -07:00 · 979a1ede27
parent f8e9c4a23d e47b5dc67c
commit 979a1ede27
1 changed files with 42 additions and 0 deletions
--- a/_posts/2020-09-09-spark-ai-summit-delta-lake.md
+++ b/_posts/2020-09-09-spark-ai-summit-delta-lake.md
@ -0,0 +1,42 @@
+---
+layout: post
+title: "Spark and AI Summit 2020: The revolution will be streamed"
+tags:
+- featured
+- databricks
+- spark
+- deltalake
+team: Core Platform
+author: rtyler
+---
+
+Earlier this summer I was able to present at Spark and AI Summit about some of
+the work we have been doing in our efforts to build the [Real-time Data
+Platform](/blog/2019/real-time-data-platform.html).  At a high level,
+what I had branded the "Real-time Data Platform" is really: [Apache
+Kafka](https://kafka.apache.org), [Apache Airflow](https://airflow.apache.org),
+[Structured streaming with Apache Spark](https://spark.apache.org), and a
+smattering of microservices to help shuffle data around. All sitting on top of
+[Delta Lake](https://delta.io) which acts as an incredibly versatile and useful
+storage layer for the platform.
+
+In the presentation I outline how we tie together Kafka,
+Databricks, and Delta Lake.
+
+<center>
+<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/YmyCOr9Mr9Y" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</center>
+
+The presentation also complements some of our
+blog posts:
+
+* [Streaming data in and out of Delta Lake](/blog/2020/streaming-with-delta-lake.html)
+* [Streaming development work with Kafka](/blog/2020/introducing-kafka-player.html)
+* [Ingesting production logs with Rust](/blog/2020/shipping-rust-to-production.html)
+* [Migrating Kafka to the cloud](/blog/2019/migrating-kafka-to-aws.html)
+
+
+I am incredibly proud of the work the Platform Engineering organization has
+done at Scribd to make real-time data a reality. I also cannot recommend Kafka +
+Spark + Delta Lake highly enough for those with similar requirements.
+