Add a link to Kuntal's great post

This commit is contained in:
R Tyler Croy 2021-05-04 14:36:25 -07:00
parent c0eb060d0a
commit 56dfd393a6
No known key found for this signature in database
GPG Key ID: E5C92681BEF6CEA2
1 changed files with 8 additions and 3 deletions

View File

@ -13,7 +13,7 @@ consider is: "how much damage could one accidentally cause with our existing
policies and controls?" At [Scribd](https://tech.scribd.com) we have made
[Delta Lake](https://delta.io) a cornerstone of our data platform, and as such
I've spent a lot of time thinking about what could go wrong and how we would
defend against it.
defend against it.
To start I recommend reading this recent post from Databricks: [Attack of the
@ -41,12 +41,12 @@ For my disaster recovery needs, the clone-based approach is insufficient as I de
> Our requirements are basically to prevent catastrophic loss of business critical data via:
>
>
> * Erroneous rewriting of data by an automated job
> * Inadvertent table drops through metastore automation.
> * Overaggressive use of VACUUM command
> * Failed manual sync/cleanup operations by Data Engineering staff
>
>
> It's important to consider whether you're worried about the transaction log
> getting corrupted, files in storage (e.g. ADLS) disappearing, or both.
@ -85,4 +85,9 @@ this week so make sure you follow us on Twitter
feed](https://tech.scribd.com/feed.xml)!
---
**Update**: my colleague Kuntal wrote [this blog post on backing up Delta Lake with AWS S3 Batch Operations](https://tech.scribd.com/blog/2021/backing-up-data-warehouse.html) which is what we're doing here at [Scribd](https://tech.scribd.com)