Viewing Table History (Time Travel Basics)
Swipe um das Menü anzuzeigen
Time Travel is a Delta Lake feature that allows you to query previous versions of a table. By using the Transaction Log, Databricks can reconstruct the state of your data from any point in its recorded history.
One of the most powerful "safety nets" in Databricks is the ability to see exactly who changed a table, when they changed it, and what the data looked like before that change.
For our diamonds table at workspace.workshop.diamonds, every update or deletion is tracked automatically.
1. The DESCRIBE HISTORY Command
To see the full timeline of your data, use the DESCRIBE HISTORY command — it provides a detailed ledger of every version of the table.
DESCRIBE HISTORY workspace.workshop.diamonds
When you run this, you will see a table with the following key columns:
- version: an integer starting from 0 (the first time the table was created);
- timestamp: the exact date and time the change occurred;
- userName: the person who performed the action;
- operation: what happened (e.g.,
WRITE,UPDATE,DELETE, orMERGE); - operationParameters: technical details about the change.
2. Querying a Specific Version
If a recent update messed up the "Price" column, you can query a specific older version to see the original values using the VERSION AS OF syntax.
-- Look at the diamonds table exactly as it was in Version 0
SELECT * FROM workspace.workshop.diamonds VERSION AS OF 0
3. Querying by Timestamp
If you don't know the version number but you know the data was correct at a specific time, you can "travel" back using TIMESTAMP AS OF.
-- Look at the data as it existed at a specific time
SELECT * FROM workspace.workshop.diamonds TIMESTAMP AS OF '2026-02-11 09:00:00'
4. Why This Matters
In traditional databases, if you accidentally run a DELETE command without a WHERE clause, you have to call a Database Administrator to restore a backup — a process that can take hours.
With Delta Lake Time Travel, you can fix the mistake yourself in seconds by selecting the data from a previous version and overwriting the current broken table with the healthy historical data.
5. Managing History
By default, Delta Lake keeps 30 days of history. While incredibly helpful, storing history does require keeping older data files, but the underlying data files of deleted/overwritten versions are only kept for 7 days (delta.deletedFileRetentionDuration) before VACUUM can clean them up.
In professional environments, data engineers use the VACUUM command to clean up very old files and reduce storage costs. For this course, your history is safe and ready for exploration.
VACUUM is a cleanup command that permanently deletes the old data files that Delta Lake no longer needs — the ones left behind after updates, deletes, or overwrites.
VACUUM workspace.workshop.diamonds
By default, Delta keeps 30 days of file history for Time Travel. VACUUM removes files older than that threshold to save storage space.
You can also set a custom retention window:
VACUUM workspace.workshop.diamonds RETAIN 7 HOURS
1. Which SQL command allows you to see the list of all previous versions and operations performed on a table?
2. You accidentally updated the diamonds table and want to see the data from 10 minutes ago. Which syntax should you use?
Danke für Ihr Feedback!
Fragen Sie AI
Fragen Sie AI
Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen