Basic Data Updates and Deletes
Swipe um das Menü anzuzeigen
In standard Data Lakes, updating or deleting a single row is nearly impossible without rewriting the entire dataset. Because Delta Lake uses a transaction log, it supports standard SQL UPDATE and DELETE commands, allowing for precise modifications to your tables.
In the previous section, you focused on reading and summarizing data. However, data is rarely static — you may discover that a batch of diamond records has incorrect pricing, or that a specific "cut" type needs to be removed from your reporting.
Because the diamonds table is a Delta table, you can modify it just like a traditional database.
1. Updating Records
The UPDATE command allows you to change values in specific rows that meet a condition. For example, if all "Fair" cut diamonds received a 10% price increase:
UPDATE workspace.workshop.diamonds
SET price = price * 1.1
WHERE cut = 'Fair'
- SET — Defines the new value for the column
- WHERE — Critical. Without a
WHEREclause, you would update the price for every single diamond in the entire table
2. Deleting Records
The DELETE command removes rows from the table permanently. Use this when data is no longer relevant or was ingested by mistake.
DELETE FROM workspace.workshop.diamonds
WHERE carat > 5.0
This command identifies all diamonds larger than 5 carats and removes them from the active version of the table.
3. Verification After Modification
After running an update or delete, you should always verify the change with a simple SELECT count to ensure the operation behaved as expected.
-- Verify how many 'Fair' diamonds are left (count should be the same, but prices changed)
SELECT count(*) FROM workspace.workshop.diamonds WHERE cut = 'Fair'
4. The Safety of Delta Updates
When you run DELETE or UPDATE, Delta Lake doesn't erase data from the old files directly. Instead, it follows a three-step process:
- It marks the old files as "invalid"
- It writes new files containing the updated data
- It records the change in the Transaction Log
This is why Time Travel still works even after a deletion — the old data continues to exist in previous versions of the table files until the history is manually purged.
5. Best Practice: Testing with SELECT
Before running a DELETE or UPDATE, always test your filter with a SELECT first.
- Instead of:
DELETE FROM diamonds WHERE color = 'J' - Try first:
SELECT * FROM diamonds WHERE color = 'J'
If the SELECT results show exactly what you intended to remove, you can safely proceed with the DELETE command.
1. What happens if you run an UPDATE command without a WHERE clause?
2. Does running a DELETE command on a Delta table immediately erase the data from the underlying cloud storage forever?
Danke für Ihr Feedback!
Fragen Sie AI
Fragen Sie AI
Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen