Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Basic Data Updates and Deletes | Core Databricks Concepts
Databricks Fundamentals: A Beginner's Guide

bookBasic Data Updates and Deletes

Swipe to show menu

Note
Definition

In standard Data Lakes, updating or deleting a single row is nearly impossible without rewriting the entire dataset. Because Delta Lake uses a transaction log, it supports standard SQL UPDATE and DELETE commands, allowing for precise modifications to your tables.

In the previous section, you focused on reading and summarizing data. However, data is rarely static — you may discover that a batch of diamond records has incorrect pricing, or that a specific "cut" type needs to be removed from your reporting.

Because the diamonds table is a Delta table, you can modify it just like a traditional database.

1. Updating Records

The UPDATE command allows you to change values in specific rows that meet a condition. For example, if all "Fair" cut diamonds received a 10% price increase:

UPDATE workspace.workshop.diamonds
SET price = price * 1.1
WHERE cut = 'Fair'
  • SET — Defines the new value for the column
  • WHERE — Critical. Without a WHERE clause, you would update the price for every single diamond in the entire table

2. Deleting Records

The DELETE command removes rows from the table permanently. Use this when data is no longer relevant or was ingested by mistake.

DELETE FROM workspace.workshop.diamonds
WHERE carat > 5.0

This command identifies all diamonds larger than 5 carats and removes them from the active version of the table.

3. Verification After Modification

After running an update or delete, you should always verify the change with a simple SELECT count to ensure the operation behaved as expected.

-- Verify how many 'Fair' diamonds are left (count should be the same, but prices changed)
SELECT count(*) FROM workspace.workshop.diamonds WHERE cut = 'Fair'

4. The Safety of Delta Updates

When you run DELETE or UPDATE, Delta Lake doesn't erase data from the old files directly. Instead, it follows a three-step process:

  1. It marks the old files as "invalid"
  2. It writes new files containing the updated data
  3. It records the change in the Transaction Log

This is why Time Travel still works even after a deletion — the old data continues to exist in previous versions of the table files until the history is manually purged.

5. Best Practice: Testing with SELECT

Before running a DELETE or UPDATE, always test your filter with a SELECT first.

  • Instead of: DELETE FROM diamonds WHERE color = 'J'
  • Try first: SELECT * FROM diamonds WHERE color = 'J'

If the SELECT results show exactly what you intended to remove, you can safely proceed with the DELETE command.

1. What happens if you run an UPDATE command without a WHERE clause?

2. Does running a DELETE command on a Delta table immediately erase the data from the underlying cloud storage forever?

question mark

What happens if you run an UPDATE command without a WHERE clause?

Select the correct answer

question mark

Does running a DELETE command on a Delta table immediately erase the data from the underlying cloud storage forever?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 5. Chapter 4

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Section 5. Chapter 4
some-alt