Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Basic Data Updates and Deletes | Core Databricks Concepts
Databricks Fundamentals: A Beginner's Guide

bookBasic Data Updates and Deletes

Scorri per mostrare il menu

Note
Definition

In standard Data Lakes, updating or deleting a single row is nearly impossible without rewriting the entire dataset. Because Delta Lake uses a transaction log, it supports standard SQL UPDATE and DELETE commands, allowing for precise modifications to your tables.

In the previous section, you focused on reading and summarizing data. However, data is rarely static — you may discover that a batch of diamond records has incorrect pricing, or that a specific "cut" type needs to be removed from your reporting.

Because the diamonds table is a Delta table, you can modify it just like a traditional database.

1. Updating Records

The UPDATE command allows you to change values in specific rows that meet a condition. For example, if all "Fair" cut diamonds received a 10% price increase:

UPDATE workspace.workshop.diamonds
SET price = price * 1.1
WHERE cut = 'Fair'
  • SET — Defines the new value for the column
  • WHERE — Critical. Without a WHERE clause, you would update the price for every single diamond in the entire table

2. Deleting Records

The DELETE command removes rows from the table permanently. Use this when data is no longer relevant or was ingested by mistake.

DELETE FROM workspace.workshop.diamonds
WHERE carat > 5.0

This command identifies all diamonds larger than 5 carats and removes them from the active version of the table.

3. Verification After Modification

After running an update or delete, you should always verify the change with a simple SELECT count to ensure the operation behaved as expected.

-- Verify how many 'Fair' diamonds are left (count should be the same, but prices changed)
SELECT count(*) FROM workspace.workshop.diamonds WHERE cut = 'Fair'

4. The Safety of Delta Updates

When you run DELETE or UPDATE, Delta Lake doesn't erase data from the old files directly. Instead, it follows a three-step process:

  1. It marks the old files as "invalid"
  2. It writes new files containing the updated data
  3. It records the change in the Transaction Log

This is why Time Travel still works even after a deletion — the old data continues to exist in previous versions of the table files until the history is manually purged.

5. Best Practice: Testing with SELECT

Before running a DELETE or UPDATE, always test your filter with a SELECT first.

  • Instead of: DELETE FROM diamonds WHERE color = 'J'
  • Try first: SELECT * FROM diamonds WHERE color = 'J'

If the SELECT results show exactly what you intended to remove, you can safely proceed with the DELETE command.

1. What happens if you run an UPDATE command without a WHERE clause?

2. Does running a DELETE command on a Delta table immediately erase the data from the underlying cloud storage forever?

question mark

What happens if you run an UPDATE command without a WHERE clause?

Seleziona la risposta corretta

question mark

Does running a DELETE command on a Delta table immediately erase the data from the underlying cloud storage forever?

Seleziona la risposta corretta

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 5. Capitolo 4

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Sezione 5. Capitolo 4
some-alt