Summary  
This chapter covers how to use the DataFrame writer interface to persist an in-memory DataFrame as a Delta Lake table by chaining write.mode(...).saveAsTable(...), including overwrite and append modes.  

General domain of usage  
Big data analytics

Writing data is the process of moving a DataFrame from the cluster's temporary memory into permanent storage in the **Catalog**. By using the `saveAsTable()` method, you ensure that your cleaned and aggregated results are preserved and accessible to other users and tools.


Definition

Everything you have done so far has been "in-memory." If you were to turn off your cluster right now, your transformed DataFrames would disappear. To make your work permanent, you must write the data back to the Lakehouse. In Databricks, the standard way to do this is by saving your DataFrame as a **Delta Table**.


## The saveAsTable() Syntax
To save your work, you chain the `write` method to your DataFrame. The most direct approach is:


```
# Save the 'summary_df' we created earlier as a permanent table
summary_df.write.mode("overwrite").saveAsTable("workspace.default.diamonds_summary”)
```

- **write:** accesses the DataFrame writer interface;
- **mode("overwrite"):** This tells Databricks what to do if a table with that name already exists. "Overwrite" replaces the old data with the new data. Other options include "append" (to add new rows to the end of the existing table);
- **saveAsTable:** specifies the three-part name (`catalog.schema.table`) where the data will be stored.


## Delta Lake: The Default Format

When you use `saveAsTable`, Databricks automatically saves the data in the **Delta** format. As we discussed in Section 1, Delta Lake provides reliability. It ensures that even if the cluster crashes in the middle of a "write" operation, your table won't be corrupted. It also allows for "Time Travel," meaning you can look back at previous versions of the table if you make a mistake.


## Verifying the Write in the Catalog

Once the command finishes running, you should verify that the data has landed correctly:

- Navigate to the **Catalog** tab in the left-hand sidebar;
- Drill down into the `main` catalog and the `default` schema;
- Look for your new table name (e.g., `regional_summary`);
- You can click on the table to see its schema, sample data, and metadata, such as when it was created and who created it.


## Reading Your Saved Table
Once a table is in the Catalog, any authorized user can access it without needing your notebook. They can simply run a SQL query or use `spark.table()` to load it into their own environment:


```
# In a new notebook, anyone can now access your processed data
new_df = spark.table("main.default.regional_summary")
```

## Best Practice: Clean Up
After saving your final results to a permanent table, it is a professional habit to terminate your cluster or at least "Clear State." Since your data is now safely stored in the Catalog, you no longer need to keep the temporary DataFrames taking up space in the cluster's RAM.

Which "mode" should you use if you want to replace an existing table with brand-new data from your DataFrame?

What is the primary benefit of saving a DataFrame using `saveAsTable()`?

A practical introduction to Databricks, its core concepts, and hands-on data manipulation using Python and SQL. This course is designed for absolute beginners, focusing on clarity, simplicity, and real-world application.

Define Databricks simply and introduce key terms without jargon.

Get the user logged in and a compute environment running.

Master the primary development environment using familiar Python and SQL.

Practical, hands-on data manipulation using DataFrames (the core data structure).

Introduce the key differentiator, Delta Lake, simply.

Writing the Processed Data to a Table

The saveAsTable() Syntax

Delta Lake: The Default Format

Verifying the Write in the Catalog

Reading Your Saved Table

Best Practice: Clean Up

1. Which "mode" should you use if you want to replace an existing table with brand-new data from your DataFrame?

2. What is the primary benefit of saving a DataFrame using `saveAsTable()`?