Filtering and Selecting Data with SQL
Swipe um das Menü anzuzeigen
A Temporary View is a session-scoped alias for a DataFrame that allows you to interact with your data using standard SQL syntax. It does not move or duplicate the data; it simply provides a SQL-friendly "window" into the DataFrame sitting in the cluster's memory.
While Python is excellent for building complex pipelines, many data professionals prefer the readability and speed of SQL for filtering and selecting data. In Databricks, you don't have to choose one or the other. You can transform a Python DataFrame into a temporary SQL table (a "view") and immediately start querying it.
Creating a Temporary View
Before you can run a SQL query against a DataFrame, you must give it a name that the SQL engine can recognize. You can do this using the createOrReplaceTempView method.
# Create a SQL view named 'diamond_view' from our existing DataFrame
df.createOrReplaceTempView("diamond_view")
The "Replace" part of this command is important: it ensures that if you run the cell again, it will simply update the view rather than causing an error. This view will exist only for the duration of your current notebook session.
Querying with the %sql Magic
Now that the view is registered, you can switch languages. By using the %sql magic command at the top of a new cell, you can perform the exact same selections and filters you did in Python, but using standard SQL syntax.
%sql
SELECT carat, cut, depth
FROM diamond_view
WHERE price > 500
The Power of SQL Filtering
SQL is particularly intuitive for complex filtering. You can use the IN operator to filter for multiple values or the LIKE operator for pattern matching - tasks that are often more verbose in Python.
%sql
SELECT * FROM diamond_view
WHERE cut IN ('Good', 'Premium')
AND clarity LIKE '%V%'
Selecting and Aliasing Columns
In SQL, selecting and renaming columns is straightforward using the AS keyword. This is a common practice to make your final report headers more professional.
%sql
SELECT
carat,
cut AS Diamond_Quality,
price AS Selling_Price
FROM diamond_view
Why use a Temporary View?
You might ask: "Why not just query the table directly from the Catalog?" The advantage of a Temporary View is that it allows you to query a DataFrame that you have already partially cleaned or modified in Python. It provides a bridge, allowing you to use Python for the "heavy lifting" of data engineering and SQL for the final analysis and visualization.
1. Which Python method must you run before you can use %sql to query a DataFrame?
2. What happens to a "Temporary View" once you detach your notebook from the cluster or end your session?
Danke für Ihr Feedback!
Fragen Sie AI
Fragen Sie AI
Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen