Aggregation and Grouping in SQL
Veeg om het menu te tonen
SQL Aggregation uses the GROUP BY clause to arrange raw data into summary rows. By using functions like SUM(), AVG(), and COUNT(), you can transform millions of rows into actionable business metrics directly within a SQL cell.
For many, SQL is the most natural language for summarizing data. The syntax is highly readable and follows a logical flow: pick your categories, choose your math, and define your groups. In this chapter, you will use the temporary view we created earlier to perform the same aggregations we did in Python, but with the simplicity of SQL.
The GROUP BY Clause
In SQL, any column that is not being "mathematically summarized" must be included in the GROUP BY statement at the end of the query. This tells the SQL engine exactly how to slice the data.
SELECT cut, COUNT(*)
FROM diamond_view
GROUP BY cut
Using Built-in Aggregate Functions
SQL provides several standard functions to perform calculations on your numeric columns:
- SUM(): adds up all values in the column;
- AVG(): calculates the mean/average;
- MIN() / MAX(): finds the lowest or highest value;
- COUNT(): Returns the number of rows.
%sql
SELECT
cut,
SUM(x),
AVG(y)
FROM diamond_view
GROUP BY cut
Aliasing for Professional Reports
By default, SQL might label your result columns as sum(Total_Profit). To make your output ready for a business dashboard, use the AS keyword to rename these columns to something more descriptive.
%sql
SELECT
cut,
SUM(x) AS Total_Length,
MAX(y) AS Highest_Diamond_Value
FROM diamond_view
GROUP BY cut
Filtering Aggregated Data: The HAVING Clause
A common mistake is trying to filter an aggregated result using WHERE. In SQL, WHERE filters individual rows before they are grouped. If you want to filter the results after they have been summed up (for example, to only show regions with more than $1,000,000 in profit), you must use the HAVING clause.
%sql
SELECT cut, SUM(x) AS Total_Length
FROM diamond_view
GROUP BY cut
HAVING Total_Length > 4000
Sorting the Summary
To make your insights stand out, always use ORDER BY. This allows you to rank your results, such as showing your most profitable regions at the top of the table.
%sql
SELECT cut, SUM(x) AS Total_Length
FROM diamond_view
GROUP BY cut
ORDER BY Total_Length DESC
Bedankt voor je feedback!
Vraag AI
Vraag AI
Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.