Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Aggregation and Grouping in SQL | Working with Data
Databricks Fundamentals: A Beginner's Guide

bookAggregation and Grouping in SQL

Stryg for at vise menuen

Note
Definition

SQL Aggregation uses the GROUP BY clause to arrange raw data into summary rows. By using functions like SUM(), AVG(), and COUNT(), you can transform millions of rows into actionable business metrics directly within a SQL cell.

For many, SQL is the most natural language for summarizing data. The syntax is highly readable and follows a logical flow: pick your categories, choose your math, and define your groups. In this chapter, you will use the temporary view we created earlier to perform the same aggregations we did in Python, but with the simplicity of SQL.

The GROUP BY Clause

In SQL, any column that is not being "mathematically summarized" must be included in the GROUP BY statement at the end of the query. This tells the SQL engine exactly how to slice the data.

SELECT cut, COUNT(*) 
FROM diamond_view
GROUP BY cut

Using Built-in Aggregate Functions

SQL provides several standard functions to perform calculations on your numeric columns:

  • SUM(): adds up all values in the column;
  • AVG(): calculates the mean/average;
  • MIN() / MAX(): finds the lowest or highest value;
  • COUNT(): Returns the number of rows.
%sql
SELECT 
  cut, 
  SUM(x), 
  AVG(y)
FROM diamond_view
GROUP BY cut

Aliasing for Professional Reports

By default, SQL might label your result columns as sum(Total_Profit). To make your output ready for a business dashboard, use the AS keyword to rename these columns to something more descriptive.

%sql
SELECT 
  cut, 
  SUM(x) AS Total_Length,
  MAX(y) AS Highest_Diamond_Value
FROM diamond_view
GROUP BY cut

Filtering Aggregated Data: The HAVING Clause

A common mistake is trying to filter an aggregated result using WHERE. In SQL, WHERE filters individual rows before they are grouped. If you want to filter the results after they have been summed up (for example, to only show regions with more than $1,000,000 in profit), you must use the HAVING clause.

%sql
SELECT cut, SUM(x) AS Total_Length
FROM diamond_view
GROUP BY cut
HAVING Total_Length > 4000

Sorting the Summary

To make your insights stand out, always use ORDER BY. This allows you to rank your results, such as showing your most profitable regions at the top of the table.

%sql
SELECT cut, SUM(x) AS Total_Length
FROM diamond_view
GROUP BY cut
ORDER BY Total_Length DESC
question mark

Which SQL clause is required to create summary rows when using aggregate functions like SUM() or AVG()?

Vælg det korrekte svar

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 4. Kapitel 7

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Sektion 4. Kapitel 7
some-alt