What Is SQL GROUP BY and How to Use It?

What Is SQL GROUP BY and How to Use It


The article discusses the seven most typical uses of SQL GROUP BY. We’ll go through its syntax and coding examples to give a taste of real-life SQL coding.

The GROUP BY clause is one of the most ubiquitous clauses in SQL. It’s an interesting one, as its use is a basic one but also extends to advanced SQL knowledge. It seems like you have to learn it very early, and you just can’t get rid of it, no matter how complex your SQL queries become. On the contrary – the more complex they are, the more you’ll use GROUP BY.

This article will teach you what SQL GROUP BY is and show you its syntax. But the main part will be showing you how GROUP BY works. We’ll use the interview questions from our platform to show you as many common and relatable examples of GROUP BY in an SQL query.

These examples include SQL GROUP BY with:

  1. Aggregate functions
  2. JOINs
  3. Multiple Columns
  4. WHERE
  5. HAVING
  6. ORDER BY
  7. Window functions

Once you come to the end of the article, we’re sure you’ll find even more day-to-day situations where you’ll want to use SQL GROUP BY.

What Is GROUP BY Clause in SQL?

The SQL GROUP BY definition is simple: It’s an SQL clause that groups all the rows with the same column value.

Its main purpose is to help with data aggregation, working on that together with the SQL aggregate functions. You know them – COUNT(), SUM(), AVG(), MIN(), MAX(), to mention those most commonly used.

SQL GROUP BY Syntax

When data is grouped in SQL, it is done so by one or several columns. The GROUP BY syntax reflects that, as you need to specify which column(s) you want to group by.

The SQL GROUP BY syntax is, therefore, rather straightforward. However, you have to keep in mind where GROUP BY comes in relation to other SQL clauses. That’s why we’ll use them, too, in the GROUP BY syntax to show where GROUP BY stands.

SELECT column_1,
       column_2,
FROM table
WHEREGROUP BY column_1, column_2, …
HAVINGORDER BY column_1, column_2, …;

The above syntax shows that GROUP BY always comes after FROM and WHERE but before HAVING and ORDER BY.

The best way to internalize these rules is to write SQL code that requires GROUP BY, and that’s what we’ll do in the following section.

Practical Examples for Mastering SQL GROUP BY

Using SQL GROUP BY With Aggregate Functions

Using SQL GROUP BY With Aggregate Functions

When using it with aggregate functions, GROUP BY categorizes aggregated data into groups and gives it labels. Labels are distinct values from the column you group by.

The aggregate functions mustn’t be used in GROUP BY because they are executed after GROUP BY.

Question: Total AdWords Earnings

Take a look at this question by Google.


Table: google_adwords_earnings

Link to the question: https://platform.stratascratch.com/coding/10164-total-adwords-earnings


It gives you the google_adwords_earnings table to work with.

Table: google_adwords_earnings
business_typen_employeesyearadwords_earnings
handyman2201881
handyman102018110
media100002018123456789
handyman500020181001001
handyman22018150

Code

To solve this problem, you need to sum the adwords earnings and then group them by the business type. That will give you the total earnings by each business type.

It translates to this in SQL.

SELECT business_type, 
       SUM(adwords_earnings) AS earnings
FROM google_adwords_earnings
GROUP BY business_type;

There are two columns in SELECT: business_type and earnings. This second column is calculated using the SUM() aggregate function with the column adwords_earnings.

The FROM clause refers to the only table we have.Then GROUP BY: it groups data by the business_type column, which is what the question asks.

Combined with SUM(), this will output the earnings by each business type.

All required columns and the first 5 rows of the solution are shown

business_typeearnings
media247914579
transport132323280
handyman6042187

Lookit!: Don’t Use the Aggregate Function in GROUP BY

It’s like kids trying to stick a screwdriver into a socket – we don’t know why you would do that, but you might be tempted.

We’ll rid you of the temptation: the aggregate functions are not allowed in GROUP BY. Why? Will you be electrocuted if you put aggregate functions there?

No, but you’ll get an error. Run this code and see for yourself.

SELECT business_type, 
       SUM(adwords_earnings) AS earnings
FROM google_adwords_earnings
GROUP BY business_type, SUM(adwords_earnings);

See!? And you wouldn’t believe us when we told you!

The aggregate functions are not allowed in GROUP BY

Using SQL GROUP BY With JOINs

When using JOINs in your query, it usually means you’ll have more than one table in the FROM clause. This doesn’t change anything regarding the use of GROUP BY: you simply state the columns by which you want to group your output.

The only change is that joining tables allows you to group by any columns from any table.

Question: Popularity of Hack

Here’s a good example by Meta.


Tables: facebook_employees, facebook_hack_survey

Link to the question: https://platform.stratascratch.com/coding/10061-popularity-of-hack

You have two tables at your disposal. The first one is facebook_employees.

Table: facebook_employees
idlocationagegenderis_senior
0USA24MFALSE
1USA31FTRUE
2USA29FFALSE
3USA33MFALSE
4USA36FTRUE

The second table is facebook_hack_survey.

Table: facebook_hack_survey
employee_idagegenderpopularity
024M6
131F4
229F0
333M7
436F6

Code

What you need to do here is to find the average popularity and group by office location. Compared to the previous code’s logic, the only addition is that there will be two tables.

Have a look.

SELECT e.location,
       AVG(s.popularity) AS avg_popularity
FROM facebook_employees e
JOIN facebook_hack_survey s ON e.id = s.employee_id
GROUP BY e.location;

We select the location and calculate the hack popularity by applying the AVG() function to the popularity column.

Since these two columns are from one table each, we have to join the tables.  We join the tables on the columns id and employee_id.

Finally, we group the output by the column that is required to solve the problem. In this case, it’s the column location from the table facebook_employees.

All required columns and the first 5 rows of the solution are shown

locationavg_popularity
UK4.33
USA4.6
India7.5
Switzerland1

Lookit!: Careful When Choosing the JOIN

This isn’t in direct relation to GROUP BY in SQL. But you need to be careful when choosing which join type to use, as they may return different results.

For instance, we used (INNER) JOIN for the previous question. The result would have been the same if we had used LEFT JOIN.

However, if we had chosen to use RIGHT JOIN, we wouldn’t have gotten the right (pun intended) result.

Run this code to see what it’ll return.

SELECT e.location,
       AVG(s.popularity) AS avg_popularity
FROM facebook_employees e
RIGHT JOIN facebook_hack_survey s ON e.id = s.employee_id
GROUP BY e.location;

All required columns and the first 5 rows of the solution are shown

locationavg_popularity
6.67
UK4.33
USA4.6
India7.5
Switzerland1

There’s one additional row in the output with the calculated average popularity but without the location. These are all the employees appearing in the table facebook_hack_survey, but missing from facebook_employees.

To be sure which JOIN to choose, read our article about types of SQL JOINs.

Using SQL GROUP BY With Multiple Columns

Using SQL GROUP BY With Multiple Columns

Yes, of course, you can group your output by more than one column. You simply list all the required columns in GROUP BY and separate them by a comma.

Question: Number Of Custom Email Labels

Let’s practice this in the Google interview question.


Tables: google_gmail_emails, google_gmail_labels

Link to the question: https://platform.stratascratch.com/coding/10120-number-of-custom-email-labels?code_type=1

It gives us two tables: google_gmail_emails,

Table: google_gmail_emails
idfrom_userto_userday
06edf0be4b2267df1fa75d295377a46f8323610
16edf0be4b2267df1fa32ded68d89443e8086
26edf0be4b2267df1fa55e60cfcc9dc49c17e10
36edf0be4b2267df1fae0e0defbb9ec47f6f76
46edf0be4b2267df1fa47be2887786891367e1

and google_gmail_labels.

Table: google_gmail_labels
email_idlabel
0Shopping
1Custom_3
2Social
3Promotion
4Social

Code

This question will require you to group by email labels and the user receiving an email. Here’s how to do this.

SELECT to_user AS user_id,
       label,
       COUNT(*) AS n_occurences
FROM google_gmail_emails e 
INNER JOIN google_gmail_labels l
ON e.id = l.email_id
AND l.label ILIKE 'custom%'
GROUP BY to_user, label;

The question asks us to output the receiver user ID, label, and the number of occurrences. For this third column, we simply use COUNT(*) to count the number of rows, which equals the number of occurrences.

Then we INNER JOIN the two available tables. We join them on the ID and email ID, and where the email label is custom.

To get the number of occurrences by user and email label, we list the columns to_user and label in the GROUP BY clause.

All required columns and the first 5 rows of the solution are shown

user_idlabeln_occurences
8bba390b53976da0cdCustom_32
5dc768b2f067c56f77Custom_24
114bafadff2d882864Custom_31
32ded68d89443e808Custom_24
a84065b7933ad01019Custom_11

Lookit!: Group What You Select

When using GROUP BY with the aggregate functions, it’s important to remember one simple rule, which is especially easy to forget if you need to group by several columns.

And the rule is: the columns you include in SELECT must also be included in GROUP BY. (Except, of course, the columns with the aggregate functions. We learned they don’t go in GROUP BY.)

Run the modified code above, where we included only one column in GROUP BY.

SELECT to_user AS user_id,
       label,
       COUNT(*) AS n_occurences
FROM google_gmail_emails e 
INNER JOIN google_gmail_labels l
ON e.id = l.email_id
AND l.label ILIKE 'custom%'
GROUP BY to_user;

Not only that the answer would be wrong, but the only output you’ll see will be an error.

Using SQL GROUP BY With ORDER BY

The use of ORDER BY doesn’t change anything in how you use GROUP BY. The only thing you need to remember is that ORDER BY comes after GROUP BY in SQL.

As ORDER BY is used for sorting your output, you can specify how you want it sorted. If you want it sorted ascendingly, write the keyword ASC after the columns in ORDER BY. If you omit the ASC keyword, the result will still be sorted ascendingly, as it is the default mode.

If you want descending sorting, use the keyword DESC.

Question: Expensive Projects

Let’s see how this works in a Microsoft interview question.


Tables: ms_projects, ms_emp_projects

Link to the question: https://platform.stratascratch.com/coding/10301-expensive-projects?code_type=1

It gives us two tables. The first one is ms_projects.

Table: ms_projects
idtitlebudget
1Project129498
2Project232487
3Project343909
4Project415776
5Project536268

The second table is ms_emp_projects.

Table: ms_emp_projects
emp_idproject_id
105921
105932
105943
105954
105965

Code

To answer this question, you will need to group the result by the project title and its budget, then order it by the budget/employee ratio.

Let’s analyze the code more closely.

SELECT title AS project,
       ROUND((budget/COUNT(emp_id)::FLOAT)::NUMERIC, 0) AS budget_emp_ratio
FROM ms_projects a
INNER JOIN ms_emp_projects b 
ON a.id = b.project_id
GROUP BY title, budget
ORDER BY budget_emp_ratio DESC;

First, we select the project title. Then we calculate the amount of the project’s budget per employee working on that project. We do that by dividing the budget by the number of employees, which we get by using the COUNT() function.

As the required columns are from both tables, we need to join them in the FROM clause. We do that on the project ID columns.

Then we group the output by title and budget, as both columns appear in the SELECT statement.

The question wants you to show the output with the projects with the highest budget per employee on top. In other words, sort the result descendingly using ORDER BY.

All required columns and the first 5 rows of the solution are shown

projectbudget_emp_ratio
Project824642
Project4924387
Project1524058
Project1023794
Project1922493

Lookit!: ORDER BY Must Come After GROUP BY

This one has to do with the order (pun intended) in which the SQL clauses are executed. ORDER BY is executed after the data is grouped, so if you put it before GROUP BY, the code won’t work.

Run this modified code to see that. It’s the same code as above. Only we put ORDER BY before GROUP BY.

SELECT title AS project,
       ROUND((budget/COUNT(emp_id)::FLOAT)::NUMERIC, 0) AS budget_emp_ratio
FROM ms_projects a
INNER JOIN ms_emp_projects b 
ON a.id = b.project_id
ORDER BY budget_emp_ratio
GROUP BY title, budget;

It throws an error.

Using SQL GROUP BY With WHERE Clause

The WHERE clause filters the data. It takes the logical condition state in WHERE, checks the data, and returns only the values that satisfy the condition. The conditions are set using logical operators or commands such as BETWEEN, LIKE, or IN.

If you don’t use SQL GROUP BY, you’re not bothered with the specific use of WHERE. However, using GROUP BY puts an important feature of WHERE to the front: it filters data before grouping. That’s why WHERE always comes before GROUP BY.

Again, knowing the execution order of the SQL clauses shows to be a valuable asset.

Question: Reviews of Hotel Arena

Let’s see how to use GROUP BY and WHERE in the question by Airbnb.


Table: hotel_reviews

Link to the question: https://platform.stratascratch.com/coding/10166-reviews-of-hotel-arena

There’s only one table to work with: hotel_reviews.

Table: hotel_reviews
hotel_addressadditional_number_of_scoringreview_dateaverage_scorehotel_namereviewer_nationalitynegative_reviewreview_total_negative_word_countstotal_number_of_reviewspositive_reviewreview_total_positive_word_countstotal_number_of_reviews_reviewer_has_givenreviewer_scoretagsdays_since_reviewlatlng
7 Western Gateway Royal Victoria Dock Newham London E16 1AA United Kingdom3592017-07-058.5Novotel London ExcelUnited Kingdomcoffee and tea at breakfast were not particularly hot Otherwise everything else was fine161158we were allocated the newly refurbished rooms and so everything was fresh and the bed was very comfortable the hotel is ideally situated near City Airport although eventually we travelled by train34210[' Leisure trip ', ' Family with young children ', ' Standard Double Room with Two Single Beds ', ' Stayed 2 nights ', ' Submitted from a mobile device ']29 days51.510.02
35 Charles Street Mayfair Westminster Borough London W1J 5EB United Kingdom2522015-08-299.1The Chesterfield MayfairIsraelNo Negative01166We liked everything The hotel is simply a boutique the staff were all polite and helpfull The room was clean and been serviced daily Wifi was completely free Breakfast was simply great I so much want to get back41810[' Leisure trip ', ' Couple ', ' Classic Double Room ', ' Stayed 4 nights ', ' Submitted from a mobile device ']705 day51.51-0.15
14 Rue Stanislas 6th arr 75006 Paris France402017-05-239.1Hotel Le SixUnited States of AmericaThere is currently utility construction taking place on the street in front of the hotel so a little noisy at times and barriers in place27177Neat boutique hotel Some of the most comfortable hotel beds I have ever come across Staff was wonderful Loved the location Not too touristy Luxembourg gardens close by and a great place for a morning run walk3939.2[' Leisure trip ', ' Family with young children ', ' Deluxe Double Room ', ' Stayed 4 nights ', ' Submitted from a mobile device ']72 days48.842.33
Gran V a De Les Corts Catalanes 570 Eixample 08011 Barcelona Spain3252016-08-258.2Sunotel CentralUnited KingdomCoffee at breakfast could be better When you spend this amount in a hotel I expect better coffee in the morning223870Great bed nice to have a coffee machine in the room love the air conditioning and basically loved the attitude of the staff Really great2629.2[' Leisure trip ', ' Group ', ' Comfort Double or Twin Room ', ' Stayed 1 night ', ' Submitted from a mobile device ']343 day41.382.16
Rathausstra e 17 01 Innere Stadt 1010 Vienna Austria1952015-09-178.5Austria Trend Hotel Rathauspark WienUnited KingdomA bit out of the way location wise91884Clean modern rooms and bathroom well equipped927.5[' Leisure trip ', ' Couple ', ' Comfort Room ', ' Stayed 2 nights ', ' Submitted from a mobile device ']686 day48.2116.36

Code

This question requires you to count the score occurrences for Hotel Arena only and show the hotel name and score.

SELECT hotel_name,
       reviewer_score,
       COUNT(*) AS score_occurrences
FROM hotel_reviews
WHERE hotel_name = 'Hotel Arena'
GROUP BY hotel_name, reviewer_score;

The code selects the hotel name and the review score. It then counts the number of each score occurrences using COUNT(*).

The condition in WHERE means the calculation is done only on the Hotel Arena hotels.

The output is grouped by hotel name and review score. You already learned how to group by multiple columns.

All required columns and the first 5 rows of the solution are shown

hotel_namereviewer_scorecount
Hotel Arena6.31
Hotel Arena5.41
Hotel Arena7.11
Hotel Arena4.21
Hotel Arena5.82

Lookit!: WHERE Must Come Before GROUP BY

Let’s see what happens if we use WHERE after GROUP BY.

SELECT hotel_name,
       reviewer_score,
       COUNT(*) AS score_occurrences
FROM hotel_reviews
GROUP BY hotel_name, reviewer_score
WHERE hotel_name = 'Hotel Arena';

The query again returns an error.

Using SQL GROUP BY With HAVING Clause

Using SQL GROUP BY With HAVING Clause

The HAVING clause is the same as WHERE, in that both clauses are used for filtering data. But there’s one significant difference: HAVING filters data after grouping.

The conditions in HAVING are set the same way as in WHERE. The difference is that HAVING accepts aggregate functions. And, of course, it always comes after GROUP BY.

Question: Cities With the Most Expensive Homes

Here’s a question by Zillow that asks you to use SQL GROUP BY with HAVING.


Table: zillow_transactions

Link to the question: https://platform.stratascratch.com/coding/10315-cities-with-the-most-expensive-homes

We have to work with the table zillow_transactions.

Table: zillow_transactions
idstatecitystreet_addressmkt_price
1NYNew York City66 Trout Drive449761
2NYNew York CityAtwater277527
3NYNew York City58 Gates Street268394
4NYNew York CityNorcross279929
5NYNew York City337 Shore Ave.151592

Code

To give a correct answer, you need to group data by city and filter those cities with average home prices higher than the national average using the HAVING clause.

Here’s how to do it.

SELECT city
FROM zillow_transactions a
GROUP BY city
HAVING AVG(a.mkt_price) >
  (SELECT AVG(mkt_price)
   FROM zillow_transactions)
ORDER BY city ASC;

The easy first step is to select the cities from the table and group by the same column.

OK, we now have the list of cities. This list should contain only cities with average house prices higher than the national average.

Filtering is done in HAVING. The average housing price by city is calculated using the AVG() function. This is then compared with the subquery that, again, uses AVG() to return the total national average price of homes.

Finally, the output is sorted by city alphabetically.

All required columns and the first 5 rows of the solution are shown

city
Mountain View
San Francisco
Santa Clara

Lookit!: HAVING Must Come After GROUP BY

Let’s write the same code, but put HAVING before GROUP BY.

Run the code below to see what happens.

SELECT city
FROM zillow_transactions a
HAVING AVG(a.mkt_price) >
  (SELECT AVG(mkt_price)
   FROM zillow_transactions)
GROUP BY city
ORDER BY city ASC;

An error, as you probably expected!

Using SQL GROUP BY With the Window Functions

The SQL window functions are the SQL functions that perform a calculation over a set of rows (window) that are related to the current row.

One of their most common uses is data aggregation. In that way, they are similar to aggregate functions. The significant difference is that the window functions don’t collapse the individual rows. In other words, they allow for the aggregated and non-aggregated values to be shown side by side.

The other common window functions use is ranking data.

When using the window functions with GROUP BY, it’s important to remember that the window functions can’t be included in GROUP BY, the same as the aggregate functions.

Question: Activity Rank

Let’s see how GROUP BY works alongside the window functions in this Google interview question.


Table: google_gmail_emails

Link to the question: https://platform.stratascratch.com/coding/10351-activity-rank

The question gives the table google_gmail_emails.

Table: google_gmail_emails
idfrom_userto_userday
06edf0be4b2267df1fa75d295377a46f8323610
16edf0be4b2267df1fa32ded68d89443e8086
26edf0be4b2267df1fa55e60cfcc9dc49c17e10
36edf0be4b2267df1fae0e0defbb9ec47f6f76
46edf0be4b2267df1fa47be2887786891367e1

Code

To solve the question, you have to use the aggregate and window functions with GROUP BY.

SELECT  from_user, 
        COUNT(*) as total_emails, 
        ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC, from_user ASC) AS ranking
FROM google_gmail_emails 
GROUP BY from_user
ORDER BY 2 DESC, 1;

First, select the sender. Then use COUNT(*) to count the number of times this sender appears. This will equal the number of emails sent by the user.

We use the ROW_NUMBER() window function to rank the users by the number of sent emails.

The output is grouped by the sender. Finally, it is sorted by the number of sent emails descendingly and by the ranking ascendingly.

All required columns and the first 5 rows of the solution are shown

from_usertotal_emailsrow_number
32ded68d89443e808191
ef5fe98c6b9f313075192
5b8754928306a18b68183
55e60cfcc9dc49c17e164
91f59516cb9dee1e88165

Lookit!: Don’t Include the Window Functions in GROUP BY

You learned that the aggregate functions don’t go in GROUP BY. So, you might be tempted to think that the window functions aren’t the aggregate functions (correct!) and that they can go in GROUP BY (not correct!)

Let’s still try it.

SELECT  from_user, 
        COUNT(*) as total_emails, 
        ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC, from_user ASC) AS ranking
FROM google_gmail_emails 
GROUP BY from_user, ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC, from_user ASC)
ORDER BY 2 DESC, 1;

Run the code to make sure it returns an error.

Why the window functions aren’t allowed in GROUP BY? The answer: they are executed after GROUP BY. So grouping by the window functions would mean grouping by something that isn’t yet calculated. In other words: impossible!

Summary

In this article, we discussed the seven most common uses of SQL GROUP BY. As you’ve seen, GROUP BY is a clause that offers various ways of grouping and aggregating data.

Each use case was supported by a practical example, which allowed you to see and write the code. Hope you took that opportunity, as coding is the best way for the new SQL knowledge to sink in.

There are even more coding challenges on our platform. You’re welcome to browse through and solve as many coding interview questions as possible. The questions are categorized into different difficulty levels. That way, you can smoothly learn SQL GROUP BY or any other SQL concept from the basic to an advanced level.

What Is SQL GROUP BY and How to Use It


Become a data expert. Subscribe to our newsletter.