How to Join 3 or More Tables in SQL
Categories
Here we’re going to talk about using a commonly required SQL concept - how to join 3 or more tables in SQL.
The SQL experts often disagree on the SQL JOINs. One camp says the JOINs are the basic SQL concept. Others claim the JOINs are one of the advanced SQL topics. We won’t side with anyone because it won’t change the fact that SQL JOINs are one of the most required concepts at the SQL job interviews.
You have probably, as most people did, learned SQL JOINs by joining two tables. At some point, you probably wondered if it’s possible to join 3 tables in SQL or even more tables using the JOIN keyword.
The short answer is, yes, it’s possible!
The longer answer is, yes, it’s possible, and we will show you how to join 3 or more tables in SQL on a concrete job interview question.
The longest answer is, yes, it’s possible, and we will show you a practical example, but we first have to make sure that you understand how joining two tables works.
Joining Two Tables in SQL
If you know how to join two tables, you for sure know how to join 3 tables in SQL or even more than 3. Maybe you just don’t know that you know that.
For example, let’s take a look at this Airbnb interview question:
Find the total number of available beds per hosts' nationality. Output the nationality along with the corresponding total number of available beds. Sort records by the total available beds in descending order.
Answer:
SELECT
nationality,
SUM(n_beds) AS total_beds_available
FROM
airbnb_hosts h
INNER JOIN
airbnb_apartments a
ON
h.host_id = a.host_id
GROUP BY
nationality
ORDER BY
total_beds_available DESC;
Joining table consists of the next parts:
- The first table
- The selected join keyword
- The second table
- The ON keyword
- The column from the first table used to connect it with the second table
- The equal sign (=)
- The column from the second table used to connect it with the first table
In this example, the first table is airbnb_hosts, with alias h. Then comes the join keyword, which is INNER JOIN in this case. The second table is airbnb_apartments, with alias a. After the ON keyword comes the column host_id from the table airbnb_hosts. Whenever this column is equal to the column host_id from the second table, the data from two tables will be joined.
That’s it! You follow that structure whenever you use any of the common join types:
- (INNER) JOIN
- LEFT (OUTER) JOIN
- RIGHT (OUTER) JOIN
- FULL OUTER JOIN
- CROSS JOIN
Now, how do you apply this to joining 3 tables in SQL?
Joining 3 Tables in SQL
You’ll soon see why, once you know how to join 2 tables, joining 3 tables in SQL is easy. Here’s an interview question from LinkedIn:
Interview Question Date: November 2020
Identify projects that are overbudget. A project is overbudget if the prorated cost of all employees assigned to it exceeds the project’s budget.
To determine this, prorate each employee's annual salary to match the project's duration. For example, if a project with a six-month duration has a budget of $10,000.
The total prorated annual employee salary should not exceed $10,000.
Output a list of overbudget projects with the following details: project name, project budget, and prorated total employee expenses (rounded up to the nearest dollar).
Hint: Assume all years have 365 days and disregard leap years.
Answer:
SELECT title,
budget,
ceiling(prorated_expenses) AS prorated_employee_expense
FROM
(SELECT title,
budget,
(end_date::date - start_date::date) * (sum(salary)/365)
AS prorated_expenses
FROM linkedin_projects a
INNER JOIN linkedin_emp_projects b ON a.id = b.project_id
INNER JOIN linkedin_employees c ON b.emp_id=c.id
GROUP BY title,
budget,
end_date,
start_date) a
WHERE prorated_expenses > budget
ORDER BY title ASC;
We’re not going to explain the whole query, but only the JOIN part. In this case, INNER JOIN is showing up in the subquery, but this doesn’t change a thing in how 3 tables should be joined.
After you join 2 tables, the principle is that you repeat steps 2-7 mentioned above. That is, you only omit the first step, i.e., writing the name of the first table.
See how this looks like in the above code. The first table is linkedin_projects, with the alias a. Then comes the INNER JOIN keyword. After that, there’s a second table linkedin_emp_projects, with the alias b. These two tables are joined on the column id from the first table and the column project_id from the second table. Nice, you’ve joined 2 tables.
Now, you need to join the third table, which is linkedin_employees. To make SQL realize what you want to do, you need to state which join you want to use. In this case, it’s INNER JOIN again. Then you state the third table’s name, which is linkedin_employees, with the alias c. This table is joined using the column emp_id from the second table (linkedin_emp_projects) and the column id from the third table (linkedin_employees).
There’s nothing more to it. You’ve joined 3 tables in SQL! You can think about this in terms of forming a kind of chain of joins:
Note that this doesn’t mean that, by adding a third table, you can only join it with the second table. No, by adding the third table into the join chain, you’re just stating your intention to join this table with any of the previously joined tables. You can just as well join the third table with the first table if that makes sense in a particular query.
This is the general principle that works in every situation. However, what causes a little bit of conflict here and complicates things a bit is what happens if you use the LEFT JOIN instead of INNER JOIN.
INNER JOIN v LEFT JOIN
The logic of the INNER JOIN is that it returns all the matching rows from the first and the second table. If you join the third or any other number of tables, it doesn’t change a thing. The INNER JOIN will always return the matching rows from any number of tables you join.
But what if you use the LEFT JOIN? Here we want to introduce a concept of the reference table.
LEFT JOIN Reference Table
If you use the LEFT JOIN, the order in which you join the tables is extremely important. The LEFT JOIN will return all the rows from the first table, also called a reference table. Then it will look for the values in the second table. Wherever there are matching rows in the reference and the second table, you’ll get data from both tables.
However, if there are no rows in the second table that match the rows from the reference table, you’ll get the NULL values.
How does that work in practice and how do you then join 3 tables in SQL? Let’s have a look at an example question from LinkedIn:
Interview Question Date: August 2021
Write a query that will return all cities with more customers than the average number of customers of all cities that have at least one customer. For each such city, return the country name, the city name, and the number of customers
Before analyzing the interview answer, we’d like to show you first how the LEFT JOIN works itself and what we mean when saying it will return all the rows from the reference table and only the matching rows from the second table.
Suppose you write a code below:
SELECT *
FROM linkedin_city city
LEFT JOIN linkedin_customers customers
ON city.id = customers.city_id;
If you run it, it will return this output:
id | city_name | country_id | id | business_name | city_id |
---|---|---|---|---|---|
1 | London | 1 | 1 | Hair Studio | 1 |
1 | London | 1 | 2 | Kosmetik Plus | 1 |
1 | London | 1 | 3 | Kosmetik Plus | 1 |
2 | Berlin | 2 | 4 | Natural Skin | 2 |
2 | Berlin | 2 | 5 | Kosmetik Plus | 2 |
2 | Berlin | 2 | 6 | Natural Skin | 2 |
2 | Berlin | 2 | 7 | Kosmetik Plus | 2 |
3 | Manchester | 1 | 8 | Kosmetik Plus | 3 |
3 | Manchester | 1 | 9 | Hair Studio | 3 |
4 | New York | 3 |
Have a look at the last row. There is data in the columns id, city_name, and country_id. All these columns are from the table linkedin_city. However, there are no data in the following three columns. Why is that? It’s because the query returns the NULL values wherever there’s data from the master table, but there are no matching rows in the second table we joined. The result above tells us there are no customers located in New York.
How about adding the third table? Here it is:
SELECT *
FROM linkedin_city city
LEFT JOIN linkedin_customers customers
ON city.id = customers.city_id
LEFT JOIN linkedin_country as country
ON city.country_id = country.id;
The output of this code is below:
id | city_name | country_id | id | business_name | city_id | id | country_name |
---|---|---|---|---|---|---|---|
1 | London | 1 | 1 | Hair Studio | 1 | 1 | UK |
1 | London | 1 | 2 | Kosmetik Plus | 1 | 1 | UK |
3 | Manchester | 1 | 8 | Kosmetik Plus | 3 | 1 | UK |
3 | Manchester | 1 | 9 | Hair Studio | 3 | 1 | UK |
1 | London | 1 | 3 | Kosmetik Plus | 1 | 1 | UK |
2 | Berlin | 2 | 6 | Natural Skin | 2 | 2 | Germany |
2 | Berlin | 2 | 4 | Natural Skin | 2 | 2 | Germany |
2 | Berlin | 2 | 5 | Kosmetik Plus | 2 | 2 | Germany |
2 | Berlin | 2 | 7 | Kosmetik Plus | 2 | 2 | Germany |
4 | New York | 3 | 3 | USA |
Now there are two additional columns in the result. Even though there are NULL values, the columns id and country_name still have values in the last row. This shows New York has a matching row in the table linkedin_country, so data shows New York is in the USA, but there are no customers in that city and country.
Now we can get to the solution code of this interview question.
Answer:
WITH cities_customers AS (
SELECT
country.country_name AS country,
city.city_name AS city,
count(customer.id) AS total_customers
FROM linkedin_country country
INNER JOIN linkedin_city city
ON city.country_id = country.id
INNER JOIN linkedin_customers customer
ON city.id = customer.city_id
GROUP BY
country.country_name,
city.city_name
),
avg_customers AS (
SELECT
count(id)::float / count(DISTINCT city_id) AS avg_cus_per_city
FROM linkedin_customers
)
SELECT
country,
city,
total_customers
FROM cities_customers
WHERE
total_customers > (SELECT avg_cus_per_city FROM avg_customers);
Again, we will concentrate only on the join part, which can be found in the CTE. It joins table linkedin_country with the table linkedin_city. Then this table is joined with the table linkedin_customers. All 3 tables are joined using the INNER JOIN.
If you run only the first SELECT within the CTE, you’ll get this table:
country | city | total_customers |
---|---|---|
Germany | Berlin | 4 |
However, what if you replace the INNER JOINs with the LEFT JOINs? This new code looks like this:
SELECT
country.country_name AS country,
city.city_name AS city,
count(customer.id) AS total_customers
FROM linkedin_country country
LEFT JOIN linkedin_city city
ON city.country_id = country.id
LEFT JOIN linkedin_customers customer
ON city.id = customer.city_id
GROUP BY
country.country_name,
city.city_name;
How is this different? Your first table is again linkedin_country. When using LEFT JOIN to join it with the table linkedin_city, this will get you all the rows from the first table and only the matching rows from the second table. The second join is the LEFT JOIN between tables linkedin_city and linkedin_customers. The same logic applies here. This way, you will get all the rows from the table linkedin_country, only the matching rows from the table linkedin_city, and then again matching rows from the table linkedin_customers.
Here’s the result:
You see, there’s one additional row compared to the previous output. You’ve got the row that says there is the USA in the table country, there’s New York in the table city, but there are no customers from this city. This is exactly what we got above when we got the NULL values. The COUNT() function counted the number of customers from New York and returned 0.
It is knowing what you want as output is that will make you decide whether to use INNER JOIN or the LEFT JOIN. However, when you use the LEFT JOIN, be aware that the order of the tables does matter, unlike with the INNER JOIN. You should always have in mind which table is your left table because that’s the table you’ll get all rows from.
If you use one LEFT JOIN, it’s generally advisable that the second join should also be the LEFT JOIN. Thinking of multiple joins as a chain of joins becomes even more important with this type of join.
Joining N Tables
Now that you’ve learned how to join 3 tables in SQL, maybe you want to know how to join more than 3 tables in SQL?
No need for us to explain that; you already know! Joining more than three tables works exactly the same way as joining three tables. Simply select the type of join you need, follow the steps of joining three tables, and there you have it: you can join as many tables and add them to the chain of joins. There’s no limit to how many.
Conclusion
You saw that joining 3 or more tables in SQL is not that hard, especially if you use the INNER JOIN. The prerequisite is to be very comfortable with joins in general. That means you know how to join 2 tables easily, but you also know all the theory behind joins and, most importantly, what each join will return if you use it in your query. Speaking of which, you should be much more careful when using the LEFT JOIN because how you use it and which order of tables you choose can have a significant impact on the output you get.
Make sure that you have strong foundational knowledge. To help you with that, we’ve also written about the different types of SQL JOINs that you must know. And here if you want to practice more SQL JOIN interview questions.