Joining Multiple Tables in SQL - Examples and Walkthrough
Categories
We’ll show you how to use JOINs to join multiple tables in SQL and how choosing a specific type of JOIN can change the outcome of the query.
Tables in relational databases describe a specific entity - a person, organization, or any other object in the real world. Usually, they have a few columns that contain essential details about these objects. Tables are sometimes related, similar to how real-world entities like ‘student’ and ‘school’ might be related to one another.
SQL JOINs allow us to combine tables and add details about each object. For example, add school details to the ‘student’ table. JOINs are commonly used to combine two tables, but you can also JOIN 3 or more tables in SQL.
In this article, we’ll walk you through using JOINs to combine multiple tables in SQL and how chaining different types of JOINs can affect the outcome of the query.
SQL JOIN Statement
JOINs are useful when working with values from two different tables. Before writing any code, determine how tables might be related and identify columns containing the necessary information. This will help you choose the right type of JOIN for the task, especially if you’re going to chain multiple JOINs.
Sometimes when multiple tables you’re trying to JOIN in SQL do not have a shared dimension. In this case, you may need a junction table to align values from two other tables properly.
Needless to say, you should master JOINs on two tables before trying to join three or more in SQL.
For example, here’s how to use JOIN to combine two tables in SQL:
SELECT *
FROM students
INNER JOIN schools ON students.school_id = schools.id
JOINs combine tables on a shared dimension - student’s school. We use the ON clause to specify a JOIN condition. This is necessary because we want to combine information about a student and the school they belong to, not a random school.
In this example, we assume that the students table has a school_id column that specifies the school to which the student belongs. We try to match this value with the id of rows in the schools table. As a result, we can match records from two tables and gain access to additional information about a student's school, like its address.
This syntax applies to all common types of JOINs - INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN. The main difference between these is the output. INNER JOIN only returns rows that meet the criteria from both tables.
In this case, INNER JOIN only returns students whose school_id can be found in the schools table. Other JOINs might return student information even if there is no information about the corresponding school.
Relationships Between Tables in SQL
Understanding relationships between tables is essential for working with relational databases.
If values in a table reference records in another table, two tables are considered related. For example, the students table has school_id values referring to a specific school in the schools table.
Understanding foreign and primary keys is important for understanding relationships between tables in SQL. We already gave an example of a foreign key - a value that refers to a record in another table. In our previous example, we match foreign key (school_id) values with a value that uniquely identifies each school. Each record in the schools table must have its own distinct id.
Values that uniquely identify records are called primary keys.
Tables in a database usually describe real-life objects: a person, place, or something else. The relationships between tables are often similar to logical relationships between objects in the real world. A student record has a school_id value that refers to a school that the student goes to.
However, even in real life, various objects can relate to one another in different ways. For example, many students go to one school. In the context of a database, many student records can have the same school_id. This type of relationship is called many-to-one.
Let’s explore various types of relationships:
- one-to-one relationship
In this type of relationship, records in one table can relate to a maximum of one record in another table.
For example, the relationship between employees and offices. Under normal circumstances, one employee will be assigned only one office, and one office can hold only one employee.
- one-to-many relationship
The most common type of relationship in relational databases as well as the real world. In this case, a single entity is related to many other entities.
For example, one city can have many related zip_code values.
- many-to-one kind of relationship
Many-to-one is basically the same as a mirrored one-to-many relationship.
We discussed an example of a city to zip code as an example of one-to-many relationships. The reverse of that - zip codes to a city - is a ‘many to one’ relationship.
- many-to-many kind of relationship
In this type of relationship, both tables have entities related to multiple records.
For example, a student can take multiple classes, and classes can have multiple students in them.
In relational databases, every column must have a single value, so many-to-many relationships are difficult to represent.
A student takes multiple courses, but the course_id column can only store a single value. The inefficient approach is to have multiple rows describing the same student. All information, like the student's name, age, and school will be the same, but they’ll have different course_id values.
A better approach is to create a new table with the sole purpose of tracking relationships between students and courses. They are a common utility in SQL called junction tables.
Can you join multiple tables in SQL?
You can! SQL makes it really easy to join multiple tables. Obviously, you can JOIN two tables, but also three, four, and even more if necessary.
When you perform a JOIN, tables are horizontally combined. Columns from all combined tables are merged into one table.
JOINs essentially take columns from all combined tables and merge them into one. Two different tables might have columns with the same name, which is fine as long as the tables are separate. When you combine them, you end up with a table with two columns of the same name.
To solve this problem, use the AS keyword in the SELECT statement to give these columns a unique name.
How to Join multiple tables in SQL using INNER JOIN
Using INNER JOINs to join multiple tables in SQL is really simple. The first INNER JOIN combines the first two tables, and the second INNER JOIN is applied to the result of the first INNER JOIN, and so on.
Average Customers Per City - Joining multiple tables using INNER JOIN in Practice
This is a very specific interview question from LinkedIn. We’ll have to INNER JOIN three different tables to find the answer. This is just one of many SQL JOIN Interview Questions.
Interview Question Date: August 2021
Write a query that will return all cities with more customers than the average number of customers of all cities that have at least one customer. For each such city, return the country name, the city name, and the number of customers
Understand the question
Phrasing of this question is a little difficult, so make sure to read it multiple times.
We are asked to find cities where the number of customers is higher than average of all cities with at least one customer. So we have to calculate two values: the number of customers in each city, and the average number of customers in cities with at least one customer.
Analyze data
To answer this question, we need to work with data from three different tables.
Let’s try to understand these tables and possible relationships between them. This will help us JOIN them if needed.
id | business_name | city_id |
---|---|---|
1 | Hair Studio | 1 |
2 | Kosmetik Plus | 1 |
3 | Kosmetik Plus | 1 |
4 | Natural Skin | 2 |
5 | Kosmetik Plus | 2 |
- The id column contains unique primary key values that identify customers in the linkedin_customers table.
- business_name column contains a text value.
- city_id values refer to the city in which customers are located. It is a foreign key - a unique identifier that refers to records in another table.
No surprises with data here. One thing to note is that there are different instances of customers with the same name (Kosmetik Plus, for example) but different IDs. These rows probably describe different locations of the same business.
id | city_name | country_id |
---|---|---|
1 | London | 1 |
2 | Berlin | 2 |
3 | Manchester | 1 |
4 | New York | 3 |
- Because the table is named linkedin_city, we can assume that each record describes one city. id values are primary keys - they uniquely identify each record.
- city_name column probably contains city names in a text format.
- country_id values refer to the country where the city is located. It is a foreign key - a unique identifier that refers to records in another table.
The table lists city_id values, their corresponding cities, and the countries in which they are located.
id | country_name |
---|---|
1 | UK |
2 | Germany |
3 | USA |
- Because the table is named linkedin_country, we can assume that id values identify country records. They are also primary key values.
- country_name column contains the country name in a text format.
This table lists country id values and respective countries.
These three tables are definitely related to one another. The linkedin_customers table has foreign key values (city_id) that refer to cities from the linkedin_city table. Similarly, this table has the country_id column with foreign key values that refer to countries in the linkedin_country table.
Logically, many customers can be related to one city, and many cities can be related to one country.
In the next step, we’ll lay out our logical approach and try to choose the type of INNER JOIN best suited for the task.
Logical Approach
The question asks us to return the following values: city name, country name and number of customers. Currently this information is separated into three tables. Let’s combine them and align related customers, cities and countries to start.
Next, we should count the number of customers in each city. Once tables are combined, we need to group rows (customers) by country and city to which they belong. Then we can use the COUNT() aggregate function to find the number of records in each group.
In the following steps, we’ll compare the number of customers per city against the average number of customers in cities with at least one customer.
To calculate the average, we need to find and divide the following two numbers:
- Total number of customers
- The number of cities with at least one customer.
We can use COUNT() to find the total number of customers (rows) in the linkedin_customers table.
To find the number of cities with at least one customer, we can apply COUNT() to city_id values. The mere fact that city_id values are found in the linkedin_customers table means that these cities have at least one customer.
There may be multiple customers related to the same city. In this case, the city should be counted only once, as it fits the criteria of ‘city with at least one customer’. We can use the DISTINCT clause with the COUNT() function to get the number of unique city_id values.
To get the final answer, we need to compare all cities with the average. We should use the WITH clause to save the average value for later reference.
To answer the question, we should SELECT three values - city_name, country_name, and the number of customers per city. Finally, we should add a condition to only return cities where the number of customers is higher than the average.
Write the Code
1. JOIN three tables and find customers per city
The question explicitly tells us to return city_name, country_name and the number of customers per city. These bits of information are stored in three different tables that need to be combined into one.
SELECT country.country_name AS country,
city.city_name AS city,
COUNT(customer.id) AS total_customers
FROM linkedin_country country
INNER JOIN linkedin_city city ON city.country_id = country.id
INNER JOIN linkedin_customers customer ON city.id = customer.city_id
GROUP BY country.country_name,
city.city_name
We use INNER JOINs to return interconnected records. It will remove instances when a customer's city_id value does not refer to any city from the linkedin_city list or country_id doesn’t refer to any country in the linkedin_country table.
It’s also a good idea to give tables descriptive names - customer for customers table, city for cities, country for countries.
Once tables are combined, we create groups for each unique pair of country_name and city_name values. In the SELECT statement, we use the COUNT() aggregate function to find the number of cities (records) in each group.
Let’s use the WITH clause to save the result of this subquery, which looks like this:
country | city | total_customers |
---|---|---|
UK | Manchester | 2 |
UK | London | 3 |
Germany | Berlin | 4 |
2. Find the average of cities with at least one customer
We need to calculate the average number of customers for these cities. As we’ve already mentioned, we need to divide the total number of customers by the number of cities with at least one customer.
SELECT COUNT(id)::float / COUNT(DISTINCT city_id) AS avg_cus_per_city
FROM linkedin_customers
To find the total number of customers, we can simply count id values.
Next, we should find the number of cities with at least one customer. In other words, we need to count city_id values that appear at least once. If ten customers have the same city_id value, that city still counts as one. So we need to use the DISTINCT clause.
To make sure two numbers are divisible, we should convert one of them to a float. In this case, we cast count(id) (total number of customers) to a float type.
Let’s see the output of this query:
avg_cus_per_city |
---|
3 |
It’s a good idea to save the average as a common table expression so we can use it in the comparison later.
3. Return cities with higher than the average number of customers
Finally, we have both - the list of cities with the corresponding number of customers (cities_customers CTE) and the average number of customers (avg_customers CTE)
Finally, we can SELECT cities from the cities_customers common table expression, and use the WHERE clause to filter cities.
The condition for filtering is clear - the number of customers in the city must be greater than the previously calculated average value.
WITH cities_customers AS
(SELECT country.country_name AS country,
city.city_name AS city,
COUNT(customer.id) AS total_customers
FROM linkedin_country country
INNER JOIN linkedin_city city ON city.country_id = country.id
INNER JOIN linkedin_customers customer ON city.id = customer.city_id
GROUP BY country.country_name,
city.city_name),
avg_customers AS
(SELECT COUNT(id)::float / COUNT(DISTINCT city_id) AS avg_cus_per_city
FROM linkedin_customers)
SELECT country,
city,
total_customers
FROM cities_customers
WHERE total_customers >
(SELECT avg_cus_per_city
FROM avg_customers)
Output
Final answer should be a list of cities that satisfy the criteria - have a higher than average number of customers.
country | city | total_customers |
---|---|---|
Germany | Berlin | 4 |
How to Join multiple tables in SQL using LEFT JOIN
This time, let’s discuss another commonly used type of JOIN. They may lead to a different outcome, but LEFT JOINs are otherwise similar to INNER JOINs.
The Cheapest Airline Connection - Joining multiple tables using LEFT JOIN in Practice
This is a difficult but excellent question from Delta Airlines. A great thing about it is that the rules for finding an answer are very clear. The question specifies limitations, conditions, and assumptions you can make.
Answering this question will allow you to practice using LEFT JOINs to join multiple tables in SQL. Also, your ability to use various other SQL features to find an answer.
Interview Question Date: January 2021
COMPANY X employees are trying to find the cheapest flights to upcoming conferences. When people fly long distances, a direct city-to-city flight is often more expensive than taking two flights with a stop in a hub city. Travelers might save even more money by breaking the trip into three flights with two stops. But for the purposes of this challenge, let's assume that no one is willing to stop three times! You have a table with individual airport-to-airport flights, which contains the following columns:
• id - the unique ID of the flight; • origin - the origin city of the current flight; • destination - the destination city of the current flight; • cost - the cost of current flight.
Your task is to produce a trips table that lists all the cheapest possible trips that can be done in two or fewer stops. This table should have the columns origin, destination and total_cost (cheapest one). Sort the output table by origin, then by destination. The cities are all represented by an abbreviation composed of three uppercase English letters. Note: A flight from SFO to JFK is considered to be different than a flight from JFK to SFO.
Example of the output: origin | destination | total_cost DFW | JFK | 200
Understand the question
For this question, we are given a list of direct flights. We have to mix and match these direct flights to find cheap connections through hub airports. We can match up to three flights.
The question description itself is very readable and clear on the desired outcome. Still, because of many variables, it’s difficult to wrap your head around it all at once.
Interviewers give us specific instructions and even an example of what the output should look like. Rows should be arranged in a certain order by origin and destination columns.
This is a difficult question, so it’s a good idea to come up with a step-by-step logical approach before writing any code.
Analyze data
To answer this question, we have to work with a single da_flights table.
It’s safe to assume that da stands for Delta Airlines, and each row describes a direct flight.
Let’s look at values in the da_flights table:
id | origin | destination | cost |
---|---|---|---|
1 | SFO | JFK | 500 |
2 | SFO | DFW | 200 |
3 | SFO | MCO | 400 |
4 | DFW | MCO | 100 |
5 | DFW | JFK | 200 |
- id values are primary keys and identify each direct route.
- origin values refer to the starting point of the flight.
- destination values represent the destination of the flight.
- cost refers to the cost of the trip.
In this case, we have only one table, so we don’t have a foreign key. However, we’ll use origin and destination values to JOIN the table with itself.
Logical Approach
To answer this question, we need to find every possible way to get from one city to another. We can use a maximum of three flights (two stops) to make a route. Then we need to find the cheapest route for each destination.
For example, there may be a direct flight between city A and city B, but if a connection through the hub airport is cheaper, the final answer should include details (like the cost) of the cheaper route.
We need to consider all possible routes, so we'll probably have to generate every possible flight combination.
1. Generate every possible combination of flights
We start with a list of all available direct flights. We need all possible combinations of flights. The simplest approach is to do it gradually - find all possible combinations of two flights and then try to find a third matching flight.
We need records of all routes - three connecting flights, two connecting flights, and even direct flights that don’t have a pair.
Let’s think about how flights can be connected. Every flight in the table has an origin and destination. To find a connection, we’re looking for flights that take off from the same airport where a certain flight is going. For example, NYC to Chicago and Chicago to LA flights could be connected, with Chicago acting as a transfer airport.
So flights are connected if a traveler can take the first flight, land at the airport, and take a second flight from that airport to go somewhere else.
We can follow the same principle to find the third connecting flight. The origin of the third flight should be the same as the destination of the second flight.
We are given one table of direct flights but have to generate a table with up to three connected flights. We can simply JOIN the table with itself three times and even define conditions to find connecting flights. In this case, we’re looking for flights where the destination is equal to another flight’s origin.
The next step is deciding what type of JOIN to use. The question asks us to return every possible route - direct flights as well as combinations of two or three flights.
As you may know, LEFT JOIN returns all records (flights) from the first table and records that satisfy a condition (connected with the first flight) from the second table. We can add a second LEFT JOIN, which will be performed based on the first LEFT JOIN result. In other words, it will return all direct flights, two connected flights, and third flights that satisfy the criteria (connect with second flight).
Here’s the output of the first LEFT JOIN performed on two tables:
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 |
---|---|---|---|---|---|
SFO | JFK | 500 | JFK | LHR | 1000 |
SFO | DFW | 200 | DFW | JFK | 200 |
SFO | DFW | 200 | DFW | MCO | 100 |
SFO | MCO | 400 | |||
DFW | MCO | 100 | |||
DFW | JFK | 200 | JFK | LHR | 1000 |
JFK | LHR | 1000 |
When you chain multiple JOINs, the second one is performed on the result of the first JOIN. The output of the second LEFT JOIN should look like this:
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 | origin2 | destination2 | cost2 |
---|---|---|---|---|---|---|---|---|
SFO | DFW | 200 | DFW | JFK | 200 | JFK | LHR | 1000 |
DFW | JFK | 200 | JFK | LHR | 1000 | |||
SFO | JFK | 500 | JFK | LHR | 1000 | |||
SFO | DFW | 200 | DFW | MCO | 100 | |||
SFO | MCO | 400 | ||||||
DFW | MCO | 100 | ||||||
JFK | LHR | 1000 |
We use LEFT JOIN instead of INNER JOIN because we want to keep even flights that don’t connect. We need all three: direct flights (don’t connect with any other flight), two connected flights (connected with only one flight, but not the third), and three connected flights. INNER JOIN would remove all rows where all three flights didn’t connect.
2. Create a list of all possible routes
In the previous step, we horizontally combined three copies of the same table. The combined table has nine columns to describe up to three flights. We need to analyze every possible way to get from one city to another, calculate the cost of connecting flights, and return to the cheapest route.
So far, we only have a list of all possible connections. We need to create a new dataset containing each route's cost, origin, and final destination.
For routes that consist of one direct flight, the origin, destination, and cost are the same as the origin, destination, and cost of the flight itself.
For connected flights, the origin is the origin of the very first flight, but the destination is where the traveler actually lands - the destination of the last flight. To find their total cost, we need to add individual costs of two or three individual flights.
So we need to create three separate datasets for routes made up of one, two, or three flights and determine their costs and destination accordingly. Then we can use UNION ALL to combine three separate datasets into one. This gives us all possible ways to get from one city to another with up to three connecting flights.
3. Remove routes with empty destination and cost columns
In previous steps, some direct flights did not have any connecting flights. Because of how LEFT JOIN works, a lot of columns in the combined table were empty.
In step 2, we created three different datasets to describe the origin, destination, and cost of direct and connecting flights. Because we worked with the result of two LEFT JOINs, some routes could have an empty destination or cost columns. That’s why we need a WHERE clause to remove records where either destination or cost columns are empty.
4. Find the cheapest routes
Finally, we have a list of all routes. Most of the time, there will be multiple routes connecting the same two cities. For example, there may be a direct flight from city NYC to city Los Angeles, but also two connected flights - flying from city NYC to Chicago and from Chicago to Los Angeles. So route and destination values will be the same, but the cost will differ.
To find the cheapest route, we need to group rows with the same origin and destination values. Each group will contain all available connections between the two cities. We can use the min() aggregate function to find the cheapest route in each group.
Write the Code
1. Chain two LEFT JOINs to get a list of connections
We are given one table, which contains information about all direct flights.
We need to perform two LEFT JOINs and provide a condition that the second flight should take off from the airport where the first flight is going. The first LEFT JOIN is intended to connect two direct flights. We perform the second LEFT JOIN to connect the third flight.
SELECT f0.origin origin0,
f0.destination destination0,
f0.cost cost0,
f1.origin origin1,
f1.destination destination1,
f1.cost cost1
FROM da_flights f0
LEFT JOIN da_flights f1 ON f0.destination = f1.origin
We are JOINing the table with itself, so column names are going to be the same. We need to give each table a unique shorthand name to avoid any SQL errors.
The result of the first LEFT JOIN will look like this:
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 |
---|---|---|---|---|---|
SFO | JFK | 500 | JFK | LHR | 1000 |
SFO | DFW | 200 | DFW | JFK | 200 |
SFO | DFW | 200 | DFW | MCO | 100 |
SFO | MCO | 400 | |||
DFW | MCO | 100 | |||
DFW | JFK | 200 | JFK | LHR | 1000 |
JFK | LHR | 1000 |
But the question specifies that we can combine up to three flights.
We can chain another LEFT JOIN on the result of the first LEFT JOIN.
In this case, we’re looking for a third flight that takes off from the same airport where the second flight lands. So the origin of the third flight should be equal to the destination of the second flight.
SELECT f0.origin origin0,
f0.destination destination0,
f0.cost cost0,
f1.origin origin1,
f1.destination destination1,
f1.cost cost1,
f2.origin origin2,
f2.destination destination2,
f2.cost cost2
FROM da_flights f0
LEFT JOIN da_flights f1 ON f0.destination = f1.origin
LEFT JOIN da_flights f2 ON f1.destination = f2.origin
The result of the second LEFT JOIN:
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 | origin2 | destination2 | cost2 |
---|---|---|---|---|---|---|---|---|
SFO | DFW | 200 | DFW | JFK | 200 | JFK | LHR | 1000 |
DFW | JFK | 200 | JFK | LHR | 1000 | |||
SFO | JFK | 500 | JFK | LHR | 1000 | |||
SFO | DFW | 200 | DFW | MCO | 100 | |||
SFO | MCO | 400 | ||||||
DFW | MCO | 100 | ||||||
JFK | LHR | 1000 |
We’ll use the WITH clause to save this result as a CTE and reference it later.
WITH connections AS
(SELECT f0.origin origin0,
f0.destination destination0,
f0.cost cost0,
f1.origin origin1,
f1.destination destination1,
f1.cost cost1,
f2.origin origin2,
f2.destination destination2,
f2.cost cost2
FROM da_flights f0
LEFT JOIN da_flights f1 ON f0.destination = f1.origin
LEFT JOIN da_flights f2 ON f1.destination = f2.origin)
2. Create a new list of flights, including connections
In the previous step, we laid out all possible combinations of three flights. Now we’ll work with the result of the previous step to create a list of routes.
We can use this information to create a new table to describe routes with up to three connecting flights. Rows will have three columns: origin, destination, and total cost of getting there.
Calculating costs
If a route contains only a direct flight, its total cost is the same as that of that direct flight. For two connected flights, we add individual costs of the first and second flights. For three connected flights, we add individual costs of three different flights.
Choosing the right origin and destination values
Once again, routes made up of direct flights are easy - simply select the origin and destination values of the flight itself.
For two connected flights, we should SELECT the origin of the first flight and the destination of the final (second) flight. For three connected flights, we SELECT the destination of the third (final) flight.
Because of the reasons outlined in our logical approach, we need to create three separate datasets and store information like the origin, destination, and total cost of direct and connected flights.
Then we’ll combine them into one.
SELECT origin0 origin,
destination0 destination,
cost0 AS cost
FROM connections
UNION ALL
SELECT origin0 origin,
destination1 destination,
cost0+cost1 AS cost
FROM connections
UNION ALL
SELECT origin0 origin,
destination2 destination,
cost0+cost1+cost2 AS cost
FROM connections
We need to use UNION ALL to vertically stack them on top of one another to store all routes in one table.
Steps 1 and 2 are part of a larger subquery in the FORM clause. We need these two steps to generate a list of all possible routes consisting of up to three direct flights. The subquery will be named ‘list’ and referenced as such in the next step.
3. Remove routes with empty destination and cost columns
Output of the previous step shows that not all routes have destination and cost values.
The majority of routes are made up of single or two flights and don’t have corresponding values for the third flight.
Let’s set a condition to remove rows where destination or cost columns are empty.
SELECT origin,
destination,
cost
FROM list
WHERE destination IS NOT NULL
AND cost IS NOT NULL
list is the reference to the result of the subquery.
4. Find the cheapest connection between two cities
To find the cheapest route, we should have a group for every unique pair of origin and destination values. Then we can apply the min() aggregate function to find the record (route) with the lowest cost value in each group.
Finally, our code should look something like this:
SELECT origin,
destination,
min(cost)
FROM
(WITH connections AS
(SELECT f0.origin origin0,
f0.destination destination0,
f0.cost cost0,
f1.origin origin1,
f1.destination destination1,
f1.cost cost1,
f2.origin origin2,
f2.destination destination2,
f2.cost cost2
FROM da_flights f0
LEFT JOIN da_flights f1 ON f0.destination = f1.origin
LEFT JOIN da_flights f2 ON f1.destination = f2.origin) SELECT origin0 origin,
destination0 destination,
cost0 AS cost
FROM connections
UNION ALL SELECT origin0 origin,
destination1 destination,
cost0+cost1 AS cost
FROM connections
UNION ALL SELECT origin0 origin,
destination2 destination,
cost0+cost1+cost2 AS cost
FROM connections) list
WHERE destination IS NOT NULL
AND cost IS NOT NULL
GROUP BY origin,
destination
Output
Query should return a table with cheapest routes between two cities.
origin | destination | min |
---|---|---|
DFW | MCO | 100 |
SFO | DFW | 200 |
SFO | LHR | 1400 |
DFW | LHR | 1200 |
SFO | MCO | 300 |
How to Join multiple tables in SQL using both INNER JOIN & LEFT JOIN
Sometimes, we need to use different types of JOINs to join multiple tables in SQL. Different types of SQL JOINs have different effects, so the order (which one you use first) can affect the final outcome.
Joining multiple tables in SQL using INNER JOIN & LEFT JOIN
In the previous question, we used two LEFT JOINs to generate flight combinations.
JOINs in the previous example looked something like this:
SELECT * FROM da_flights f0
LEFT JOIN da_flights f1 ON f0.destination = f1.origin
LEFT JOIN da_flights f2 ON f1.destination = f2.origin
First LEFT JOIN connects two direct flights, and the second one finds an additional direct flight.
Now let’s see what happens when we use one INNER JOIN and one LEFT JOIN instead.
As we mentioned before, the order of JOINs can affect the outcome.
Example 1: INNER JOIN first, LEFT JOIN second
Remember when we chain multiple JOINs, each additional JOIN is applied to the result of previously performed JOINs. Anticipating the result of the first JOIN can help you predict the outcome of multiple JOINs as well.
In this query, we use LEFT JOIN to join the first two flight tables:
SELECT f0.origin origin0,
f0.destination destination0,
f0.cost cost0,
f1.origin origin1,
f1.destination destination1,
f1.cost cost1
FROM da_flights f0
LEFT JOIN da_flights f1 ON f0.destination = f1.origin
The output of this query looks like this:
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 |
---|---|---|---|---|---|
SFO | JFK | 500 | JFK | LHR | 1000 |
SFO | DFW | 200 | DFW | JFK | 200 |
SFO | DFW | 200 | DFW | MCO | 100 |
SFO | MCO | 400 | |||
DFW | MCO | 100 | |||
DFW | JFK | 200 | JFK | LHR | 1000 |
JFK | LHR | 1000 |
Let’s use INNER JOIN instead - many of the records will be filtered out.
SELECT f0.origin origin0,
f0.destination destination0,
f0.cost cost0,
f1.origin origin1,
f1.destination destination1,
f1.cost cost1
FROM da_flights f0
INNER JOIN da_flights f1 ON f0.destination = f1.origin
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 |
---|---|---|---|---|---|
SFO | JFK | 500 | JFK | LHR | 1000 |
SFO | DFW | 200 | DFW | JFK | 200 |
SFO | DFW | 200 | DFW | MCO | 100 |
DFW | JFK | 200 | JFK | LHR | 1000 |
This might’ve been useful if we wanted to remove direct flights that did not have a match. This question is different. We want to consider the cost of direct flights as well as connecting flights and return the cheapest. We should not use INNER JOIN because it removes direct flights.
Previously we chained a second LEFT JOIN to match two connected flights with a third direct flight.
SELECT f0.origin origin0,
f0.destination destination0,
f0.cost cost0,
f1.origin origin1,
f1.destination destination1,
f1.cost cost1,
f2.origin origin2,
f2.destination destination2,
f2.cost cost2
FROM da_flights f0
LEFT JOIN da_flights f1 ON f0.destination = f1.origin
LEFT JOIN da_flights f2 ON f1.destination = f2.origin
When there is a match, LEFT JOIN will show the details of the third flight. If there is no match, it will leave the result of the first LEFT JOIN unchanged and leave out details of the third flight.
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 | origin2 | destination2 | cost2 |
---|---|---|---|---|---|---|---|---|
SFO | DFW | 200 | DFW | JFK | 200 | JFK | LHR | 1000 |
DFW | JFK | 200 | JFK | LHR | 1000 | |||
SFO | JFK | 500 | JFK | LHR | 1000 | |||
SFO | DFW | 200 | DFW | MCO | 100 | |||
SFO | MCO | 400 | ||||||
DFW | MCO | 100 | ||||||
JFK | LHR | 1000 |
To answer the original question, we want to consider all three - one, two, and three flights.
If the question was asking to consider only connected flights (two and three connections) then we could replace the first LEFT JOIN with an INNER JOIN and then use LEFT JOIN again.
SELECT f0.origin origin0,
f0.destination destination0,
f0.cost cost0,
f1.origin origin1,
f1.destination destination1,
f1.cost cost1,
f2.origin origin2,
f2.destination destination2,
f2.cost cost2
FROM da_flights f0
INNER JOIN da_flights f1 ON f0.destination = f1.origin
LEFT JOIN da_flights f2 ON f1.destination = f2.origin
The first INNER JOIN would remove direct flights without a connection. The second LEFT JOIN would include details of the third flight when it can be connected.
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 | origin2 | destination2 | cost2 |
---|---|---|---|---|---|---|---|---|
SFO | DFW | 200 | DFW | JFK | 200 | JFK | LHR | 1000 |
SFO | JFK | 500 | JFK | LHR | 1000 | |||
DFW | JFK | 200 | JFK | LHR | 1000 | |||
SFO | DFW | 200 | DFW | MCO | 100 |
In the example above, we used INNER JOIN first and LEFT JOIN second.
Example 2: LEFT JOIN first, INNER JOIN second
Now let’s change the order and use LEFT JOIN first and INNER JOIN as a second join.
SELECT f0.origin origin0,
f0.destination destination0,
f0.cost cost0,
f1.origin origin1,
f1.destination destination1,
f1.cost cost1
FROM da_flights f0
LEFT JOIN da_flights f1 ON f0.destination = f1.origin
The result of the first LEFT JOIN will be the same as before - it will include both single and connected flights.
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 |
---|---|---|---|---|---|
SFO | JFK | 500 | JFK | LHR | 1000 |
SFO | DFW | 200 | DFW | JFK | 200 |
SFO | DFW | 200 | DFW | MCO | 100 |
SFO | MCO | 400 | |||
DFW | MCO | 100 | |||
DFW | JFK | 200 | JFK | LHR | 1000 |
JFK | LHR | 1000 |
Using an additional INNER JOIN will return one record with three connecting flights. Because of how INNER JOIN works, it will remove all flights that don’t have a third matching flight.
SELECT f0.origin origin0,
f0.destination destination0,
f0.cost cost0,
f1.origin origin1,
f1.destination destination1,
f1.cost cost1,
f2.origin origin2,
f2.destination destination2,
f2.cost cost2
FROM da_flights f0
LEFT JOIN da_flights f1 ON f0.destination = f1.origin
INNER JOIN da_flights f2 ON f1.destination = f2.origin
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 | origin2 | destination2 | cost2 |
---|---|---|---|---|---|---|---|---|
SFO | DFW | 200 | DFW | JFK | 200 | JFK | LHR | 1000 |
Previously, we used INNER JOIN first and LEFT JOIN second, which led to this result:
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 | origin2 | destination2 | cost2 |
---|---|---|---|---|---|---|---|---|
SFO | DFW | 200 | DFW | JFK | 200 | JFK | LHR | 1000 |
SFO | JFK | 500 | JFK | LHR | 1000 | |||
DFW | JFK | 200 | JFK | LHR | 1000 | |||
SFO | DFW | 200 | DFW | MCO | 100 |
Then we changed the order (LEFT JOIN first, INNER JOIN second) and ended up with only one record.
As you can see, changing the order of JOINs can lead to a totally different outcome.
If you chain multiple JOINs, remember that the second JOIN will be performed on the result of the first one. The third JOIN will be performed on the result of the second JOIN, and so on.
To effectively combine various types of JOINs, decide how you want to filter tables at each stage. Let’s say you want to join 5 tables in SQL. Maybe you want the first two to be combined in a very specific and strict way - only return rows that meet the defined criteria. In that case, use INNER JOIN to join the first two tables.
You might want to combine a third table more leniently - keep the result of the first INNER JOIN intact, even if conditions for combining the third table aren’t met. In this case, you should use LEFT JOIN to join the third table. For the fourth table, you might want to use INNER JOIN again, and so on.
The trick is to anticipate the outcome of each type of JOIN and understand that subsequent JOINs are performed on the result of previous JOINs.
How to Join multiple tables in SQL using WHERE Clause
You can also use the WHERE clause to join tables in SQL.
Simply SELECT values you need from multiple tables, and use the WHERE clause to set multiple conditions.
Let’s proceed with our previous example of direct flights. In this case, we use SELECT to create three copies of initial flight data:
SELECT da_flights.origin origin0,
da_flights.origin destination0,
da_flights.origin cost0,
da_flights.origin origin1,
da_flights.origin destination1,
da_flights.origin cost1,
da_flights.origin origin2,
da_flights.origin destination2,
da_flights.origin cost2
FROM da_flights
The output will look something like this:
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 | origin2 | destination2 | cost2 |
---|---|---|---|---|---|---|---|---|
SFO | SFO | SFO | SFO | SFO | SFO | SFO | SFO | SFO |
SFO | SFO | SFO | SFO | SFO | SFO | SFO | SFO | SFO |
SFO | SFO | SFO | SFO | SFO | SFO | SFO | SFO | SFO |
DFW | DFW | DFW | DFW | DFW | DFW | DFW | DFW | DFW |
DFW | DFW | DFW | DFW | DFW | DFW | DFW | DFW | DFW |
JFK | JFK | JFK | JFK | JFK | JFK | JFK | JFK | JFK |
We could use the WHERE clause to find flight connections by setting the following condition: return rows where the destination of the first flight is the origin of the second flight, and the destination of the second flight is the origin of the third flight.
SELECT da_flights.origin origin0,
da_flights.origin destination0,
da_flights.origin cost0,
da_flights.origin origin1,
da_flights.origin destination1,
da_flights.origin cost1,
da_flights.origin origin2,
da_flights.origin destination2,
da_flights.origin cost2
FROM da_flights
WHERE 'origin1' = 'destination0'
AND 'origin2' = 'destination1'
However, this does not return any rows.
origin0 | destination0 | cost0 | origin1 | destination1 | cost1 | origin2 | destination2 | cost2 |
---|
We could find a match by applying the WHERE clause to various different combinations of flights, but the SQL code would become too lengthy and inefficient.
Another limit is that WHERE can only replace INNER JOIN. It can not substitute LEFT JOIN or any of OUTER JOINs. You can learn the differences between Left JOIN vs LEFT OUTER JOIN on the StrataScratch blog.
It is not a good idea to replace an INNER JOIN with a WHERE clause, especially if you’re working on multiple tables. The syntax is simply too complicated.
It’s best to use WHERE only to filter records in a table. Use various JOINs to combine tables horizontally and WHERE to filter the output. This will result in clear and more readable code.
How many joining conditions do you need for more than two tables?
Every JOIN (except for CROSS JOIN) requires at least one condition specified after the ON clause. You’ll need one JOIN to combine two tables and an additional JOIN for every additional table. For example, combining three tables would require 2 JOINs and four tables - 3 JOINs.
Every JOIN has one condition, so the number of joining conditions will be the same as the number of JOINs.
Remember that one condition can be a chain of two conditions. For example, you can perform a single INNER JOIN on multiple columns.
SELECT *
FROM products
INNER JOIN services ON products.price = services.price
AND products.inStock = services.inStock
In this case, we have one INNER JOIN, but records must satisfy two conditions - have the same price and inStock values. The thing is, in the context of the INNER JOIN, two conditions are still considered one joining condition because they are chained with the AND keyword.
Which join is most efficient in SQL?
All types of JOINs are equally efficient, but query execution time might be affected by factors like output data volume. This is especially evident when you chain multiple types of JOINs together.
Compared to other types, INNER JOIN is the least expensive operation simply because it tends to return a smaller volume of data.
LEFT JOINs, RIGHT JOINs, and OUTER JOINs return more rows than the INNER JOIN. A chain of multiple LEFT JOINs might take longer to execute only because it returns a larger volume of data. Otherwise, underlying algorithms of all types of SQL JOINs are equally efficient.
It’s important to note that when choosing the type of JOIN, efficiency should not be your main concern. More important is the task itself and which JOIN is needed to get the desired results.
Summary
In this article, we walked you through using various JOINs to join multiple tables in SQL. Hopefully, by now, you have an in-depth understanding of different types of JOINs, how they work, and how to use them in practice. To reinforce your knowledge, you should practice writing actual queries where you JOIN multiple tables in SQL.
The StrataScratch platform has hundreds of interview questions of various types and difficulties. You can find interview questions where you have to JOIN multiple tables and try to answer them. Each question has a code widget where you can write MySQL, PostgreSQL, and Python code and immediately see the output.