A Detailed Comparison Between Inner Join vs. Outer Joins
Categories
In this article, we compare inner join vs. outer join. We will talk about their similarities and differences and show you how to use them in practice.
Fully exploiting SQL’s ability to manage and manipulate relational databases is possible only if you know how to join tables.
This will allow you to write complex queries across multiple datasets.
It goes without saying that knowing SQL JOINs also means knowing the differences between inner and outer joins.
So, in this article, you’ll find a detailed comparison between those two distinct types of joins in SQL.
What is a Join in SQL?
JOIN is an SQL clause that combines columns from two or more tables based on a related column between them.
Without it, you’re basically limited to querying only one table, which is completely useless in most real-life situations.
There are several join types in SQL:
- JOIN
- LEFT JOIN
- RIGHT JOIN
- FULL JOIN
- CROSS JOIN
These are all different types of SQL JOINs you must know.
Some of those are inner and some are outer joins, which we’ll see in the following sections. But, first, let’s talk about the join syntax.
Inner Join vs. Outer Join: The Syntax
A general syntax of joins in SQL is
SELECT …
FROM table_1
JOIN table_2
ON table_1.column = table_2.column;
The syntax doesn’t change whether you’re using inner or outer joins. It always has one table in the FROM clause followed by the join type keyword and the second table.
The tables are always joined using the ON clause. This clause is used for stating the joining condition, which is usually where one column from the first table equals another column from the second table.
Inner Join vs. Outer Join: Definition
What is Inner Join?
An inner join is a type of join that returns rows from both tables only when there is at least one match in the columns being joined. In other words, it outputs data whose keys are in the tables’ intersection. Therefore the name inner join.
JOIN is an inner type of join, so you can also write it as INNER JOIN in SQL.
Here’s how this type of join works.
What is Outer Join?
Outer joins extend the functionality of an inner join. They do that by returning not only matching rows but also the non-matched rows. More specifically, they return all the rows from the dominant (outer) table and only the matching rows from the subordinate (inner) tables.
This means they will include the records outside the tables' intersection. That’s why they’re called outer joins.
All three SQL outer joins work on the same principle, but they still will show different outputs.
- LEFT JOIN: Returns all rows from the left (first) table and the matched rows from the right (second) table. If there is no match, the result set will include NULLs on the side of the right table.
- RIGHT JOIN: Returns all rows from the right (second) table and the matched rows from the left (first) table. If there is no match, the result set will include NULLs on the side of the left table.
- FULL JOIN: Returns all the rows from the joined tables. It combines the results of both left and right outer joins and the unmatched rows from both tables.
These all belong to the outer join family, so they can also be written in SQL as LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN.
Here’s a more detailed explanation on the LEFT JOIN vs. LEFT OUTER JOIN example.
Comparison: Inner Join vs. Outer Join
Here’s a comparison between the two types of joins based on their main features.
Inner Join vs. Outer Join: Use Cases and Examples
Inner Join
As I already mentioned, INNER JOIN is used when you need only matching rows from the joined tables.
Here’s one example.
INNER JOIN Example: Hosts' Abroad Apartments
The question by Airbnb asks you to find the number of hosts that have accommodations in countries of which they are not citizens.
Find the number of hosts that have accommodations in countries of which they are not citizens.
Link to the question: https://platform.stratascratch.com/coding/10071-hosts-abroad-apartments
The dataset’s first table is airbnb_hosts. It’s a list of hosts and some details about them.
host_id | nationality | gender | age |
---|---|---|---|
0 | USA | M | 28 |
1 | USA | F | 29 |
2 | China | F | 31 |
3 | China | M | 24 |
4 | Mali | M | 30 |
The second table is airbnb_apartments. It’s a list of apartments connected with the first table via the column host_id.
host_id | apartment_id | apartment_type | n_beds | n_bedrooms | country | city |
---|---|---|---|---|---|---|
0 | A1 | Room | 1 | 1 | USA | New York |
0 | A2 | Room | 1 | 1 | USA | New Jersey |
0 | A3 | Room | 1 | 1 | USA | New Jersey |
1 | A4 | Apartment | 2 | 1 | USA | Houston |
1 | A5 | Apartment | 2 | 1 | USA | Las Vegas |
This problem should be solved by joining the tables with INNER JOIN. The tables are joined on two conditions. One is that the host ID is the same in both tables. The second is that the host’s nationality is different from the country where the accommodation is.
To get the number of such hosts, I use the COUNT() aggregate function and DISTINCT to count every host ID only once.
SELECT COUNT(DISTINCT h.host_id) AS number_of_hosts
FROM airbnb_hosts h
INNER JOIN airbnb_apartments a
ON h.host_id = a.host_id AND h.nationality <> a.country;
The output there are three hosts that satisfy the criteria.
count |
---|
3 |
INNER JOIN was necessary in solving this problem. Why not one of the outer joins? It’s because LEFT JOIN, for example, would include all the hosts, meaning even those who don’t meet the joining criteria, i.e., they don’t have accommodations in countries of which they’re not citizens.
We can check this by running the code that uses LEFT JOIN.
SELECT COUNT(DISTINCT h.host_id) AS number_of_hosts
FROM airbnb_hosts h
LEFT JOIN airbnb_apartments a
ON h.host_id = a.host_id AND h.nationality <> a.country;
This code shows there are 12 hosts with accommodations in countries in which they’re not citizens. You already know that’s not the correct answer, but feel free to click the ‘Check Solution’ button for yourself.
number_of_hosts |
---|
12 |
Outer Joins
The outer joins are used when you need the matching rows from both joined tables but also the non-matching rows from at least one table, if not both.
Let’s see examples for each of the outer joins in SQL.
LEFT OUTER JOIN Example: Find the Number of Employees Who Received the Bonus and Who Didn’t
The question by Dell & Microsoft asks you to solve the problem of finding the number of employees who received the bonus and who didn't. There are also some caveats here. You should use values from the bonus table. Also, one employee can receive more than one bonus.
Find the number of employees who received the bonus and who didn't. Bonus values in employee table are corrupted so you should use values from the bonus table. Be aware of the fact that employee can receive more than one bonus. Output value inside has_bonus column (1 if they had bonus, 0 if not) along with the corresponding number of employees for each.
Link to the question: https://platform.stratascratch.com/coding/10081-find-the-number-of-employees-who-received-the-bonus-and-who-didnt
The first table in the dataset is a detailed list of employees named adequately, employee.
id | first_name | last_name | age | sex | employee_title | department | salary | target | bonus | city | address | manager_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | Max | George | 26 | M | Sales | Sales | 1300 | 200 | 150 | Max@company.com | California | 2638 Richards Avenue | 1 |
13 | Katty | Bond | 56 | F | Manager | Management | 150000 | 0 | 300 | Katty@company.com | Arizona | 1 | |
11 | Richerd | Gear | 57 | M | Manager | Management | 250000 | 0 | 300 | Richerd@company.com | Alabama | 1 | |
10 | Jennifer | Dion | 34 | F | Sales | Sales | 1000 | 200 | 150 | Jennifer@company.com | Alabama | 13 | |
19 | George | Joe | 50 | M | Manager | Management | 100000 | 0 | 300 | George@company.com | Florida | 1003 Wyatt Street | 1 |
The second table is bonus.
worker_ref_id | bonus_amount | bonus_date |
---|---|---|
1 | 5000 | 2020-02-16 |
2 | 3000 | 2011-06-16 |
3 | 4000 | 2020-02-16 |
1 | 4500 | 2020-02-16 |
2 | 3500 | 2011-06-16 |
Solving this question requires using LEFT JOIN. Why? Along with the number of employees who received bonuses, you also need to show the number of those who didn’t. As those employees won’t be found in the table bonus, the question virtually asks you to show all employees from the table employee.
The solution involves writing a subquery, so let me explain it first.
The subquery LEFT JOINs the table employee with the table bonus on the employee ID.
It then uses the CASE statement to make a distinction between the employees who got the bonus and those who didn’t. It does that by checking whether the bonus date is NULL. Remember, if the employee from the first table isn’t found in the second table, the value in the column will be NULL, i.e., he or she didn’t receive a bonus.
Knowing that (and reading the question’s instructions) will make CASE statement look like a nice little trick for solving the interview question.
SELECT e.id,
CASE
WHEN bonus_date IS NULL THEN 0
ELSE 1
END AS has_bonus
FROM employee e
LEFT JOIN bonus b
ON e.id = b.worker_ref_id;
Here’s this soon-to-be subquery’s output snapshot.
id | has_bonus |
---|---|
5 | 0 |
13 | 0 |
11 | 0 |
10 | 0 |
19 | 0 |
Now, let’s turn this SELECT into a subquery and write the main query. It selects the column that labels data with 0 or 1, depending on the fact the bonus isn’t or is received.
The COUNT() function with DISTINCT will return the number of employees in each category.
SELECT has_bonus,
COUNT(DISTINCT base.id) AS n_employees
FROM
(SELECT e.id,
CASE
WHEN bonus_date IS NULL THEN 0
ELSE 1
END AS has_bonus
FROM employee e
LEFT JOIN bonus b
ON e.id = b.worker_ref_id) base
GROUP BY has_bonus;
The query shows there are 27 employees without a bonus and only three with.
has_bonus | n_employees |
---|---|
0 | 27 |
1 | 3 |
If you don’t trust me that you shouldn’t use INNER JOIN here, maybe you can try it yourself. Run the code below, rewritten with INNER JOIN. You’ll see it will return only the employees who received a bonus.
SELECT has_bonus,
COUNT(DISTINCT base.id) AS n_employees
FROM
(SELECT e.id,
CASE
WHEN bonus_date IS NULL THEN 0
ELSE 1
END AS has_bonus
FROM employee e
INNER JOIN bonus b
ON e.id = b.worker_ref_id) base
GROUP BY has_bonus;
has_bonus | n_employees |
---|---|
1 | 3 |
RIGHT OUTER JOIN Example: Products With No Sales
Here’s a question that wants you to use RIGHT JOIN. It’s a Salesforce and Amazon question, where you have to find products that have had no sales.
Interview Question Date: May 2022
Write a query to get a list of products that have not had any sales. Output the ID and market name of these products.
Link to the question: https://platform.stratascratch.com/coding/2109-products-with-no-sales
The first table in the dataset is fct_customer_sales.
cust_id | prod_sku_id | order_date | order_value | order_id |
---|---|---|---|---|
C274 | P474 | 2021-06-28 | 1500 | O110 |
C285 | P472 | 2021-06-28 | 899 | O118 |
C282 | P487 | 2021-06-30 | 500 | O125 |
C282 | P476 | 2021-07-02 | 999 | O146 |
C284 | P487 | 2021-07-07 | 500 | O149 |
The second one is dim_product.
prod_sku_id | prod_sku_name | prod_brand | market_name |
---|---|---|---|
P472 | iphone-13 | Apple | Apple IPhone 13 |
P473 | iphone-13-promax | Apple | Apply IPhone 13 Pro Max |
P474 | macbook-pro-13 | Apple | Apple Macbook Pro 13'' |
P475 | macbook-air-13 | Apple | Apple Makbook Air 13'' |
P476 | ipad | Apple | Apple IPad |
The solution RIGHT JOINs the fct_customer_sales table with the dim_product table. The tables are joined on the column prod_sku_id.
This way of joining tables will get you the list of all products from the table dim_products. In case some products can’t be found in the fct_customer_sales, they will be shown as NULL.
And this is exactly why the condition in WHERE includes only NULL values, as they represent products without sales.
SELECT p.prod_sku_id,
p.market_name
FROM fct_customer_sales s
RIGHT JOIN dim_product p
ON s.prod_sku_id = p.prod_sku_id
WHERE s.prod_sku_id IS NULL;
The code shows four products without sales.
prod_sku_id | market_name |
---|---|
P473 | Apply IPhone 13 Pro Max |
P481 | Samsung Galaxy Tab A |
P483 | Dell XPS13 |
P488 | JBL Charge 5 |
Now, you might ask if it is possible to get the same result using LEFT JOIN. It is, as LEFT JOIN and RIGHT JOIN are mirror images of each other.
The above code can be amended this way. I’ve only switched the order of the tables and changed the join type from RIGHT JOIN to LEFT JOIN.
SELECT p.prod_sku_id,
p.market_name
FROM dim_product p
LEFT JOIN fct_customer_sales s
ON s.prod_sku_id = p.prod_sku_id
WHERE s.prod_sku_id IS NULL;
As you can see, the output is exactly the same as earlier.
prod_sku_id | market_name |
---|---|
P473 | Apply IPhone 13 Pro Max |
P481 | Samsung Galaxy Tab A |
P483 | Dell XPS13 |
P488 | JBL Charge 5 |
FULL OUTER JOIN Example: New Products
In this question by Salesforce and Tesla, you need to count the difference between the number of products companies launched in 2020 compared to the same number from the previous year.
Interview Question Date: December 2020
You are given a table of product launches by company by year. Write a query to count the net difference between the number of products companies launched in 2020 with the number of products companies launched in the previous year. Output the name of the companies and a net difference of net products released for 2020 compared to the previous year.
Link to the question: https://platform.stratascratch.com/coding/10318-new-products
The only table you get is car_launches.
year | company_name | product_name |
---|---|---|
2019 | Toyota | Avalon |
2019 | Toyota | Camry |
2020 | Toyota | Corolla |
2019 | Honda | Accord |
2019 | Honda | Passport |
There are two subqueries in the solution. The first one outputs the company and product names that were launched in 2020.
SELECT company_name,
product_name AS product_2020
FROM car_launches
WHERE year = 2020;
The second subquery does the same but for the year 2019.
SELECT company_name,
product_name AS product_2019
FROM car_launches
WHERE year = 2019;
I have to join these two subqueries somehow so that I can use COUNT() to find the difference between the two years. The right choice is to use FULL OUTER JOIN. Why? Remember, it’s a join that outputs all the rows from both tables. I need exactly that!
Now I can use COUNT() and DISTINCT to count products launched in 2020 and 2019 separately and subtract two results.
I group the output by the company name because the difference between the products launched must be shown on a company level; as required by the question.
To make the output more readable I sort it by the company name alphabetically.
SELECT a.company_name,
(COUNT(DISTINCT a.product_2020)-COUNT(DISTINCT b.product_2019)) AS net_products
FROM
(SELECT company_name,
product_name AS product_2020
FROM car_launches
WHERE year = 2020) a
FULL OUTER JOIN
(SELECT company_name,
product_name AS product_2019
FROM car_launches
WHERE year = 2019) b ON a.company_name = b.company_name
GROUP BY a.company_name
ORDER BY company_name;
Here’s the net difference by company.
company_name | net_products |
---|---|
Chevrolet | 2 |
Ford | -1 |
Honda | -3 |
Jeep | 1 |
Toyota | -1 |
Inner Join vs. Outer Join: Advantages and Disadvantages
When talking about inner join vs. outer join advantages and disadvantages, there really aren’t any per se.
Which join you will choose really depends on what you need. So, you must be very clear about what you want to achieve. If you choose the wrong join type then its characteristics will, obviously, become disadvantages for you.
Tips for Optimizing Queries With Joins
There are some ways of optimizing queries where you use joins. The most obvious and important one is to choose the join type according to your needs. In other words, don’t use an outer join when an inner join suffices. That way, you’ll avoid outputting unnecessary data and slowing down your query.
Other tips for query optimization are:
1. Use Specific Column Names: In SELECT, specify only the columns you need instead of using SELECT *. Of course, the query is faster when selecting fewer columns.
2. Filter Early: Apply the WHERE clause before joining. That way, you can reduce the size of the dataset being joined.
3. Indexing: If the columns used for joining are indexed, this can significantly reduce the query’s running time.
4. Analyze Query Plans: Use your database’s features to analyze a query’s execution plan and optimize it accordingly.
Conclusion
Inner and outer join are two of the most used join families in SQL. Knowing their differences and distinct uses separates an SQL noob from an expert. Choose the wrong join type, and you’ll end up with completely skewed results and your boss jumping down your throat.
To avoid this uncomfortable situation, make sure you’re fluent in SQL joins. For that, use StrataScratch’s coding questions for practice and blog articles for theoretical explanations.