Introduction to IFNULL() Function in SQL
Categories
Learn all about the SQL IFNULL() function in MySQL and PostgreSQL with the example of a hard-level Interview Question asked by Google.
We are often faced with NULL values in datasets and calculations in SQL. In most cases, we will need to handle such NULL values with some operation or by feeding alternate values. In this article, we will introduce you to the SQL IFNULL function that will help you do just that.
What is SQL IFNULL()?
The first thing you need to know is that this IFNULL function is used in MySQL. Don’t worry! We will also discuss an equivalent function in PostgreSQL later in the article.
This SQL IFNULL function is critical in handling NULL values. It is considered to be a MySQL Flow Control function.
In any Flow Control function, there is an expression to be checked. The flow of data can be controlled by determining what the output should look like. In general, it can take the form of a boolean i,e, true or false, a particular column, or even a static value.
The SQL IFNULL() function can be used in the data manipulation statements such as SELECT, WHERE, GROUP BY, or ORDER BY. In this article, we will demonstrate the use of the function with the SELECT statement.
Syntax of SQL IFNULL() Function
Let’s look into the syntax of the SQL IFNULL function and try to understand how it works.
IFNULL(expression, alternate_value)
The SQL IFNULL() function takes two arguments. The first one is the expression, which checks if the value of the expression is NULL. If so, the function returns the second argument i.e. the alternate_value.
On the other hand, if the value of the expression is NOT NULL, then the same value is returned.
Let’s see this in action using simple examples first.
SELECT IFNULL(10, 0);
In this example, the first expression is NOT NULL, therefore, the output is 10 itself.
SELECT IFNULL(NULL, 5);
Here, the first expression is NULL, which means we need to substitute it with the alternate value mentioned in the second expression. The output, therefore, is 5.
The second expression doesn’t necessarily have to be an integer. It can be a string as well. Look at the example below:
SELECT IFNULL(NULL, ‘test’);
The output of this statement is ‘test’. The application of this functionality can be diverse depending on the situation. The main goal is to replace the NULL values in the dataset with an alternate value so that any operation done on those values later on, do not end in an error.
PostgreSQL Statement Equivalent to MySQL IFNULL() Function
The IFNULL() function does not exist in PostgreSQL. Knowing that PostgreSQL is also often used by data scientists, it is good to know the equivalent function that fits the language you’re working on.
PostgreSQL offers the COALESCE() function that retrieves the first NON-NULL value. In other words, COALESCE() function evaluates the arguments from left to right, till it finds the first NON-NULL value and returns it. The remaining arguments are not evaluated.
The syntax of the COALESCE() function is as follows:
COALESCE(arg1, arg2, …);
For instance, the below statement will return 5 as it is the first NON-NULL argument.
SELECT COALESCE(NULL, 5, 10);
The functionality is primarily the same as the MySQL IFNULL() function. It evaluates if the first expression is NULL, just like the IFNULL() function in MySQL. If NULL, the second expression is evaluated, and so on, until a NON-NULL value is found.
Using SQL IFNULL() to Solve the Advanced Interview Question by Google
In this example, we apply the SQL IFNULL() function to a hard-level interview question asked by Google. Let’s take a look at the question.
Interview Question Date: April 2022
For each unique user in the dataset, find the latest date when their flags got reviewed. Then, find total number of distinct videos that were removed on that date (by any user). Output the the first and last name of the user (in two columns), the date and the number of removed videos. Only include these users who had at least one of their flags reviewed by Youtube. If no videos got removed on a certain date, output 0.
Link to the question: https://platform.stratascratch.com/coding/2105-videos-removed-on-latest-date
Watch the video tutorial for this question on our YouTube channel.
The problem asks you to find the latest date when the flags got reviewed for each unique user in the dataset. And then we are asked to find the total number of distinct videos that were removed on that date.
The output is expected to include the first and last names of each unique user, the latest date when the flags got reviewed, as well as the number of videos removed. However, only the users who have had at least one video reviewed by YouTube can be included in the output. If no videos were removed on a certain date, the output must be 0.
1. Exploring the dataset
Google has provided us with two datasets, namely, user_flags, and flag_review. Let us look into these datasets, one at a time.
Here is a preview of the user_flags table.
user_firstname | user_lastname | video_id | flag_id |
---|---|---|---|
Richard | Hasson | y6120QOlsfU | 0cazx3 |
Mark | May | Ct6BUPvE2sM | 1cn76u |
Gina | Korman | dQw4w9WgXcQ | 1i43zk |
Mark | May | Ct6BUPvE2sM | 1n0vef |
Mark | May | jNQXAC9IVRw | 1sv6ib |
This table contains the first and last names of the users, the ID of the video, and the flag ID corresponding to that video.
This table contains NULL values in each of its columns, which means that these might be a cause for concern when we begin to operate on them.
Now, let us look at the schema of the flag_review table.
Here’s a preview of the table:
flag_id | reviewed_by_yt | reviewed_date | reviewed_outcome |
---|---|---|---|
0cazx3 | FALSE | ||
1cn76u | TRUE | 2022-03-15 | REMOVED |
1i43zk | TRUE | 2022-03-15 | REMOVED |
1n0vef | TRUE | 2022-03-15 | REMOVED |
1sv6ib | TRUE | 2022-03-15 | APPROVED |
This table contains the flag_id column as well. It also includes ‘reviewed_by_yt’ column which seems to contain binary values. The value is 1 if the video was reviewed by YouTube and 0 if it wasn’t. There is also a ‘reviewed_date’ column and a ‘reviewed_outcome’ column that shows whether the video was in fact, removed or approved as a result of the review by YouTube.
2. Writing out the approach
It is a good practice to write down your approach before solving any coding question. This allows you to think of any possible scenarios that might need to be covered. So, we will begin solving the question by first laying down the steps.
Step 1: Merge the two tables on flag_id
To get all of the information in one table, we can merge the two tables on the common column, flag_id. This way, we can scrutinize whether the videos flagged for review were reviewed by YouTube or not, and also, find out the outcome and the date of the review.
There is another condition we need to fulfill to continue working on the merged dataset. That is to only include the users who have had at least one of their flagged videos reviewed by YouTube i.e. the reviewed_by_yt should have a value of 1 for at least one of the videos pertaining to each unique user.
Step 2: Calculate the total number of distinct videos removed on the latest reviewed date
In the next step, we will need to count the number of distinct videos from the user_flags table. Since there are NULL values in the user_flags table, we will ignore the video_ids that contain a NULL value. This is one place where we can use the SQL IFNULL() function. We will count only the video_ids that do NOT have NULL values.
And since we only need the videos that were removed as a result of the YouTube review, we will add a condition in the WHERE clause. We will perform this while we turn our query from the last step into a CTE (Common Table Expression).
Step 3: Assign 0 to the videos that were NOT removed after the YouTube review
Since the question asks us to evaluate each unique user, we will also have videos that were approved with NULL values displayed in the count column. We are expected to handle this in a way that the output of these users is displayed as 0. So, again we will employ the IFNULL() functionality in our problem. Again, to begin another subquery we will turn the subquery of step 2 into another CTE.
In this subquery, we are tying all our operations together. We will select the first and the last names of each unique user as well as the latest reviewed date from the first CTE, and the count of the videos of each user, if NOT NULL and 0, if they are NULL.
3. Coding the solution
With our approach laid out, let us begin coding the solution.
Step 1: Merge the two tables on flag_id
SELECT user_firstname,
user_lastname,
MAX(reviewed_date) AS latest_date
FROM user_flags u
JOIN flag_review r USING (flag_id)
WHERE reviewed_by_yt = 1
GROUP BY user_firstname,
user_lastname
Here, we have used the MAX() function to find the latest of the ‘reviewed_date’ column. And we’ve grouped the merged table by the first and last names of each of the users who’ve had at least one of the videos reviewed by YouTube.
Let us run the query and find the output:
Step 2: Calculate the total number of distinct videos removed on the latest reviewed date
WITH a AS
(SELECT user_firstname,
user_lastname,
MAX(reviewed_date) AS latest_date
FROM user_flags u
JOIN flag_review r USING (flag_id)
WHERE reviewed_by_yt = 1
GROUP BY user_firstname,
user_lastname)
SELECT reviewed_date,
IFNULL(COUNT(DISTINCT video_id), 0) AS num_video
FROM flag_review r
JOIN user_flags f USING (flag_id)
WHERE reviewed_outcome = 'REMOVED'
GROUP BY reviewed_date
So, here we are placing the subquery into a CTE named ‘a’.
And then, performing an operation in the IFNULL() function’s first expression. The COUNT(DISTINCT video_id) evaluates if the count of the number of distinct video_ids is NULL or NOT NULL. If it is NOT NULL, then the value of the COUNT() function is returned. If it is NULL, then we display an alternate value, which in our case is 0.
This function, of course, only applies to dataset of filtered videos where the ‘review_outcome’ is ‘REMOVED’. This step is crucial in solving the problem as this tackles the main part of the problem, i.e. to display the date and number of removed videos.
Let us run the query to see the output:
Step 3: Assign 0 to the videos that were NOT removed after the YouTube review
Let us now turn the second subquery into a CTE as well, named ‘b’.
Now we will select the desired columns from the two CTEs and also perform only final check before we validate the solution.
WITH a AS
(SELECT user_firstname,
user_lastname,
MAX(reviewed_date) AS latest_date
FROM user_flags u
JOIN flag_review r USING (flag_id)
WHERE reviewed_by_yt = 1
GROUP BY user_firstname,
user_lastname),
b AS
(SELECT reviewed_date,
IFNULL(COUNT(DISTINCT video_id), 0) AS num_video
FROM flag_review r
JOIN user_flags f USING (flag_id)
WHERE reviewed_outcome = 'REMOVED'
GROUP BY reviewed_date)
SELECT a.user_firstname,
a.user_lastname,
a.latest_date,
IFNULL(b.num_video, 0) AS num_video
FROM a
LEFT JOIN b ON a.latest_date = b.reviewed_date
So, we have included the user’s first and last names from the first CTE ‘a’ and also the latest reviewed date from the same CTE.
From the second CTE, we are selecting the number of distinct videos. However, we are performing yet another SQL IFNULL() function on this column.
If the number of distinct videos does not contain a value, that is it is NULL, then we are presenting 0 as the substitute value to be displayed in the output table. If it is NOT NULL, we will display the value of ‘num_video’ found in the second CTE, ‘b’.
Finally, we are performing a left join on the latest reviewed date columns of the two CTEs because we have the name of each unique user in the first CTE, and the number of distinct videos is calculated in the second for the removed videos. So, it only makes sense to perform a LEFT JOIN instead of an INNER or CROSS JOIN.
If you’re interested in learning the similarities and differences between Left Join and Left Outer Join, check out our post “Left Join vs. Left Outer Join”.
Now, let us run the query and see the final solution:
user_firstname | user_lastname | latest_date | n_removed |
---|---|---|---|
William | Kwan | 2022-03-14 | 1 |
Daniel | Bell | 2022-03-16 | 1 |
Gina | Korman | 2022-03-17 | 2 |
Richard | Hasson | 2022-03-18 | 0 |
Evelyn | Johnson | 2022-03-17 | 2 |
Conclusion
Wasn’t that an interesting problem? We surely think so. We hope you learned all about the SQL IFNULL() function and how it can be used anywhere from the easiest of scenarios to a complex problem like the one we solved above. Whether you are using IFNULL() in MySQL or COALESCE() function in PostgreSQL, the core concept is the same and can be used to handle NULL values anywhere. For more interesting concepts, keep visiting us here at Stratascratch!