SQL Interview Questions for the Data Analyst Position
Categories
Let's go over several examples which highlight some of the most important SQL concepts you’ll see in the SQL interview questions for Data Analyst position.
The interview process for data analysts can be intimidating for those new to the position. The data analyst interviews typically cover a wide range of skills involving SQL, relational databases, statistics, probabilities, and practical problem-solving. While the skills which companies require differs from industry to industry and company to company, being proficient in SQL is typically mandatory. As such, a strong SQL ability will always position you well as a candidate for data analyst positions. One of the best ways to improve your SQL skills for interview questions is to leverage the interactive StrataScratch platform to practice real SQL interview questions and engage with a community of other data analysts.
In this article, we’ll cover some of the most important SQL concepts by working through several SQL interview questions for data analyst position.
SQL Skills a Data Analyst Must Know
During the data analyst interview process, you’ll more than likely encounter technical problems where you will be expected to answer SQL interview questions. It’s crucial to prepare for your interviews by practicing similar data analyst interview questions in advance. For context, you may be asked questions related to:
- Using WHERE vs HAVING()
- Performing JOINS and UNIONS
- Finding Unique Data Values
- Aggregation Functions
- Ordering Results
- And Other General SQL Skills
As such, we’re going to cover three data analyst SQL interview questions which clearly display these concepts, so you can internalize and better implement them.
Three SQL Interview Questions a Data Analyst Should Be Able to Solve
Data Analyst SQL Interview Question #1: Filtering Data Using WHERE or HAVING()
The data analyst interview question we’re going to look at to understand why to use WHERE or HAVING() comes from a Zillow interview.
Interview Question Date: December 2020
Write a query that identifies cities with higher than average home prices when compared to the national average. Output the city names.
Link to this data analyst SQL interview question: https://platform.stratascratch.com/coding/10315-cities-with-the-most-expensive-homes
It asks to find all cities with higher average home prices than the national average home price. For this problem, we only have one table zillow_transactions to examine.
id | state | city | street_address | mkt_price |
---|---|---|---|---|
1 | NY | New York City | 66 Trout Drive | 449761 |
2 | NY | New York City | Atwater | 277527 |
3 | NY | New York City | 58 Gates Street | 268394 |
4 | NY | New York City | Norcross | 279929 |
5 | NY | New York City | 337 Shore Ave. | 151592 |
We’ll break this down into two steps - first let’s find the average home price by city. For this, we’ll use one of the SQL Aggregate Functions - "avg()", and GROUP BY cities.
SELECT city,
avg(mkt_price)
FROM zillow_transactions
GROUP BY city
The next logical step is to limit this data to only the cities with average home prices larger than the national average. It’s at this point we must understand the distinction between the WHERE and HAVING() clause. WHERE applies before the calculation of city averages in our query above, so, if we compared our city averages to national averages in a WHERE clause, it’s removing individual home data before the city average calculation.
Whereas, if we use a HAVING() clause, we filter data after our city average calculation and can perform a proper comparison with the national average. Additionally, the HAVING() clause lets us include comparisons, subqueries, and other SQL code, so we can perform our national average calculation and comparison inside of it.
SELECT city
FROM zillow_transactions
GROUP BY city HAVING(avg(mkt_price) >
(SELECT avg(mkt_price)
FROM zillow_transactions))
city |
---|
Mountain View |
San Francisco |
Santa Clara |
As a data analyst, you must understand the difference between the WHERE and HAVING() clauses since you’ll in many cases need to filter out data and want to avoid discarding relevant data.
Data Analyst SQL Interview Question #2: How to Aggregate Distinct Data Values
The question we’re going to look at to understand why we must use unique values comes from a Postmates interview.
Interview Question Date: February 2021
How many customers placed an order and what is the average order amount?
Link to the question: https://platform.stratascratch.com/coding/2013-customer-average-orders
This question is relatively easy as it asks us to find how many customers placed an order as well as the average order amount. We only have one table postmates_orders to examine.
id | customer_id | courier_id | seller_id | order_timestamp_utc | amount | city_id |
---|---|---|---|---|---|---|
1 | 102 | 224 | 79 | 2019-03-11 23:27:00 | 155.73 | 47 |
2 | 104 | 224 | 75 | 2019-04-11 04:24:00 | 216.6 | 44 |
3 | 100 | 239 | 79 | 2019-03-11 21:17:00 | 168.69 | 47 |
4 | 101 | 205 | 79 | 2019-03-11 02:34:00 | 210.84 | 43 |
5 | 103 | 218 | 71 | 2019-04-11 00:15:00 | 212.6 | 47 |
Looking at the example table, we can see how we have the same customer_id ‘102’ applying to more than one order. As a result, finding how many customers placed an order isn’t as simple as counting all the rows. Rather, we must use the DISTINCT clause in our selection to count customer_ids only once.
As for finding the average order values, we can do this easily with the average aggregation function. Unlike the previous question, we don’t have to GROUP BY any column here since we want the average for the whole table (a single value).
SELECT count(DISTINCT customer_id),
avg(amount)
FROM postmates_orders
count | avg |
---|---|
5 | 139.22 |
It’s important to understand the proper use of DISTINCT. DISTINCT is looking for unique combinations of row data for the columns you specify. If DISTINCT is within the count function with only the customer_id following it, it’s going to disregard customer_ids which appear more than once before the aggregation. If it’s outside the count function after SELECT instead, it’s going to look for unique combinations of the count and average after the aggregation instead of before, so you end up with incorrect results.
Being able to understand how the DISTINCT clause works in conjunction with aggregation functions is critical for a data analyst since you’ll often have to run calculations on data featuring duplicates or multiple rows for a single field.
Data Analyst SQL Interview Question #3: How to Join and Aggregate Data then Order Results
The last data analyst SQL interview question we’ll look at shows how we can join and aggregate data then order results and comes from an Airbnb interview question.
Find the total number of available beds per hosts' nationality. Output the nationality along with the corresponding total number of available beds. Sort records by the total available beds in descending order.
Link to this data analyst SQL interview question: https://platform.stratascratch.com/coding/10187-find-the-total-number-of-available-beds-per-hosts-nationality
This data analyst SQL interview question is asking us to return the total comments received by a user in the last 30 days while excluding users who didn’t receive comments. There are two tables to analyze here:
host_id | apartment_id | apartment_type | n_beds | n_bedrooms | country | city |
---|---|---|---|---|---|---|
0 | A1 | Room | 1 | 1 | USA | New York |
0 | A2 | Room | 1 | 1 | USA | New Jersey |
0 | A3 | Room | 1 | 1 | USA | New Jersey |
1 | A4 | Apartment | 2 | 1 | USA | Houston |
1 | A5 | Apartment | 2 | 1 | USA | Las Vegas |
host_id | nationality | gender | age |
---|---|---|---|
0 | USA | M | 28 |
1 | USA | F | 29 |
2 | China | F | 31 |
3 | China | M | 24 |
4 | Mali | M | 30 |
Given we have to output the nationality along with the total number of available beds, we already know we’ll need to join the tables. In this case, since there is a match in the data for all the host_ids for the two tables, we can use an INNER JOIN. OUTER JOINs are typically used when there is some dissimilar data which we don’t have in this problem. If, for example, the airbnb_hosts table was missing some host_ids, then we would use an OUTER JOIN.
SELECT n_beds,
nationality
FROM airbnb_hosts h
INNER JOIN airbnb_apartments a ON h.host_id = a.host_id
The INNER JOIN is extremely simple - only requiring us to provide aliases for the two tables and then join them ON their matching field host_id. The final step in this data analyst SQL interview question is to aggregate the number of beds and order them. We’ll use the SUM() aggregation function and GROUP BY nationality to sum by nationality. Finally, we’ll ORDER BY a descending count of the sum of beds.
SELECT nationality,
SUM(n_beds) AS total_beds_available
FROM airbnb_hosts h
INNER JOIN airbnb_apartments a ON h.host_id = a.host_id
GROUP BY nationality
ORDER BY total_beds_available DESC
nationality | total_beds_available |
---|---|
Luxembourg | 320 |
USA | 248 |
Brazil | 72 |
China | 44 |
Mali | 28 |
Knowing how to JOIN tables in SQL is essential for a data analyst as you might often be working with several different datasets you need to combine before analysis. Keep in mind there are several types of SQL JOINs and which one you must use depends on how well your tables match up! Also, for best presenting and ranking your results, you’ll need to understand how to use the ORDER BY clause.
To practice more such JOIN interview questions, check out our post "SQL JOIN Interview Questions".
Conclusion
In this article, we went over several examples which highlight some of the most important SQL concepts you’ll see in Data Analyst SQL Interview Questions. Ultimately, there are almost endless SQL skills you’d need to master the language, but a good grasp of the concepts explained in this article will prepare you for a large amount of SQL interview questions you might have to solve during an interview.
Keep in mind SQL isn’t the only concept you’ll need to know for the Data Analyst interview, but it is one of the most important! On the StrataScratch platform, you can practice answering more data analyst interview questions. There you can build out your solutions independently or by using one of our articles as a guide. Beyond this, you’ll have access to a large pool of other members who post their solutions for feedback and can critique yours.