Airbnb Data Scientist Interview Questions
Categories
This article will teach you how to solve one of the hard Airbnb data scientist interview questions.
Airbnb is one of the most popular companies for data scientists. It’s in the lodging and the hospitality industry. While traditionally, you wouldn’t expect this to be an industry of high interest for data scientists, Airbnb is different. And this difference, of course, reflects in the fact they operate as an online marketplace for accommodation that connects hosts and guests. To do that, they depend on data and data scientists.
And you depend on impressing at the job interview. If you’re interested in working at Airbnb, that is. One of the things that makes it easier for you to do well in the interview is knowing what type of questions they ask at Airbnb.
The other thing is answering the concrete question, like the one we’ll talk about right now.
Airbnb Data Scientist Interview Question
One of the hard data scientist interview questions from the Airbnb interview is this one:
Interview Question Date: February 2018
You’re given a table of rental property searches by users. The table consists of search results and outputs host information for searchers. Find the minimum, average, maximum rental prices for each host’s popularity rating. The host’s popularity rating is defined as below: 0 reviews: New 1 to 5 reviews: Rising 6 to 15 reviews: Trending Up 16 to 40 reviews: Popular more than 40 reviews: Hot
Tip: The id
column in the table refers to the search ID. You'll need to create your own host_id by concating price, room_type, host_since, zipcode, and number_of_reviews.
Output host popularity rating and their minimum, average and maximum rental prices.
Link to the question: https://platform.stratascratch.com/coding/9632-host-popularity-rental-prices
Note that the question on the platform has been slightly modified. Compared to the video, the main difference is the names of the popularity rating categories and the table. Because of that, the solution here will differ slightly, but the approach will be the same.
Data
You’ll have the table airbnb_host_searches at your disposal to solve this problem.
id | price | property_type | room_type | amenities | accommodates | bathrooms | bed_type | cancellation_policy | cleaning_fee | city | host_identity_verified | host_response_rate | host_since | neighbourhood | number_of_reviews | review_scores_rating | zipcode | bedrooms | beds |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8284881 | 621.46 | House | Entire home/apt | {TV,"Cable TV",Internet,"Wireless Internet","Air conditioning",Pool,Kitchen,"Free parking on premises",Gym,"Hot tub","Indoor fireplace",Heating,"Family/kid friendly",Washer,Dryer,"Smoke detector","Carbon monoxide detector","First aid kit","Safety card","Fire extinguisher",Essentials,Shampoo,"24-hour check-in",Hangers,"Hair dryer",Iron,"Laptop friendly workspace"} | 8 | 3 | Real Bed | strict | TRUE | LA | f | 100% | 2016-11-01 | Pacific Palisades | 1 | 90272 | 4 | 6 | |
8284882 | 621.46 | House | Entire home/apt | {TV,"Cable TV",Internet,"Wireless Internet","Air conditioning",Pool,Kitchen,"Free parking on premises",Gym,"Hot tub","Indoor fireplace",Heating,"Family/kid friendly",Washer,Dryer,"Smoke detector","Carbon monoxide detector","First aid kit","Safety card","Fire extinguisher",Essentials,Shampoo,"24-hour check-in",Hangers,"Hair dryer",Iron,"Laptop friendly workspace"} | 8 | 3 | Real Bed | strict | TRUE | LA | f | 100% | 2016-11-01 | Pacific Palisades | 1 | 90272 | 4 | 6 | |
9479348 | 598.9 | Apartment | Entire home/apt | {"Wireless Internet","Air conditioning",Kitchen,Heating,"Smoke detector","Carbon monoxide detector",Essentials,Shampoo,Hangers,Iron,"translation missing: en.hosting_amenity_49","translation missing: en.hosting_amenity_50"} | 7 | 2 | Real Bed | strict | FALSE | NYC | f | 100% | 2017-07-03 | Hell's Kitchen | 1 | 60 | 10036 | 3 | 4 |
8596057 | 420.47 | House | Private room | {"Wireless Internet","Air conditioning",Pool,Kitchen,"Free parking on premises",Breakfast,"Family/kid friendly",Washer,Dryer,Essentials,Shampoo,Hangers,"Hair dryer","Self Check-In","Doorman Entry"} | 1 | 2 | Real Bed | flexible | FALSE | LA | f | 100% | 2016-04-20 | 0 | 91748 | 1 | 1 | ||
11525500 | 478.75 | Apartment | Entire home/apt | {"Wireless Internet","Air conditioning",Heating,Washer,Dryer,Essentials,"Laptop friendly workspace",Microwave,Refrigerator,Dishwasher,"Dishes and silverware","Cooking basics",Oven,Stove,"Host greets you"} | 2 | 1 | Real Bed | flexible | TRUE | NYC | f | 100% | 2017-10-07 | Williamsburg | 2 | 100 | 11206 | 1 | 1 |
Try your hand here:
The Solution Approach
The approach to writing a solution code can be broken down into these steps:
- Find the distinct hosts
- Create popularity rating categories using the CASE WHEN statement
- Find the minimum, average, and the maximum rental price for each category using the aggregate functions
- GROUP BY category
The Solution Assumptions
When you’re at the data science interview, it’s always advisable to state your assumptions. That way, the interviewer can understand your point of view and evaluate your solution against it. If you do that, you’ll avoid the possibility that your solution gets marked down just because you didn’t explain the assumptions involved in your solution.
Also, by stating the assumption, you’ll get a clearer idea of how to solve the interview question. Our one and only assumption for this Airbnb data scientist interview question is:
- all the values in the number_of_reviews column are integer and there’ll be no NULL values
The Airbnb Interview Question Solution
We should follow the steps outlined above.
The first thing to do here is to find the unique hosts. The tip from the question advises us to create our own host_id by concatenating several columns.
SELECT DISTINCT CONCAT(price, room_type, host_since, zipcode, number_of_reviews) AS host_id,
number_of_reviews,
price
FROM airbnb_host_searches
We’re using the CONCAT() function to create the host ID and to write the query as a CTE. We’ll also need information about the number of reviews and the price for every host.
This query is written as a CTE with the name hosts. It will output the result with the following first few rows:
The next step is to create the rating categories. To do that, you need to use the CASE WHEN statement. The most practical way is to write this part of a query as a CTE.
SELECT CASE
WHEN number_of_reviews = 0 THEN 'New'
WHEN number_of_reviews BETWEEN 1 AND 5 THEN 'Rising'
WHEN number_of_reviews BETWEEN 6 AND 15 THEN 'Trending Up'
WHEN number_of_reviews BETWEEN 16 AND 40 THEN 'Popular'
WHEN number_of_reviews > 40 THEN 'Hot'
END AS host_popularity,
price
FROM hosts;
If you run the first and the second step together, the code will return the following result (showing you only the first few rows):
Now, we have to calculate the minimum, average, and maximum prices and group by the host’s popularity. We’ll do that by writing the SELECT statement with the aggregate function and using the previous part of the code as a subquery in the following way:
SELECT host_popularity AS host_pop_rating,
MIN(price) AS min_price,
AVG(price) AS avg_price,
MAX(price) AS max_price
FROM
(SELECT CASE
WHEN number_of_reviews = 0 THEN 'New'
WHEN number_of_reviews BETWEEN 1 AND 5 THEN 'Rising'
WHEN number_of_reviews BETWEEN 6 AND 15 THEN 'Trending Up'
WHEN number_of_reviews BETWEEN 16 AND 40 THEN 'Popular'
WHEN number_of_reviews > 40 THEN 'Hot'
END AS host_popularity,
price
FROM hosts) a
GROUP BY host_pop_rating;
All three SQL aggregate functions are applied to the column price and the data is taken from the subquery. On the other hand, the subquery takes the data from the CTE.
If you put all this together, the final solution will be:
WITH hosts AS
(SELECT DISTINCT CONCAT(price, room_type, host_since, zipcode, number_of_reviews) AS host_id,
number_of_reviews,
price
FROM airbnb_host_searches)
SELECT host_popularity AS host_pop_rating,
MIN(price) AS min_price,
AVG(price) AS avg_price,
MAX(price) AS max_price
FROM
(SELECT CASE
WHEN number_of_reviews = 0 THEN 'New'
WHEN number_of_reviews BETWEEN 1 AND 5 THEN 'Rising'
WHEN number_of_reviews BETWEEN 6 AND 15 THEN 'Trending Up'
WHEN number_of_reviews BETWEEN 16 AND 40 THEN 'Popular'
WHEN number_of_reviews > 40 THEN 'Hot'
END AS host_popularity,
price
FROM hosts) a
GROUP BY host_pop_rating;
This query returns the correct answer.
Conclusion
By asking this question during the data science job interview, Airbnb tests several important SQL concepts. For example, finding the distinct values and manipulating data using the CONCAT() function.
The second important concept is the SQL aggregate functions. Also, this Airbnb data scientist interview question tested your skills in using the CASE WHEN statement and grouping data.
Also, to make all these concepts work in a quite neat query, you’d have to show proficiency in writing CTEs and subqueries.Along with your technical knowledge, how well you understand the question and approach the solution will also factor in writing a correct answer. Following best practices in structuring and writing the SQL code is always a good idea!