Facebook Data Science Interview Question and Solution in SQL - Friend Acceptance Rate
Categories
This data science interview question was asked by Facebook during a data science interview.
This article will cover both the solution to the question and a detailed explanation of the approach I take to answer this Facebook data science interview question. I walk through each step of my solution design, talk about business context and impact, and explain every line of code I write.
Question
Our suggestion is to try to solve the data science interview question first before reading the solution. You can try to solve this Facebook Data science interview question.
What is the overall friend acceptance rate by date?
Order by the latest friend request date to the earliest date.
The question asks the interviewee to calculate the friend acceptance rate over time, which means you need to calculate a rate and group it by date. Technically, this question leverages SQL concepts like LEFT JOIN, GROUP BY, SUBQUERIES, and data type conversions from INT to FLOAT or DECIMAL.
Link to the question: https://platform.stratascratch.com/coding/10285-acceptance-rate-by-date
user_id_sender | user_id_receiver | date | action |
---|---|---|---|
ad4943sdz | 948ksx123d | 2020-01-04 | sent |
ad4943sdz | 948ksx123d | 2020-01-06 | accepted |
dfdfxf9483 | 9djjjd9283 | 2020-01-04 | sent |
dfdfxf9483 | 9djjjd9283 | 2020-01-15 | accepted |
ffdfff4234234 | lpjzjdi4949 | 2020-01-06 | sent |
You're getting this one of the interesting SQL interview questions during a Facebook interview and you are given the data set schema where we have the sender, the receiver, the date and then the action. If we preview this table, we have again the sender, the receiver, the date and then the action sent.
Assumptions
Just starting off with this question, I am going to assume that every time a sender sends a friend request to the receiver that's always logged in this table as a sent action. We know that the denominator of the acceptance rate that we want to try to find there is sent. We also know that the numerator for that rate should be accepted in the action column.
Approach
Defining acceptance rate
What we're going to do first is define acceptance rate as the number of accepted divided by the number of sent requests. This becomes
acceptance rate = # accepted / # sents
Splitting rows by accepted and sent records
The next thing I want to do is be able to split the number of sent records and then be able to split the number of accepted records and then group them by date. In order to do that I'm going to use two subqueries to split this one table and make it into two tables. What I like to do when I create my solution especially in SQL is actually work starting at from the tables and the JOIN clause and work myself out.
In order to do that I can write SELECT and then user_id_sender, user_id_receiver, date and then action. Basically, the columns of the table FROM fb_friend_requests, WHERE action equals ‘sent’. That's the first subquery and if I run this code, I get exactly what I'm expecting - all of the sent records.
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='sent'
Then what I'm going to do is I am going to copy this subquery and create a new subquery. And now after I preview it, instead of sent, I have accepted.
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='accepted'
user_id_sender | user_id_receiver | date | action |
---|---|---|---|
ad4943sdz | 948ksx123d | 2020-01-06 | accepted |
dfdfxf9483 | 9djjjd9283 | 2020-01-15 | accepted |
fffkfld9499 | 993lsldidif | 2020-01-10 | accepted |
fg503kdsdd | ofp049dkd | 2020-01-10 | accepted |
r4gfgf2344 | 234ddr4545 | 2020-01-11 | accepted |
Creating my subqueries and creating the outer SELECT
These are my subqueries. I will now work from the outer top SELECT and use my subqueries in the FROM clause.
SELECT
*
FROM (
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='sent') a
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='accepted'
Adding the JOIN
Now what I will try to figure out next is what sort of JOIN I want to implement. I could use an INNER JOIN, LEFT JOIN, RIGHT JOIN, or an OUTER JOIN. In this case, we know that we are creating a ratio or a percentage. So, we're going to have a denominator and a numerator and what I'm doing with these subqueries is I'm actually defining the numerator and denominator and then joining them together.
What I really want to do is implement a LEFT JOIN to preserve all of the friend requests that were not accepted yet. This will be more obvious once I finish this query and show you the underlying data.
SELECT
*
FROM (
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='sent') a
LEFT JOIN (
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='accepted') b
Now on the keys, all I'm going to do is just line up the sender and the receivers for both tables. We're going to use two keys because we definitely want the sender and the receiver to be matched together since that's the primary key combination of these two users.
SELECT
*
FROM (
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='sent') a
LEFT JOIN (
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='accepted') b on b.user_id_sender = a.user_id_sender
and b.user_id_receiver = a.user_id_receiver
On the left, I have all of the friend requests and on the right side, I have all the accepted. You can see that there are some NULL records where we did a LEFT JOIN and the receiver did not accept the friend request so that's why you're seeing some blank rows.
Now what we want to do is count the number of accepted and then count up the number of sents and then group that by the date. In order to do that I will put the date column first and then define the number of accepted and the number of sent.
SELECT
date,
count(b.user_id_receiver)/count(a.user_id_sender)
FROM (
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='sent') a
LEFT JOIN (
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='accepted') b on b.user_id_sender = a.user_id_sender
and b.user_id_receiver = a.user_id_receiver
Data type conversion to output a decimal
The next thing I want to do is turn this entire operation here into a decimal. So, we're going to actually turn that into a FLOAT.
SELECT
date,
count(b.user_id_receiver)/count(a.user_id_sender)::float
FROM (
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='sent') a
LEFT JOIN (
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='accepted') b on b.user_id_sender = a.user_id_sender
and b.user_id_receiver = a.user_id_receiver
The reason why we have this casting of this count into a FLOAT is that we basically have integers with this count of receiver and an integer where we're counting the senders. So, an integer divided by an integer is going to remain an integer and it's just going to give us a 1 if we don't change this integer into a decimal or into a float. That's what we're doing with this casting operation.
Once we change the data type to a FLOAT, we can rename the column. Now we have the date and the acceptance rate and all we need now is just a GROUP BY. We'll GROUP BY the date. Lastly, we have the ORDER BY because this Facebook data science interview question is asking us to order by date from the latest date to the earliest date. What I'm going to do now is just type in date descending.
Solution
If I run this query, I get the date and the acceptance rate as an actual decimal or as an actual float.
SELECT
a.date,
count(b.user_id_receiver)/count(a.user_id_sender)::float
as acceptance_rate
FROM (
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='sent') a
LEFT JOIN (
SELECT
user_id_sender,
user_id_receiver,
date,
action
FROM fb_friend_requests
WHERE action='accepted') b on b.user_id_sender = a.user_id_sender
and b.user_id_receiver = a.user_id_receiver
GROUP BY a.date
ORDER BY a.date DESC
Expected Output:
date | percentage_acceptance |
---|---|
2020-01-04 | 0.75 |
2020-01-06 | 0.67 |
Now I have acceptance rate by date . I took from the one table given and split it into a table where we captured friend requests being sent and friend requests being accepted. And then we joined those two together preserving the number of sent requests. That allowed us to basically form a rate calculation where we are counting the number of acceptances and the number of senders. That is how we can get the acceptance rate.
Check out "Facebook data science interview questions" or "data science interview questions" to find more questions to practice.