Probability Interview Questions For Data Scientists
Categories
In this article, we will look at a few probability questions from Data Science Interviews at top companies and outline various solving techniques.
Probability and Statistics are the basis of Machine Learning and Data Science. While esoteric algorithms might be the latest shiny thing in town, most aspiring data scientists pay scant attention to the basics. Probability interview questions appear simplistic and challenging at the same time. Each problem involving probability eventually reduces to a simple relationship.
Probability questions are very popular in Data Science interviews. Favorite topics of interviewers include -
- Games of Chance
- Combinatorics
- Probability Distributions
- Expected Values
- Bayes Theorem
In this article, we will look at a few probability questions from Data Science Interviews at top companies like Meta and outline various solving techniques.
Probability Interview Questions
Let us start with a simple problem. This problem came up in a Meta / Facebook Data Science Interview.
Two Cards Same Suit
What is the probability of drawing two cards from the same suit? (Assume that the cards are not replaced after drawing from the deck)
You can solve this problem on the StrataScratch platform here: https://platform.stratascratch.com/technical/2003-two-cards-same-suite
One way of solving this is by using combinatorics.
All possible cases can be found by finding the number of ways of drawing any two cards from 52 cards. This can be done in 52C2 or 52.51/2 = 1,326 ways.
Favorable cases can then be found similarly. First, we pick one of the four suits that we want to pair. This is possible in four ways. Then we proceed in a similar manner as above. Choose two cards from 13 possible cards in a particular suit. This is 13C2 or 78 ways. Overall, the number of ways of selecting any two cards from the same suit will be 4 x 78 = 312 ways. We can now find the probability simply as
Note: you can also use permutations (selection with arrangements) here instead of combinations. Just make sure that you keep the process consistent for both favorable and all possible cases.
Let us try another way of solving this problem. To do this, let us break the problem down. We are required to find two cards from the same suit. It is the second card that determines whether we have matched the suit or not. After the first card has been drawn, there are 51 cards left in the deck.
Say for instance we drew the Queen of Hearts. Of the remaining 51 cards, there are only 12 cards from Hearts (recollect, we have already drawn the queen of hearts). Hence, our probability is simply the same as earlier:
This way of breaking down probability questions is very helpful in solving complex-looking probability problems. Let us increase the difficulty a bit. This problem came up in a Yammer Data Science Interview.
Where are the Birthday People?
Find the probability that in a room of k people, at least two will have the same birthday. (Assume that there are 365 days in a year and k≤ 365)
You can answer the problem here. https://platform.stratascratch.com/technical/2028-where-are-the-birthday-people
Let us break this problem down. All possible cases are easy to find. Each of the k persons’ birthdays can fall on any of the 365 days of the year. Therefore, the number of all possible cases will be:
365 x 365 x 365 x 365 … k times
or
Finding the number of favorable cases is a bit complex. We can have two of k people sharing the same birthday, or three of k people having the same birthday and so on till all k people have the same birthday. To resolve this, we can split all possible cases into scenarios in the following manner. We can have:
Scenario | |
A | None of the k people sharing any birthday |
B | 2 of k people sharing their birthday |
C | 3 of k people sharing their birthday |
….. | |
D | All k of k people sharing their birthday. |
If we observe carefully, except for the first scenario (scenario A), all other scenarios constitute a favorable case. Therefore, only the first scenario will be the unfavorable case. Ergo, we can solve this relatively easily by finding the number of unfavorable cases and then subtracting these cases from all possible cases to find the favorable cases.
The unfavorable case is that none of the k people share birthdays. This is the same as assigning k birthdays out of 365 days without repetition. This can be accomplished in the following ways:
Therefore, the favorable cases will be:
The required probability is, therefore,
Let us finish our session with a problem involving conditional probability.
Two out of three tails
What is the probability of getting exactly three tails when flipping four fair coins simultaneously? It is known that at least two tails show up.
You can answer the problem here https://platform.stratascratch.com/technical/2285-two-out-of-three-tails
While this problem can be solved using the conditional probability relationship, we will reserve it for slightly complex cases. Let us solve this by breaking the problem down. If the disregard the additional information (at least two of the four-coin flips are tails), then we can find the favorable cases and all possible cases easily.
All possible cases are Heads or Tails (two outcomes) for each of the four flips or 24 or 16 scenarios.
Favorable cases are exactly three Tails of the four flips. This can be found simply by choosing which of the four flips come up as heads or 4 possibilities. We can also enumerate the cases easily. TTTH, TTHT, THTT, HTTT.
What makes this problem different and potentially tricky is that it is given that there are at least two tails that show up. This reduces the number of possible cases without affecting the favorable cases which are still only four. To help illustrate this let us list out all possible cases.
Scenario | Possibilities | # Cases |
4 Heads | HHHH | 1 |
3 Heads, 1 Tails | HHHT, HHTH, HTHH, THHH | 4 |
2 Heads, 2 Tails | HHTT, HTHT, THHT HTTH, THTH, TTHH | 6 |
1 Heads, 3 Tails | TTTH, TTHT, THTT, HTTT | 4 |
4 Tails | TTTT | 1 |
So, we can simply subtract those cases that are not possible viz – no tails come up and those where only one tail shows up. (Italicized above). Let us calculate these.
No tails show up: There is only one possible scenario – All four coins turn up as heads.
Only one tail shows up: This is possible in four ways (the flip side of our favorable case where only one head shows up).
All possible cases are now reduced to eleven (16, less the 5 above) or 11.
The required probability is, therefore: 4 / 11
Conclusion
In this article, we looked at the examples of the type of questions asked in Data Scientist Interviews based on Probability. Problems involving Probability involve math but they can also be solved by breaking the problem down to its basics. As with other skills, one can improve his or her proficiency in this area with persistence and practice. Also, check out our post “30 Probability and Statistics Interview Questions for Data Scientists” to find more such questions that can help you sharpen your skills to ace your data science interview.