Probability Interview Questions For Data Scientists

Probability Interview Questions For Data Scientists


In this article, we will look at a few probability questions from Data Science Interviews at top companies and outline various solving techniques.

Probability and Statistics are the basis of Machine Learning and Data Science. While esoteric algorithms might be the latest shiny thing in town, most aspiring data scientists pay scant attention to the basics. Probability interview questions appear simplistic and challenging at the same time. Each problem involving probability eventually reduces to a simple relationship.

Probability=NumberofFavorableOutcomesNumberofAllPossibleOutcomesProbability = \frac{Number of Favorable Outcomes}{Number of All Possible Outcomes}

Probability questions are very popular in Data Science interviews. Favorite topics of interviewers include -

  • Games of Chance
  • Combinatorics
  • Probability Distributions
  • Expected Values
  • Bayes Theorem

In this article, we will look at a few probability questions from Data Science Interviews at top companies like Meta and outline various solving techniques.

Probability Interview Questions

Probability Interview Questions

Let us start with a simple problem. This problem came up in a Meta / Facebook Data Science Interview.

Two Cards Same Suit

What is the probability of drawing two cards from the same suit? (Assume that the cards are not replaced after drawing from the deck)

Two Cards Same Suit

You can solve this problem on the StrataScratch platform here:  https://platform.stratascratch.com/technical/2003-two-cards-same-suite

One way of solving this is by using combinatorics.

All possible cases can be found by finding the number of ways of drawing any two cards from 52 cards. This can be done in 52C2 or 52.51/2 = 1,326 ways.

Favorable cases can then be found similarly. First, we pick one of the four suits that we want to pair. This is possible in four ways. Then we proceed in a similar manner as above. Choose two cards from 13 possible cards in a particular suit. This is 13C2 or 78 ways. Overall, the number of ways of selecting any two cards from the same suit will be 4 x 78 = 312 ways. We can now find the probability simply as

3121326or417\frac {312}{1326} or \frac {4}{17}

Note: you can also use permutations (selection with arrangements) here instead of combinations. Just make sure that you keep the process consistent for both favorable and all possible cases.

Let us try another way of solving this problem. To do this, let us break the problem down. We are required to find two cards from the same suit. It is the second card that determines whether we have matched the suit or not. After the first card has been drawn, there are 51 cards left in the deck.

Say for instance we drew the Queen of Hearts. Of the remaining 51 cards, there are only 12 cards from Hearts (recollect, we have already drawn the queen of hearts). Hence, our probability is simply the same as earlier:

1251or417\frac {12}{51} or \frac {4}{17}

This way of breaking down probability questions is very helpful in solving complex-looking probability problems. Let us increase the difficulty a bit. This problem came up in a Yammer Data Science Interview.

Where are the Birthday People?

Find the probability that in a room of k people, at least two will have the same birthday. (Assume that there are 365 days in a year and k≤ 365)

Probability Interview Question from Yammer

You can answer the problem here. https://platform.stratascratch.com/technical/2028-where-are-the-birthday-people

Let us break this problem down. All possible cases are easy to find. Each of the k persons’ birthdays can fall on any of the 365 days of the year. Therefore, the number of all possible cases will be:

365 x 365 x 365 x 365 … k times

or

365k365^{k}

Finding the number of favorable cases is a bit complex. We can have two of k people sharing the same birthday, or three of k people having the same birthday and so on till all k people have the same birthday. To resolve this, we can split all possible cases into scenarios in the following manner. We can have:

Scenario
ANone of the k people sharing any birthday
B2 of k people sharing their birthday
C3 of k people sharing their birthday
…..
DAll k of k people sharing their birthday.

If we observe carefully, except for the first scenario (scenario A), all other scenarios constitute a favorable case. Therefore, only the first scenario will be the unfavorable case. Ergo, we can solve this relatively easily by finding the number of unfavorable cases and then subtracting these cases from all possible cases to find the favorable cases.

The unfavorable case is that none of the k people share birthdays. This is the same as assigning k birthdays out of 365 days without repetition. This can be accomplished in the following ways:

365Pk^{365}P_{k}

Therefore, the favorable cases will be:

365k365Pk365^{k} - ^{365}P_{k}

The required probability is, therefore,

365k365Pk365k\frac {365^{k}-^{365}P_{k}}{365^{k}}

Let us finish our session with a problem involving conditional probability.

Two out of three tails

What is the probability of getting exactly three tails when flipping four fair coins simultaneously? It is known that at least two tails show up.

Probability Interview Question from Jane Street

You can answer the problem here https://platform.stratascratch.com/technical/2285-two-out-of-three-tails

While this problem can be solved using the conditional probability relationship, we will reserve it for slightly complex cases. Let us solve this by breaking the problem down. If the disregard the additional information (at least two of the four-coin flips are tails), then we can find the favorable cases and all possible cases easily.

All possible cases are Heads or Tails (two outcomes) for each of the four flips or 24 or 16 scenarios.

Favorable cases are exactly three Tails of the four flips. This can be found simply by choosing which of the four flips come up as heads or 4 possibilities. We can also enumerate the cases easily. TTTH, TTHT, THTT, HTTT.

What makes this problem different and potentially tricky is that it is given that there are at least two tails that show up. This reduces the number of possible cases without affecting the favorable cases which are still only four. To help illustrate this let us list out all possible cases.

ScenarioPossibilities# Cases
4 HeadsHHHH1
3 Heads, 1 TailsHHHT, HHTH, HTHH, THHH4
2 Heads, 2 TailsHHTT, HTHT, THHT HTTH, THTH, TTHH6
1 Heads, 3 TailsTTTH, TTHT, THTT, HTTT4
4 TailsTTTT1

So, we can simply subtract those cases that are not possible viz – no tails come up and those where only one tail shows up. (Italicized above). Let us calculate these.

No tails show up: There is only one possible scenario – All four coins turn up as heads.

Only one tail shows up: This is possible in four ways (the flip side of our favorable case where only one head shows up).

All possible cases are now reduced to eleven (16, less the 5 above) or 11.

The required probability is, therefore: 4 / 11

Conclusion

In this article, we looked at the examples of the type of questions asked in Data Scientist Interviews based on Probability. Problems involving Probability involve math but they can also be solved by breaking the problem down to its basics. As with other skills, one can improve his or her proficiency in this area with persistence and practice. Also, check out our post “30 Probability and Statistics Interview Questions for Data Scientists” to find more such questions that can help you sharpen your skills to ace your data science interview.

Probability Interview Questions For Data Scientists


Become a data expert. Subscribe to our newsletter.