Microsoft Data Analyst Interview Questions
Categories
The article focused exclusively on SQL perspective to solve Microsoft data analyst interview questions as this is the language most widely used among companies and the language you will need to know in order to significantly improve your chances of landing a job as a data scientist or data analyst.
When it comes to getting an interview and landing a perfect job somewhere in tech, only a few places might come to our minds before Microsoft pops up as one of the industry leaders. If we are talking strictly about data scientist or data analyst jobs, then Microsoft has even greater importance due to its activities and services provided. With the emergence of Azure as the go-to cloud computing service and their increased activities in the fields of Machine Learning and AI, Microsoft is slowly exerting its dominance in the field of Data Science.
For the better part of the last decade, Microsoft has been expanding its data science or data analysis teams and they are constantly hiring for new roles to support their growing operations in this field. With that in mind, we believe it’s important to understand what types of data analyst interview questions are being asked in Microsoft interviews for their open roles in data science. In this article, we have gathered 6 real-world data analyst interview questions that have been asked at Microsoft interviews in 2021 that will help you prepare for landing that perfect job in the future. We will analyze the questions and guide you towards solutions and we will cover all the technical concepts being tested to help you prepare for countless other questions with similar requirements.
Technical Concepts Tested in Microsoft Data Analyst Interview Questions
Here we will generally cover all the technical concepts being tested in the questions discussed, and we will talk about the nature of questions and how they relate to the technical concepts themselves.
The table below shows the concepts being tested in Microsoft data analyst interview questions we are covering today:
Sub-queries | Min/Max/Average |
Joins | Count |
Grouping by / Ordering by | Distinct |
Case/Else | Rank |
As we can see, all technical concepts being covered are used to manipulate data and get insights based on certain criteria; these insights might have certain related data that also needs to be produced as a result. For example, you can be asked to calculate the share of users based on certain criteria and output certain data points as a result; to do this, you will need to know technical concepts such as sub-queries, joins, group by, count, min/max etc.
Questions asked will have other nature as well and they might relate to different technical concepts. You will be asked to get unique users based on certain criteria. Furthermore, you will be tasked with finding top/bottom values, ordering the results and producing the output based on certain criteria. Finally, you would need to output the best performing category based on certain criteria from other columns to complete some of the tasks. Now let’s get into specific data analyst interview questions that have been asked at Microsoft interviews and examine how we can approach and implement the solution.
Microsoft Interview Question #1: New and Existing Users
Interview Question Date: March 2021
Calculate the share of new and existing users for each month in the table. Output the month, share of new users, and share of existing users as a ratio. New users are defined as users who started using services in the current month (there is no usage history in previous months). Existing users are users who used services in current month, but they also used services in any previous month. Assume that the dates are all from the year 2020. HINT: Users are contained in user_id column
Question Link: https://platform.stratascratch.com/coding/2028-new-and-existing-users
We are being asked to calculate the share of new and existing users and we need to output the month, share of new users and share of existing users as a ratio. We know that new users are defined as users who started using services in the current month. Consequently, existing users are users who started using services in the current month and used services in any previous month. This question is considered to be of hard difficulty.
Here is a sample of what the data looks like:
id | time_id | user_id | customer_id | client_id | event_type | event_id |
---|---|---|---|---|---|---|
1 | 2020-02-28 | 3668-QPYBK | Sendit | desktop | message sent | 3 |
2 | 2020-02-28 | 7892-POOKP | Connectix | mobile | file received | 2 |
3 | 2020-04-03 | 9763-GRSKD | Zoomit | desktop | video call received | 7 |
4 | 2020-04-02 | 9763-GRSKD | Connectix | desktop | video call received | 7 |
5 | 2020-02-06 | 9237-HQITU | Sendit | desktop | video call received | 7 |
As we can see, there are 7 columns that we can use to manipulate the data and produce the expected output. The columns themselves are id, time_id, user_id, customer_id, client_id, event_type and event_id.
Here is the solution:
SELECT s1.month,
new_users::decimal/all_users share_of_new_users,
1-new_users::decimal/all_users share_of_all_user
FROM
(SELECT date_part('month', time_id::date) AS month,
count(DISTINCT user_id) as all_users
FROM fact_events
GROUP BY date_part('month', time_id::date)) s1
JOIN
(SELECT date_part('month', min_time::date) AS month,
count(DISTINCT user_id) as new_users
FROM
(SELECT user_id,
min(time_id) as min_time
FROM fact_events
GROUP BY user_id) sq
GROUP BY date_part('month', min_time::date)) s2 ON s1.month = s2.month
If this is what you got, congratulations! If not, there is no reason to worry as the question is considered to be hard and this is only one of the ways you can get to the solution.
As you can see from the code, there is a lot of areas you will need to know in order to answer this question. A lot of the technical concepts we previously covered are used here; for example, you will need to know sub-queries, grouping by, joins, count and minimum values functions in order to answer this Microsoft data analyst interview question adequately. The reason this question gets asked is to check whether you can do a query that pulls only certain data from certain columns (based on the tasked criteria) and then show the solution through new variables that conform to the task at hand (see output criteria in the question itself). The part where most people get confused is usually not in knowing these concepts, but being able to organize them in a logical flow and remembering all the steps while being in a high-pressure situation such as an interview. However, with enough practice in similar situations, you will be able to master this skill and excel in your future interview.
More Microsoft Data Analyst Interview Questions
We will examine a few more questions that have appeared on recent Microsoft interviews for data science or data analysis roles. However, we will let you try to implement the solution on your own while we give you some guidance on how best to approach the task at hand.
Microsoft Interview Question #2: Unique Users per Client per Month
Interview Question Date: March 2021
Write a query that returns the number of unique users per client per month
Question Link: https://platform.stratascratch.com/coding/2024-unique-users-per-client-per-month
Dataset:
id | time_id | user_id | customer_id | client_id | event_type | event_id |
---|---|---|---|---|---|---|
1 | 2020-02-28 | 3668-QPYBK | Sendit | desktop | message sent | 3 |
2 | 2020-02-28 | 7892-POOKP | Connectix | mobile | file received | 2 |
3 | 2020-04-03 | 9763-GRSKD | Zoomit | desktop | video call received | 7 |
4 | 2020-04-02 | 9763-GRSKD | Connectix | desktop | video call received | 7 |
5 | 2020-02-06 | 9237-HQITU | Sendit | desktop | video call received | 7 |
What you are asked to do here is to write a query that returns the number of unique users per client per month. The difficulty level for this question is easy. The reason this question is asked on interviews is to determine whether you can categorize date/time data types to perform mathematical operations and pull only certain data that conforms to criteria from the date/time data type. Technical concepts being tested here are sub-queries, grouping by, data extraction, distinct function and data counting.
Approach:
The steps you can follow to solve the problem:
- Start a query that gives client id, extracts the data per month from time_id column, and counts distinct user id values as outputs;
- Group the output by client id and by the extracted monthly data. For the extracted monthly data, you can use the same criteria as at the start of the query.
Microsoft Interview Question #3: Users Exclusive per Client
Interview Question Date: March 2021
Considering a dataset that tracks user interactions with different clients, identify which clients have users who are exclusively loyal to them (i.e., they don't interact with any other clients).
For each of these clients, calculate the number of such exclusive users. The output should include the client_id and the corresponding count of exclusive users.
Question Link: https://platform.stratascratch.com/coding/2025-users-exclusive-per-client
Dataset:
id | time_id | user_id | customer_id | client_id | event_type | event_id |
---|---|---|---|---|---|---|
1 | 2020-02-28 | 3668-QPYBK | Sendit | desktop | message sent | 3 |
2 | 2020-02-28 | 7892-POOKP | Connectix | mobile | file received | 2 |
3 | 2020-04-03 | 9763-GRSKD | Zoomit | desktop | video call received | 7 |
4 | 2020-04-02 | 9763-GRSKD | Connectix | desktop | video call received | 7 |
5 | 2020-02-06 | 9237-HQITU | Sendit | desktop | video call received | 7 |
The task at hand is to write a query that returns a list of users who are exclusive to one client. You need to output client_id and the number of exclusive users. This question is considered to be of medium difficulty. What the interviewers are trying to understand here is whether you can search for unique values from a certain column and arrange them according to the task. The technical concepts being tested here are sub-queries, grouping by, distinct function and data counting.
Approach:
Here is one of the ways you can approach the solution:
- Start a query that gives client id and counts distinct user id values as outputs;
- Start a sub-query for user id column that is grouped by user id and has to count distinct client id values as outputs;
- Group the query by client id.
Microsoft Interview Question #4: Bottom 2 Companies by Mobile Usage in July
Interview Question Date: March 2021
Write a query that returns a list of the bottom 2 companies by mobile usage. Company is defined in the customer_id column. Mobile usage is defined as the number of events registered on a client_id == 'mobile'. Order the result by the number of events ascending. In the case where there are multiple companies tied for the bottom ranks (rank 1 or 2), return all the companies. Output the customer_id and number of events.
Question link: https://platform.stratascratch.com/coding/2026-bottom-2-companies-by-mobile-usage
Dataset:
id | time_id | user_id | customer_id | client_id | event_type | event_id |
---|---|---|---|---|---|---|
1 | 2020-02-28 | 3668-QPYBK | Sendit | desktop | message sent | 3 |
2 | 2020-02-28 | 7892-POOKP | Connectix | mobile | file received | 2 |
3 | 2020-04-03 | 9763-GRSKD | Zoomit | desktop | video call received | 7 |
4 | 2020-04-02 | 9763-GRSKD | Connectix | desktop | video call received | 7 |
5 | 2020-02-06 | 9237-HQITU | Sendit | desktop | video call received | 7 |
What we are being asked to do here is to write a query that returns a list of the bottom two companies by mobile usage. We know from the task that mobile usage is defined as the number of events registered on a mobile client_id. We need to order the result by the number of events ascending. If there are multiple companies tied for rank 1 or 2, we need to return all of them. The difficulty level for this data analyst interview question is medium.
The reason this question gets asked on interviews is to determine whether you can find the lowest values, order the results in a certain way and produce the appropriate output. Several technical concepts are being tested here as well, such as sub-queries, grouping by/ordering by, ranking and data counting.
Approach:
The steps you can follow to solve the problem:
- Start a query that returns customer id and a variable that will register mobile usage (as defined in the question;
- Start a sub-query that gives 3 outputs: customer id, counts the number of times mobile usage has been registered (from your identified variable in the original query) and consecutively ranks and orders previously counted values (you can identify a new variable here to make it easier).
- Add a criterion to your sub-query so that it only pulls mobile values from the client id column.
- Add a criterion to your query that the ranked and counted values (from your sub-query) should be less than or equal to 2.
- Order your mobile usage variable (from the original query) in ascending order.
Microsoft Interview Question #5: Top Company Where Users Use Desktop Only
Interview Question Date: March 2021
Write a query that returns the company (customer id column) with highest number of users that use desktop only.
Question Link: https://platform.stratascratch.com/coding/2027-top-company-where-users-use-desktop-only
Dataset:
id | time_id | user_id | customer_id | client_id | event_type | event_id |
---|---|---|---|---|---|---|
1 | 2020-02-28 | 3668-QPYBK | Sendit | desktop | message sent | 3 |
2 | 2020-02-28 | 7892-POOKP | Connectix | mobile | file received | 2 |
3 | 2020-04-03 | 9763-GRSKD | Zoomit | desktop | video call received | 7 |
4 | 2020-04-02 | 9763-GRSKD | Connectix | desktop | video call received | 7 |
5 | 2020-02-06 | 9237-HQITU | Sendit | desktop | video call received | 7 |
What we need to do here is to write a query that returns the top company in terms of events where users use desktop only. The difficulty level for this question is also medium and the same dataset is being used as with previous questions. Interviewers ask this question to find out if the candidate can find the highest value that conforms to one (or several) criteria from multiple columns in the dataset. The technical concepts candidates should know in order to answer this question adequately are sub-queries, grouping by/ordering by, distinct function, ranking and data counting.
Approach:
Here is a way you can get to the solution:
- Start a query that returns customer_id as output;
- Start a sub-query that gives two outputs: customer id and ranking and ordering of counted distinct user_id values in the descending order;
- Start another sub-query (part of your first sub-query) that gives user id as output and has a criterion that it only pulls desktop values from the client id column;
- Add additional criteria in your sub-query to group the output by user id and to have counted distinct client id values that are equal to 1;
- Group your first sub-query by customer_id;
- Set a final criterion to your original query that the rank should be equal to 1.
Microsoft Interview Question #6: The Most Popular Client ID Among Users Using Video and Voice Calls
Interview Question Date: March 2021
Select the most popular client_id based on a count of the number of users who have at least 50% of their events from the following list: 'video call received', 'video call sent', 'voice call received', 'voice call sent'.
Question Link: https://platform.stratascratch.com/coding/2029-the-most-popular-client_id-among-users-using-video-and-voice-calls
Dataset:
id | time_id | user_id | customer_id | client_id | event_type | event_id |
---|---|---|---|---|---|---|
1 | 2020-02-28 | 3668-QPYBK | Sendit | desktop | message sent | 3 |
2 | 2020-02-28 | 7892-POOKP | Connectix | mobile | file received | 2 |
3 | 2020-04-03 | 9763-GRSKD | Zoomit | desktop | video call received | 7 |
4 | 2020-04-02 | 9763-GRSKD | Connectix | desktop | video call received | 7 |
5 | 2020-02-06 | 9237-HQITU | Sendit | desktop | video call received | 7 |
The task to be completed is to select the most popular client_id based on a count of the number of users who have at least 50% of their events from the following list: ‘video call received’, ‘video call sent’, ‘voice call received’, ‘voice call sent’. The difficulty level for this question is hard, so not to worry if the task seems too challenging at first.
The reason this question gets asked is to find out whether you can identify the best performers in your data based on several criteria from your dataset. Technical concepts that you would need to complete this task are sub-queries, grouping by/ ordering by, case/else function, data counting and data averaging.
Approach:
Here is how you can try to solve the problem:
- Start a query that returns client_id as output;
- Start a sub-query in order to manipulate user id data in the original query; this sub-query should return user_id as output;
- In your sub-query, group the data by user id and prepare the case/else statement that pulls average values from it;
- In the case/else statement, specify that if the event_type data registers as one of the above-mentioned values (‘video call received’, ‘video call sent’, ‘voice call received’, ‘voice call sent’), it should return a value of 1 and otherwise it should return a value of 0;
- Make sure that your sub-queried case/else data only pulls values equal or greater than 0.5 since we are looking for at least 50% usage from users;
- Group your query by client_id;
- Order your query by count in the descending order;
- Limit the result to 1 (only outputs the top performer).
Conclusion
Hopefully, this article has been helpful in showing how the real interview at Microsoft might look like and that you managed to resolve some (if not all) of the interview questions we presented here. If you were struggling with the question, feel free to use our approach section to help you identify the requirements and the workflow of your task. The article focused exclusively on solutions from a SQL perspective as this is the language most widely used among companies (even if it is at a limited capacity for some of them) and the language you will need to know in order to significantly improve your chances of landing a data science job.
These have been only some of the real-world interview questions that Microsoft is asking when they conduct interviews for positions in the field of data analytics. To find more, and to explore the solutions for more questions, visit Microsoft Data Scientist Interview Questions and Microsoft SQL Interview Questions.