What Skills Do You Need as a Data Scientist?
Categories
What makes you keep your job and advance as a data scientist is the sweet spot between technical and non-technical skills.
Data scientist being one of the hottest jobs in the last few years means there are plenty of job opportunities. With this comes strong competition from other applicants. You can’t control the number of people that apply for a certain job. Data science popularity means you’ll need to have a different strategy. Instead of quantity (being one of three applicants), you should concentrate on quality—your quality as a data science job candidate.
Your quality in these situations is a function of the range of skills you have. Not any skills, of course. There are very specific skills required as a data scientist. If you know what they are, you can assess how good a candidate you are. Based on that, you can apply for certain jobs or decide you need to improve, get more education, or get more experience working on a certain aspect of a certain skill.
The starting point to all that is knowing what data scientist skills are required. When we’re talking about the data scientist, we mean data scientist in the broadest sense. We mean all the jobs with data scientist titles and all other jobs that belong to a data science family but focus only on one particular aspect of data science.
Generally speaking, skills you’ll need as a data scientist can be divided into technical and non-technical (soft) skills.
Data Scientist Technical Skills
The data scientist technical skills relate strictly to the expert knowledge you need for completing your day-to-day tasks. In the strictest sense, those are skills critical to the quality and efficiency of your work. They answer the question of how you do something.
What technical skills do you need as a data scientist? Generally, there are seven of them. Along with data scientist skills, we’ll give you the job interview question types, so you can use them to improve the particular skill:
- Databases and database design
• Coding Questions (both SQL and Python)
• System Design Questions - Working with data
• Coding (both SQL and Python) Questions
• Technical Questions
• Product Questions
• Business Case Questions - Coding
• Coding (both SQL and Python) Questions - Statistical analysis
• Statistics Questions
• Probability Questions
• Modeling Questions - Mathematics
• Probability Questions - Model building
• Modeling Questions - Model validation and deployment
• System Design Questions
We’re going to explain each of these data scientist skills and how they’re used in data science.
1. Databases and Database Design
Given the name data science, one can safely say that data science has something to do with data, right? And if it has something to do with data, it certainly has something to do with databases.
Databases as a collection of structured and organized data are the basis for anyone wanting a career in data science. To design a database means going through steps that will result in creating the logical and physical data model, then implementing them, and, later, maintaining them.
In their work, data scientists are usually required to use databases. Even for that, they need to understand what databases are, which data is there in the databases they use, how the data is organized, and what it looks like. Data scientists need this skill so they can be independent in getting the data they need for the further steps required by their jobs.
Not only that, but data scientists will often need to create their own databases (or, at least, participate in their design). That way, they’ll be not only the database end-users but also its creators.
When we talk about databases, we mean relational databases and tools made for handling them. However, that is usually not enough. Aside from relational databases, your data scientist skills should include working with NoSQL databases too.
Companies are increasingly moving to cloud computing, and data science is moving with them. Because of that being skilled in working with cloud databases and cloud data warehouses is highly in demand.
Tools you need to know to improve and showcase this data scientist skill are:
- Database design tools (e.g., Lucidchart, Vertabelo, Visual Paradigm ERD Tools, Erwin Data Modeler)
- Relational databases (e.g., MS SQL Server, PostgreSQL, MySQL, Oracle, HIVE, Snowflake, etc.)
- NoSQL databases (e.g. MongoDB, Cassandra, CouchBase)
- Cloud databases (e.g., Amazon Web Service, Microsoft Azure, Google Cloud, etc.)
- Cloud data warehouses (e.g., Snowflake or HIVE)
Employers test these skills by question along this line. For example, a data science question by Facebook:
“How would you compare the relative performance of two different backend engines for automated generation of Facebook "Friend" suggestions?”
Or something like the coding question from Salesforce:
“Compare each employee's salary with the average salary of the corresponding department.
Output the department, first name, and salary of employees along with the average salary of that department.”
2. Working With Data
Working with data is the next data scientist skill that builds on databases. You will first have to know how to collect data from databases or other sources, where you can get the data, and in which format. Once you get the data, you’ll probably need to clean it, manipulate it, and adapt it to your needs. Again, knowing databases will be needed because, most probably, you’ll have to save this data in some databases.
Once you get the data you need, you’ll also need to analyze it and, potentially, visualize your findings.
So along with using database tools, it’s very important that your data science skills also include working with these tools:
- Data collection tools (e.g., Xplenty, BrightData, GoSpotCheck, Repsly Mobile CRM, Fulcrum, etc.)
- Data cleaning tools (e.g., IBM Infosphere Quality Stage, Drake, TIBCO Clarity, OpenRefine, Trifacta Wrangler, etc.)
- BI tools (e.g., Tableau, Power BI, Looker, QlikSense, etc.)
Plenty of working with data is tested by coding questions like this one by Yelp on reviews of categories:
“Find the top business categories based on the total number of reviews. Output the category along with the total number of reviews. Order by total reviews in descending order.”
Some more theoretical questions about data are in the Technical category, for example a question by Walmart on data structures in Python:
“What are the data structures in Python?”
Even Product interview questions generally test your feeling regarding data. Here’s an example of a question by Instagram on Ad success metrics:
“What metrics would you use to measure the success of an Instagram ad?”
The Business Case questions also can be used to test how you would work with data. For example, this question by Airbnb on measuring effectiveness:
“How would you measure the effectiveness of our operations team?”
3. Coding
This data scientist skill leans on the previous skills we talked about. Not only that, but it’s also necessary for getting all the other skills we’ll talk about.
You’ll need coding skills if you want to work with databases, get the data from databases, and store data. Even though there are tools for data collecting and cleaning, coding is very helpful in these areas too. But when you get to building models, deploying them, creating software, that’s where you won’t be able to make without coding skills.
It doesn’t mean you need to know everything about every programming language. However, there are some programming languages that are very often used in data science.
- Mandatory programming languages (learn this first!) - SQL:
- SQL - required because it’s the most fitting and the most popular programming language for querying databases
- Very often required (choose one for learning!) - R or Python
- Usually, one of those languages is enough, especially if used in the statistical analysis only; your choice should depend on your needs and whether you need for some broader uses like the ones Python offers
- R - the most popular programming language for statistical analysis and data modeling
- Python - used in statistical analysis, but also in querying databases, cleaning data, analyzing and visualizing data, also used in machine learning
- Optional (but required for some roles!) - Java and C family
- These are not required, except for some specific roles such as Data Engineer, ML Engineer, or Software Engineer because these are the positions that concentrate on data cleaning, machine learning, and deploying into production
- Java/JavaScript - this one’s also used for data cleaning, analysis, and visualization
- C/C++/C# - typically used in machine learning and implementing the machine learning algorithms
Check out our post on Python vs R for Data Science to find which language is better for you.
The coding skills are mainly covered by coding questions such as the one they ask at Google on letter occurrences:
“Find the top 3 most common letters across all the words from both the tables. Output the letter along with the number of occurrences and order records in descending order based on the number of occurrences.”
Or the question asked by Lyft during the interview on distance traveled:
“Find the top 10 users that have traveled the greatest distance. Output their id, name and a total distance traveled.”
You can solve these questions both in PostgreSQL and Python, so this covers two of the most popular programming languages in data science.
4. Statistical Analysis
For data scientists, knowing the statistical analysis is also one of the crucial data scientist skills. Statistical analysis means you have to state your statistical hypothesis, meaning what relationships between variables you want to prove. After that, you need data (see why all the previous data scientist skills are needed?) to validate your hypothesis.
You’ll need to choose the data sample and calculate statistical measures such as median, standard deviation, variance, etc. After that, you’ll test your hypothesis and interpret the results of your statistical analysis.
To perform statistical analysis, you need to work with tools designed for that:
- Statistical analysis tools (e.g., SPSS, R, MS Excel, MatLab, SAS, etc.)
There are, of course, various statistical analysis types you could use as a data scientist. Some main types are:
- Descriptive analysis
- Diagnostic analysis
- Predictive analysis
- Prescriptive analysis
The descriptive analysis uses historical data and describes what happened in the past. It involves techniques such as data aggregation, calculating measures of central tendency (mean, median, mode) and variance, and time-series analysis.
One of the example interview questions regarding the descriptive statistical analysis could be a question by Facebook on expectation of variance:
“What is the expectation of the variance?”
The other three statistical analyses try to give more insight based on data. The diagnostic analysis attempts to answer why something happened. Also, analyzing past trends is the predictive analysis, but this one wants to answer what will happen in the future. Finally, the prescriptive analysis considers all the previous analyses and tries to say what should be done based on what happened, why, and what will happen.
All these three analysis types have some statistical concepts in common, such as regression analysis, probability, machine learning, data modeling, etc.
Example of testing these concepts is a question by IBM on logistic and linear regression:
“When you are doing logistic regression, how do you assess your model? What is the different compared to simple linear regression?”
5. Mathematics
Data science involves having a rather wide set of skills. One of those data scientist skills is mathematics.
For a starter, if you need statistics and you need to calculate statistics measures, you for sure need mathematics. But using mathematics is also necessary for other data science skills necessary for the day-to-day job. For example, data scientist builds machine learning models. Here mathematics is needed, too, so you can understand how machine learning algorithms work. Not only that but mathematics is used for training algorithms to make predictions.
The mathematics fields you can’t survive without in data science are:
- Linear algebra (e.g., matrices, vectors)
- Calculus (e.g., derivatives, differential, integral, etc.)
Being one of the mathematical concepts wanted in data science, you could expect some questions, such as this one by DRW about probability:
“You have 2 envelopes; one with 100% chance of having 5000 dollars and the other with 50% chance of either a 10k dollars or 1000 dollars, which one do you choose?”
6. Model Building
Like all the previous data science skills, the model building also leans on all the previous skills. Here you will decide on a suitable model you want to build. Then you will build it through the implementation of algorithms. After that, you need to choose the learning procedure, then train and train, and evaluate the model.
Along with all previous skills, here you’ll also need ML & AI skills. This means working with the following tools:
- Data science and machine learning platforms (e.g., Jupyter Notebooks, MATLAB, KNIME, MS Azure-learning Studio, IBM Watson Machine Learning, etc.)
The companies will test your model building skills by the questions such as this Square one about credit risk:
“How do you test whether a new credit risk scoring model works? What data would you look at?”
Or maybe a question by Instacart on random forest:
“How would you tune a random forest?”
7. Model Validation and Deployment
As a data scientist, you will also have to validate and deploy the model you built.
The model validation means you have to check how accurately it predicts what will happen. This also includes identifying the model’s limitations and risks of using it.
After the model validation comes the model deployment. It means you need to have skills that will make the predictions of the model you created available to other users, where they can again use these insights the way they want. This usually means you need to develop some kind of software or application that will have a model working in the background. You can also deploy your model in the cloud or create an API or a simple dashboard.
Because of all of that, you’ll need to use the following tools on top of all others:
- Model validation tools (e.g., SAS, Arize, Neptune, Qualdo, Fiddler, Amazon SageMaker Model Monitor, etc.)
- Model deployment tools (e.g., Kubeflow, Amazon SageMaker Model Monitor, SAS, Google AI, Azure Machine Learning, etc.
- API development tools (e.g., Amazon AWS, IBM Cloud, etc.)
- Web application frameworks (e.g., Django, Ruby on Rails)
The System Design Questions in a way cover this skill. For example, a question by Workday using System Design:
“Given a huge collection of books, how would you tag each book based on genre?”
Now that we’ve covered technical skills, it’s time to see what soft data scientist skills you need.
Data Scientist Soft Skills
While technical data scientist skills answer the question of how you do something, soft skills are ones that describe how you think what you do and how you present it.
The soft skills in high demand in data science are:
- Curiosity and desire to learn
- Critical thinking
- Business acumen
- Communication skills
- Teamwork
- Cross-sectionality
1. Curiosity and Desire to Learn
Curiosity can be defined as a constant need to ask the question “why?”. If you’re curious, you’ll detect problems much easier. That means that you don’t accept things as they are; you find holes in explanations why something is (or isn’t) done exactly that way or at all. Being curious will make you become a problem-maker. Well, maybe not a problem maker, but problem uncoverer for sure. And that’s exactly the point of data science.
You need to be curious about things because finding a problem is the first step to solving it. Curiosity goes hand in hand with the desire to learn, which means you’re not satisfied with what you know and what you’ve been taught. You want to grow, learn new things, and by learning new things, you know it’ll be easier for you to find where the improvements are needed in the company you work for.
2. Critical Thinking
Once you find a problem, you engage your critical thinking as one of the highly regarded data scientist skills. Critical thinking makes you able to use logic, apply deductive and inductive reasoning, and avoid biases with the only purpose of solving problems you found through your curiosity.
3. Business Acumen
Your problem-solving is especially great if it solves real, practical problems. Working in business means most of the time, you’ll be solving practical business problems. This means you need to have an understanding of how the industry operates, how your company works, how its different segments are interconnected, what their products are, how they are produced, what their market is, how certain market trends impact the company etc.
4. Communication Skills
Unfortunately or not, you won’t only be sitting behind your monitor and playing with statistics and programming languages. Your data science job is part of a wider picture. What you do is important, and you maybe did a great job. But how you communicate what you accomplished, its importance, and its benefits for other people are extremely important. It will make a huge difference in how your work is accepted and how purposeful were the long hours you put into your models.
That’s why you must be able to communicate in a clear, concise, and simple way. You have to be adaptable to various levels of technical expertise. It’s important that you understand what you do. But also being able to explain what you do in plain language shows whole another level of understanding.
5. Teamwork
Communication skills will also be used while working in teams. It’s important how you communicate problems you come across, changes of scope of your work, deadlines, requirements from other teammates, how you handle their requirements.
If you do that the right way, it will make you comfortable to work with. Make sure it does, because making yourself dependable, open for suggestions, not stress-inducing, and simply a pleasant company makes a huge difference. The thing is, you’ll be working in teams with other data scientists or on projects with people from other departments. Even though you’re technically competent, you’ll get more project opportunities if you contribute to a good working environment.
6. Cross-sectionality
Being an expert in your field is great. It will get you to a certain point in your career. But if you show some knowledge in different areas (curiosity and desire to learn, remember?), it will make you a whole lot valuable team member or expert in general.
Imagine if, besides being great at computer science, you have leadership skills (and education). It significantly increases your opportunities for a job because you can apply for management positions too. Or having a degree (even completely amateurish interest) in human psychology, philosophy, economy, sociology, ethics, medicine, music, literature can help you better connect with other people and increase the number of fields or projects you can participate in. Don’t underestimate that, because people usually offer only this dry technical expertise, which becomes much more interesting and beneficial for everybody if it’s paired with other disciplines.
If you fully understand what data scientists do, you will see how these soft skills fit the job description. Maybe you can find some other soft skills that you think are important. But to show your (soft) skills, you need a job. So maybe one of the “preliminary” soft skills you should have is the ability to understand how the interviewer’s mind works.
Conclusion
Becoming a data scientist requires a special set of skills. And quite a wide range of it. That is the nature of data science, being at the crossroads of various disciplines. However, all these data scientist skills can be put into two categories: technical and non-technical skills.
Both categories work together in making you a good data scientist. A high level of technical skills is, of course, very important. That’s most extensively tested in the interviews. Maybe having all the most in-demand technical skills alone will get you a job, but what makes you keep your job and advance (both in terms of knowledge and hierarchy) is this sweet spot between technical and non-technical skills.
Technical skills impact how you do your data science job and include:
- Databases and database design
- Working with data
- Coding
- Statistical analysis
- Mathematics
- Model building
- Model validation and deployment
Non-technical or soft skills will improve how you think and how you present what you think. They are:
- Curiosity and desire to learn
- Critical thinking
- Business acumen
- Communication skills
- Teamwork
- Cross-sectionality
Being solid and balanced in both skill types is what will make you stand out.