What employers look for in a portfolio

Question

Accepted Answer

In this post, we’ll focus on your portfolio at a high level. We’ll discuss:
What employers look for and skills employers want to see a candidate demonstrate
How to demonstrate and showcase skills effectively
Examples of what each project in your portfolio should look like
After reading this post, you should feel confident about how to move forward and be ready to start your first project.
What Employers Look For
When employers hire, they’re looking for someone who can add value to their business. Often, this means someone who has skills that can generate revenue and opportunities for the business. As a data professional, you add value to a business in one of 4 main ways:
Extracting insights from raw data, and presenting those insights to others.An example would be analyzing ad click rates, and discovering that it’s much more cost-effective to advertise to people who are 18 to 21 than to people who are 21 to 25 — this adds business value by allowing the business to optimize its ad spend.
Building systems that offer direct value to the customer.An example would be a data scientist at Facebook optimizing the news feed to show better results to users — this generates direct revenue for Facebook because more news feed engagement means more ad engagement.
Building systems that offer direct value to others in the organization.An example would be building a script that automatically aggregates data from 3 databases and generates a clean dataset for others to analyze — this adds value by making it faster for others to do their work.
Sharing your expertise with others in the organization.An example is chatting with a product manager about how to build a feature that requires machine learning algorithms — this adds value by preventing unrealistic timelines, or a semi-functional product.
Unsurprisingly, when employers look at candidates to hire, they look at people who can do one or more of the four things above (the exact ones they look at depend on the company and role). In order to demonstrate that you can aid a business in the 4 areas listed above, you need to demonstrate some combination of these skills:
Ability to communicate
Ability to collaborate with others
Technical competence
Ability to reason about data
Motivation and ability to take initiative
A well-rounded portfolio should show off your skills in each of the above areas, and be relatively easy for someone to scan — each portfolio item should be well documented and explained, so a hiring manager is able to quickly evaluate your portfolio.

What to Put in Your Data Science Portfolio
We’ll walk through a few types of projects that should be in your portfolio.  Ultimately, you'll need these projects to highlight the skills needed for the role you are seeking. If you’re applying to positions that require a lot of machine learning, building more end-to-end projects that use machine learning could be useful. On the other hand, if you’re applying for analyst positions, data cleaning and storytelling projects are more critical.
Remember, quality of quantity. A few in-depth projects will beat out a long list of shallow ones.

Project 1: Data Cleaning
A data cleaning project shows a hiring manager that you can take disparate datasets and make sense of them. This is most of the work a data professional does and is a critical skill to demonstrate. This project involves taking messy data, then cleaning it up and doing analysis. A data cleaning project demonstrates that you can reason about data, and can take data from many sources and consolidate it into a single dataset. Data cleaning is a huge part of any data job, and showing that you’ve done it before will be a leg up.
You’ll want to go from raw data to a version that’s easy to do analysis with. In order to do this, you’ll need to:
Find a messy datasetTry using data.gov, /r/datasets, or Kaggle Datasets to find something.
Avoid picking anything that is already clean — you want there to be multiple data files and some nuance to the data.
Find any supplemental datasets if you can — for example, if you downloaded a dataset on flights, are there any datasets you can find via Google that you can combine with it?
Try to pick something that interests you personally — you’ll produce a much better final project if you do
Pick a question to answer using the dataExplore the data
Identify an interesting angle to explore
Clean up the dataUnify multiple data files if you have them
Ensure that exploring the angle you want to is possible with the data
Do some basic analysisTry to answer the question you picked initially
Present your results
If you need some inspiration, here are some examples of good data cleaning projects:
Analyzing Twitter data
Cleaning Airbnb data

Project 2: Data Storytelling Project
A data storytelling project demonstrates your ability to extract insights from data and persuade others. This has a large impact on the business value you can deliver and is an important piece of your portfolio. This project involves taking a set of data and telling a compelling narrative with it. For example, you could use data on flights to show that there are significant delays at certain airports, which could be fixed by changing the routing.
A good storytelling project will make heavy use of visualizations and will take the reader on a path that lets them see each step of the analysis. Here are the steps you’ll need to follow to build a good data storytelling project:
Find an interesting datasetPick something that is related to the field you are most interested in working in
Explore a few angles in the dataExplore the data
Identify interesting correlations in the data
Create charts and display your findings step-by-step
Write up a compelling narrativePick the most interesting angle from your explorations
Write up a story around getting from the raw data to the findings you made
Create compelling charts that enhance the story
Write extensive explanations about what you were thinking at each step, and about what the code is doing
Write an analysis of the results of each step, and what they tell a reader
Teach the reader something as you go through the analysis
Present your results
If you need some inspiration, here are some examples of good data storytelling posts:
Analyzing NYC taxi and Uber data
Tracking NBA player movements

Project 3: End-to-end Project
So far, we’ve covered projects that involve exploratory data cleaning and analysis. This helps a hiring manager who’s concerned with how well you can extract insights and present them to others. However, it doesn’t show that you’re capable of building systems that are customer-facing. Customer-facing systems involve high-performance code that can be run multiple times with different pieces of data to generate different outputs. An example is a system that predicts the stock market — it will download new market data every morning, then predict which stocks will do well during the day.
In order to show we can build operational systems, we’ll need to build an end-to-end project. An end-to-end project takes in and processes data, then generates some output. Often, this is the result of a machine learning algorithm, but it can also be another output, like the total number of rows matching certain criteria.
The key here is to make the system flexible enough to work with new data (like in our stock market data), and high performance. It’s also important to make the code easy to set up and run. Here are the steps you’ll need to follow to build a good end to end project:
Find an interesting topicWe won’t be working with a single static dataset, so you’ll need to find a topic instead
The topic should have publicly-accessible data that is updated regularly
Some examples:The weather
Electricity pricing
Import and parse multiple datasetsDownload as much available data as you’re comfortable working with
Read in the data
Figure out what you want to predict
Create predictionsCalculate any needed features
Assemble training and test data
Make predictions
Clean up and document your codeSplit your code into multiple files
Add a README file to explain how to install and run the project
Add inline documentation
Make the code easy to run from the command line
Upload your project to Github
If you need some inspiration, here are some examples of good end-to-end projects:
Stock price prediction
Automatic music generation

Project 4: Explanatory Post
It’s important to be able to understand and explain complex data science concepts, such as machine learning algorithms. This helps a hiring manager understand how good you’d be at communicating complex concepts to other team members and customers. This is a critical piece of a data science portfolio, as it covers a good portion of real-world data science work. This also shows that you understand concepts and how things work at a deep level, not just at a syntax level. This deep understanding is important in being able to justify your choices and walk others through your work.
In order to build an explanatory post, we’ll need to pick a data science topic to explain, then write up a blog post taking someone from the very ground level all the way up to having a working example of the concept. The key here is to use plain, simple, language — the more academic you get, the harder it is for a hiring manager to tell if you actually understand the concept.
The important steps are to pick a topic you understand well, walk a reader through the concept, then do something interesting with the final concept. Here are the steps you’ll need to follow:
Find a concept you know well or can learnMachine learning algorithms like k-nearest neighbors are good concepts to pick.
Statistical concepts are also good to pick.
Make sure that the concept has enough nuance to spend some time explaining.
Make sure you fully understand the concept, and it’s not too complex to explain well.
Pick a dataset or “scaffold” to help you explain the concept.For instance, if you pick k-nearest neighbors, you could explain k-nearest neighbors by using NBA data (finding similar players).
Create an outline of your postAssume that the reader has no knowledge of the topic you’re explaining
Break the concept into small stepsFor k-nearest neighbors, this might be:Predicting using similarity
Measures of similarity
Euclidean distance
Finding a match using k=1
Finding a match with k > 1
Write up your postExplain everything in clear and straightforward language
Make sure to tie everything back to the “scaffold” you picked when possible
Try having someone non-technical reading it, and gauge their reaction
Share your postPreferably post on your own blog
If not, upload to Github
If you need some inspiration, here are some examples of good explanatory blog posts:
Linear regression
Natural language processing
More in this series:
Building a data science portfolio
What employers look for in a portfolio
Creating your first project
Project walkthrough – Stock Prediction
Publishing your project on GitHub
Presenting and marketing your portfolio