What employers look for in a portfolio
- Last updated on July 7, 2023 at 11:56 PM
In this post, we’ll focus on your portfolio at a high level. We’ll discuss:
- What employers look for and skills employers want to see a candidate demonstrate
- How to demonstrate and showcase skills effectively
- Examples of what each project in your portfolio should look like
After reading this post, you should feel confident about how to move forward and be ready to start your first project.
What Employers Look For
When employers hire, they’re looking for someone who can add value to their business. Often, this means someone who has skills that can generate revenue and opportunities for the business. As a data professional, you add value to a business in one of 4 main ways:
- Extracting insights from raw data, and presenting those insights to others.
- An example would be analyzing ad click rates, and discovering that it’s much more cost-effective to advertise to people who are 18 to 21 than to people who are 21 to 25 — this adds business value by allowing the business to optimize its ad spend.
- Building systems that offer direct value to the customer.
- An example would be a data scientist at Facebook optimizing the news feed to show better results to users — this generates direct revenue for Facebook because more news feed engagement means more ad engagement.
- Building systems that offer direct value to others in the organization.
- An example would be building a script that automatically aggregates data from 3 databases and generates a clean dataset for others to analyze — this adds value by making it faster for others to do their work.
- Sharing your expertise with others in the organization.
- An example is chatting with a product manager about how to build a feature that requires machine learning algorithms — this adds value by preventing unrealistic timelines, or a semi-functional product.
Unsurprisingly, when employers look at candidates to hire, they look at people who can do one or more of the four things above (the exact ones they look at depend on the company and role). In order to demonstrate that you can aid a business in the 4 areas listed above, you need to demonstrate some combination of these skills:
- Ability to communicate
- Ability to collaborate with others
- Technical competence
- Ability to reason about data
- Motivation and ability to take initiative
A well-rounded portfolio should show off your skills in each of the above areas, and be relatively easy for someone to scan — each portfolio item should be well documented and explained, so a hiring manager is able to quickly evaluate your portfolio.
What to Put in Your Data Science Portfolio
We’ll walk through a few types of projects that should be in your portfolio. Ultimately, you'll need these projects to highlight the skills needed for the role you are seeking. If you’re applying to positions that require a lot of machine learning, building more end-to-end projects that use machine learning could be useful. On the other hand, if you’re applying for analyst positions, data cleaning and storytelling projects are more critical.
Remember, quality of quantity. A few in-depth projects will beat out a long list of shallow ones.
Project 1: Data Cleaning
A data cleaning project shows a hiring manager that you can take disparate datasets and make sense of them. This is most of the work a data professional does and is a critical skill to demonstrate. This project involves taking messy data, then cleaning it up and doing analysis. A data cleaning project demonstrates that you can reason about data, and can take data from many sources and consolidate it into a single dataset. Data cleaning is a huge part of any data job, and showing that you’ve done it before will be a leg up.
You’ll want to go from raw data to a version that’s easy to do analysis with. In order to do this, you’ll need to:
- Find a messy dataset
- Try using data.gov, /r/datasets, or Kaggle Datasets to find something.
- Avoid picking anything that is already clean — you want there to be multiple data files and some nuance to the data.
- Find any supplemental datasets if you can — for example, if you downloaded a dataset on flights, are there any datasets you can find via Google that you can combine with it?
- Try to pick something that interests you personally — you’ll produce a much better final project if you do
- Pick a question to answer using the data
- Explore the data
- Identify an interesting angle to explore
- Clean up the data
- Unify multiple data files if you have them
- Ensure that exploring the angle you want to is possible with the data
- Do some basic analysis
- Try to answer the question you picked initially
- Present your results
If you need some inspiration, here are some examples of good data cleaning projects:
Project 2: Data Storytelling Project
A data storytelling project demonstrates your ability to extract insights from data and persuade others. This has a large impact on the business value you can deliver and is an important piece of your portfolio. This project involves taking a set of data and telling a compelling narrative with it. For example, you could use data on flights to show that there are significant delays at certain airports, which could be fixed by changing the routing.
A good storytelling project will make heavy use of visualizations and will take the reader on a path that lets them see each step of the analysis. Here are the steps you’ll need to follow to build a good data storytelling project:
- Find an interesting dataset
- Pick something that is related to the field you are most interested in working in
- Explore a few angles in the data
- Explore the data
- Identify interesting correlations in the data
- Create charts and display your findings step-by-step
- Write up a compelling narrative
- Pick the most interesting angle from your explorations
- Write up a story around getting from the raw data to the findings you made
- Create compelling charts that enhance the story
- Write extensive explanations about what you were thinking at each step, and about what the code is doing
- Write an analysis of the results of each step, and what they tell a reader
- Teach the reader something as you go through the analysis
- Present your results
If you need some inspiration, here are some examples of good data storytelling posts:
Project 3: End-to-end Project
So far, we’ve covered projects that involve exploratory data cleaning and analysis. This helps a hiring manager who’s concerned with how well you can extract insights and present them to others. However, it doesn’t show that you’re capable of building systems that are customer-facing. Customer-facing systems involve high-performance code that can be run multiple times with different pieces of data to generate different outputs. An example is a system that predicts the stock market — it will download new market data every morning, then predict which stocks will do well during the day.
In order to show we can build operational systems, we’ll need to build an end-to-end project. An end-to-end project takes in and processes data, then generates some output. Often, this is the result of a machine learning algorithm, but it can also be another output, like the total number of rows matching certain criteria.
The key here is to make the system flexible enough to work with new data (like in our stock market data), and high performance. It’s also important to make the code easy to set up and run. Here are the steps you’ll need to follow to build a good end to end project:
- Find an interesting topic
- We won’t be working with a single static dataset, so you’ll need to find a topic instead
- The topic should have publicly-accessible data that is updated regularly
- Some examples:
- Import and parse multiple datasets
- Download as much available data as you’re comfortable working with
- Read in the data
- Figure out what you want to predict
- Create predictions
- Calculate any needed features
- Assemble training and test data
- Make predictions
- Clean up and document your code
- Split your code into multiple files
- Add a README file to explain how to install and run the project
- Add inline documentation
- Make the code easy to run from the command line
- Upload your project to Github
If you need some inspiration, here are some examples of good end-to-end projects:
Project 4: Explanatory Post
It’s important to be able to understand and explain complex data science concepts, such as machine learning algorithms. This helps a hiring manager understand how good you’d be at communicating complex concepts to other team members and customers. This is a critical piece of a data science portfolio, as it covers a good portion of real-world data science work. This also shows that you understand concepts and how things work at a deep level, not just at a syntax level. This deep understanding is important in being able to justify your choices and walk others through your work.
In order to build an explanatory post, we’ll need to pick a data science topic to explain, then write up a blog post taking someone from the very ground level all the way up to having a working example of the concept. The key here is to use plain, simple, language — the more academic you get, the harder it is for a hiring manager to tell if you actually understand the concept.
The important steps are to pick a topic you understand well, walk a reader through the concept, then do something interesting with the final concept. Here are the steps you’ll need to follow:
- Find a concept you know well or can learn
- Machine learning algorithms like k-nearest neighbors are good concepts to pick.
- Statistical concepts are also good to pick.
- Make sure that the concept has enough nuance to spend some time explaining.
- Make sure you fully understand the concept, and it’s not too complex to explain well.
- Pick a dataset or “scaffold” to help you explain the concept.
- For instance, if you pick k-nearest neighbors, you could explain k-nearest neighbors by using NBA data (finding similar players).
- Create an outline of your post
- Assume that the reader has no knowledge of the topic you’re explaining
- Break the concept into small steps
- For k-nearest neighbors, this might be:
- Predicting using similarity
- Measures of similarity
- Euclidean distance
- Finding a match using k=1
- Finding a match with k > 1
- For k-nearest neighbors, this might be:
- Write up your post
- Explain everything in clear and straightforward language
- Make sure to tie everything back to the “scaffold” you picked when possible
- Try having someone non-technical reading it, and gauge their reaction
- Share your post
- Preferably post on your own blog
- If not, upload to Github
If you need some inspiration, here are some examples of good explanatory blog posts:
More in this series: