
Series and DataFrame Methods You Need to Know
- Last updated on February 7, 2025 at 10:55 PM
If you ever decide to work with a dataset that doesn't come with an instruction manual, you’re going to have questions. What does the data look like? Is it messy? What data is missing? Is it mostly numbers or text, and what kinds of statistics can we extract from it? Thankfully, pandas can help answer these questions in seconds with just a few commands. Think of it as your data detective toolkit, helping you make sense of everything before getting into deeper analysis.
Loading the Dataset
We’ll be working with data from the 2017 Fortune Global 500 list, which ranks the world’s largest companies by revenue. Once you’ve downloaded the dataset, here’s how to load it into pandas:
import pandas as pd
f500 = pd.read_csv("f500.csv", index_col=0)
f500.index.name = None
This dataset includes columns like:
- rank: Global rank of the company
- revenues: Revenue for the year (in millions)
- industry: Industry in which the company operates
- ceo: Name of the CEO
Now that the data is loaded, let’s look at the methods that will help us explore and get to know it.
Exploring DataFrames: Your First Look at the Data
While some of these methods might also work on Series objects, this section focuses on DataFrame methods to help you get an overview of your entire dataset. We'll take a look at Series methods after going over these useful DataFrame techniques.
1. .head()
: Take a Peek
The first step is often just seeing what the data looks like. The DataFrame.head()
method returns the first few rows (5 by default), giving you an immediate sense of what’s inside. If you want to see more (or fewer) rows, just pass a number as an argument to the method call, like f500.head(10)
or f500.head(3)
.
f500.head()
Output:
rank revenues revenue_change profits ...
Walmart 1 485873 0.8 13643.0 ...
State Grid 2 315199 -4.4 9571.3 ...
Sinopec Group 3 267518 -9.1 1257.9 ...
China National 4 262573 -12.3 1867.5 ...
Toyota Motor 5 254694 7.7 16899.3 ...
This is like peeking into a box of chocolates to see what flavors you’ve got before you start ripping into them. If you notice anything odd, you can investigate further.
2. .info()
: Structure and Data Health
What’s the overall structure of the data? Which columns have missing values? What data types are you working with? The DataFrame.info()
method is your one-stop shop for this information.
f500.info()
Output:
<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to LG Electronics
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 rank 500 non-null int64
1 revenues 500 non-null int64
2 revenue_change 498 non-null float64
3 profits 499 non-null float64
...
This quick overview helps you spot potential problems, like missing values or incorrect data types (e.g., a numeric column mistakenly stored as text).
3. .describe()
: Numeric Summaries
The DataFrame.describe()
method summarizes numeric columns by default, providing statistics like mean, min, max, and standard deviation.
f500.describe()
Output:
rank revenues revenue_change profits ...
count 500.000000 500.000000 498.000000 499.000000 ...
mean 250.500000 55416.35800 4.538353 3055.2032 ...
std 144.481833 45725.478963 28.549067 5171.981071 ...
min 1.000000 21609.000000 -67.300000 -13038.0000 ...
25% 125.750000 29003.000000 -5.900000 556.950000 ...
50% 250.500000 40236.000000 0.550000 1761.60000 ...
75% 375.250000 63926.750000 6.975000 3954.00000 ...
max 500.000000 485873.00000 442.300000 45687.00000 ...
This is incredibly useful for identifying outliers or understanding the general scale of your data. For example, if the standard deviation for revenues
is very high, it suggests that company sizes vary widely.
Exploring Series: Zooming In on Columns
Sometimes you don’t need to explore the whole dataset—you just want to focus on a single column. That’s where Series exploration comes in handy.
4. .describe()
for Object Columns
Did you know that .describe()
can summarize text-based (or object) columns as well? It provides stats like the number of unique values, the most frequently occurring value (top
), and its count (freq
).
f500["industry"].describe()
Output:
count 500
unique 58
top Banks: Commercial and Savings
freq 51
Name: industry, dtype: object
In this example, you can see that the most common industry among the top 500 companies is Banking, with 51 companies in that category. This quick summary gives you a sense of the distribution without needing to write complex queries.
Example: Exploring the country
Column
Let’s say you want to explore which countries have the most companies on the list:
f500["country"].value_counts().head()
Output:
USA 132
China 73
Japan 52
Germany 30
France 29
Name: country, dtype: int64
As expected, the United States tops the list, followed by China and Japan. This insight could prompt further investigation into regional patterns in company revenue or profits.
Why pandas Is So Fast
Here’s a fun fact: pandas is built on top of NumPy, which is designed to handle large datasets efficiently. This means pandas can apply vectorized operations—performing computations on entire Series or DataFrames without needing loops.
For example, if we wanted to calculate the change in company ranks:
f500["previous_rank"] - f500["rank"]
This subtraction happens all at once, thanks to vectorized operations. This is a big reason why pandas is so powerful when working with large datasets.
Understanding Axis Behavior
The axis
parameter is a key option in many pandas methods, specifying the direction in which operations are applied. Here’s a simple visual using pseudo-code:
axis=0
: Apply the operation down the rows (column-wise)axis=1
: Apply the operation across the columns (row-wise)
For example, to sum all column values:
f500.sum(axis=0)
To sum across rows:
f500.sum(axis=1)
Keep this distinction in mind—it’ll save you from head-scratching moments when debugging your code!
Final Thoughts: Practice Makes Perfect
These exploration methods are just the tip of the iceberg, but they’re enough to help you quickly understand the shape and structure of your data. Try them out on other datasets and see what insights you can uncover!
Want to learn more about pandas? Check out our full pandas fundamentals lesson or enroll in the Junior Data Analyst path to build on your skills.
Happy coding, and keep experimenting!