Series and DataFrame Methods You Need to Know

  • Last updated on February 7, 2025 at 10:55 PM

If you ever decide to work with a dataset that doesn't come with an instruction manual, you’re going to have questions. What does the data look like? Is it messy? What data is missing? Is it mostly numbers or text, and what kinds of statistics can we extract from it? Thankfully, pandas can help answer these questions in seconds with just a few commands. Think of it as your data detective toolkit, helping you make sense of everything before getting into deeper analysis.


Loading the Dataset

We’ll be working with data from the 2017 Fortune Global 500 list, which ranks the world’s largest companies by revenue. Once you’ve downloaded the dataset, here’s how to load it into pandas:

import pandas as pd

f500 = pd.read_csv("f500.csv", index_col=0)
f500.index.name = None

This dataset includes columns like:

  • rank: Global rank of the company
  • revenues: Revenue for the year (in millions)
  • industry: Industry in which the company operates
  • ceo: Name of the CEO

Now that the data is loaded, let’s look at the methods that will help us explore and get to know it.


Exploring DataFrames: Your First Look at the Data

While some of these methods might also work on Series objects, this section focuses on DataFrame methods to help you get an overview of your entire dataset. We'll take a look at Series methods after going over these useful DataFrame techniques.

1. .head(): Take a Peek

The first step is often just seeing what the data looks like. The DataFrame.head() method returns the first few rows (5 by default), giving you an immediate sense of what’s inside. If you want to see more (or fewer) rows, just pass a number as an argument to the method call, like f500.head(10) or f500.head(3).

f500.head()

Output:

              rank  revenues  revenue_change  profits    ...
Walmart          1    485873             0.8  13643.0    ...
State Grid       2    315199            -4.4   9571.3    ...
Sinopec Group    3    267518            -9.1   1257.9    ...
China National   4    262573           -12.3   1867.5    ...
Toyota Motor     5    254694             7.7  16899.3    ...

This is like peeking into a box of chocolates to see what flavors you’ve got before you start ripping into them. If you notice anything odd, you can investigate further.


2. .info(): Structure and Data Health

What’s the overall structure of the data? Which columns have missing values? What data types are you working with? The DataFrame.info() method is your one-stop shop for this information.

f500.info()

Output:

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to LG Electronics
Data columns (total 16 columns):
 #   Column                   Non-Null Count  Dtype
---  ------                   --------------  -----
 0   rank                     500 non-null    int64
 1   revenues                 500 non-null    int64
 2   revenue_change           498 non-null    float64
 3   profits                  499 non-null    float64
    ...

This quick overview helps you spot potential problems, like missing values or incorrect data types (e.g., a numeric column mistakenly stored as text).


3. .describe(): Numeric Summaries

The DataFrame.describe() method summarizes numeric columns by default, providing statistics like mean, min, max, and standard deviation.

f500.describe()

Output:

             rank     revenues  revenue_change      profits    ...
count  500.000000   500.000000      498.000000   499.000000    ...
mean   250.500000  55416.35800        4.538353    3055.2032    ...
std    144.481833 45725.478963       28.549067  5171.981071    ...
min      1.000000 21609.000000      -67.300000  -13038.0000    ...
25%    125.750000 29003.000000       -5.900000   556.950000    ...
50%    250.500000 40236.000000        0.550000   1761.60000    ...
75%    375.250000 63926.750000        6.975000   3954.00000    ...
max    500.000000 485873.00000      442.300000  45687.00000    ...

This is incredibly useful for identifying outliers or understanding the general scale of your data. For example, if the standard deviation for revenues is very high, it suggests that company sizes vary widely.


Exploring Series: Zooming In on Columns

Sometimes you don’t need to explore the whole dataset—you just want to focus on a single column. That’s where Series exploration comes in handy.

4. .describe() for Object Columns

Did you know that .describe() can summarize text-based (or object) columns as well? It provides stats like the number of unique values, the most frequently occurring value (top), and its count (freq).

f500["industry"].describe()

Output:

count                                500
unique                                58
top        Banks: Commercial and Savings
freq                                  51
Name: industry, dtype: object

In this example, you can see that the most common industry among the top 500 companies is Banking, with 51 companies in that category. This quick summary gives you a sense of the distribution without needing to write complex queries.


Example: Exploring the country Column

Let’s say you want to explore which countries have the most companies on the list:

f500["country"].value_counts().head()

Output:

USA       132
China      73
Japan      52
Germany    30
France     29
Name: country, dtype: int64

As expected, the United States tops the list, followed by China and Japan. This insight could prompt further investigation into regional patterns in company revenue or profits.


Why pandas Is So Fast

Here’s a fun fact: pandas is built on top of NumPy, which is designed to handle large datasets efficiently. This means pandas can apply vectorized operations—performing computations on entire Series or DataFrames without needing loops.

For example, if we wanted to calculate the change in company ranks:

f500["previous_rank"] - f500["rank"]

This subtraction happens all at once, thanks to vectorized operations. This is a big reason why pandas is so powerful when working with large datasets.


Understanding Axis Behavior

The axis parameter is a key option in many pandas methods, specifying the direction in which operations are applied. Here’s a simple visual using pseudo-code:

  • axis=0: Apply the operation down the rows (column-wise)
  • axis=1: Apply the operation across the columns (row-wise)

For example, to sum all column values:

f500.sum(axis=0)

To sum across rows:

f500.sum(axis=1)

Keep this distinction in mind—it’ll save you from head-scratching moments when debugging your code!


Final Thoughts: Practice Makes Perfect

These exploration methods are just the tip of the iceberg, but they’re enough to help you quickly understand the shape and structure of your data. Try them out on other datasets and see what insights you can uncover!

Want to learn more about pandas? Check out our full pandas fundamentals lesson or enroll in the Junior Data Analyst path to build on your skills. 

Happy coding, and keep experimenting!