Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Working with Missing Data | Data Cleaning
Introduction to Pandas with AI

bookWorking with Missing Data

AI in Action

import pandas as pd

df = pd.read_csv("passengers.csv")

print(df.isna().sum())
df["age"] = df["age"].fillna(df["age"].median())

Detecting Missing Data

In pandas, missing values are represented as NaN ("Not a Number"). To detect these values, you can use the .isna() and .notna() methods. You can also count how many values are missing in each column, and even filter rows where specific values are missing.

12345678910
import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Detect missing values print(df.isna()) # Count missing values per column print(df.isna().sum()) # Rows with missing age print(df[df["Age"].isna()])
copy

Removing Missing Data

A simple way to deal with missing values is to remove them from a dataset. For this, pandas has the .dropna() method:

12345678
import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Drop rows with any missing values print(df.dropna()) # Drop columns that contain missing values print(df.dropna(axis=1))
copy

If you want to apply these changes, just save the result into a variable:

df = df.dropna()

Filling Missing Data

Deleting rows or columns inevitably leads to the loss of valuable information. To prevent this, you can fill the missing values using the .fillna() method instead:

12345678910
import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Fill with statistical value df["Fare"] = df["Fare"].fillna(df["Fare"].mean()) df["Embarked"] = df["Embarked"].fillna(df["Embarked"].mode()) df["Age"] = df["Age"].fillna(df["Age"].median()) # Fill with fixed value df["Cabin"] = df["Cabin"].fillna("Unknown")
copy
Note
Note

Before filling missing values, always review how many there are and in which columns they appear. Unchecked filling can introduce incorrect information into your dataset.

1. Which method returns a boolean mask showing where data is missing?

2. How do you drop columns that contain any missing values?

3. What does this code do?

df["Age"] = df["Age"].fillna(df["Age"].median())
question mark

Which method returns a boolean mask showing where data is missing?

Select the correct answer

question mark

How do you drop columns that contain any missing values?

Select the correct answer

question mark

What does this code do?

df["Age"] = df["Age"].fillna(df["Age"].median())

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 2. Capítulo 3

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Awesome!

Completion rate improved to 10

bookWorking with Missing Data

Desliza para mostrar el menú

AI in Action

import pandas as pd

df = pd.read_csv("passengers.csv")

print(df.isna().sum())
df["age"] = df["age"].fillna(df["age"].median())

Detecting Missing Data

In pandas, missing values are represented as NaN ("Not a Number"). To detect these values, you can use the .isna() and .notna() methods. You can also count how many values are missing in each column, and even filter rows where specific values are missing.

12345678910
import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Detect missing values print(df.isna()) # Count missing values per column print(df.isna().sum()) # Rows with missing age print(df[df["Age"].isna()])
copy

Removing Missing Data

A simple way to deal with missing values is to remove them from a dataset. For this, pandas has the .dropna() method:

12345678
import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Drop rows with any missing values print(df.dropna()) # Drop columns that contain missing values print(df.dropna(axis=1))
copy

If you want to apply these changes, just save the result into a variable:

df = df.dropna()

Filling Missing Data

Deleting rows or columns inevitably leads to the loss of valuable information. To prevent this, you can fill the missing values using the .fillna() method instead:

12345678910
import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Fill with statistical value df["Fare"] = df["Fare"].fillna(df["Fare"].mean()) df["Embarked"] = df["Embarked"].fillna(df["Embarked"].mode()) df["Age"] = df["Age"].fillna(df["Age"].median()) # Fill with fixed value df["Cabin"] = df["Cabin"].fillna("Unknown")
copy
Note
Note

Before filling missing values, always review how many there are and in which columns they appear. Unchecked filling can introduce incorrect information into your dataset.

1. Which method returns a boolean mask showing where data is missing?

2. How do you drop columns that contain any missing values?

3. What does this code do?

df["Age"] = df["Age"].fillna(df["Age"].median())
question mark

Which method returns a boolean mask showing where data is missing?

Select the correct answer

question mark

How do you drop columns that contain any missing values?

Select the correct answer

question mark

What does this code do?

df["Age"] = df["Age"].fillna(df["Age"].median())

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 2. Capítulo 3
some-alt