Working with Missing Data
AI in Action
import pandas as pd
df = pd.read_csv("passengers.csv")
print(df.isna().sum())
df["age"] = df["age"].fillna(df["age"].median())
Detecting Missing Data
In pandas, missing values are represented as NaN ("Not a Number"). To detect these values, you can use the .isna() and .notna() methods. You can also count how many values are missing in each column, and even filter rows where specific values are missing.
12345678910import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Detect missing values print(df.isna()) # Count missing values per column print(df.isna().sum()) # Rows with missing age print(df[df["Age"].isna()])
Removing Missing Data
A simple way to deal with missing values is to remove them from a dataset. For this, pandas has the .dropna() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Drop rows with any missing values print(df.dropna()) # Drop columns that contain missing values print(df.dropna(axis=1))
If you want to apply these changes, just save the result into a variable:
df = df.dropna()
Filling Missing Data
Deleting rows or columns inevitably leads to the loss of valuable information. To prevent this, you can fill the missing values using the .fillna() method instead:
12345678910import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Fill with statistical value df["Fare"] = df["Fare"].fillna(df["Fare"].mean()) df["Embarked"] = df["Embarked"].fillna(df["Embarked"].mode()) df["Age"] = df["Age"].fillna(df["Age"].median()) # Fill with fixed value df["Cabin"] = df["Cabin"].fillna("Unknown")
Before filling missing values, always review how many there are and in which columns they appear. Unchecked filling can introduce incorrect information into your dataset.
1. Which method returns a boolean mask showing where data is missing?
2. How do you drop columns that contain any missing values?
3. What does this code do?
df["Age"] = df["Age"].fillna(df["Age"].median())
Obrigado pelo seu feedback!
Pergunte à IA
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo
Awesome!
Completion rate improved to 10
Working with Missing Data
Deslize para mostrar o menu
AI in Action
import pandas as pd
df = pd.read_csv("passengers.csv")
print(df.isna().sum())
df["age"] = df["age"].fillna(df["age"].median())
Detecting Missing Data
In pandas, missing values are represented as NaN ("Not a Number"). To detect these values, you can use the .isna() and .notna() methods. You can also count how many values are missing in each column, and even filter rows where specific values are missing.
12345678910import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Detect missing values print(df.isna()) # Count missing values per column print(df.isna().sum()) # Rows with missing age print(df[df["Age"].isna()])
Removing Missing Data
A simple way to deal with missing values is to remove them from a dataset. For this, pandas has the .dropna() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Drop rows with any missing values print(df.dropna()) # Drop columns that contain missing values print(df.dropna(axis=1))
If you want to apply these changes, just save the result into a variable:
df = df.dropna()
Filling Missing Data
Deleting rows or columns inevitably leads to the loss of valuable information. To prevent this, you can fill the missing values using the .fillna() method instead:
12345678910import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Fill with statistical value df["Fare"] = df["Fare"].fillna(df["Fare"].mean()) df["Embarked"] = df["Embarked"].fillna(df["Embarked"].mode()) df["Age"] = df["Age"].fillna(df["Age"].median()) # Fill with fixed value df["Cabin"] = df["Cabin"].fillna("Unknown")
Before filling missing values, always review how many there are and in which columns they appear. Unchecked filling can introduce incorrect information into your dataset.
1. Which method returns a boolean mask showing where data is missing?
2. How do you drop columns that contain any missing values?
3. What does this code do?
df["Age"] = df["Age"].fillna(df["Age"].median())
Obrigado pelo seu feedback!