Working with Duplicates
AI in Action
import pandas as pd
df = pd.read_csv("passengers.csv")
print(df.duplicated().sum())
df = df.drop_duplicates()
Detecting Duplicates
You can check for duplicates in a DataFrame using the .duplicated() method. You can also use it to count how many rows are duplicates.
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Check which rows are duplicates print(df.duplicated()) # Count duplicate rows print(df.duplicated().sum())
By default, pandas checks all columns when identifying duplicates. You can also check duplicates within a specific subset of columns:
12345import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") print(df.duplicated(subset=["Ticket"]).sum())
Removing Duplicates
After you confirm that the duplicate rows shouldn't remain, remove them using the .drop_duplicates() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Remove duplicate rows print(df.drop_duplicates()) # Remove duplicates based only on values in a subset print(df.drop_duplicates(subset=["Ticket"]))
Counting Unique Values
To check how many distinct values each column has, use the .nunique() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Count unique values for each column print(df.nunique()) # Count unique values for a single column print(df["Embarked"].nunique())
This helps you identify columns with limited categories or verify whether an ID column is truly unique.
1. What does df.duplicated() return?
2. How can you remove all duplicate rows from a DataFrame?
3. How would you count a number of unique elements in the "Class" column?
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 10
Working with Duplicates
Swipe to show menu
AI in Action
import pandas as pd
df = pd.read_csv("passengers.csv")
print(df.duplicated().sum())
df = df.drop_duplicates()
Detecting Duplicates
You can check for duplicates in a DataFrame using the .duplicated() method. You can also use it to count how many rows are duplicates.
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Check which rows are duplicates print(df.duplicated()) # Count duplicate rows print(df.duplicated().sum())
By default, pandas checks all columns when identifying duplicates. You can also check duplicates within a specific subset of columns:
12345import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") print(df.duplicated(subset=["Ticket"]).sum())
Removing Duplicates
After you confirm that the duplicate rows shouldn't remain, remove them using the .drop_duplicates() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Remove duplicate rows print(df.drop_duplicates()) # Remove duplicates based only on values in a subset print(df.drop_duplicates(subset=["Ticket"]))
Counting Unique Values
To check how many distinct values each column has, use the .nunique() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Count unique values for each column print(df.nunique()) # Count unique values for a single column print(df["Embarked"].nunique())
This helps you identify columns with limited categories or verify whether an ID column is truly unique.
1. What does df.duplicated() return?
2. How can you remove all duplicate rows from a DataFrame?
3. How would you count a number of unique elements in the "Class" column?
Thanks for your feedback!