Working with Duplicates
AI in Action
import pandas as pd
df = pd.read_csv("passengers.csv")
print(df.duplicated().sum())
df = df.drop_duplicates()
Detecting Duplicates
You can check for duplicates in a DataFrame using the .duplicated() method. You can also use it to count how many rows are duplicates.
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Check which rows are duplicates print(df.duplicated()) # Count duplicate rows print(df.duplicated().sum())
By default, pandas checks all columns when identifying duplicates. You can also check duplicates within a specific subset of columns:
12345import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") print(df.duplicated(subset=["Ticket"]).sum())
Removing Duplicates
After you confirm that the duplicate rows shouldn't remain, remove them using the .drop_duplicates() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Remove duplicate rows print(df.drop_duplicates()) # Remove duplicates based only on values in a subset print(df.drop_duplicates(subset=["Ticket"]))
Counting Unique Values
To check how many distinct values each column has, use the .nunique() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Count unique values for each column print(df.nunique()) # Count unique values for a single column print(df["Embarked"].nunique())
This helps you identify columns with limited categories or verify whether an ID column is truly unique.
1. What does df.duplicated() return?
2. How can you remove all duplicate rows from a DataFrame?
3. How would you count a number of unique elements in the "Class" column?
Bedankt voor je feedback!
Vraag AI
Vraag AI
Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.
Geweldig!
Completion tarief verbeterd naar 5.26
Working with Duplicates
Veeg om het menu te tonen
AI in Action
import pandas as pd
df = pd.read_csv("passengers.csv")
print(df.duplicated().sum())
df = df.drop_duplicates()
Detecting Duplicates
You can check for duplicates in a DataFrame using the .duplicated() method. You can also use it to count how many rows are duplicates.
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Check which rows are duplicates print(df.duplicated()) # Count duplicate rows print(df.duplicated().sum())
By default, pandas checks all columns when identifying duplicates. You can also check duplicates within a specific subset of columns:
12345import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") print(df.duplicated(subset=["Ticket"]).sum())
Removing Duplicates
After you confirm that the duplicate rows shouldn't remain, remove them using the .drop_duplicates() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Remove duplicate rows print(df.drop_duplicates()) # Remove duplicates based only on values in a subset print(df.drop_duplicates(subset=["Ticket"]))
Counting Unique Values
To check how many distinct values each column has, use the .nunique() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Count unique values for each column print(df.nunique()) # Count unique values for a single column print(df["Embarked"].nunique())
This helps you identify columns with limited categories or verify whether an ID column is truly unique.
1. What does df.duplicated() return?
2. How can you remove all duplicate rows from a DataFrame?
3. How would you count a number of unique elements in the "Class" column?
Bedankt voor je feedback!