Working with Duplicates
AI in Action
import pandas as pd
df = pd.read_csv("passengers.csv")
print(df.duplicated().sum())
df = df.drop_duplicates()
Detecting Duplicates
You can check for duplicates in a DataFrame using the .duplicated() method. You can also use it to count how many rows are duplicates.
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Check which rows are duplicates print(df.duplicated()) # Count duplicate rows print(df.duplicated().sum())
By default, pandas checks all columns when identifying duplicates. You can also check duplicates within a specific subset of columns:
12345import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") print(df.duplicated(subset=["Ticket"]).sum())
Removing Duplicates
After you confirm that the duplicate rows shouldn't remain, remove them using the .drop_duplicates() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Remove duplicate rows print(df.drop_duplicates()) # Remove duplicates based only on values in a subset print(df.drop_duplicates(subset=["Ticket"]))
Counting Unique Values
To check how many distinct values each column has, use the .nunique() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Count unique values for each column print(df.nunique()) # Count unique values for a single column print(df["Embarked"].nunique())
This helps you identify columns with limited categories or verify whether an ID column is truly unique.
1. What does df.duplicated() return?
2. How can you remove all duplicate rows from a DataFrame?
3. How would you count a number of unique elements in the "Class" column?
Grazie per i tuoi commenti!
Chieda ad AI
Chieda ad AI
Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione
Fantastico!
Completion tasso migliorato a 5.26
Working with Duplicates
Scorri per mostrare il menu
AI in Action
import pandas as pd
df = pd.read_csv("passengers.csv")
print(df.duplicated().sum())
df = df.drop_duplicates()
Detecting Duplicates
You can check for duplicates in a DataFrame using the .duplicated() method. You can also use it to count how many rows are duplicates.
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Check which rows are duplicates print(df.duplicated()) # Count duplicate rows print(df.duplicated().sum())
By default, pandas checks all columns when identifying duplicates. You can also check duplicates within a specific subset of columns:
12345import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") print(df.duplicated(subset=["Ticket"]).sum())
Removing Duplicates
After you confirm that the duplicate rows shouldn't remain, remove them using the .drop_duplicates() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Remove duplicate rows print(df.drop_duplicates()) # Remove duplicates based only on values in a subset print(df.drop_duplicates(subset=["Ticket"]))
Counting Unique Values
To check how many distinct values each column has, use the .nunique() method:
12345678import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv") # Count unique values for each column print(df.nunique()) # Count unique values for a single column print(df["Embarked"].nunique())
This helps you identify columns with limited categories or verify whether an ID column is truly unique.
1. What does df.duplicated() return?
2. How can you remove all duplicate rows from a DataFrame?
3. How would you count a number of unique elements in the "Class" column?
Grazie per i tuoi commenti!