Removing Rows

Let's see what are the differences that caused these issues by displaying these rows.


              12345678
            
# Importing the library
import pandas as pd

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data2.csv')
# Select rows with discrepancies
ind = df.iloc[:,2:15].sum(axis = 1) != df.hhsize
print(df.loc[ind, df.columns[1:15]])

These are not consequent observations, and it's only half a percent of all observations, so we can easily delete them. If you want to delete rows or columns, apply the .drop() method to dataframe. If you want to remain changes saved, either reassign to dataframe result of applying method, or set the inplace = True parameter. If you want to drop rows, set the index parameter to indexes of rows you want to remove, if you want to delete columns - set the columns parameter to list of columns you want to delete. For instance, if you want to delete the first, and the third rows, you should apply the .drop(index = [0, 2]) method. If you want to delete 3-5 columns, then you can get their names from the .columns attribute, and apply the .drop(columns = df.columns[2:5]) method. Feel free to experiment!


              12345678910
            
# Importing the library
import pandas as pd

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data2.csv')
# Dropping rows

print(dr.drop(index = [0, 2]))
# Dropping columns
print(df.drop(columns = df.columns[2:5]))

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Data Manipulation using pandas

1. Preprocessing Data: Part I

What is Data Preprocessing?Types consistency Poor Data Presentation Manipulating Strings Challenge Replacing Specific Elements Simultaneous Replacement Challenge

2. Preprocessing Data: Part II

Logical Inconsistency Removing Rows Challenge Outliers Challenge Missing Values Filling NA values Challenge

3. Grouping Data

What is Grouping Data?Grouping in pandas [1/2]Challenge Grouping in pandas [2/2]Challenge Grouping by Several Columns Challenge

4. Aggregating and Visualizing Data

Advanced Aggregation [1/2]Challenge Advanced Aggregation [2/2]Challenge Histograms Challenge Bar and Scatter Plots Other Types of Graphs Challenge 1 Challenge 2

5. Joining Data

What is Joining Data?Left Join Right Join Inner Join Outer Join Concatenation

Removing Rows

Let's see what are the differences that caused these issues by displaying these rows.


              12345678
            
# Importing the library
import pandas as pd

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data2.csv')
# Select rows with discrepancies
ind = df.iloc[:,2:15].sum(axis = 1) != df.hhsize
print(df.loc[ind, df.columns[1:15]])


              12345678910
            
# Importing the library
import pandas as pd

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data2.csv')
# Dropping rows

print(dr.drop(index = [0, 2]))
# Dropping columns
print(df.drop(columns = df.columns[2:5]))

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 2