Course Content
Data Manipulation using pandas
Data Manipulation using pandas
Missing Values
The last issue you can meet while working with data is missing data. As you can see, missing data can represented in different ways (like, dots in our dataset).
There are several ways on dealing with missing values: you can either delete rows containing missing values, or replace them with some constant. As was said before, check out if you won't delete big share of data. What values can be used for replacement? One of the most popular options is mean of available data.
If you want to drop rows with NA values, apply the dropna()
method. Let's consider what parameters does this method have.
drop(axis = 0, how = 'any', thresh, subset, inplace = True)
Parameter | Description |
axis = 0/1 | Determines if rows (0 - default) or columns (1 ) which contains missing values will be removed |
how = 'any'/'all' | Determines if row/column will be removed from dataframe, when we have at least one NA ('any' - default) or all NA ('all' ) |
thrash = int | Optional, determines that many non-NA values across specified axis. Cannot be combined with the how parameter |
subset = 'column'/['column1', 'column2'] | Optional, what columns/rows should be looked for NA values |
inplace = True/False | Should the changes modify the dataframe rather than creating a new one (default - False , shouldn't) |
For instance, let's remove rows containing NA within at least one of columns 'morgh'
and 'valueh'
.
# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data4.csv') # Dimensionality of dataframe before deleting print("Before deleting:", df.shape) # After deleting df.dropna(subset = ['morgh', 'valueh'], inplace = True) print("After deleting:", df.shape)
As you can see, 248 rows were removed since there were NA values in at least one of the 'morgh', 'valueh'
columns.
Thanks for your feedback!