Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Missing Values | Preprocessing Data: Part II
Data Manipulation using pandas
course content

Зміст курсу

Data Manipulation using pandas

Data Manipulation using pandas

1. Preprocessing Data: Part I
2. Preprocessing Data: Part II
3. Grouping Data
4. Aggregating and Visualizing Data
5. Joining Data

bookMissing Values

The last issue you can meet while working with data is missing data. As you can see, missing data can represented in different ways (like, dots in our dataset).

There are several ways on dealing with missing values: you can either delete rows containing missing values, or replace them with some constant. As was said before, check out if you won't delete big share of data. What values can be used for replacement? One of the most popular options is mean of available data.

If you want to drop rows with NA values, apply the dropna() method. Let's consider what parameters does this method have. drop(axis = 0, how = 'any', thresh, subset, inplace = True)

ParameterDescription
axis = 0/1Determines if rows (0 - default) or columns (1) which contains missing values will be removed
how = 'any'/'all'Determines if row/column will be removed from dataframe, when we have at least one NA ('any' - default) or all NA ('all')
thrash = intOptional, determines that many non-NA values across specified axis. Cannot be combined with the how parameter
subset = 'column'/['column1', 'column2']Optional, what columns/rows should be looked for NA values
inplace = True/FalseShould the changes modify the dataframe rather than creating a new one (default - False, shouldn't)

For instance, let's remove rows containing NA within at least one of columns 'morgh' and 'valueh'.

12345678910
# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data4.csv') # Dimensionality of dataframe before deleting print("Before deleting:", df.shape) # After deleting df.dropna(subset = ['morgh', 'valueh'], inplace = True) print("After deleting:", df.shape)
copy

As you can see, 248 rows were removed since there were NA values in at least one of the 'morgh', 'valueh' columns.

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 6
some-alt