**Removing missing values** from datasets is an important step in ensuring data analysis and modeling quality and accuracy. It helps to avoid issues with incomplete data, skewed results, poor model performance, and data integrity. But it is important to carefully consider the implications of removing missing values and to choose an appropriate method for handling them, such as imputation or removal, depending on the specific situation.

To remove missing values in Python, you can use the `.dropna()`, method of the `pandas` library. This function removes any rows or columns that contain missing values in a dataset.

Here's an example:


import pandas as pd
import numpy as np

# Load dataset
dataset = pd.DataFrame(np.array([[10, 2, np.nan], [5, 0.3, 9], [np.nan, 12, 8], [11, 12, 8]]))
print('Dataset is:\n', dataset)

# Drop rows with missing values
dataset = dataset.dropna()
print('Cleaned dataset is:\n', dataset)

It's important to note that removing missing values can result in a loss of information, so it's important to consider the implications of removing them before doing so. In some cases, it may be appropriate to impute missing values instead of removing them.

Also, we want to remind you that replacing missing values with mean values can be used for handling missing data in Python. It is typically used when the missing data is missing at random (MAR), which means that the missing values are not related to the actual value of the missing data.


Creating a machine learning model seems to be your most challenging and essential task. But first, we have to work with data! Learn how to process datasets and fully prepare them for use. Numerical, categorical, and temporal data await you in our course.

Different types of data? How to work with them? If your eyes are wide open, don't worry, let's start with a brief overview of the pandas library and learn how to work with it in the future.

This chapter discusses in detail how to work with quantitative data, what methods it is processed with, how data scaling and normalization differ, and much more.

Is categorical data as simple as you think it is? Find out what is the complexity of processing and working with it.


Time series data processing is the process of handling, analyzing, and preparing data that is presented as a sequence of temporally ordered values. Find out what steps it includes in this section.

Did you know that you can extract even more values from your data and create more informative features? In this section, you will learn how to work with feature engineering.

You have reached the end of this course. Let's test your knowledge! There are 3 tasks for you to solve.

Removing Missing Values

Рішення