Filling In the Missing Values
Deleting missing values is not the only way to get rid of them. You can also replace all NaNs with a defined value, for instance, with the mean value of the column or with zeros. It can be useful in a lot of cases. You will learn this in the course Learning Statistics with Python.
Look at the example of filling missing values in the column 'Age' with the median value of this column:
1234import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/titanic_2', index_col = 0) data['Age'].fillna(value=data['Age'].median(), inplace=True) print(data['Age'].isna().sum())
Explanation:
.fillna(value=data['Age'].median(), inplace=True)
value = data['Age'].median()- using the argumentvalue, we tell the.fillna()method what to do with theNaNvalues. In this case, we applied the.fillna()method to the column'Age'and replaced all missing values with the median of the column;inplace=True- the argument we can use for saving changes.
Swipe to start coding
Missing values can cause problems when analyzing data. One of the most common ways to handle them is by replacing missing values with the mean of the column.
Your task is to:
-
Replace all
NaNvalues in the column'Age'with the mean of that column.- Use the
.fillna()method with the argumentsvalue=data['Age'].mean()andinplace=True.
- Use the
-
Calculate and print the number of remaining missing values in the
'Age'column.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 3.03
Filling In the Missing Values
Swipe to show menu
Deleting missing values is not the only way to get rid of them. You can also replace all NaNs with a defined value, for instance, with the mean value of the column or with zeros. It can be useful in a lot of cases. You will learn this in the course Learning Statistics with Python.
Look at the example of filling missing values in the column 'Age' with the median value of this column:
1234import pandas as pd data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/titanic_2', index_col = 0) data['Age'].fillna(value=data['Age'].median(), inplace=True) print(data['Age'].isna().sum())
Explanation:
.fillna(value=data['Age'].median(), inplace=True)
value = data['Age'].median()- using the argumentvalue, we tell the.fillna()method what to do with theNaNvalues. In this case, we applied the.fillna()method to the column'Age'and replaced all missing values with the median of the column;inplace=True- the argument we can use for saving changes.
Swipe to start coding
Missing values can cause problems when analyzing data. One of the most common ways to handle them is by replacing missing values with the mean of the column.
Your task is to:
-
Replace all
NaNvalues in the column'Age'with the mean of that column.- Use the
.fillna()method with the argumentsvalue=data['Age'].mean()andinplace=True.
- Use the
-
Calculate and print the number of remaining missing values in the
'Age'column.
Solution
Thanks for your feedback!
single