Desliza para mostrar el menú

Work with NaNs

To check if the current value is NaN, use isna() function. You can apply it to the full dataframe, to the column or cell, and you'll get True if the value is NaN and False otherwise.


              1
            
print(data.isna())

It is more informative to check if there are some NaNs in each column. We'll use sum() function to find the total amount among dataframe's columns:


              123456
            
import pandas as pd
import numpy as np

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/titanic.csv')

print(data.isna().sum())

If you run the code above (reset the editor and paste code in it) you'll probably see the next output:


PassengerId	0
Survived	0
Pclass	0
Name	0
Sex	0
Age	177
SibSp	0
Parch	0
Ticket	0
Fare	0
Cabin	687
Embarked	2

dtype: int64

You can see that Embarked column has only 2 NaNs, which is not too much for almost 900 records, but look at the Cabin! More than 75% of entries are missing values. And we should deal with it in some way.

Drop NaNs

The easiest way to deal with missing data is just to drop the records that contain it. Use the method dropna(). Note that it doesn't change the current dataframe, but returns the new one. To change the current dataframe, add parameter inplace assigned with True:


              12
            
clean_data = data.dropna() # data is not modified, but clean_data now contains no NaNs
data.dropna(inplace=True) # data is modified

Tarea

Swipe to start coding

Apply the dropna() to the dataframe data. Then check the dataframe shape after modification and compare it with the original (before modification) dataframe shape.

Solución

We expect the shape (183, 12) for new dataframe.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 2. Capítulo 2

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Work with NaNs

To check if the current value is NaN, use isna() function. You can apply it to the full dataframe, to the column or cell, and you'll get True if the value is NaN and False otherwise.


              1
            
print(data.isna())

It is more informative to check if there are some NaNs in each column. We'll use sum() function to find the total amount among dataframe's columns:


              123456
            
import pandas as pd
import numpy as np

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/titanic.csv')

print(data.isna().sum())

If you run the code above (reset the editor and paste code in it) you'll probably see the next output:


PassengerId	0
Survived	0
Pclass	0
Name	0
Sex	0
Age	177
SibSp	0
Parch	0
Ticket	0
Fare	0
Cabin	687
Embarked	2

dtype: int64

Drop NaNs


              12
            
clean_data = data.dropna() # data is not modified, but clean_data now contains no NaNs
data.dropna(inplace=True) # data is modified

Tarea

Swipe to start coding

Apply the dropna() to the dataframe data. Then check the dataframe shape after modification and compare it with the original (before modification) dataframe shape.

Solución

We expect the shape (183, 12) for new dataframe.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 2. Capítulo 2

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones