Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
What Will We Do With the NaN Values? | Preprocessing Data
Advanced Techniques in pandas
course content

Contenido del Curso

Advanced Techniques in pandas

Advanced Techniques in pandas

1. Getting Familiar With Indexing and Selecting Data
2. Dealing With Conditions
3. Extracting Data
4. Aggregating Data
5. Preprocessing Data

bookWhat Will We Do With the NaN Values?

In the previous chapter, you received the result:

PassengerId0
Survived0
Pclass0
Name0
Sex0
Age86
SibSp0
Parch0
Ticket0
Fare1
Cabin327
Embarked0

The dataset has 418 rows. Look at the column Cabin, where we have 327 missing values. There is no sense filling them in because we have minimal information here. So, in this case, the best solution is to delete the column that is senseless to us. One of the reasons is that we can delete only the rows that contain missing values, but we can't delete 327 rows out of 418. So, let's figure out how to do this.

To delete a column, you must apply the method .drop() to the data set. The syntax is the following:

Explanation:

  • .drop() - a method that deletes columns;
  • columns = 'column_name' or columns = ['column_1', 'column_2'] - argument of the function, where you specify the name or names of columns that you want to delete;
  • inplace = True - useful argument of pandas that allows us to save all changes. You can use it in other functions too; we will learn some of them later on.

Tarea

Your task is to delete the column with the greatest number of NaN values. Follow the algorithm:

  1. Drop the column 'Cabin' using the inplace = True argument.
  2. Output the random 5 rows of the data set.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 5. Capítulo 3
toggle bottom row

bookWhat Will We Do With the NaN Values?

In the previous chapter, you received the result:

PassengerId0
Survived0
Pclass0
Name0
Sex0
Age86
SibSp0
Parch0
Ticket0
Fare1
Cabin327
Embarked0

The dataset has 418 rows. Look at the column Cabin, where we have 327 missing values. There is no sense filling them in because we have minimal information here. So, in this case, the best solution is to delete the column that is senseless to us. One of the reasons is that we can delete only the rows that contain missing values, but we can't delete 327 rows out of 418. So, let's figure out how to do this.

To delete a column, you must apply the method .drop() to the data set. The syntax is the following:

Explanation:

  • .drop() - a method that deletes columns;
  • columns = 'column_name' or columns = ['column_1', 'column_2'] - argument of the function, where you specify the name or names of columns that you want to delete;
  • inplace = True - useful argument of pandas that allows us to save all changes. You can use it in other functions too; we will learn some of them later on.

Tarea

Your task is to delete the column with the greatest number of NaN values. Follow the algorithm:

  1. Drop the column 'Cabin' using the inplace = True argument.
  2. Output the random 5 rows of the data set.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 5. Capítulo 3
toggle bottom row

bookWhat Will We Do With the NaN Values?

In the previous chapter, you received the result:

PassengerId0
Survived0
Pclass0
Name0
Sex0
Age86
SibSp0
Parch0
Ticket0
Fare1
Cabin327
Embarked0

The dataset has 418 rows. Look at the column Cabin, where we have 327 missing values. There is no sense filling them in because we have minimal information here. So, in this case, the best solution is to delete the column that is senseless to us. One of the reasons is that we can delete only the rows that contain missing values, but we can't delete 327 rows out of 418. So, let's figure out how to do this.

To delete a column, you must apply the method .drop() to the data set. The syntax is the following:

Explanation:

  • .drop() - a method that deletes columns;
  • columns = 'column_name' or columns = ['column_1', 'column_2'] - argument of the function, where you specify the name or names of columns that you want to delete;
  • inplace = True - useful argument of pandas that allows us to save all changes. You can use it in other functions too; we will learn some of them later on.

Tarea

Your task is to delete the column with the greatest number of NaN values. Follow the algorithm:

  1. Drop the column 'Cabin' using the inplace = True argument.
  2. Output the random 5 rows of the data set.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

In the previous chapter, you received the result:

PassengerId0
Survived0
Pclass0
Name0
Sex0
Age86
SibSp0
Parch0
Ticket0
Fare1
Cabin327
Embarked0

The dataset has 418 rows. Look at the column Cabin, where we have 327 missing values. There is no sense filling them in because we have minimal information here. So, in this case, the best solution is to delete the column that is senseless to us. One of the reasons is that we can delete only the rows that contain missing values, but we can't delete 327 rows out of 418. So, let's figure out how to do this.

To delete a column, you must apply the method .drop() to the data set. The syntax is the following:

Explanation:

  • .drop() - a method that deletes columns;
  • columns = 'column_name' or columns = ['column_1', 'column_2'] - argument of the function, where you specify the name or names of columns that you want to delete;
  • inplace = True - useful argument of pandas that allows us to save all changes. You can use it in other functions too; we will learn some of them later on.

Tarea

Your task is to delete the column with the greatest number of NaN values. Follow the algorithm:

  1. Drop the column 'Cabin' using the inplace = True argument.
  2. Output the random 5 rows of the data set.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
Sección 5. Capítulo 3
Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
some-alt