Course Content
Preprocessing Data
Preprocessing Data
Replace Categorical Missing Data with Values
To deal with categorical data:
- replace with some constant or the most popular value
- create a new category for these values. -process the data after converting it to the numerical. We'll use this approach later.
Let's explore for each column Cabin
and Embarked
(these columns contain NaNs) and figure out how to proceed with the NaNs.
Task
- Explore the share of NaNs for each of the given columns. Print these values.
- For
Embarked
column, simply drop the missing values, since there are only 2 rows containing it. - For the
Cabin
, about 77% of data is missing (if everything is done correct). That's why we'll replace NaNs with some new value. To do that:
- print all the unique values for the
Cabin
column. - choose any other vlaue except already presented in the
Cabin
column and replace all NaNs with it. (For example, it can be 'Z' or 'X').
Check some data samples to see the modified dataframe.
Thanks for your feedback!
Replace Categorical Missing Data with Values
To deal with categorical data:
- replace with some constant or the most popular value
- create a new category for these values. -process the data after converting it to the numerical. We'll use this approach later.
Let's explore for each column Cabin
and Embarked
(these columns contain NaNs) and figure out how to proceed with the NaNs.
Task
- Explore the share of NaNs for each of the given columns. Print these values.
- For
Embarked
column, simply drop the missing values, since there are only 2 rows containing it. - For the
Cabin
, about 77% of data is missing (if everything is done correct). That's why we'll replace NaNs with some new value. To do that:
- print all the unique values for the
Cabin
column. - choose any other vlaue except already presented in the
Cabin
column and replace all NaNs with it. (For example, it can be 'Z' or 'X').
Check some data samples to see the modified dataframe.
Thanks for your feedback!
Replace Categorical Missing Data with Values
To deal with categorical data:
- replace with some constant or the most popular value
- create a new category for these values. -process the data after converting it to the numerical. We'll use this approach later.
Let's explore for each column Cabin
and Embarked
(these columns contain NaNs) and figure out how to proceed with the NaNs.
Task
- Explore the share of NaNs for each of the given columns. Print these values.
- For
Embarked
column, simply drop the missing values, since there are only 2 rows containing it. - For the
Cabin
, about 77% of data is missing (if everything is done correct). That's why we'll replace NaNs with some new value. To do that:
- print all the unique values for the
Cabin
column. - choose any other vlaue except already presented in the
Cabin
column and replace all NaNs with it. (For example, it can be 'Z' or 'X').
Check some data samples to see the modified dataframe.
Thanks for your feedback!
To deal with categorical data:
- replace with some constant or the most popular value
- create a new category for these values. -process the data after converting it to the numerical. We'll use this approach later.
Let's explore for each column Cabin
and Embarked
(these columns contain NaNs) and figure out how to proceed with the NaNs.
Task
- Explore the share of NaNs for each of the given columns. Print these values.
- For
Embarked
column, simply drop the missing values, since there are only 2 rows containing it. - For the
Cabin
, about 77% of data is missing (if everything is done correct). That's why we'll replace NaNs with some new value. To do that:
- print all the unique values for the
Cabin
column. - choose any other vlaue except already presented in the
Cabin
column and replace all NaNs with it. (For example, it can be 'Z' or 'X').
Check some data samples to see the modified dataframe.