Course Content
Preprocessing Data
Preprocessing Data
Data Types
Let's talk about the types of data that dataframe may contain.
Numerical
Numerical data is presented by int or float values. In the dataframe, it should be stored as int64 or float64 data types value. Use data.info()
to check the data types for each column.
Note that some fields in the dataframe may contain numerical values, but are stored using some other data type (object or str). You have to convert it to the int64 or float64, and we’ll explore how to do it later.
Categorical
Categorical data has no numerical representation, it is an item from the list of some groups or categories. For example, column Sex has values Male or Female, or column Season with values Spring, Summer, Fall, and Winter. It requires special conversion and preprocessing. This data has data types: object, bool, str.
Fortunately, the dataset titanic
already contains numerical data as int64
and float64
.
Task
Let's divide the columns into numerical and categorical. Create num_cols
as numpy array, including types int
and float
. Let the cat_cols
be all other features except the num_cols
.
Thanks for your feedback!
Data Types
Let's talk about the types of data that dataframe may contain.
Numerical
Numerical data is presented by int or float values. In the dataframe, it should be stored as int64 or float64 data types value. Use data.info()
to check the data types for each column.
Note that some fields in the dataframe may contain numerical values, but are stored using some other data type (object or str). You have to convert it to the int64 or float64, and we’ll explore how to do it later.
Categorical
Categorical data has no numerical representation, it is an item from the list of some groups or categories. For example, column Sex has values Male or Female, or column Season with values Spring, Summer, Fall, and Winter. It requires special conversion and preprocessing. This data has data types: object, bool, str.
Fortunately, the dataset titanic
already contains numerical data as int64
and float64
.
Task
Let's divide the columns into numerical and categorical. Create num_cols
as numpy array, including types int
and float
. Let the cat_cols
be all other features except the num_cols
.
Thanks for your feedback!
Data Types
Let's talk about the types of data that dataframe may contain.
Numerical
Numerical data is presented by int or float values. In the dataframe, it should be stored as int64 or float64 data types value. Use data.info()
to check the data types for each column.
Note that some fields in the dataframe may contain numerical values, but are stored using some other data type (object or str). You have to convert it to the int64 or float64, and we’ll explore how to do it later.
Categorical
Categorical data has no numerical representation, it is an item from the list of some groups or categories. For example, column Sex has values Male or Female, or column Season with values Spring, Summer, Fall, and Winter. It requires special conversion and preprocessing. This data has data types: object, bool, str.
Fortunately, the dataset titanic
already contains numerical data as int64
and float64
.
Task
Let's divide the columns into numerical and categorical. Create num_cols
as numpy array, including types int
and float
. Let the cat_cols
be all other features except the num_cols
.
Thanks for your feedback!
Let's talk about the types of data that dataframe may contain.
Numerical
Numerical data is presented by int or float values. In the dataframe, it should be stored as int64 or float64 data types value. Use data.info()
to check the data types for each column.
Note that some fields in the dataframe may contain numerical values, but are stored using some other data type (object or str). You have to convert it to the int64 or float64, and we’ll explore how to do it later.
Categorical
Categorical data has no numerical representation, it is an item from the list of some groups or categories. For example, column Sex has values Male or Female, or column Season with values Spring, Summer, Fall, and Winter. It requires special conversion and preprocessing. This data has data types: object, bool, str.
Fortunately, the dataset titanic
already contains numerical data as int64
and float64
.
Task
Let's divide the columns into numerical and categorical. Create num_cols
as numpy array, including types int
and float
. Let the cat_cols
be all other features except the num_cols
.