Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Data Types | Data Exploration
Preprocessing Data
course content

Course Content

Preprocessing Data

Preprocessing Data

1. Data Exploration
2. Data Cleaning
3. Data Validation
4. Normalization & Standardization
5. Data Encoding

Data Types

Let's talk about the types of data that dataframe may contain.

Numerical

Numerical data is presented by int or float values. In the dataframe, it should be stored as int64 or float64 data types value. Use data.info() to check the data types for each column.

Note that some fields in the dataframe may contain numerical values, but are stored using some other data type (object or str). You have to convert it to the int64 or float64, and we’ll explore how to do it later.

Categorical

Categorical data has no numerical representation, it is an item from the list of some groups or categories. For example, column Sex has values Male or Female, or column Season with values Spring, Summer, Fall, and Winter. It requires special conversion and preprocessing. This data has data types: object, bool, str.

Fortunately, the dataset titanic already contains numerical data as int64 and float64.

Task

Let's divide the columns into numerical and categorical. Create num_cols as numpy array, including types int and float. Let the cat_cols be all other features except the num_cols.

Task

Let's divide the columns into numerical and categorical. Create num_cols as numpy array, including types int and float. Let the cat_cols be all other features except the num_cols.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 1. Chapter 3
toggle bottom row

Data Types

Let's talk about the types of data that dataframe may contain.

Numerical

Numerical data is presented by int or float values. In the dataframe, it should be stored as int64 or float64 data types value. Use data.info() to check the data types for each column.

Note that some fields in the dataframe may contain numerical values, but are stored using some other data type (object or str). You have to convert it to the int64 or float64, and we’ll explore how to do it later.

Categorical

Categorical data has no numerical representation, it is an item from the list of some groups or categories. For example, column Sex has values Male or Female, or column Season with values Spring, Summer, Fall, and Winter. It requires special conversion and preprocessing. This data has data types: object, bool, str.

Fortunately, the dataset titanic already contains numerical data as int64 and float64.

Task

Let's divide the columns into numerical and categorical. Create num_cols as numpy array, including types int and float. Let the cat_cols be all other features except the num_cols.

Task

Let's divide the columns into numerical and categorical. Create num_cols as numpy array, including types int and float. Let the cat_cols be all other features except the num_cols.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 1. Chapter 3
toggle bottom row

Data Types

Let's talk about the types of data that dataframe may contain.

Numerical

Numerical data is presented by int or float values. In the dataframe, it should be stored as int64 or float64 data types value. Use data.info() to check the data types for each column.

Note that some fields in the dataframe may contain numerical values, but are stored using some other data type (object or str). You have to convert it to the int64 or float64, and we’ll explore how to do it later.

Categorical

Categorical data has no numerical representation, it is an item from the list of some groups or categories. For example, column Sex has values Male or Female, or column Season with values Spring, Summer, Fall, and Winter. It requires special conversion and preprocessing. This data has data types: object, bool, str.

Fortunately, the dataset titanic already contains numerical data as int64 and float64.

Task

Let's divide the columns into numerical and categorical. Create num_cols as numpy array, including types int and float. Let the cat_cols be all other features except the num_cols.

Task

Let's divide the columns into numerical and categorical. Create num_cols as numpy array, including types int and float. Let the cat_cols be all other features except the num_cols.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Let's talk about the types of data that dataframe may contain.

Numerical

Numerical data is presented by int or float values. In the dataframe, it should be stored as int64 or float64 data types value. Use data.info() to check the data types for each column.

Note that some fields in the dataframe may contain numerical values, but are stored using some other data type (object or str). You have to convert it to the int64 or float64, and we’ll explore how to do it later.

Categorical

Categorical data has no numerical representation, it is an item from the list of some groups or categories. For example, column Sex has values Male or Female, or column Season with values Spring, Summer, Fall, and Winter. It requires special conversion and preprocessing. This data has data types: object, bool, str.

Fortunately, the dataset titanic already contains numerical data as int64 and float64.

Task

Let's divide the columns into numerical and categorical. Create num_cols as numpy array, including types int and float. Let the cat_cols be all other features except the num_cols.

Switch to desktop for real-world practiceContinue from where you are using one of the options below
Section 1. Chapter 3
Switch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt