Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Data Extracting | Data Exploration
Preprocessing Data
course content

Contenido del Curso

Preprocessing Data

Preprocessing Data

1. Data Exploration
2. Data Cleaning
3. Data Validation
4. Normalization & Standardization
5. Data Encoding

Data Extracting

Welcome to the course! Data cleaning and preprocessing is the very first step in analyzing data or model-building processes. Just after you got some raw data stored into the database (table, file, etc.), you are not able to work with it as it is. Usually, Data Analysts and Data Scientists work with huge masses of data, big tables with thousands of records. Of course, data collecting, storing, and keeping in databases requires a lot of resources, and the more data is in, the more chance that there are mismatches, wrong records, unexpected values, etc among this data. Also, some values may require to be converted, for example, '890.0' has to be converted from string to numeric data type. Origin data should be cleaned, converted, and scaled, and the whole process is called Data Preprocessing.

Before you start, let's load the data to understand what will you work with.

Data Uploading

You can upload data from different sources using pandas:

  • From .csv file
  • By direct link
  • From existing dataframe, list or other data structure
  • from built-in Python datasets

We'll use open-source dataset 'Titanic' which can be downloaded by the link as .csv file. To upload the data, use such a statement:

12
import pandas as pd data = pd.read_csv('path_to_file')
copy

Tarea

Read the dataset as .csv file using the link:

https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/titanic.csv

Store the dataset into variable 'data'.

Tarea

Read the dataset as .csv file using the link:

https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/titanic.csv

Store the dataset into variable 'data'.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones

¿Todo estuvo claro?

Sección 1. Capítulo 1
toggle bottom row

Data Extracting

Welcome to the course! Data cleaning and preprocessing is the very first step in analyzing data or model-building processes. Just after you got some raw data stored into the database (table, file, etc.), you are not able to work with it as it is. Usually, Data Analysts and Data Scientists work with huge masses of data, big tables with thousands of records. Of course, data collecting, storing, and keeping in databases requires a lot of resources, and the more data is in, the more chance that there are mismatches, wrong records, unexpected values, etc among this data. Also, some values may require to be converted, for example, '890.0' has to be converted from string to numeric data type. Origin data should be cleaned, converted, and scaled, and the whole process is called Data Preprocessing.

Before you start, let's load the data to understand what will you work with.

Data Uploading

You can upload data from different sources using pandas:

  • From .csv file
  • By direct link
  • From existing dataframe, list or other data structure
  • from built-in Python datasets

We'll use open-source dataset 'Titanic' which can be downloaded by the link as .csv file. To upload the data, use such a statement:

12
import pandas as pd data = pd.read_csv('path_to_file')
copy

Tarea

Read the dataset as .csv file using the link:

https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/titanic.csv

Store the dataset into variable 'data'.

Tarea

Read the dataset as .csv file using the link:

https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/titanic.csv

Store the dataset into variable 'data'.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones

¿Todo estuvo claro?

Sección 1. Capítulo 1
toggle bottom row

Data Extracting

Welcome to the course! Data cleaning and preprocessing is the very first step in analyzing data or model-building processes. Just after you got some raw data stored into the database (table, file, etc.), you are not able to work with it as it is. Usually, Data Analysts and Data Scientists work with huge masses of data, big tables with thousands of records. Of course, data collecting, storing, and keeping in databases requires a lot of resources, and the more data is in, the more chance that there are mismatches, wrong records, unexpected values, etc among this data. Also, some values may require to be converted, for example, '890.0' has to be converted from string to numeric data type. Origin data should be cleaned, converted, and scaled, and the whole process is called Data Preprocessing.

Before you start, let's load the data to understand what will you work with.

Data Uploading

You can upload data from different sources using pandas:

  • From .csv file
  • By direct link
  • From existing dataframe, list or other data structure
  • from built-in Python datasets

We'll use open-source dataset 'Titanic' which can be downloaded by the link as .csv file. To upload the data, use such a statement:

12
import pandas as pd data = pd.read_csv('path_to_file')
copy

Tarea

Read the dataset as .csv file using the link:

https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/titanic.csv

Store the dataset into variable 'data'.

Tarea

Read the dataset as .csv file using the link:

https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/titanic.csv

Store the dataset into variable 'data'.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones

¿Todo estuvo claro?

Welcome to the course! Data cleaning and preprocessing is the very first step in analyzing data or model-building processes. Just after you got some raw data stored into the database (table, file, etc.), you are not able to work with it as it is. Usually, Data Analysts and Data Scientists work with huge masses of data, big tables with thousands of records. Of course, data collecting, storing, and keeping in databases requires a lot of resources, and the more data is in, the more chance that there are mismatches, wrong records, unexpected values, etc among this data. Also, some values may require to be converted, for example, '890.0' has to be converted from string to numeric data type. Origin data should be cleaned, converted, and scaled, and the whole process is called Data Preprocessing.

Before you start, let's load the data to understand what will you work with.

Data Uploading

You can upload data from different sources using pandas:

  • From .csv file
  • By direct link
  • From existing dataframe, list or other data structure
  • from built-in Python datasets

We'll use open-source dataset 'Titanic' which can be downloaded by the link as .csv file. To upload the data, use such a statement:

12
import pandas as pd data = pd.read_csv('path_to_file')
copy

Tarea

Read the dataset as .csv file using the link:

https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/titanic.csv

Store the dataset into variable 'data'.

Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
Sección 1. Capítulo 1
Cambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
We're sorry to hear that something went wrong. What happened?
some-alt