Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Missing and Wrong Data | Data Cleaning
Preprocessing Data
course content

Conteúdo do Curso

Preprocessing Data

Preprocessing Data

1. Data Exploration
2. Data Cleaning
3. Data Validation
4. Normalization & Standardization
5. Data Encoding

bookMissing and Wrong Data

As you already know, it is possible that raw data can contain some dirty data. It can be:

  • NaN: undefined or missing data.
  • empty strings.
  • infinite: very large data.
  • incorrect data: for example, 'Female' in the Price column, that contains numeric data (this value could be stored into the wrong cell accidentally). You may find impossible values of the user's age, for example, if this value should be entered by him manually (like -1, 110, 0, etc.).
  • outliers: critically small or big values(for example, 250 cm in the Height column, or 112 yrs in the Age column), usually in a small amount. They may affect your result of analysis or model weights, so sometimes it makes sense to remove them.

Let's learn how to 'clean' your data and not to lose some useful info.

Switch to desktopMude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 1
toggle bottom row

bookMissing and Wrong Data

As you already know, it is possible that raw data can contain some dirty data. It can be:

  • NaN: undefined or missing data.
  • empty strings.
  • infinite: very large data.
  • incorrect data: for example, 'Female' in the Price column, that contains numeric data (this value could be stored into the wrong cell accidentally). You may find impossible values of the user's age, for example, if this value should be entered by him manually (like -1, 110, 0, etc.).
  • outliers: critically small or big values(for example, 250 cm in the Height column, or 112 yrs in the Age column), usually in a small amount. They may affect your result of analysis or model weights, so sometimes it makes sense to remove them.

Let's learn how to 'clean' your data and not to lose some useful info.

Switch to desktopMude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 1
toggle bottom row

bookMissing and Wrong Data

As you already know, it is possible that raw data can contain some dirty data. It can be:

  • NaN: undefined or missing data.
  • empty strings.
  • infinite: very large data.
  • incorrect data: for example, 'Female' in the Price column, that contains numeric data (this value could be stored into the wrong cell accidentally). You may find impossible values of the user's age, for example, if this value should be entered by him manually (like -1, 110, 0, etc.).
  • outliers: critically small or big values(for example, 250 cm in the Height column, or 112 yrs in the Age column), usually in a small amount. They may affect your result of analysis or model weights, so sometimes it makes sense to remove them.

Let's learn how to 'clean' your data and not to lose some useful info.

Switch to desktopMude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

As you already know, it is possible that raw data can contain some dirty data. It can be:

  • NaN: undefined or missing data.
  • empty strings.
  • infinite: very large data.
  • incorrect data: for example, 'Female' in the Price column, that contains numeric data (this value could be stored into the wrong cell accidentally). You may find impossible values of the user's age, for example, if this value should be entered by him manually (like -1, 110, 0, etc.).
  • outliers: critically small or big values(for example, 250 cm in the Height column, or 112 yrs in the Age column), usually in a small amount. They may affect your result of analysis or model weights, so sometimes it makes sense to remove them.

Let's learn how to 'clean' your data and not to lose some useful info.

Switch to desktopMude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
Seção 2. Capítulo 1
Switch to desktopMude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
some-alt