Missing and Wrong Data
As you already know, it is possible that raw data can contain some dirty data. It can be:
- NaN: undefined or missing data.
- empty strings.
- infinite: very large data.
- incorrect data: for example, 'Female' in the Price column, that contains numeric data (this value could be stored into the wrong cell accidentally). You may find impossible values of the user's age, for example, if this value should be entered by him manually (like -1, 110, 0, etc.).
- outliers: critically small or big values(for example, 250 cm in the Height column, or 112 yrs in the Age column), usually in a small amount. They may affect your result of analysis or model weights, so sometimes it makes sense to remove them.
Let's learn how to 'clean' your data and not to lose some useful info.
Was alles duidelijk?
Bedankt voor je feedback!
Sectie 2. Hoofdstuk 1
single
Vraag AI
Vraag AI
Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.
Suggested prompts:
Vat dit hoofdstuk samen
Explain code
Explain why doesn't solve task
Awesome!
Completion rate improved to 5.56
Missing and Wrong Data
Veeg om het menu te tonen
As you already know, it is possible that raw data can contain some dirty data. It can be:
- NaN: undefined or missing data.
- empty strings.
- infinite: very large data.
- incorrect data: for example, 'Female' in the Price column, that contains numeric data (this value could be stored into the wrong cell accidentally). You may find impossible values of the user's age, for example, if this value should be entered by him manually (like -1, 110, 0, etc.).
- outliers: critically small or big values(for example, 250 cm in the Height column, or 112 yrs in the Age column), usually in a small amount. They may affect your result of analysis or model weights, so sometimes it makes sense to remove them.
Let's learn how to 'clean' your data and not to lose some useful info.
Was alles duidelijk?
Bedankt voor je feedback!
Awesome!
Completion rate improved to 5.56Sectie 2. Hoofdstuk 1
single