Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Getting Familiar with Dataset | Section
/
Foundations of Machine Learning

bookGetting Familiar with Dataset

メニューを表示するにはスワイプしてください

Begin preprocessing by exploring the dataset. Throughout this course, the penguin dataset will be used, with the task of predicting the species of a penguin.

There are three possible options, often referred to as classes in machine learning:

The features are: 'island', 'culmen_depth_mm', 'flipper_length_mm', 'body_mass_g', and 'sex'.

The dataset is stored in the penguins.csv file. It can be loaded from a link with the pd.read_csv() function to examine its contents:

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins.csv') print(df.head(10))
copy

This dataset presents several issues that need to be addressed:

  • Missing data;
  • Categorical variables;
  • Different feature scales.

Missing Data

Most ML algorithms cannot process missing values directly, so these must be addressed before training. Missing values can either be removed or imputed (replaced with substitute values).

In pandas, empty cells are represented as NaN. Many ML models will raise an error if the dataset contains even a single NaN.

Categorical Data

The dataset includes categorical variables, which machine learning models are unable to process directly.

Categorical data must be encoded into numerical form.

Different Scales

'culmen_depth_mm' values range from 13.1 to 21.5, while 'body_mass_g' values range from 2700 to 6300. Because of that, some ML models may consider the 'body_mass_g' feature much more important than 'culmen_depth_mm'.

Scaling solves this problem. It will be covered in later chapters.

question-icon

Match the problem with a way to solve it.

Missing values –
Categorical data –

Different Scales –

クリックまたはドラッグ`n`ドロップして空欄を埋めてください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 1.  7

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 1.  7
some-alt