Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Data Preprocessing | Basics of Keras
Neural Networks with TensorFlow

Sveip for å vise menyen

book
Data Preprocessing

Now, let's explore setting up a model and apply it to a practical scenario. We'll aim to predict house prices using the well-known Boston Housing Price Regression Dataset.

Data Overview

First, we need to examine the data before loading it.

Missing Values

We need to verify if there are any missing values in the dataset. This requires first loading the dataset from Keras and then checking for missing values.

12345678910111213
from tensorflow import keras import pandas as pd # Loading the dataset (X_train, y_train), (X_test, y_test) = keras.datasets.boston_housing.load_data() # Converting each subset to DataFrame X_train = pd.DataFrame(X_train) y_train = pd.DataFrame(y_train) # Summing up number of empty values of each set print('Null values in X_Train:', X_train.isnull().sum().sum()) print('Null values in y_train:', y_train.isnull().sum().sum())
copy

As it turns out, there are no empty values, so we don't need to address this issue.

Data Preprocessing

  • Note

    • IsolationForest's predict method returns a list indicating valid samples (1) or outliers (-1).

    • To set up contamination rate you can set up contamination parameter of the IsolationForest constructor.

    • Outliers should be removed only from the training set.

    Outliers: Although Keras datasets are typically free of outliers, we will demonstrate outlier removal using IsolationForest, eliminating 5% of the data as outliers.

  • Rescaling: To ensure consistency and compatibility with our model, the data needs to be rescaled.

Oppgave

Swipe to start coding

Throughout this course, you'll engage extensively in data handling independently. Therefore, let's revisit how to preprocess data using the scikit-learn library, making it ready for use in a neural network developed with Keras.

  1. Initialize an Isolation Forest with a 5% contamination rate.
  2. Apply the Isolation Forest to the training set and discard outliers.
  3. Rescale the data using MinMaxScaler.

Løsning

Switch to desktopBytt til skrivebordet for virkelighetspraksisFortsett der du er med et av alternativene nedenfor
Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 5
Vi beklager at noe gikk galt. Hva skjedde?

Spør AI

expand
ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

book
Data Preprocessing

Now, let's explore setting up a model and apply it to a practical scenario. We'll aim to predict house prices using the well-known Boston Housing Price Regression Dataset.

Data Overview

First, we need to examine the data before loading it.

Missing Values

We need to verify if there are any missing values in the dataset. This requires first loading the dataset from Keras and then checking for missing values.

12345678910111213
from tensorflow import keras import pandas as pd # Loading the dataset (X_train, y_train), (X_test, y_test) = keras.datasets.boston_housing.load_data() # Converting each subset to DataFrame X_train = pd.DataFrame(X_train) y_train = pd.DataFrame(y_train) # Summing up number of empty values of each set print('Null values in X_Train:', X_train.isnull().sum().sum()) print('Null values in y_train:', y_train.isnull().sum().sum())
copy

As it turns out, there are no empty values, so we don't need to address this issue.

Data Preprocessing

  • Note

    • IsolationForest's predict method returns a list indicating valid samples (1) or outliers (-1).

    • To set up contamination rate you can set up contamination parameter of the IsolationForest constructor.

    • Outliers should be removed only from the training set.

    Outliers: Although Keras datasets are typically free of outliers, we will demonstrate outlier removal using IsolationForest, eliminating 5% of the data as outliers.

  • Rescaling: To ensure consistency and compatibility with our model, the data needs to be rescaled.

Oppgave

Swipe to start coding

Throughout this course, you'll engage extensively in data handling independently. Therefore, let's revisit how to preprocess data using the scikit-learn library, making it ready for use in a neural network developed with Keras.

  1. Initialize an Isolation Forest with a 5% contamination rate.
  2. Apply the Isolation Forest to the training set and discard outliers.
  3. Rescale the data using MinMaxScaler.

Løsning

Switch to desktopBytt til skrivebordet for virkelighetspraksisFortsett der du er med et av alternativene nedenfor
Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 5
Switch to desktopBytt til skrivebordet for virkelighetspraksisFortsett der du er med et av alternativene nedenfor
Vi beklager at noe gikk galt. Hva skjedde?
some-alt