Conteúdo do Curso
Neural Networks with TensorFlow
Neural Networks with TensorFlow
Data Preprocessing
Now, let's explore setting up a model and apply it to a practical scenario. We'll aim to predict house prices using the well-known Boston Housing Price Regression Dataset.
Data Overview
First, we need to examine the data before loading it.
Note
For an in-depth description of the dataset, you can visit this link: Boston Housing price regression dataset.
Features
Below is a list of all the columns in the dataset:
Ethical Consideration: The dataset includes a variable "B," which implies a correlation between race and house prices. We will exclude this column in the preprocessing stage to avoid racial bias in our model.
Note
To understand more about ethical considerations, check out our Ethical Considerations in Deep Learning article.
Missing Values
We need to verify if there are any missing values in the dataset. This requires first loading the dataset from Keras and then evaluating for missing values.
import tensorflow.keras as keras import pandas as pd # Load the dataset (X_train, y_train), (X_test, y_test) = keras.datasets.boston_housing.load_data() # Convert each subset to Pandas DataFrame X_train = pd.DataFrame(X_train) y_train = pd.DataFrame(y_train) # Sum up number of empty values of each set print('Null values in X_Train:', X_train.isnull().sum().sum()) print('Null values in y_train:', y_train.isnull().sum().sum())
As it turns out, there are no empty values, so we don't need to address this issue.
Data Preprocessing
-
Outliers: Although Keras datasets are typically free of outliers, we will demonstrate outlier removal using
IsolationForest
, eliminating 5% of the data as outliers.Note
IsolationForest
'spredict
method returns a list indicating valid samples (1
) or outliers (-1
).- To set up contamination rate you can set up
contamination
parameter of theIsolationForest
constructor. - Outliers should be removed only from the training set.
-
Rescaling: To ensure consistency and compatibility with our model, the data needs to be rescaled.
Swipe to show code editor
Throughout this course, you'll engage extensively in data handling independently. Therefore, let's revisit how to preprocess data using the scikit-learn library, making it ready for use in a neural network developed with Keras.
- Initialize an Isolation Forest with a 5% contamination rate.
- Apply the Isolation Forest to the training set and discard outliers.
- Rescale the data using MinMaxScaler.
Obrigado pelo seu feedback!
Data Preprocessing
Now, let's explore setting up a model and apply it to a practical scenario. We'll aim to predict house prices using the well-known Boston Housing Price Regression Dataset.
Data Overview
First, we need to examine the data before loading it.
Note
For an in-depth description of the dataset, you can visit this link: Boston Housing price regression dataset.
Features
Below is a list of all the columns in the dataset:
Ethical Consideration: The dataset includes a variable "B," which implies a correlation between race and house prices. We will exclude this column in the preprocessing stage to avoid racial bias in our model.
Note
To understand more about ethical considerations, check out our Ethical Considerations in Deep Learning article.
Missing Values
We need to verify if there are any missing values in the dataset. This requires first loading the dataset from Keras and then evaluating for missing values.
import tensorflow.keras as keras import pandas as pd # Load the dataset (X_train, y_train), (X_test, y_test) = keras.datasets.boston_housing.load_data() # Convert each subset to Pandas DataFrame X_train = pd.DataFrame(X_train) y_train = pd.DataFrame(y_train) # Sum up number of empty values of each set print('Null values in X_Train:', X_train.isnull().sum().sum()) print('Null values in y_train:', y_train.isnull().sum().sum())
As it turns out, there are no empty values, so we don't need to address this issue.
Data Preprocessing
-
Outliers: Although Keras datasets are typically free of outliers, we will demonstrate outlier removal using
IsolationForest
, eliminating 5% of the data as outliers.Note
IsolationForest
'spredict
method returns a list indicating valid samples (1
) or outliers (-1
).- To set up contamination rate you can set up
contamination
parameter of theIsolationForest
constructor. - Outliers should be removed only from the training set.
-
Rescaling: To ensure consistency and compatibility with our model, the data needs to be rescaled.
Swipe to show code editor
Throughout this course, you'll engage extensively in data handling independently. Therefore, let's revisit how to preprocess data using the scikit-learn library, making it ready for use in a neural network developed with Keras.
- Initialize an Isolation Forest with a 5% contamination rate.
- Apply the Isolation Forest to the training set and discard outliers.
- Rescale the data using MinMaxScaler.
Obrigado pelo seu feedback!