Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Data Preprocessing with Recipes | Section
Predictive Modeling with Tidymodels in R
Avsnitt 1. Kapitel 2
single

single

bookData Preprocessing with Recipes

Svep för att visa menyn

One of the most powerful tools for preprocessing data in a tidy modeling workflow is the recipes package. The recipes package allows you to define a sequence of preprocessing steps – such as normalization, standardization, encoding, and imputation – using a consistent, readable syntax. Each preprocessing step is added to a "recipe," which can then be applied to your data in a reproducible way. This tidy approach means you can bundle all your data preparation steps together, ensuring that transformations are performed in the correct order and can be easily reproduced or shared. Recipes are especially useful when you want to keep your preprocessing and modeling steps separate, or when you need to apply the same transformations to new data (like test or validation sets).

123456789101112131415161718192021
options(crayon.enabled = FALSE) library(recipes) # Sample data data <- data.frame( age = c(25, 30, NA, 40), income = c(50000, 60000, 55000, NA), gender = c("male", "female", "female", "male") ) # Create a recipe for normalization and missing value imputation rec <- recipe(~ ., data = data) %>% step_impute_mean(all_numeric_predictors()) %>% step_normalize(all_numeric_predictors()) # Prep the recipe (estimate parameters) rec_prep <- prep(rec, training = data) # Apply the recipe to the data data_processed <- bake(rec_prep, new_data = data) print(data_processed)
copy

When working with the recipes package, you build a recipe by chaining together a series of steps. Each step specifies a transformation or preprocessing action, such as imputing missing values or normalizing numeric variables. You start by creating a recipe object, typically using the recipe() function, and then add steps like step_impute_mean() or step_normalize() using the pipe operator (%>%). Once all steps are added, you prep the recipe with the prep() function, which estimates any required parameters (like means or standard deviations) from your training data. The prepped recipe can then be applied to any dataset using the bake() function, ensuring that the same transformations are used consistently. This workflow keeps your preprocessing steps organized, reproducible, and separate from your modeling code, making it easier to manage complex data transformations.

Uppgift

Svep för att börja koda

Create a recipe that standardizes all numeric variables and encodes all categorical variables in the provided training data.

  • Load the recipes package.
  • Initialize a recipe() using the formula ~ . and the train_data.
  • Add a step to center all numeric predictors utilizing step_center() and all_numeric_predictors().
  • Add a step to scale all numeric predictors utilizing step_scale() and all_numeric_predictors().
  • Add a step to convert all nominal (categorical) predictors to dummy variables utilizing step_dummy() and all_nominal_predictors().
  • Prepare the recipe utilizing the prep() function on the training data.
  • Apply the prepared recipe to the training data utilizing the bake() function.

Lösning

Switch to desktopByt till skrivbordet för praktisk övningFortsätt där du är med ett av alternativen nedan
Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 2
single

single

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

some-alt