Course Content
Data Preprocessing
Data Preprocessing
Changing the Data Type
You already know how to change the data type from string to number, for example. But let's take a closer look at this small but important task.
Let's start by changing the data type from string to datetime
. Most often, you will need this to work with time series. You can perform this operation using the .to_datetime()
method:
To convert a string to a bool
- use the .map()
method on the column whose values you want to change:
For example, if you have a price column that looks like "$198,800" and you want to turn it into a float
- you should create custom transformation functions:
import pandas as pd import re # Create simple dataset df = pd.DataFrame(data={'Price':['$4,122.94', '$1,002.3']}) # Create a custom function to transform data # x - value from column def price2int(x): return float(re.sub(r'[\$\,]', '', x)) # Use custom transformation on a column df['Price'] = df['Price'].apply(price2int)
Swipe to show code editor
Read the sales_data_types.csv
dataset and change the data type in the Active
column from str
to bool
.
Solution
Thanks for your feedback!
Changing the Data Type
You already know how to change the data type from string to number, for example. But let's take a closer look at this small but important task.
Let's start by changing the data type from string to datetime
. Most often, you will need this to work with time series. You can perform this operation using the .to_datetime()
method:
To convert a string to a bool
- use the .map()
method on the column whose values you want to change:
For example, if you have a price column that looks like "$198,800" and you want to turn it into a float
- you should create custom transformation functions:
import pandas as pd import re # Create simple dataset df = pd.DataFrame(data={'Price':['$4,122.94', '$1,002.3']}) # Create a custom function to transform data # x - value from column def price2int(x): return float(re.sub(r'[\$\,]', '', x)) # Use custom transformation on a column df['Price'] = df['Price'].apply(price2int)
Swipe to show code editor
Read the sales_data_types.csv
dataset and change the data type in the Active
column from str
to bool
.
Solution
Thanks for your feedback!