Converting Data Types
AI in Action
import pandas as pd
df = pd.read_csv("passengers.csv")
print(df.dtypes)
df["TicketDate"] = pd.to_datetime(df["TicketDate"], errors="coerce")
Converting Columns to Another Type
You can convert a column's data type using the .astype() method:
1234567891011import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv", dtype=str) print(df.dtypes) # Convert Pclass to int df["Pclass"] = df["Pclass"].astype(int) # Convert Age to float df["Age"] = df["Age"].astype(float) print(df.dtypes)
Converting Strings to Numeric or Datetime
If you load numbers or dates and store them as text, you can use pd.to_numeric() and pd.to_datetime() to safely convert them back into a correct data type:
1234567891011import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv", dtype=str) print(df.dtypes) # Convert a text column to numeric df["Fare"] = pd.to_numeric(df["Fare"], errors="coerce") # Convert a text column to datetime df["TicketDate"] = pd.to_datetime(df["TicketDate"], errors="coerce") print(df.dtypes)
The errors="coerce" argument replaces invalid entries with NaN instead of raising an error.
Converting to Categorical Type
If a column has only a few repeated values, you can convert it to the categorical type. This saves memory and speeds up comparisons.
123456789import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv", dtype=str) print(df["Embarked"].dtype) # Convert column to categorical df["Embarked"] = df["Embarked"].astype("category") print(df["Embarked"].dtype)
This is especially useful for columns like passenger class, gender, or embarkation port.
1. Which method converts the column's data type?
2. What happens when you use errors="coerce" in pd.to_numeric()?
3. Why would you convert a column to the category type?
Obrigado pelo seu feedback!
Pergunte à IA
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo
How do I know which columns are good candidates for categorical type?
Can you explain more about the benefits of using categorical types?
What happens if I try to convert a column with many unique values to categorical?
Awesome!
Completion rate improved to 10
Converting Data Types
Deslize para mostrar o menu
AI in Action
import pandas as pd
df = pd.read_csv("passengers.csv")
print(df.dtypes)
df["TicketDate"] = pd.to_datetime(df["TicketDate"], errors="coerce")
Converting Columns to Another Type
You can convert a column's data type using the .astype() method:
1234567891011import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv", dtype=str) print(df.dtypes) # Convert Pclass to int df["Pclass"] = df["Pclass"].astype(int) # Convert Age to float df["Age"] = df["Age"].astype(float) print(df.dtypes)
Converting Strings to Numeric or Datetime
If you load numbers or dates and store them as text, you can use pd.to_numeric() and pd.to_datetime() to safely convert them back into a correct data type:
1234567891011import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv", dtype=str) print(df.dtypes) # Convert a text column to numeric df["Fare"] = pd.to_numeric(df["Fare"], errors="coerce") # Convert a text column to datetime df["TicketDate"] = pd.to_datetime(df["TicketDate"], errors="coerce") print(df.dtypes)
The errors="coerce" argument replaces invalid entries with NaN instead of raising an error.
Converting to Categorical Type
If a column has only a few repeated values, you can convert it to the categorical type. This saves memory and speeds up comparisons.
123456789import pandas as pd df = pd.read_csv("https://staging-content-media-cdn.codefinity.com/courses/64641555-cae4-4cd0-8d29-807aeb6bc0c4/datasets/passengers.csv", dtype=str) print(df["Embarked"].dtype) # Convert column to categorical df["Embarked"] = df["Embarked"].astype("category") print(df["Embarked"].dtype)
This is especially useful for columns like passenger class, gender, or embarkation port.
1. Which method converts the column's data type?
2. What happens when you use errors="coerce" in pd.to_numeric()?
3. Why would you convert a column to the category type?
Obrigado pelo seu feedback!