merge() basics
Swipe to show menu
When working with tabular data in pandas, you often need to combine information from multiple sources. This is where merging DataFrames becomes essential. Merging allows you to bring together rows from different DataFrames based on shared columns, making it possible to analyze related data in a unified view. Whether you are combining customer records with order details or joining survey results with demographic information, understanding how to merge DataFrames is a foundational skill in data analysis.
1234567891011121314151617import pandas as pd # Create two simple DataFrames with a common column 'id' df_left = pd.DataFrame({ "id": [1, 2, 3], "name": ["Alice", "Bob", "Charlie"] }) df_right = pd.DataFrame({ "id": [2, 3, 4], "score": [85, 92, 78] }) # Merge the DataFrames on the 'id' column merged_df = pd.merge(df_left, df_right, on="id") print(merged_df)
The merge() function in pandas is designed to combine two DataFrames by aligning the rows based on one or more key columns. The most common way to use merge() is by specifying the column(s) that exist in both DataFrames using the on parameter. In the example above, both DataFrames have an "id" column, so pd.merge(df_left, df_right, on="id") matches rows with the same "id" value. By default, merge() performs an inner join, meaning only rows with matching keys in both DataFrames are included in the result. The resulting DataFrame contains all columns from both sources, with only the matched rows preserved. This approach is similar to joining tables in SQL and is fundamental when combining datasets that share a common attribute.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat