join()
Swipe to show menu
The join() method in pandas allows you to combine two DataFrame objects based on their indexes. Unlike the merge() method, which typically matches rows using one or more columns (keys), join() aligns rows by their indexes, making it ideal when your data is already indexed in a compatible way. This method is especially convenient for adding columns from one DataFrame to another when their indexes represent the same entities, such as dates, IDs, or categories.
123456789101112131415import pandas as pd # Create two DataFrames with indexes representing employee IDs df_left = pd.DataFrame({ "name": ["Alice", "Bob", "Charlie"], "salary": [70000, 80000, 90000] }, index=[101, 102, 103]) df_right = pd.DataFrame({ "department": ["HR", "Engineering", "Marketing"] }, index=[101, 102, 104]) # Join df_right to df_left using their indexes (default is left join) result = df_left.join(df_right) print(result)
The join() method offers several parameters to control its behavior. The how parameter determines the type of join to perform: "left" (default), "right", "outer", or "inner". This controls which indexes are included in the result. The lsuffix and rsuffix parameters are used to add suffixes to overlapping column names from the left and right DataFrame, respectively. This helps prevent column name collisions when both DataFrame objects have columns with the same name.
1234567891011121314# Two DataFrames with overlapping column names df_left = pd.DataFrame({ "name": ["Alice", "Bob"], "age": [25, 30] }, index=[1, 2]) df_right = pd.DataFrame({ "age": [28, 35], "city": ["New York", "Chicago"] }, index=[1, 3]) # Join with suffixes to distinguish overlapping 'age' columns result = df_left.join(df_right, lsuffix="_left", rsuffix="_right", how="outer") print(result)
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat