Challenge: Link Employee Records
Swipe to start coding
You are given two employee datasets originating from different internal systems. Each dataset contains employee attributes, but names and cities may differ slightly in formatting or spelling.
Your goal is to link matching employees across both datasets using fuzzy similarity.
Follow these steps:
- Create a composite matching key for each record using the fields
first_name,last_name, andcity. - Convert all composite keys to lowercase strings with spaces separating the parts.
- Use the
SequenceMatcherclass from thediffliblibrary to compute similarity scores between composite keys in the two datasets. - For every employee in the first dataset, find all employees in the second dataset whose similarity score is 0.80 or higher.
- Store all matching pairs in a list named
linked_records. Each element must be a dictionary containing:"index_df1"β index of the record in the first dataset;"index_df2"β index of the record in the second dataset;"similarity"β computed similarity score.
Make sure the variable linked_records is declared and contains the correct linked employee pairs.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain that in simpler terms?
What are the main benefits of this approach?
Are there any common mistakes to avoid with this?
Awesome!
Completion rate improved to 8.33
Challenge: Link Employee Records
Swipe to show menu
Swipe to start coding
You are given two employee datasets originating from different internal systems. Each dataset contains employee attributes, but names and cities may differ slightly in formatting or spelling.
Your goal is to link matching employees across both datasets using fuzzy similarity.
Follow these steps:
- Create a composite matching key for each record using the fields
first_name,last_name, andcity. - Convert all composite keys to lowercase strings with spaces separating the parts.
- Use the
SequenceMatcherclass from thediffliblibrary to compute similarity scores between composite keys in the two datasets. - For every employee in the first dataset, find all employees in the second dataset whose similarity score is 0.80 or higher.
- Store all matching pairs in a list named
linked_records. Each element must be a dictionary containing:"index_df1"β index of the record in the first dataset;"index_df2"β index of the record in the second dataset;"similarity"β computed similarity score.
Make sure the variable linked_records is declared and contains the correct linked employee pairs.
Solution
Thanks for your feedback!
single