Challenge: Link Employee Records
Swipe to start coding
You are given two employee datasets originating from different internal systems. Each dataset contains employee attributes, but names and cities may differ slightly in formatting or spelling.
Your goal is to link matching employees across both datasets using fuzzy similarity.
Follow these steps:
- Create a composite matching key for each record using the fields
first_name,last_name, andcity. - Convert all composite keys to lowercase strings with spaces separating the parts.
- Use the
SequenceMatcherclass from thediffliblibrary to compute similarity scores between composite keys in the two datasets. - For every employee in the first dataset, find all employees in the second dataset whose similarity score is 0.80 or higher.
- Store all matching pairs in a list named
linked_records. Each element must be a dictionary containing:"index_df1"— index of the record in the first dataset;"index_df2"— index of the record in the second dataset;"similarity"— computed similarity score.
Make sure the variable linked_records is declared and contains the correct linked employee pairs.
Рішення
Дякуємо за ваш відгук!
single
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Can you explain that in simpler terms?
What are the main benefits of this approach?
Are there any common mistakes to avoid with this?
Чудово!
Completion показник покращився до 8.33
Challenge: Link Employee Records
Свайпніть щоб показати меню
Swipe to start coding
You are given two employee datasets originating from different internal systems. Each dataset contains employee attributes, but names and cities may differ slightly in formatting or spelling.
Your goal is to link matching employees across both datasets using fuzzy similarity.
Follow these steps:
- Create a composite matching key for each record using the fields
first_name,last_name, andcity. - Convert all composite keys to lowercase strings with spaces separating the parts.
- Use the
SequenceMatcherclass from thediffliblibrary to compute similarity scores between composite keys in the two datasets. - For every employee in the first dataset, find all employees in the second dataset whose similarity score is 0.80 or higher.
- Store all matching pairs in a list named
linked_records. Each element must be a dictionary containing:"index_df1"— index of the record in the first dataset;"index_df2"— index of the record in the second dataset;"similarity"— computed similarity score.
Make sure the variable linked_records is declared and contains the correct linked employee pairs.
Рішення
Дякуємо за ваш відгук!
single