Challenge: Link Employee Records
Swipe to start coding
You are given two employee datasets originating from different internal systems. Each dataset contains employee attributes, but names and cities may differ slightly in formatting or spelling.
Your goal is to link matching employees across both datasets using fuzzy similarity.
Follow these steps:
- Create a composite matching key for each record using the fields
first_name,last_name, andcity. - Convert all composite keys to lowercase strings with spaces separating the parts.
- Use the
SequenceMatcherclass from thediffliblibrary to compute similarity scores between composite keys in the two datasets. - For every employee in the first dataset, find all employees in the second dataset whose similarity score is 0.80 or higher.
- Store all matching pairs in a list named
linked_records. Each element must be a dictionary containing:"index_df1"— index of the record in the first dataset;"index_df2"— index of the record in the second dataset;"similarity"— computed similarity score.
Make sure the variable linked_records is declared and contains the correct linked employee pairs.
Solution
Merci pour vos commentaires !
single
Demandez à l'IA
Demandez à l'IA
Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion
Can you explain that in simpler terms?
What are the main benefits of this approach?
Are there any common mistakes to avoid with this?
Génial!
Completion taux amélioré à 8.33
Challenge: Link Employee Records
Glissez pour afficher le menu
Swipe to start coding
You are given two employee datasets originating from different internal systems. Each dataset contains employee attributes, but names and cities may differ slightly in formatting or spelling.
Your goal is to link matching employees across both datasets using fuzzy similarity.
Follow these steps:
- Create a composite matching key for each record using the fields
first_name,last_name, andcity. - Convert all composite keys to lowercase strings with spaces separating the parts.
- Use the
SequenceMatcherclass from thediffliblibrary to compute similarity scores between composite keys in the two datasets. - For every employee in the first dataset, find all employees in the second dataset whose similarity score is 0.80 or higher.
- Store all matching pairs in a list named
linked_records. Each element must be a dictionary containing:"index_df1"— index of the record in the first dataset;"index_df2"— index of the record in the second dataset;"similarity"— computed similarity score.
Make sure the variable linked_records is declared and contains the correct linked employee pairs.
Solution
Merci pour vos commentaires !
single