Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Challenge: Link Employee Records | Record Linkage Techniques
Data Cleaning Techniques in Python

bookChallenge: Link Employee Records

Compito

Swipe to start coding

You are given two employee datasets originating from different internal systems. Each dataset contains employee attributes, but names and cities may differ slightly in formatting or spelling.

Your goal is to link matching employees across both datasets using fuzzy similarity.

Follow these steps:

  1. Create a composite matching key for each record using the fields first_name, last_name, and city.
  2. Convert all composite keys to lowercase strings with spaces separating the parts.
  3. Use the SequenceMatcher class from the difflib library to compute similarity scores between composite keys in the two datasets.
  4. For every employee in the first dataset, find all employees in the second dataset whose similarity score is 0.80 or higher.
  5. Store all matching pairs in a list named linked_records. Each element must be a dictionary containing:
    • "index_df1" — index of the record in the first dataset;
    • "index_df2" — index of the record in the second dataset;
    • "similarity" — computed similarity score.

Make sure the variable linked_records is declared and contains the correct linked employee pairs.

Soluzione

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 3
single

single

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Suggested prompts:

Can you explain that in simpler terms?

What are the main benefits of this approach?

Are there any common mistakes to avoid with this?

close

bookChallenge: Link Employee Records

Scorri per mostrare il menu

Compito

Swipe to start coding

You are given two employee datasets originating from different internal systems. Each dataset contains employee attributes, but names and cities may differ slightly in formatting or spelling.

Your goal is to link matching employees across both datasets using fuzzy similarity.

Follow these steps:

  1. Create a composite matching key for each record using the fields first_name, last_name, and city.
  2. Convert all composite keys to lowercase strings with spaces separating the parts.
  3. Use the SequenceMatcher class from the difflib library to compute similarity scores between composite keys in the two datasets.
  4. For every employee in the first dataset, find all employees in the second dataset whose similarity score is 0.80 or higher.
  5. Store all matching pairs in a list named linked_records. Each element must be a dictionary containing:
    • "index_df1" — index of the record in the first dataset;
    • "index_df2" — index of the record in the second dataset;
    • "similarity" — computed similarity score.

Make sure the variable linked_records is declared and contains the correct linked employee pairs.

Soluzione

Switch to desktopCambia al desktop per esercitarti nel mondo realeContinua da dove ti trovi utilizzando una delle opzioni seguenti
Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 3
single

single

some-alt