Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Challenge: Conducting Transaction Analysis | Practical Problem Solving with ADF
Introduction to Data Engineering with Azure
course content

Course Content

Introduction to Data Engineering with Azure

Introduction to Data Engineering with Azure

1. Getting Started with Azure and Core Tools
2. Foundations of Azure Data Factory
3. Data Flows and Transformations in ADF
4. Practical Problem Solving with ADF

book
Challenge: Conducting Transaction Analysis

Financial institutions frequently assess the creditworthiness of their customers to enhance risk management and make better-informed decisions. By analyzing debt levels and financial behavior through user transactions, institutions can categorize customers into risk groups.

For example, individuals with a high debt-to-income ratio, indicating they owe significantly more than they earn, might be flagged as Risk Users due to their potential financial instability. In contrast, those with a healthier balance between their income and debt are classified as Normal Users, reflecting a lower likelihood of financial risk.

In the previous challenge, we loaded the cards data, and now the goal is to classify credit card users as either Risk Users or Normal Users based on their custom debt-to-income ratio.

To complete this task, you will need additional data:

  • Transactions data to calculate key metrics such as the average transaction amount, the sum of all transactions, and the chip usage rate (the ratio of transactions made using chips compared to the total number of transactions);
  • Users data to calculate the custom debt-to-income ratio, which is determined by dividing the total debt by the yearly income.

By using these datasets, you can evaluate users' financial behavior and classify them into the appropriate risk category.

Users Table

Transactions Table

As the result of this task you will have two tables - one for risk users and one for normal users. They will look like this.

Risk Users

Normal Users

Algorithm Description

To solve this task, you can use the materials from the third section. Here's a step-by-step guide on how to do the task:

  1. First, load the raw data into the database. This involves reading the CSV files for users, transactions, and cards and populating the respective tables. To do it you should use Script and Copy activities just like in the previous challenge;
  2. After loading, ensure the correct data types are applied by using another Script activity;
  3. Filter the users table to include only users with credit cards. This can be done by joining the users and cards tables on client_id. You should use that cards table that stores only credit cards data (you have created this table in the previous challenge);
  4. Calculate the Credit Score Ratio for the filtered users by dividing total_debt by yearly_income. Based on this ratio:
    • Classify users with a ratio > 50% as Risk Users;
    • Classify users with a ratio ≤ 50% as Normal Users;
  5. Aggregate the transactions data for each user group (Risk Users and Normal Users) to compute the following metrics:
    • Total Transaction Amount: Sum of all transactions;
    • Average Transaction Amount;
    • Chip Usage Ratio: Proportion of chip transactions to total transactions;
  6. Create separate tables for Risk Users and Normal Users in the database and populate them with the aggregated metrics, ensuring all required fields (e.g., client_id, metrics) are included.

By following these steps, you can efficiently process the data and create the required outputs using only dataflow activities. Good luck!

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 4. Chapter 3
We're sorry to hear that something went wrong. What happened?
some-alt