Keys, IDs, and Surrogate Keys
Svep för att visa menyn
A key column does one job: it links a row in one table to the correct row in another.
When the Data Model sees CustomerID C003 in a Sales row, it follows the link to the Customers table and finds the one row where CustomerID is also C003. From there it reads the name, region, and segment. This lookup happens automatically — but only if the key column is reliable.
Two conditions must both be true:
- Unique on the dimension side: every CustomerID in Customers must appear exactly once;
- Matching values: the ID in Sales must exist in Customers with the same value and the same data type.
If either condition fails, the relationship will error or produce silently wrong results.
Primary Keys and Foreign Keys
Always check the dimension side for uniqueness. If the primary key is not unique, Power Pivot will refuse to create the relationship.
Surrogate keys
A surrogate key is a new ID column you create when the source data has no reliable natural key. You invent the values — they do not come from the source system.
When to create one:
- No ID column exists in the source data;
- The only way to uniquely identify a row is by combining two or more columns;
- The natural key is a label (a name or description) rather than a stable identifier.
How to create one in Excel:
- Add a new column to the left of the table. Name it clearly:
ProductID,CustomerID; - In the first data row, enter:
="P"&TEXT(ROW()-1,"000")— produces P001, P002, P003 automatically; - Copy the formula down the full column;
- Paste Special → Values to freeze the IDs as fixed text. Surrogate keys must not stay as formulas — adding or removing rows would recalculate and break them.
The main goal in this chapter is to get comfortable with recognizing, understanding and creating Primary, Foreign and Surrogate Keys.
Your Task is to first identify the main issues with the keys in both of the Sheets in the Workbook and consider solutions.
For the 'Customers_Raw' Sheet, answer the following questions:
- Is this a fact or dimension table? Why?
- Which column(s), if any, could potentially function as either a primary or foreign key? Why?
- If there are none, what surrogate key would you suggest for this table?
For the 'SalesLines_Raw' Sheet, answer the following questions:
- Is this a fact or dimension table? Why?
- If it's neither, what steps would you undertake to fix that?
1. Which of the following is a good characteristic of a primary key?
2. When should you create a surrogate key?
3. In a relationship, what must match between the foreign key and primary key?
Tack för dina kommentarer!
Fråga AI
Fråga AI
Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal