Correlation Analysis
Correlation analysis is a statistical technique used to measure the strength and direction of a relationship between two numeric variables. It helps us understand how changes in one variable are associated with changes in another.
What is Correlation?
A correlation coefficient (usually represented as r) ranges between -1 and A correlation coefficient (usually represented as r) ranges between -1 and
- +1: perfect positive correlation;
- 0: no correlation;
- −1: Perfect negative correlation.
There are several types of correlation methods, but Pearson correlation is the most commonly used for numeric continuous data in R.
Correlation Between Two Variables
cor(df$selling_price, df$km_driven) # Selling price vs kilometers driven
cor(df$mileage, df$max_power) # Mileage vs power
These functions return a value between -1 and 1, indicating strength and direction.
Correlation Matrix (Multiple Variables)
You can also examine relationships among several variables using a correlation matrix:
# Select only numeric columns
numeric_df <- df[, c("selling_price", "km_driven", "max_power", "mileage", "engine", "seats")]
# Compute correlation matrix
cor_matrix <- cor(numeric_df, use = "complete.obs") # Ignores any rows with missing data
View(cor_matrix)
The matrix shows pairwise correlation values between all selected numeric variables. This helps in identifying which variables are strongly related.
Summary
-
Use
cor()
to measure relationship strength and direction between variables; -
Use a correlation matrix to analyze relationships between several numeric variables simultaneously;
-
Always clean and prepare your data before running correlation analysis.
Tak for dine kommentarer!
Spørg AI
Spørg AI
Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat
Awesome!
Completion rate improved to 4
Correlation Analysis
Stryg for at vise menuen
Correlation analysis is a statistical technique used to measure the strength and direction of a relationship between two numeric variables. It helps us understand how changes in one variable are associated with changes in another.
What is Correlation?
A correlation coefficient (usually represented as r) ranges between -1 and A correlation coefficient (usually represented as r) ranges between -1 and
- +1: perfect positive correlation;
- 0: no correlation;
- −1: Perfect negative correlation.
There are several types of correlation methods, but Pearson correlation is the most commonly used for numeric continuous data in R.
Correlation Between Two Variables
cor(df$selling_price, df$km_driven) # Selling price vs kilometers driven
cor(df$mileage, df$max_power) # Mileage vs power
These functions return a value between -1 and 1, indicating strength and direction.
Correlation Matrix (Multiple Variables)
You can also examine relationships among several variables using a correlation matrix:
# Select only numeric columns
numeric_df <- df[, c("selling_price", "km_driven", "max_power", "mileage", "engine", "seats")]
# Compute correlation matrix
cor_matrix <- cor(numeric_df, use = "complete.obs") # Ignores any rows with missing data
View(cor_matrix)
The matrix shows pairwise correlation values between all selected numeric variables. This helps in identifying which variables are strongly related.
Summary
-
Use
cor()
to measure relationship strength and direction between variables; -
Use a correlation matrix to analyze relationships between several numeric variables simultaneously;
-
Always clean and prepare your data before running correlation analysis.
Tak for dine kommentarer!