Correlation Analysis
Correlation analysis is a statistical technique used to measure the strength and direction of a relationship between two numeric variables. It helps us understand how changes in one variable are associated with changes in another.
What is Correlation?
A correlation coefficient (usually represented as r) ranges between -1 and A correlation coefficient (usually represented as r) ranges between -1 and
- +1: perfect positive correlation;
- 0: no correlation;
- −1: Perfect negative correlation.
There are several types of correlation methods, but Pearson correlation is the most commonly used for numeric continuous data in R.
Correlation Between Two Variables
cor(df$selling_price, df$km_driven) # Selling price vs kilometers driven
cor(df$mileage, df$max_power) # Mileage vs power
These functions return a value between -1 and 1, indicating strength and direction.
Correlation Matrix (Multiple Variables)
You can also examine relationships among several variables using a correlation matrix:
# Select only numeric columns
numeric_df <- df[, c("selling_price", "km_driven", "max_power", "mileage", "engine", "seats")]
# Compute correlation matrix
cor_matrix <- cor(numeric_df, use = "complete.obs") # Ignores any rows with missing data
View(cor_matrix)
The matrix shows pairwise correlation values between all selected numeric variables. This helps in identifying which variables are strongly related.
Summary
-
Use
cor()
to measure relationship strength and direction between variables; -
Use a correlation matrix to analyze relationships between several numeric variables simultaneously;
-
Always clean and prepare your data before running correlation analysis.
Obrigado pelo seu feedback!
Pergunte à IA
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo
Awesome!
Completion rate improved to 4
Correlation Analysis
Deslize para mostrar o menu
Correlation analysis is a statistical technique used to measure the strength and direction of a relationship between two numeric variables. It helps us understand how changes in one variable are associated with changes in another.
What is Correlation?
A correlation coefficient (usually represented as r) ranges between -1 and A correlation coefficient (usually represented as r) ranges between -1 and
- +1: perfect positive correlation;
- 0: no correlation;
- −1: Perfect negative correlation.
There are several types of correlation methods, but Pearson correlation is the most commonly used for numeric continuous data in R.
Correlation Between Two Variables
cor(df$selling_price, df$km_driven) # Selling price vs kilometers driven
cor(df$mileage, df$max_power) # Mileage vs power
These functions return a value between -1 and 1, indicating strength and direction.
Correlation Matrix (Multiple Variables)
You can also examine relationships among several variables using a correlation matrix:
# Select only numeric columns
numeric_df <- df[, c("selling_price", "km_driven", "max_power", "mileage", "engine", "seats")]
# Compute correlation matrix
cor_matrix <- cor(numeric_df, use = "complete.obs") # Ignores any rows with missing data
View(cor_matrix)
The matrix shows pairwise correlation values between all selected numeric variables. This helps in identifying which variables are strongly related.
Summary
-
Use
cor()
to measure relationship strength and direction between variables; -
Use a correlation matrix to analyze relationships between several numeric variables simultaneously;
-
Always clean and prepare your data before running correlation analysis.
Obrigado pelo seu feedback!