Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Correlation Analysis | Basic Statistical Analysis
Data Analysis with R

bookCorrelation Analysis

Correlation analysis is a statistical technique used to measure the strength and direction of a relationship between two numeric variables. It helps us understand how changes in one variable are associated with changes in another.

What is Correlation?

A correlation coefficient (usually represented as r) ranges between -1 and A correlation coefficient (usually represented as r) ranges between -1 and

  • +1: perfect positive correlation;
  • 0: no correlation;
  • −1: Perfect negative correlation.

There are several types of correlation methods, but Pearson correlation is the most commonly used for numeric continuous data in R.

Correlation Between Two Variables

cor(df$selling_price, df$km_driven)  # Selling price vs kilometers driven
cor(df$mileage, df$max_power)        # Mileage vs power

These functions return a value between -1 and 1, indicating strength and direction.

Correlation Matrix (Multiple Variables)

You can also examine relationships among several variables using a correlation matrix:

# Select only numeric columns
numeric_df <- df[, c("selling_price", "km_driven", "max_power", "mileage", "engine", "seats")]
# Compute correlation matrix
cor_matrix <- cor(numeric_df, use = "complete.obs")  # Ignores any rows with missing data
View(cor_matrix)

The matrix shows pairwise correlation values between all selected numeric variables. This helps in identifying which variables are strongly related.

Summary

  • Use cor() to measure relationship strength and direction between variables;

  • Use a correlation matrix to analyze relationships between several numeric variables simultaneously;

  • Always clean and prepare your data before running correlation analysis.

question mark

A correlation coefficient of -0.9 indicates:

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 3. Kapitel 5

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Suggested prompts:

Can you explain the difference between positive and negative correlation with more examples?

How do I interpret the values in a correlation matrix?

What are some common mistakes to avoid when performing correlation analysis?

Awesome!

Completion rate improved to 4

bookCorrelation Analysis

Svep för att visa menyn

Correlation analysis is a statistical technique used to measure the strength and direction of a relationship between two numeric variables. It helps us understand how changes in one variable are associated with changes in another.

What is Correlation?

A correlation coefficient (usually represented as r) ranges between -1 and A correlation coefficient (usually represented as r) ranges between -1 and

  • +1: perfect positive correlation;
  • 0: no correlation;
  • −1: Perfect negative correlation.

There are several types of correlation methods, but Pearson correlation is the most commonly used for numeric continuous data in R.

Correlation Between Two Variables

cor(df$selling_price, df$km_driven)  # Selling price vs kilometers driven
cor(df$mileage, df$max_power)        # Mileage vs power

These functions return a value between -1 and 1, indicating strength and direction.

Correlation Matrix (Multiple Variables)

You can also examine relationships among several variables using a correlation matrix:

# Select only numeric columns
numeric_df <- df[, c("selling_price", "km_driven", "max_power", "mileage", "engine", "seats")]
# Compute correlation matrix
cor_matrix <- cor(numeric_df, use = "complete.obs")  # Ignores any rows with missing data
View(cor_matrix)

The matrix shows pairwise correlation values between all selected numeric variables. This helps in identifying which variables are strongly related.

Summary

  • Use cor() to measure relationship strength and direction between variables;

  • Use a correlation matrix to analyze relationships between several numeric variables simultaneously;

  • Always clean and prepare your data before running correlation analysis.

question mark

A correlation coefficient of -0.9 indicates:

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 3. Kapitel 5
some-alt