Contenido del Curso
Analyzing and Visualizing Real-World Data
Analyzing and Visualizing Real-World Data
Finding Relation
Based on the graph, it appears that the peak in sales right before the end of the year is not random; the top 5 stores have the same pattern. Now, let's explore another parameter that we haven't examined yet. We will investigate whether there is a correlation between the temperature and shop income. We assume that colder temperatures outside will lead to fewer visitors and lower sales. To answer this question, we will create a scatter plot of temperatures vs. weekly sales. Since there are more than 6,000 records in the dataset, we will only use data for the top 5 best-selling stores.
# Loading the libraries import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Reading the data df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/72be5dde-f3e6-4c40-8881-e1d97ae31287/shops_data3.csv') df['Date'] = pd.to_datetime(df['Date'], dayfirst = True) # Preparing data top_stores = [20, 4, 14, 13, 2] data = df.loc[df['Store'].isin(top_stores)] # Initializing a scatter plot sns.scatterplot(x = 'Temperature', y = 'Weekly_Sales', data = data, alpha = 0.5) # Displaying the plot plt.show()
However, it seems that there is no clear correlation between temperature and weekly sales since most of the data points are located in the lower part of the plot. There are only a few data points with significantly higher sales in the upper part of the plot. What are these points? Let's find out.
¡Gracias por tus comentarios!