Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Histograms, Density Plots, and Boxplots | section
Hands-On Data Visualization with ggplot2 in R
Sección 1. Capítulo 4
single

single

bookHistograms, Density Plots, and Boxplots

Desliza para mostrar el menú

Visualizing the distribution of your data is a key step in exploratory data analysis. In R with ggplot2, you can make three kinds of simple charts to look at your data: histograms, density plots, and boxplots. Each chart shows your numbers in a different way and helps you spot patterns or unusual values.

A histogram is like a bar chart. It puts your numbers into groups and shows how many times each group appears. This helps you see where most of your values are and if there are any groups with very few numbers. To make a histogram in ggplot2, you use the geom_histogram() layer.

Note
Note

Parameters explained:

  • binwidth = 2: sets the width of each bar (the "range" of data it covers); in this case, each bar groups together cars that fall within a 2-mile-per-gallon range;

  • fill = "steelblue": determines the inside color of the bars;

  • color = "white": determines the color of the outline (the borders) of the bars, which helps separate them visually.

12345
library(ggplot2) ggplot(mtcars, aes(x = mpg)) + geom_histogram(binwidth = 2, fill = "steelblue", color = "white") + labs(title = "Histogram of Miles Per Gallon", x = "Miles Per Gallon (mpg)", y = "Count")
copy

Density plots help you see the overall shape of your data in a smooth way, without the blocky look of a histogram. Instead of showing bars, a density plot draws a curve that shows where your values are most common. The higher the curve, the more data points are in that area. You can use the geom_density() function in ggplot2 to make a density plot and easily spot patterns or groups in your data.

Note
Note

The alpha parameter controls the transparency of the fill color, where 0 is completely see-through and 1 is solid

1234
library(ggplot2) ggplot(mtcars, aes(x = mpg)) + geom_density(fill = "lightgreen", alpha = 0.6) + labs(title = "Density Plot of Miles Per Gallon", x = "Miles Per Gallon (mpg)", y = "Density")
copy

To summarize the distribution and highlight outliers, boxplots are especially effective. A boxplot displays your data using several simple parts:

  • The box shows where the middle half of your data falls, stretching from the first quartile (25th percentile) to the third quartile (75th percentile);
  • The line inside the box marks the median, or the middle value of your data;
  • The whiskers extend from each end of the box to show the range of most of the remaining data, usually up to 1.5 times the box length from the quartiles;
  • Any dots outside the whiskers represent outliers - values that are much higher or lower than the rest.

This makes it easy to compare distributions across groups and quickly spot unusual values. In ggplot2, you can use geom_boxplot() to create a boxplot.

1234
library(ggplot2) ggplot(mtcars, aes(y = mpg)) + geom_boxplot(fill = "orange", color = "brown") + labs(title = "Boxplot of Miles Per Gallon", y = "Miles Per Gallon (mpg)")
copy

When you want to look at your data, you have different ways to see what it looks like:

  • Choose a histogram if you want to see how many values fall into different ranges. It shows where most of your data points are and if there are any empty spots;
  • Use a density plot if you want a smooth curve that helps you spot bumps or dips in your data. This makes it easier to see if your data has one group, two groups, or more;
  • Pick a boxplot if you want a simple picture that shows where most of your data sits, what the middle value is, and if there are any unusual points far away from the rest.

Try these different plots to get a clear, simple view of your numbers and to notice anything interesting or unexpected.

Tarea

Desliza para comenzar a programar

Use the provided custom dataset of heights to create three types of distribution plots using ggplot2. Assign each plot to the specified variable name.

  • Build a histogram of the heights and assign it to the variable hist_plot;
  • Build a density plot of the heights and assign it to the variable density_plot;
  • Build a boxplot of the heights and assign it to the variable box_plot.
  • Each plot must use the height column from the heights data frame as the variable to plot.

Solución

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 4
single

single

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

some-alt