Summary  
This chapter explains how to implement code to generate histograms for exploring the distribution of a single numeric variable by dividing data into bins, adjusting bin width, and customizing plot aesthetics.

General domain of usage  
Sleep pattern analysis

Histograms are a fundamental tool in data visualization for exploring the distribution of a single numeric variable. By dividing the range of data into intervals, or **"bins,"** and counting how many observations fall into each bin, histograms provide a clear picture of where values are concentrated, where there are gaps, and whether the distribution is skewed or symmetric. This makes them especially useful for understanding the **spread**, **central tendency**, and **shape** of data, allowing you to quickly spot patterns such as clusters, outliers, or unusual peaks.

library(ggplot2)

# Using the iris dataset to visualize the distribution of Sepal.Length
ggplot(iris, aes(x = Sepal.Length)) +
  geom_histogram(binwidth = 0.3, fill = "steelblue", color = "black") +
  labs(title = "Distribution of Sepal Length in Iris Dataset",
       x = "Sepal Length (cm)",
       y = "Count")

The ggplot2 code works as follows:

- The `ggplot()` function initializes the plot and specifies the data to use. Here, `iris` is the dataset, and `aes(x = Sepal.Length)` sets the variable to plot on the x-axis;
- The `geom_histogram()` function creates the histogram. The `binwidth` argument controls the width of the bins; in this example, each bin covers 0.3 units of Sepal.Length. Adjusting `binwidth` can reveal more or less detail in the distribution;
- The `fill` argument sets the color that fills the bars, and `color` sets the outline color for each bar;
- The `labs()` function adds labels for the plot's title and axes, making the visualization easier to interpret.

Choosing an appropriate `binwidth` is important: a smaller binwidth shows more detail but can be noisy, while a larger binwidth smooths the distribution but may hide important features. Experimenting with different binwidth values helps you find a balance that best represents your data.

When you examine the resulting histogram, you can learn a lot about the distribution of Sepal.Length in the iris dataset. For instance, you might notice that most values are concentrated between 5 and 7 centimeters, with fewer flowers having very short or very long sepals. If the histogram is symmetric, the data are evenly distributed around the center; if it skews to one side, there are more extreme values in that direction. Peaks in the histogram may indicate common measurements or distinct groups within the data. By interpreting these features, you gain insights into the typical sepal lengths and the variability among the different iris species.

library(testthat)
library(ggplot2)

# Завантажуємо код студента
source("user_code.R")

# Наш стандартний помічник для чистого виводу
check_logic <- function(test_name, condition, error_msg) {
  test_that(test_name, {
    if (isTRUE(condition)) {
      expect_true(TRUE)
    } else {
      fail(error_msg)
    }
  })
}

# --- АНАЛІЗ ОБ'ЄКТА ---
plt <- if (exists("p")) p else last_plot()

# --- ТЕСТИ ---

# 1. Перевірка наявності об'єкта та шару гістограми
ok1 <- !is.null(plt) && inherits(plt, "ggplot")
ok1_geom <- FALSE
if (ok1) {
  # Гістограма в ggplot2 використовує GeomBar + StatBin
  ok1_geom <- any(sapply(plt$layers, function(l) {
    inherits(l$stat, "StatBin")
  }))
}

check_logic(
  "1. Object p is a ggplot histogram",
  ok1 && ok1_geom,
  "Ensure you created an object named 'p' and used geom_histogram()."
)

# 2. Перевірка наявності параметра binwidth
ok2 <- FALSE
if (ok1 && ok1_geom) {
  # Знаходимо шар гістограми та перевіряємо, чи binwidth не є стандартним (NULL)
  # ggplot2 за замовчуванням видає попередження, якщо binwidth не вказано студентом
  for (l in plt$layers) {
    if (inherits(l$stat, "StatBin")) {
      params <- l$stat_params
      # Перевіряємо, чи вказано binwidth АБО bins
      ok2 <- !is.null(params$binwidth) || !is.null(params$bins)
    }
  }
}

check_logic(
  "2. Binwidth or bins parameter is specified",
  ok2,
  "You must specify a 'binwidth' (or 'bins') inside geom_histogram() to define the bar ranges."
)

# 3. Перевірка підписів (Labels)
ok3 <- FALSE
if (!is.null(plt)) {
  l <- plt$labels
  ok3 <- l$title == "Distribution of Total Sleep Time in Mammals" &&
         l$x == "Total Sleep Time (hours)" &&
         l$y == "Count"
}

check_logic(
  "3. Title and axis labels are correct",
  ok3,
  "Check your labs(): title, x, and y labels must match the instructions exactly."
)

test_main.R

A hands-on project-based course guiding learners through the process of exploring, analyzing, and visualizing the msleep dataset using R and ggplot2. Learners will develop practical skills in data visualization, from basic plots to advanced customization and storytelling with data.

Section contains all hands-on tasks for the Data Visualization Project with R and ggplot2, focusing on the msleep dataset.

Visualizing Sleep Patterns with Histograms

Lösning