--- title: "Creating Box Plots with viz_boxplot()" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Creating Box Plots with viz_boxplot()} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE ) ``` ## 📖 Introduction The `viz_boxplot()` function creates interactive box plots (also known as box-and-whisker plots) using highcharter. Box plots display the five-number summary of a distribution: minimum, first quartile (Q1), median, third quartile (Q3), and maximum, along with outliers. Box plots are particularly useful for: - Comparing distributions across groups - Identifying outliers - Visualizing the spread and skewness of data ```{r setup} library(dashboardr) library(dplyr) library(gssr) # Load GSS data data(gss_all) gss <- gss_all %>% select(year, age, sex, race, degree) %>% filter(year == max(year, na.rm = TRUE), !is.na(age)) ``` ```{r, include=FALSE} library(dashboardr) ``` ## 📊 Basic Box Plot Create a simple box plot showing the overall distribution of age: ```{r} plot <- viz_boxplot( data = gss, y_var = "age", title = "Age Distribution", y_label = "Age (years)" ) plot ``` ## 📊 Grouped Box Plots Compare distributions across categories by adding an `x_var`: ```{r} gss_sex <- gss %>% filter(!is.na(sex)) %>% mutate(sex = as.character(haven::as_factor(sex))) plot <- viz_boxplot( data = gss_sex, y_var = "age", x_var = "sex", title = "Age Distribution by Sex", x_label = "Sex", y_label = "Age (years)" ) plot ``` ## 🎓 Box Plot by Education Level Examine how age varies across education levels: ```{r} gss_degree <- gss %>% filter(!is.na(degree)) %>% mutate(degree = as.character(haven::as_factor(degree))) plot <- viz_boxplot( data = gss_degree, y_var = "age", x_var = "degree", title = "Age Distribution by Education", x_label = "Highest Degree", y_label = "Age (years)" ) plot ``` ## ⚙️ Controlling Outlier Display By default, outliers are shown as individual points. Use `show_outliers = FALSE` to hide them: ```{r} plot <- viz_boxplot( data = gss_sex, y_var = "age", x_var = "sex", title = "Age by Sex (No Outliers)", show_outliers = FALSE ) plot ``` ## ↔️ Horizontal Box Plots Flip the orientation for better readability with many categories: ```{r} plot <- viz_boxplot( data = gss_degree, y_var = "age", x_var = "degree", title = "Age by Education (Horizontal)", horizontal = TRUE ) plot ``` ## 🏷️ Custom Category Labels Use `x_map_values` to rename category labels: ```{r} gss_sex_raw <- gss %>% filter(!is.na(sex)) plot <- viz_boxplot( data = gss_sex_raw, y_var = "age", x_var = "sex", title = "Age by Sex", x_map_values = list("1" = "Male", "2" = "Female") ) plot ``` ## 🔢 Custom Category Order Control the order of categories with `x_order`: ```{r} education_order <- c("graduate", "bachelor", "junior college", "high school", "lt high school") plot <- viz_boxplot( data = gss_degree, y_var = "age", x_var = "degree", title = "Age by Education (Ordered)", x_order = education_order ) plot ``` ## 🎨 Custom Color Palette Apply custom colors to the boxes: ```{r} plot <- viz_boxplot( data = gss_sex, y_var = "age", x_var = "sex", title = "Age by Sex", color_palette = c("#3498DB", "#E74C3C") ) plot ``` ## 🔍 Handling Missing Values Include NA as an explicit category: ```{r} gss_with_na <- gss %>% mutate(sex_with_na = if_else(row_number() %% 10 == 0, NA_character_, as.character(haven::as_factor(sex)))) plot <- viz_boxplot( data = gss_with_na, y_var = "age", x_var = "sex_with_na", title = "Age by Sex (Including Missing)", include_na = TRUE, na_label = "Not Reported" ) plot ``` ## 📊 Comparing Multiple Groups Box plots excel at comparing distributions across many groups: ```{r} gss_race <- gss %>% filter(!is.na(race)) %>% mutate(race = as.character(haven::as_factor(race))) plot <- viz_boxplot( data = gss_race, y_var = "age", x_var = "race", title = "Age Distribution by Race", x_label = "Race", y_label = "Age (years)" ) plot ``` ## 📚 Summary The `viz_boxplot()` function provides a powerful way to visualize distributions with these key features: - **Basic boxplot**: Just specify `data` and `y_var` - **Grouped comparison**: Add `x_var` to compare across categories - **Outliers**: Control display with `show_outliers` - **Orientation**: Use `horizontal = TRUE` for horizontal boxes - **Labels**: Customize with `x_map_values` and `x_order` - **Missing values**: Handle with `include_na` and `na_label` - **Styling**: Apply custom colors with `color_palette`