---
title: "Creating Interactive Stacked Bar Charts with `viz_stackedbar`"
output: rmarkdown::html_vignette
author: "Alexandra Pafford"
date: "2025-08-03"
vignette: >
  %\VignetteIndexEntry{Creating Interactive Stacked Bar Charts with `viz_stackedbar`}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7, # Default figure width
  fig.height = 5, # Default figure height
  warning = FALSE, # Suppress warnings from highcharter, haven etc. for cleaner output
  message = FALSE  # Suppress messages (e.g., haven conversion notes) for cleaner output
)
```

## 📖 Introduction

Welcome to this comprehensive guide on using the `viz_stackedbar` function from the `dashboardr` package. This unified function creates highly customizable interactive stacked bar charts from survey data, supporting two powerful modes:

**Mode 1: Grouped/Crosstab Mode** (use `x_var` + `stack_var`)

- Shows how one variable breaks down by another (e.g., education by gender)
- Use when your data is in long/tidy format
- Example: `viz_stackedbar(data, x_var = "education", stack_var = "gender")`

**Mode 2: Multi-Variable/Battery Mode** (use `x_vars`)

- Compares response distributions across multiple survey questions
- Use when you have multiple columns with the same response scale (e.g., Likert items)
- Example: `viz_stackedbar(data, x_vars = c("q1", "q2", "q3"))`

The function handles many common data preparation tasks automatically, including:

* Converting `haven_labelled` columns (from SPSS imports) to R factors
* Mapping raw values to descriptive labels
* Binning continuous variables into meaningful categories
* Handling missing values explicitly or implicitly
* Creating both count-based and percentage-based visualizations
* Customizing colors, ordering, and interactive tooltips

This vignette demonstrates both modes using the General Social Survey (GSS) Panel 2020 dataset.

## ⚙️ Getting Started

First, let's load the necessary libraries and examine our data set.

```{r libraries}
library(gssr)
library(dplyr)
library(highcharter)
library(tidyr)
library(dashboardr)

# Load GSS Panel 2020 data
data(gss_panel20)

```

## 📋 Data Preparation

Let's prepare our working dataset using the 2020 wave variables.

```{r data-prep}
# Create a working dataset with key _1a variables from 2020
gss_clean <- gss_panel20 %>%
  select(
    # Demographics
    age_1a, sex_1a, race_1a, degree_1a, region_1a,
    # Attitudes and behaviors
    happy_1a, trust_1a, fair_1a, helpful_1a,
    polviews_1a, partyid_1a, attend_1a,
    # Economic
    income_1a, class_1a
  ) %>%
  # Remove completely empty rows
  filter(if_any(everything(), ~ !is.na(.)))

# Check the data structure
glimpse(gss_clean)

# Examine some key variables
table(gss_clean$degree_1a, useNA = "always")
table(gss_clean$happy_1a, useNA = "always")
```

# 📊 Basic Stacked Bar Charts

## Example 1: Education by Gender (Count-based)

Let's start with a basic stacked bar chart showing educational attainment by gender.

```{r stackedbar-basic}
# Create basic stacked bar chart
plot1 <- viz_stackedbar(
  data = gss_clean,
  x_var = "degree_1a",
  stack_var = "sex_1a",
  title = "Educational Attainment by Gender",
  subtitle = "GSS Panel 2016 - Raw counts",
  x_label = "Highest Degree Completed",
  y_label = "Number of Respondents",
  stack_label = "Gender",
  stacked_type = "counts"
)

plot1
```

## Example 2: Happiness Distribution (Percentage-based)

Now let's create a percentage-based stacked bar chart to show happiness distribution across education levels.

```{r stackedbar-percentage}
# Define education order for logical display
education_order <- c("less than high school", "high school", "associate/junior college", "bachelor's", "graduate")

# Create percentage stacked bar chart
plot2 <- viz_stackedbar(
  data = gss_clean,
  x_var = "degree_1a",
  stack_var = "happy_1a",
  title = "Happiness Distribution Across Education Levels",
  subtitle = "Percentage breakdown within each education category",
  x_label = "Education Level",
  y_label = "Percentage of Respondents",
  stack_label = "Happiness Level",
  stacked_type = "percent",
  x_order = education_order,
  stack_order = c("very happy", "pretty happy", "not too happy"),
  tooltip_suffix = "%",
  color_palette = c("#2E86AB", "#A23B72", "#F18F01")
)

plot2
```

# ⚡ Advanced Features

## Example 3: Age Binning with Political Views

Let's demonstrate binning continuous variables by creating age groups and examining political views.

```{r stackedbar-binning}
# First, let's clean and prepare the age variable
gss_clean_age <- gss_clean %>%
  # Ensure age is numeric and remove missing values for this analysis
  filter(!is.na(age_1a), !is.na(polviews_1a)) %>%
  mutate(
    # Convert age to numeric if it isn't already
    age_numeric = as.numeric(age_1a)
  )

# Check the cleaned data
cat("Cleaned age summary:\n")
summary(gss_clean_age$age_numeric)

# Define age breaks and labels (adjusted if needed based on actual data range)
age_range <- range(gss_clean_age$age_numeric, na.rm = TRUE)
cat("Age range in data:", age_range[1], "to", age_range[2], "\n")
```


```{r stackedbar-binning2}
# Adjust breaks to match actual data range
age_breaks <- c(18, 30, 45, 60, 75, Inf)
age_labels <- c("18-29", "30-44", "45-59", "60-74", "75+")

# Map political views to shorter labels
polviews_map <- list(
  "extremely liberal" = "Ext. Liberal",
  "liberal" = "Liberal", 
  "slightly liberal" = "Sl. Liberal",
  "moderate, middle of the road" = "Moderate",
  "slightly conservative" = "Sl. Conservative",
  "conservative" = "Conservative",
  "extremely conservative" = "Ext. Conservative"
)

polviews_order <- list("Ext. Liberal", "Liberal", "Sl. Liberal",
                       "Moderate", "Sl. Conservative", "Conservative", 
                       "Ext. Conservative")


# Create chart with age binning and value mapping using the numeric age
plot3 <- viz_stackedbar(
  data = gss_clean_age,
  x_var = "age_numeric",  # Use the numeric version
  stack_var = "polviews_1a",
  title = "Political Views by Age Group",
  subtitle = "Distribution of political ideology across age cohorts",
  x_label = "Age Group",
  stack_label = "Political Views",
  x_breaks = age_breaks,
  x_bin_labels = age_labels,
  stack_map_values = polviews_map,
  stacked_type = "percent",
  tooltip_suffix = "%",
  x_tooltip_suffix = " years",
  color_palette = c("#d7191c", "#fdae61", "#fee08b", "#e6f598", "#abdda4", "#66c2a5", "#2b83ba"),
  stack_order = polviews_order
)

plot3
```

## Example 4: Including Missing Values

Let's create a chart that explicitly shows missing data patterns.

```{r stackedbar-missing}
## Example 4: Including Missing Values

# Let's create a chart that explicitly shows missing data patterns.

# Create chart including NA values (using default "(Missing)" labels)
plot4 <- viz_stackedbar(
  data = gss_clean,
  x_var = "race_1a",
  stack_var = "attend_1a",
  title = "Religious Attendance by Race/Ethnicity",
  subtitle = "Including non-responses as explicit categories",
  x_label = "Race/Ethnicity",
  stack_label = "Religious Attendance Frequency",
  include_na = TRUE,
  stacked_type = "percent",
  tooltip_suffix = "%",
  color_palette = c("#8e0152", "#c51b7d", "#de77ae", "#f1b6da", "#fde0ef", 
                   "#e6f5d0", "#b8e186", "#7fbc41", "#4d9221", "#276419")
)

plot4
```

## Example 5: Custom Value Mapping

Let's demonstrate comprehensive value mapping for cleaner labels.

```{r stackedbar-mapping}

# Create mappings for cleaner display
sex_map <- list("male" = "Men", "female" = "Women")
class_map <- list(
  "lower class" = "Lower",
  "working class" = "Working", 
  "middle class" = "Middle",
  "upper class" = "Upper"
)

# Create chart with custom mappings
plot5 <- viz_stackedbar(
  data = gss_panel20,
  x_var = "class_1a",
  stack_var = "sex_1a",
  title = "Gender Distribution Across Social Classes",
  subtitle = "With custom labels and ordering",
  x_label = "Self-Reported Social Class",
  stack_label = "Gender",
  x_map_values = class_map,
  stack_map_values = sex_map,
  x_order = c("Lower", "Working", "Middle", "Upper"),
  stack_order = c("Women", "Men"),
  stacked_type = "counts",
  tooltip_prefix = "Count: ",
  color_palette = c("#E07A5F", "#3D5A80")
)

plot5
```

# 🔬 Complex Analysis Examples

## Example 6: Regional Patterns in Trust

Let's examine how trust levels vary across regions and social classes.

```{r stackedbar-regional}

# Recode labels to fix the mistake
trust_map <- list(
  "can't trust" = "Can Trust",
  "can't be too careful" = "Can't Be Too Careful",
  "depends" = "It Depends"
)

# Create regional trust analysis
plot6 <- viz_stackedbar(
  data = gss_panel20,
  x_var = "region_1a",
  stack_var = "trust_1a",
  stack_map_values = trust_map,
  title = "Do You Trust Strangers?",
  subtitle = "Regional variation in interpersonal trust",
  x_label = "US Region",
  stack_label = "Trust Level",
  stack_order = c("Can Trust", "Can't Be Too Careful", "It Depends"),
  stacked_type = "percent",
  tooltip_suffix = "%",
  color_palette = c("#2E8B57", "#CD5C5C", "#DAA520")
)

plot6
```

# 📊 Multi-Variable Mode: Comparing Survey Questions

The `viz_stackedbar` function also supports comparing multiple survey questions side-by-side. This is particularly useful for visualizing survey batteries (sets of questions with the same response scale).

## Example 7: Basic Multi-Variable Comparison

When you have multiple columns representing different questions with the same response categories, use `x_vars` to compare them:

```{r multi-variable-basic}
# Define the questions to compare
social_questions <- c("trust_1a", "fair_1a", "helpful_1a")
social_labels <- c(
  "Interpersonal Trust",
  "Fairness of Others",
  "Helpfulness of Others"
)

# Create multi-variable comparison chart
plot7 <- viz_stackedbar(
  data = gss_clean,
  x_vars = social_questions,
  x_var_labels = social_labels,
  title = "Social Attitudes and Trust",
  subtitle = "Distribution of responses across social attitude questions",
  x_label = "Social Attitude Dimension",
  stack_label = "Response Level",
  stacked_type = "percent",
  tooltip_suffix = "%"
)

plot7
```

## Example 8: Multi-Variable with Response Mapping

You can standardize response labels across questions and customize the display. It's helpful to first check what the actual response values are:

```{r check-responses}
# First, examine what the actual response values are
cat("Unique trust responses:\n")
print(unique(as.character(gss_clean$trust_1a)))

cat("\nUnique fair responses:\n")
print(unique(as.character(gss_clean$fair_1a)))
```

Now create a mapping to standardize the labels:

```{r multi-variable-mapping}
# Create response mapping for cleaner labels
response_map <- list(
  "can't trust" = "High Trust/Positive",
  "can't be too careful" = "Low Trust/Negative",
  "depends" = "Situational/Neutral",
  "would try to be fair" = "High Trust/Positive",
  "would take advantage of you" = "Low Trust/Negative",
  "try to be helpful" = "High Trust/Positive",
  "looking out for themselves" = "Low Trust/Negative"
)

# Define response order (from negative to positive)
response_order <- c("Low Trust/Negative", "Situational/Neutral", "High Trust/Positive")

# Create chart with custom mapping and ordering
plot8 <- viz_stackedbar(
  data = gss_clean,
  x_vars = social_questions,
  x_var_labels = social_labels,
  title = "Social Trust Dimensions with Standardized Responses",
  subtitle = "Responses mapped to consistent positive/negative categories",
  x_label = "Trust Dimension",
  stack_label = "Trust Level",
  stack_map_values = response_map,
  stack_order = response_order,
  stacked_type = "percent",
  tooltip_suffix = "%",
  color_palette = c("#d62728", "#ffbb78", "#2ca02c"),
  include_na = TRUE,
  na_label_stack = "No Answer"
)

plot8
```

## Example 9: Single Variable with x_vars (Compact Display)

The `x_vars` parameter also works with a single variable. This is useful when you want the compact styling of multi-variable mode for a single question:

```{r single-variable-xvars}
# Single variable with x_vars - great for compact horizontal displays
plot9a <- viz_stackedbar(
  data = gss_clean,
  x_vars = "happy_1a",
  x_var_labels = "General Happiness",
  title = "Happiness Distribution",
  x_label = "Well-being Measure",
  stack_label = "Happiness Level",
  stacked_type = "percent",
  horizontal = TRUE,
  tooltip_suffix = "%",
  color_palette = c("#2E8B57", "#FFD700", "#CD5C5C", "grey")
)

plot9a
```

## Example 9b: Horizontal Multi-Variable Chart

For better readability with long labels, use horizontal orientation:

```{r multi-variable-horizontal}
# Horizontal chart for survey battery
plot9b <- viz_stackedbar(
  data = gss_clean,
  x_vars = c("trust_1a", "fair_1a", "helpful_1a"),
  x_var_labels = c(
    "Can people be trusted?",
    "Are people generally fair?",
    "Are people generally helpful?"
  ),
  title = "Social Capital Dimensions",
  subtitle = "GSS Panel 2016",
  stacked_type = "percent",
  horizontal = TRUE,
  tooltip_suffix = "%",
  color_palette = c("#8c510a", "#d8b365", "#f6e8c3", "grey"),
  include_na = TRUE,
  na_label_stack = "No response"
)

plot9b
```

## Example 10: Survey Battery Analysis

Survey batteries are sets of related questions with the same response scale. Here's how to create a comprehensive battery analysis:

```{r survey-battery}
# Create a social trust battery
trust_battery <- c("trust_1a", "fair_1a", "helpful_1a")
trust_battery_labels <- c(
  "Interpersonal Trust",
  "Perceived Fairness",
  "Perceived Helpfulness"
)

# Create a comprehensive battery analysis
plot10 <- viz_stackedbar(
  data = gss_clean,
  x_vars = trust_battery,
  x_var_labels = trust_battery_labels,
  title = "Social Trust Battery - Complete Analysis",
  subtitle = "Comprehensive view of social trust dimensions with enhanced tooltips",
  x_label = "Trust Dimension",
  stack_label = "Response Category",
  stacked_type = "percent",
  tooltip_prefix = "Percentage: ",
  tooltip_suffix = "% of respondents",
  show_var_tooltip = TRUE,
  include_na = TRUE,
  na_label_stack = "No answer",
  color_palette = c("#8c510a", "#d8b365", "#f6e8c3", "darkgrey")
)

plot10
```

## Example 11: Publication-Ready Chart

Let's create a fully customized, publication-ready chart:

```{r publication-ready}
# Create the most polished example
plot11 <- viz_stackedbar(
  data = gss_clean,
  x_vars = social_questions,
  x_var_labels = c(
    "Interpersonal Trust\n('Can most people be trusted?')",
    "Perceived Fairness\n('Do people try to be fair?')",
    "Perceived Helpfulness\n('Are people helpful?')"
  ),
  title = "Social Capital Dimensions in American Society",
  subtitle = "General Social Survey Panel 2016 (N = 2,867 respondents)\nPercentage distribution of responses across social trust measures",
  x_label = "Social Trust Dimension",
  stack_label = "Response Category",
  stacked_type = "percent",
  tooltip_prefix = "",
  tooltip_suffix = "% of respondents",
  x_tooltip_suffix = "",
  include_na = TRUE,
  na_label_stack = "No response",
  color_palette = c("#b2182b", "#ef8a62", "#fddbc7", "darkgrey"),
  show_var_tooltip = TRUE
)

plot11
```

# 🏷️ Labels and Tooltips Reference

## Summary of Label and Tooltip Options

The `viz_stackedbar()` function offers extensive customization for labels and tooltips:

| Parameter | Description | Example |
|-----------|-------------|---------|
| `x_label` | X-axis title | `"Question"` |
| `y_label` | Y-axis title (auto-set based on stacked_type) | `"Percentage"` |
| `stack_label` | Legend title | `"Response Category"` |
| `x_var_labels` | Custom labels for each question (multi-variable mode) | `c("Trust", "Fairness")` |
| `tooltip_prefix` | Text before value in tooltip | `"Score: "` |
| `tooltip_suffix` | Text after value in tooltip | `"%"`, `" respondents"` |
| `x_tooltip_suffix` | Text after category name in tooltip | `" question"` |
| `show_var_tooltip` | Show question name in tooltip (multi-variable mode) | `TRUE` |

```{r labels-reference}
# Example with all label/tooltip options
viz_stackedbar(
  data = gss_clean,
  x_vars = c("trust_1a", "fair_1a"),
  x_var_labels = c("Trust", "Fairness"),
  title = "Social Attitudes",
  x_label = "Attitude Measure",
  y_label = "Percent of Respondents",
  stack_label = "Response Level",
  stacked_type = "percent",
  tooltip_prefix = "",
  tooltip_suffix = "% responded",
  show_var_tooltip = TRUE
)
```

## When to Use Each Mode

| Mode | Use Case | Parameters |
|------|----------|------------|
| **Grouped/Crosstab** | One variable broken down by another | `x_var` + `stack_var` |
| **Multi-Variable** | Compare multiple questions side-by-side | `x_vars` |

**Use Grouped Mode when:**
- You want to show how education levels differ by gender
- You're creating a cross-tabulation visualization
- Your data is already in long/tidy format

**Use Multi-Variable Mode when:**
- You're comparing multiple survey questions
- Your questions share the same response categories
- You want to visualize a Likert scale battery

# 💡 Summary and Best Practices

## ✅ Key Features Demonstrated

1. **Two flexible modes**: Grouped/crosstab (`x_var` + `stack_var`) and multi-variable (`x_vars`)
2. **Basic stacked bars** with both count and percentage displays
3. **Age binning** for continuous variables
4. **Value mapping** for cleaner, more descriptive labels
5. **Custom ordering** for logical presentation of categories
6. **Missing value handling** with explicit NA categories
7. **Multi-variable comparisons** for survey batteries and Likert scales
8. **Custom color palettes** for different data types and branding
9. **Comprehensive tooltips** with prefixes, suffixes, and formatting
10. **Horizontal orientation** for better readability with long labels

## 🎯 Best Practices for Stacked Bar Charts

### General Guidelines

1. Choose appropriate stacking type
- Use "normal" or "counts" for comparing absolute counts across groups
- Use "percent" for comparing proportions within groups

2. Order categories logically
- When remapping values, remember to use the variable names as in the DataFrame
- Use natural ordering for ordinal variables (e.g., Likert scales)
- Consider frequency-based ordering for nominal categories
- Place "Other" or "Missing" categories at the end

### Multi-Variable Mode Best Practices

1. Choose questions with similar response scales
- Use questions that have the same or compatible response categories
- Consider mapping different scales to common categories when appropriate

2. Order questions logically
- Group related concepts together
- Consider ordering by typical response patterns (most positive to least positive)
- Place most important questions first

3. Use appropriate stacking type
- Use "percent" for comparing response patterns across questions
- Use "counts" when absolute counts matter more than proportions

3. Handle missing data thoughtfully
- Decide whether to include or exclude missing categories
- Use include_na = TRUE when missing patterns are meaningful
- Provide clear labels for missing categories

4. Use appropriate colors
- Use diverging palettes for scales with meaningful center points
- Use qualitative palettes for nominal categories
- Ensure sufficient contrast between adjacent categories
- Consider colorblind accessibility

5. Customize tooltips for clarity
- Include units and context in tooltips
- Use prefixes/suffixes to clarify meaning
- Format numbers appropriately for your audience

6. Consider your audience
- Use descriptive labels rather than codes
- Provide clear titles and subtitles
- Include sample sizes in subtitles when relevant


## 🌍 Common Use Cases

The `viz_stackedbar` function is particularly useful for:

- **Survey response analysis**: Displaying Likert scale responses across demographics
- **Demographic breakdowns**: Showing composition of groups by various characteristics  
- **Attitude research**: Comparing opinions across different populations
- **Market research**: Analyzing customer segments and preferences
- **Educational research**: Examining outcomes across different groups
- **Health surveys**: Displaying health behaviors or outcomes by demographics

## 📚 Conclusion

The `viz_stackedbar()` function provides a unified, comprehensive solution for creating publication-ready stacked bar charts from survey data. Its two flexible modes handle the most common visualization needs:

- **Grouped/Crosstab Mode** (`x_var` + `stack_var`): Show how one variable breaks down by another
- **Multi-Variable Mode** (`x_vars`): Compare response distributions across multiple survey questions

Key advantages include:

- **Unified interface** - one function for both crosstabs and survey batteries
- **Automatic data preparation** for common survey data formats
- **Smart mode detection** based on the parameters you provide
- **Flexible binning and mapping** for continuous and coded variables
- **Comprehensive missing data handling** options
- **Interactive tooltips** for enhanced data exploration
- **Publication-ready styling** with extensive customization options

**Note:** If you were previously using `viz_stackedbars()` for multi-variable comparisons, 
you can now use `viz_stackedbar()` with the same parameters. The old function still works 
but `viz_stackedbar()` is now the recommended approach for all stacked bar chart needs.