Presentation Exercise

I got my graph from the pew research center https://www.pewresearch.org/short-reads/2025/12/15/our-favorite-data-visualizations-of-2025/sr_25-12-15_data-visualizations_1/ It looks at voters in 2020 and 2024 and determines how many switched their votes. I picked it because my lab has previously used sankey plots to visualize some of our data and I’ve never made one before. There wasn’t a CSV I could download that I saw for this, however the % are labeled on the graph. So I made a spreadsheet including 100 hypothetical voters and used the labeled % to fill it in.

First I need to load packages

library(here)
here() starts at /Users/rebeccabasta/Desktop/EstherPalmer-portfolio
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(skimr)
library(tidyr) #I think ggalluvial needs this
library(ggplot2)
library(ggalluvial) #for sankey plots, ggplot2 extension
library(knitr)
library(kableExtra)

Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':

    group_rows

Next I need to load my data

data_location <- here::here("presentation-exercise", "2024_vote_data.csv")
vd <- read.csv(data_location)

Now for a quick look at my data

summary(vd)
  Vote_2020          Vote_2024              Freq      
 Length:10          Length:10          Min.   : 1.00  
 Class :character   Class :character   1st Qu.: 2.25  
 Mode  :character   Mode  :character   Median : 5.00  
                                       Mean   :10.00  
                                       3rd Qu.:20.00  
                                       Max.   :28.00  
glimpse(vd)
Rows: 10
Columns: 3
$ Vote_2020 <chr> "Biden", "Biden", "Biden", "Trump", "Trump", "Trump", "Did n…
$ Vote_2024 <chr> "Harris", "Trump", "Did not vote", "Harris", "Trump", "Did n…
$ Freq      <int> 25, 2, 5, 1, 25, 3, 5, 5, 28, 1
skim(vd)
Data summary
Name vd
Number of rows 10
Number of columns 3
_______________________
Column type frequency:
character 2
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Vote_2020 0 1 5 12 0 4 0
Vote_2024 0 1 5 12 0 4 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Freq 0 1 10 11.18 1 2.25 5 20 28 ▇▁▁▁▃

My data looks correct

Now lets see if I can make this plot!

is_alluvia_form(vd)
Missing alluvia for some stratum combinations.
[1] TRUE
#This is a check

ggplot(vd, aes(y = Freq, axis1 = Vote_2020, axis2 = Vote_2024)) + geom_alluvium(aes(fill = Vote_2020))
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.

This is a good first plot! I want to see if I can add the blocks in and maybe the correct colors

ggplot(vd, aes(y = Freq, axis1 = Vote_2020, axis2 = Vote_2024)) + geom_alluvium(aes(fill = Vote_2020)) + geom_stratum(width = 1/12) + geom_label(stat = "stratum", aes(label = after_stat(stratum))) + coord_flip()
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.

I have managed to flip it on it’s side! Although it’s the wrong side. I have failed so far at colors.

ggplot(vd, aes(y = Freq, axis1 = Vote_2020, axis2 = Vote_2024)) + 
    geom_alluvium(aes(fill = Vote_2020), decreasing = FALSE) + 
    geom_stratum(width = 1/12) + 
    geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
    scale_x_continuous(breaks = 1:2, labels = c("2020 Vote", "2024 Vote")) 
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.

These axis labels do not match the data

ggplot(vd, aes(y = Freq, axis1 = Vote_2024, axis2 = Vote_2020)) + 
    geom_alluvium(aes(fill = Vote_2020)) + 
    geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
    scale_x_continuous(breaks = 1:2, labels = c("2024 Vote", "2020 Vote")) +
    coord_flip() +
    ggtitle("The flow of voters and non-voters from 2020 to 2024")
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.

So by removing the geom_stratum line I can keep the data labels correct, which is fine, just maybe further away from the original graph. I have solved the flipped y axis being weird by flipping the positions of all my variables.

ggplot(vd, aes(y = Freq, axis1 = Vote_2024, axis2 = Vote_2020)) + 
    geom_alluvium(aes(fill = Vote_2020)) + 
    geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
    scale_x_continuous(breaks = 1:2, labels = c("2024 Vote", "2020 Vote")) +
    scale_fill_manual(values = c("#9AC0CD", "#e7e380ff", "#d9e2e2ff", "#fe4f1eff")) +
    coord_flip() +
    ggtitle("The flow of voters and non-voters from 2020 to 2024")
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.

I figured out the colors, it just really hated when I put the variables in there for some reason

Here is the original graph: Original Graph

Read in data in a useable form for not a sankey plot

table_data_location <- here("presentation-exercise", "Table_data.csv")
td <- read.csv(table_data_location)

So I can insert charts into my table, but first I need to make them

data_location2 <- here("presentation-exercise", "for_pie_chart.csv")
pi <- read.csv(data_location2)

Tpi <- ggplot(pi, aes(x = "", y = Trump, fill = Group)) + geom_bar(stat = "identity", width = 1) + coord_polar("y", start=0)
print("Tpi")
[1] "Tpi"
BHpi <- ggplot(pi, aes(x = "", y = Biden.Harris, fill = Group)) + geom_bar(stat = "identity", width = 1) + coord_polar("y", start=0)
print("BHpi")
[1] "BHpi"
DNVpi <- ggplot(pi, aes(x = "", y = Did.not.vote , fill = Group)) + geom_bar(stat = "identity", width = 1) + coord_polar("y", start=0)
print("DNVpi")
[1] "DNVpi"
charts <- c("Trump_pie.png", "BHpie.png", "DNVPI.png")

Now to put it in table form

rename_columns <- c("Candidate", "2020 Vote", "2024 Vote", "% reoccuring voters", "% did not vote", "% voted for opponent", "Chart")
kable(td, 
    col.names = rename_columns, 
    caption = "Percent of voters for each candidate in 2020/2024 and how many voted for the same candidate in 2024",
    ) %>% 
    column_spec(1, bold = TRUE) %>% 
    kable_styling(latex_options = "striped") %>% 
    kable_styling(fixed_thead = TRUE) %>%
    column_spec(7, image = charts)
Percent of voters for each candidate in 2020/2024 and how many voted for the same candidate in 2024
Candidate 2020 Vote 2024 Vote % reoccuring voters % did not vote % voted for opponent Chart
Trump 29 32 85 11 4 NA
Biden/Harris 32 31 79 15 6 NA
Did not vote 38 36 73 73 27 NA

This is a short table but I made it so that the header scrolls with you because I always get annoyed when I have to scroll to the top with things.