3.1 Part 1: Study which segments of the population exhibited greatest COVID vaccine hesitancy following the deployment of mass vaccinations in the US in early 2021.
The national average COVID hesitance rate reported during the first quarter of the data collection period (April - July 2021) was ~18%. In other words, 18% of those surveyed after the mass deployment of vaccinations responded that they would probably or definitely not get vaccinated. In Part 1, we will break down this percentage across U.S. geographies, identifying the states with the highest and lowest hesitancy rates. We will then analyze the types of population demographics (e.g., age, vaccination history, ethnicity, etc.) that had a strong association with COVID vaccine hesitancy.
Our analysis of variations in vaccine hesitance begins by comparing hesitancy across states during the April-July 21’ timeframe. As illustrated in the map below, vaccine hesitance varied significantly by geography. States such as Hawaii, New York, and Vermont exhibited the lowest hesitance, while other states like Wyoming, North Dakota, and West Virginia exhibited the highest hesitance. It is worth noting that there are clusters of high hesitance, namely the South East and Northern Mountain West regions, whereas low hesitance was more prominent in the North East and South West regions.
Code
# Import libraries# install.packages("choroplethr")# install.packages("RColorBrewer")# install.packages("GGally")# install.packages("devtools")# install.packages("remotes")# install.packages("choroplethrMaps")# devtools::install_github("timelyportfolio/parcoords")# install.packages("ggpmisc")library(ggpmisc)library(tidyverse)library(data.table)library(vcd)library(choroplethrMaps)library(GGally)library(parcoords)library(RColorBrewer)library(choroplethr)library(lubridate)library(stringr)# Import starting dataset and filter for vaccine hesitance indicator and monthly samples# Load vaccine hesitancy datacovid <-fread("../covid_dataset/covid.csv")# Filter for Monthly sample period and vaccine hesitance responsescovid_hes <- covid |>filter(`Time Type`=="Monthly",`Indicator Category`=="Probably or definitely will not get vaccinated")# Prep data for analysis. The following chunk truncates the names of population segments for data visualization purposes# Create vector of shortened group namesgroup_name_shortening <-c("Age"="Age","Age by race/ethnicity"="Age.Race","All adults 18+"="18+ Adults","Born in the U.S."="US.Born","Concern about getting COVID-19 disease"="Covid.Concern","Confidence in COVID-19 vaccine safety"="Vaccine.Confiden","Confidence that COVID-19 vaccine is important"="Vaccine.Imprtnt","COVID-19 vaccination status of friends and family"="Fr.Fam.Vaccd","COVID-19 vaccine information seeking frequency in past month"="Vacc.Info.Seek","Disability status (any)"="Disability","Essential worker group"="Ess.Wrkr.grp","Ever had COVID-19 disease (self-report)"="Past.Covid","Gender identity"="G.Identity","Have a regular physician or provider for primary care"="Prim.Care.Providr.Status","Health condition associated with higher risk for COVID-19 (any)"="Comorbidity","Healthcare provider recommended I get a COVID-19 booster"="Dr.Recommend.Boostr","Healthcare provider recommended I get a COVID-19 vaccine"="Dr.Recommend.Vacc","Insurance status"="Insurance","Language of interview"="Language","Mask-wearing frequency"="Mask.Use","Mental health status"="Mental.Health","Metropolitan statistical area"="Rural.Urban","Political leaning of county of residence"="Political","Poverty status"="Pov.Status","Pregnancy status (females age 18 – 49 years)"="Pregnancy","Race/Ethnicity (4 level)"="Race.4lvl","Race/Ethnicity (7 level)"="Race.7lvl","Received a flu vaccination"="Flu.Vacc","Received non-COVID-19 vaccine(s) in past two years"="Past.Vaccin","Self-perceived equity in seeking healthcare compared to other races/ethnicities"="Racial.Equity","Sex"="Sex","Sexual orientation"="S.Orientation","Social Vulnerability Index (SVI) of county of residence"="SVI","Speaks a language other than English at home"="Non.Englsh.Language","Work or school requires COVID-19 vaccination"="Work.School")# Construct dataframe from vectorgroup_name_shortened <-data.frame(`Group Name`=names(group_name_shortening), shortened_group = group_name_shortening, stringsAsFactors =FALSE)|> dplyr::rename(`Group Name`= Group.Name)# Create vector of shortened group category namesgroup_cat_shortening <-c("18 – 49 years"="18 – 49","50 – 64 years"="50 – 64","65+ years"="65+","18 – 49 years, Black, non-Hispanic"="18 – 49, Black","18 – 49 years, Hispanic"="18 – 49, Hispanic","18 – 49 years, Other or multiple races, non-Hispanic"="18 – 49, Other","18 – 49 years, White, non-Hispanic"="18 – 49, White","50 – 64 years, Black, non-Hispanic"="50 – 64, Black","50 – 64 years, Hispanic"="50 – 64, Hispanic","50 – 64 years, Other or multiple races, non-Hispanic"="50 – 64, Other","50 – 64 years, White, non-Hispanic"="50 – 64, White","65+ years, Black, non-Hispanic"="65+, Black","65+ years, Hispanic"="65+, Hispanic","65+ years, Other or multiple races, non-Hispanic"="65+, Other","65+ years, White, non-Hispanic"="65+, White","Born in the U.S."="US Born","Not born in the U.S."="Not US Born","Many or almost all family or friends have received vaccine"="Many","Some or no family or friends have been vaccinated"="No_some","A little or not at all concerned"="A little_not","Very or moderately concerned"="Very_Moderately","Somewhat or not at all safe for me"="Not at all_Somewhat","Very or completely safe for me"="Very","A little or not at all important to protect me from COVID-19"="A little_not at all","Somewhat or very important to protect me from COVID-19"="Somewhat_very","No"="No","Yes"="Yes","Cisgender"="Cisgender","Don't know/refused"="Dont know","Transgender/Non-binary"="TG/Non bin","Insured"="Insured","Not insured"="Not insured","English"="English","Other"="Other","Spanish"="Spanish","Rural"="Rural","Suburban"="Suburban","Urban"="Urban","Democrat-leaning"="Democrat","Not Democrat-leaning or Republican-leaning"="None","Republican-leaning"="Republican","Above poverty, income <$75k"="<$75k","Above poverty, income >=$75k"=">=$75k","Below poverty"="Below poverty","Unknown income"="Unknown","Breastfeeding"="Breastfeeding","Not pregnant, trying to get pregnant, or breastfeeding"="Not pregnant","Pregnant"="Pregnant","Trying to get pregnant"="Trying to get pregnant","American Indian/Alaska Native, non-Hispanic"="Native","Asian, non-Hispanic"="Asian","Black, non-Hispanic"="Black","Hispanic"="Hispanic","Native Hawaiian/Pacific Islander, non-Hispanic"="Pacific","Other or multiple races, non-Hispanic"="Other","White, non-Hispanic"="White","Female"="Female","Male"="Male","Gay/Lesbian/Bisexual/Other"="LGB","Heterosexual/Straight"="Hetero","High SVI"="High","Low SVI"="Low","Moderate SVI"="Moderate","SVI unknown"="Unknown","Not applicable/unemployed"="Unemployed","All adults age 18+ years"="18+ Adults","Essential healthcare"="Ess.Healthcare","Other essential worker"="Ess.Other","Other frontline worker"="Frontline.Other","Persons who are not essential workers"="Not.Ess.Wrkr","School and childcare"="School.Childcare","Better than other races or ethnicities"="Better","Same as other races or ethnicities"="Equal","Worse than other races or ethnicities"="Worse","Did not receive flu vaccination"="No","Received a flu vaccination"="Yes","Political leaning unknown"="Unknown","Often or sometimes"="Often.Sometimes","Rarely or never"="Rarely.Never","Always or often wears mask"="Always.Often","Sometimes, rarely, or never wears mask"="Occasional.Never","Excellent, very good, or good"="Excellent.Good","Fair or poor"="Fair.Poor")# Construct dataframe from vectorgroup_cat_shortened <-data.frame(`Group Category`=names(group_cat_shortening), shortened_cat = group_cat_shortening, stringsAsFactors =FALSE) |> dplyr::rename(`Group Category`= Group.Category)# Join shortened group names and categories to master tablecovid_hes_prepped <-left_join(left_join(covid_hes, group_name_shortened,by ="Group Name"), group_cat_shortened, by ="Group Category")# Create combinations of group names and categories via paste function# Include string wrapped version of text for visualization purposescovid_hes_prepped <- covid_hes_prepped |>mutate(concat =paste(shortened_group, shortened_cat, sep =" - "),concat_wrap =str_wrap(concat, width =10))# Store values of months that will be combined to analyze hesitancy when vaccines were initially released in 2021periods_to_combine_2021 <-c("April 22 - May 29","May 30 - June 26","June 27 - July 31")# Calculate overall nationwide hesitancynational_hesitance_rate <- covid_hes_prepped |>filter(`Year`==2021,`Time Period`%in% periods_to_combine_2021,`Group Category`=="All adults age 18+ years", Geography =="National") |>mutate(n_hesitant =round(`Estimate (%)`/100*`Sample Size`,0)) |>group_by(Geography) |>summarise(`Sample Size`=sum(`Sample Size`), n_hesitant =sum(n_hesitant)) |>mutate(`Estimate (%)`= n_hesitant /`Sample Size`*100,`Time Period`="2021 analysis period",Year =2021) # Calculate early 2021 hesitance by statecovid_2021_analysis_allstates <- covid_hes_prepped |>filter(`Year`==2021,`Time Period`%in% periods_to_combine_2021,`Group Category`=="All adults age 18+ years",`Geography Type`%in%c("Jurisdictional Estimates"),!grepl("-", Geography)) |>mutate(n_hesitant =round(`Estimate (%)`/100*`Sample Size`,0)) |>group_by(Geography) |>summarise(`Sample Size`=sum(`Sample Size`),n_hesitant =sum(n_hesitant)) |>mutate(`Estimate (%)`= n_hesitant /`Sample Size`*100,`Time Period`="2021 analysis period",Year =2021)# Prep dataframe to be used in the state_choropleth functioncovid_2021_analysis_allstates_map <- covid_2021_analysis_allstates |>select(Geography, `Estimate (%)`) |>transmute(region =tolower(Geography), value =`Estimate (%)`)# Display state heatmap of hesitance ratesstate_choropleth(covid_2021_analysis_allstates_map,title =" April-July 2021 COVID-19 Vaccine Hesitance Rates by State",legend ="Hesitancy Rates (%) ")
While hesitance rates by state appear to be normally distributed, with no clear outliers per the histogram and boxplot below, the aforementioned variation in hesitance between states like Wyoming and New York can still be used as a starting point for where to prioritize vaccination campaign efforts in the event of a future COVID-19 outbreak.
Code
# Plot histogram of state hesitance ratesggplot(covid_2021_analysis_allstates, aes(x =`Estimate (%)`)) +geom_histogram(bins =8, fill ="blue", color ="black", alpha =0.7) +geom_boxplot() +labs(title ="Distribution of Statewide Hesitance Rates", x ="Hesitance %", y ="Number of states") +theme_bw()
This finding brings us to our next layer of analysis: do population segments within each state have similar vaccine hesitance rates? Before diving into each state further, we first identified the population segments with relatively higher nationwide vaccine hesitance rates from April-July 2021. Across every state, it can be seen that the 10 segments displayed below demonstrated the highest hesitance.
Code
# Create vector of all group names as represented in the datagroup_name_toloop <-c("Sex","Age","Race/Ethnicity (7 level)","Age by race/ethnicity","Sexual orientation","Gender identity","Metropolitan statistical area","Born in the U.S.", "Language of interview","Poverty status", "Insurance status","Social Vulnerability Index (SVI) of county of residence", "Political leaning of county of residence", "Received non-COVID-19 vaccine(s) in past two years", "Health condition associated with higher risk for COVID-19 (any)","Disability status (any)", "Pregnancy status (females age 18 – 49 years)", "Ever had COVID-19 disease (self-report)","Concern about getting COVID-19 disease", "Confidence in COVID-19 vaccine safety", "Confidence that COVID-19 vaccine is important", "Healthcare provider recommended I get a COVID-19 vaccine", "COVID-19 vaccination status of friends and family","Work or school requires COVID-19 vaccination" )group_name_toloop_short <-lapply(group_name_toloop, function(x) ifelse(x %in%names(group_name_shortening), group_name_shortening[[x]], x))# Calculate vaccine hesitance by population segmentcovid_2021_analysis <- covid |>filter(`Year`==2021, `Time Period`%in% periods_to_combine_2021,`Group Name`%in% group_name_toloop,`Geography Type`%in%c("Jurisdictional Estimates","National Estimates"),`Indicator Category`=="Probably or definitely will not get vaccinated",!grepl("-", Geography)) |>mutate(n_hesitant =round(`Estimate (%)`/100*`Sample Size`,0)) |>group_by(pick(c(1:6))) |>summarise(`Sample Size`=sum(`Sample Size`),n_hesitant =sum(n_hesitant)) |>filter(Geography =="National") |>mutate(`Estimate (%)`= n_hesitant /`Sample Size`*100,`Time Period`="2021 analysis period",Year =2021)# Join shortened group names and categories to master tablecovid_2021_analysis <-left_join(left_join(covid_2021_analysis, group_name_shortened,by ="Group Name"), group_cat_shortened, by ="Group Category")# Create combinations of group names and categories via paste function# Include string wrapped version of text for visualization purposescovid_2021_analysis <- covid_2021_analysis |>mutate(concat =paste(shortened_group, shortened_cat, sep =" - "),concat_wrap =str_wrap(concat, width =10),`Group Name`= shortened_group,`Group Category`= shortened_cat) |>select(-c(shortened_group, shortened_cat))# Visualize the 10 categories with the highest hesitance ratescovid_2021_analysis |>ungroup() |>top_n(10, `Estimate (%)`) |>ggplot(aes(x =fct_reorder(concat_wrap, -(`Estimate (%)`)),y =`Estimate (%)`)) +geom_bar(stat ="identity", fill ="darkgreen", color ="black") +labs(title ="Top 10 categories with the highest hesitance in 2021",x ="Group categories", y ="2021 hesitance (%)") +theme_bw() +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1,size=6),axis.text.y =element_text(size =8),axis.title =element_text(size =8),plot.title =element_text(size =12))
Code
# Store the categories with the highest hesitance rates in a single-column dataframetop_10_highest_hesitance_cat_list <- covid_2021_analysis |>ungroup() |>top_n(10, `Estimate (%)`) |>select(concat)
Each segment represents a group of individuals who participated in the survey and who aligns to a certain type of demographic, such as an ethnicity or a group individuals who share a certain opinion about vaccines. For instance, “Vaccine.Imprtnt - A little_not at all” represents individuals who do not believe vaccines are essential to healthcare, whereas “Fr.Fam.Vaccd - No_some” corresponds to individuals who responded “No or some” to whether their friends and/or family are vaccinated. A noteworthy observation is that political views may have influenced vaccine hesitancy, evidenced by a high hesitance rate among Republican survey participants.
To investigate how population-specific hesitance varies by state, we focused on the 10 most and 10 least hesitant states and generated two corresponding parallel coordinate plots, where elements on the x-axis represent some of the most hesitant population segments.
Code
# Calculate hesitance over three months in 2021 by state and population segment# Clean state columnstate_group_analysis <- covid |>filter(`Year`==2021,`Time Period`%in% periods_to_combine_2021,`Geography Type`=="Jurisdictional Estimates",`Indicator Category`=="Probably or definitely will not get vaccinated",!grepl("-", Geography)) |>mutate(n_hesitant =round(`Estimate (%)`/100*`Sample Size`, 0)) |>group_by(Geography, `Group Name`, `Group Category`) |>summarise(`Sample Size`=sum(`Sample Size`),n_hesitant =sum(n_hesitant)) |>mutate(`Estimate (%)`= n_hesitant /`Sample Size`*100,`Time Period`="2021 analysis period",Year =2021) |>ungroup()# Shorten group namesstate_group_analysis <- state_group_analysis |>mutate_at(vars(`Group Name`), ~recode(., !!!group_name_shortening))state_group_analysis <- state_group_analysis |>mutate_at(vars(`Group Category`), ~recode(., !!!group_cat_shortening))state_group_analysis <- state_group_analysis |>mutate(concat =paste(`Group Name`, `Group Category`, sep =" - "))state_group_analysis$concat_wrap <-str_wrap(state_group_analysis$concat, width =10)# Filter groups for top most hesitant# Pivot wider to convert row-level categories to columns containing hesitancy ratesstate_group_analysis_pivot <- state_group_analysis |>filter(concat %in% top_10_highest_hesitance_cat_list$concat) |>select(Geography, `Estimate (%)`, concat_wrap) |>pivot_wider(id_cols = Geography, names_from ="concat_wrap",values_from ="Estimate (%)")# Pregnancy - pregnant is not available at the state level during the filtered timeframestate_not_in_top_10 <- top_10_highest_hesitance_cat_list |>filter(!concat %in% state_group_analysis$concat)pregnancy_check <- covid |>filter(`Group Category`=="Pregnant") |>distinct(`Geography Type`, Geography, `Group Name`, `Group Category`,`Time Period`, Year)# Identify top most hesitant statestop_10_hesistant_states <- covid_2021_analysis_allstates |>arrange(desc(`Estimate (%)`)) |>filter(!Geography %in%c("Guam","Puerto Rico")) |>slice(1:10)# Identify top least hesitant statesbottom_10_hesistant_states <- covid_2021_analysis_allstates |>arrange(`Estimate (%)`)|>filter(!Geography %in%c("Guam","Puerto Rico")) |>slice(1:10)# Filter pivoted data on most hesitant statestop_10_parallel_analaysis <- state_group_analysis_pivot |>filter(Geography %in% top_10_hesistant_states$Geography)# Display parallel coordinates plot for most hesitant statesggparcoord(top_10_parallel_analaysis |>select(where(~!any(is.na(.)))), columns =c(2:8),scale ="globalminmax",alpha =0.4) +labs(title ="2021 Hesitance Rates by Population Segment",subtitle ="10 Highest Hesitance States",x ="Population Segments", y ="2021 Hesitance (%)") +theme(axis.text.x =element_text(angle =45, hjust =1))
Code
# Filter pivoted data on least hesitant statesbottom_10_parallel_analaysis <- state_group_analysis_pivot |>filter(Geography %in% bottom_10_hesistant_states$Geography)# Display parallel coordinates plot for least hesitant statesggparcoord(bottom_10_parallel_analaysis |>select(where(~!any(is.na(.)))), columns =c(2:7), scale ="globalminmax",alpha =0.4) +labs(title ="2021 Hesitance Rates by Population Segment",subtitle ="10 Lowest Hesitance States",x ="Population Segments", y ="2021 Hesitance (%)") +theme(axis.text.x =element_text(angle =45, hjust =1))
In both cases, the hesitance rates within states were not uniform across all population segments. In particular, the group of individuals who responded “A little or not at all” to whether vaccination is important to them showed remarkably higher hesitance rates at the state level, which held true for every state.
This observation suggests that vaccine hesitance may not differ solely based on geography and ties in closely with the key question we posed in our research plan: are certain types of demographics more strongly associated with COVID-19 vaccine hesitance than others? To address this question, we proceeded to conduct Chi-Squared tests at the national level for every demographic specified in the Data section of this report. Based on the statistical significance of each test, we then ranked the demographics to identify those that had a stronger association with COVID-19 vaccine hesitance.
Code
# Define function to run Chi Square test for a given demographicget_chi_sq <-function(df, geography, time_period, year, group_name, hesitant_group) { temp <- df |>filter(Geography == geography,`Indicator Name`=="Vaccination uptake and intention",`Time Period`== time_period, Year == year,`Group Name`== group_name) |>mutate(hesitant =ifelse(`Indicator Category`%in% hesitant_group,1,0)) |>group_by(`Group Category`,hesitant) |>summarise(hesitance_rate =sum(`Estimate (%)`),sample_size =first(`Sample Size`)) |>mutate(n_hesitance =round(hesitance_rate/100* sample_size,0)) |>filter(hesitant ==1) |>select(`Group Category`,n_hesitance,sample_size)if(nrow(temp) >0) { expected_proportion <-sum(temp$n_hesitance)/sum(temp$sample_size) temp <- temp |>mutate(expected = sample_size*expected_proportion) res =chisq.test(x = temp$n_hesitance, p = temp$expected/sum(temp$expected)) output <-list(round(res$statistic,digits =0), res$p.value) }else { output <-list(0,0) }return(output)}group_name_toloop <-c("Sex", "Age", "Race/Ethnicity (7 level)","Age by race/ethnicity", "Sexual orientation","Gender identity", "Metropolitan statistical area","Born in the U.S.", "Language of interview","Poverty status", "Insurance status","Social Vulnerability Index (SVI) of county of residence","Political leaning of county of residence","Received non-COVID-19 vaccine(s) in past two years","Health condition associated with higher risk for COVID-19 (any)","Disability status (any)","Pregnancy status (females age 18 – 49 years)","Ever had COVID-19 disease (self-report)","Concern about getting COVID-19 disease","Confidence in COVID-19 vaccine safety","Confidence that COVID-19 vaccine is important","Healthcare provider recommended I get a COVID-19 vaccine","COVID-19 vaccination status of friends and family","Work or school requires COVID-19 vaccination" )group_name_toloop_short <-lapply(group_name_toloop, function(x) ifelse(x %in%names(group_name_shortening), group_name_shortening[[x]], x))result_df2 <-data.frame(matrix(ncol =3, nrow =0))colnames(result_df2) <-c("group_name", "X2_test_stat", "p_value")for (x in group_name_toloop_short) { res <-get_chi_sq(covid_2021_analysis,geography ="National",time_period ="2021 analysis period",year =2021,group_name = x,hesitant_group =c("Probably or definitely will not get vaccinated")) result_df2[nrow(result_df2) +1,] =c(x,res[1],res[2])}result_df2 <-arrange(result_df2, desc(X2_test_stat))result_df2 |>ungroup() |>top_n(10,X2_test_stat) |> dplyr::rename(`Demographic Type`= group_name)
The top 10 most statistically significant types of demographics are displayed above. These results provide key insights into understanding COVID-19 vaccine hesitance, such as:
Whether an individual has been vaccinated in the past may determine their hesitance toward the COVID-19 vaccine (see Chi-Squared for “Past.Vaccin”).
One’s age and race may also influence hesitance (see “Age.Race” above). We previously saw that white individuals between the ages of 18-49 were more hesitant than others.
Moreover, whether an individual associates themselves with the Republican party could also impact willingness to receive the COVID-19 vaccine (see Chi-Squared “Political”).
Two illustrative mosaic plots are shown below to confirm the Chi-Square test results above. The first plot helps visualize the relationship between COVID-19 vaccine hesitance vs the self-perceived importance of vaccines in general (the group with the highest Chi-Square score). The second plot visualizes the association between hesitance and gender identity (the group with the lowest Chi-Square score).
It can be seen in the first plot that those who consider vaccines as important had significantly lower hesitance than those who did not. On the other hand, gender identity yielded a plot that exactly matches a mosaic plot of expected values, thus confirming its statistically insignificant Chi-Square test result.
While mosaic plots can be analyzed for every type of demographic, having the list of most statistically significant groups is beneficial on its own for furthering our exploration of vaccine hesitance. For instance, the significant results we obtained may spark an interest in researching how race affected hesitance in 2021. We can also begin to research the influence of being U.S. born or living in a rural community on hesitance, as both these categories of demographics had a statistically significant Chi-Squared.
Now that we have a better understanding of the relationships between geography, demographics, and 2021 COVID-19 vaccine hesitance, we will move onto the second part of our research, which looks at how vaccine hesitance has changed over time.
3.2 Part 2: Analysis of segments that showed the greatest reduction in COVID vaccine hesitancy between 2021 and 2023
Across the nation, vaccine hesitance declined from around 18% in April/May 2021 to 11% in May/June 2023, as shown in the previously-seen graph below.
Code
# Filter for the vaccination uptake and intent indicator# Filter for responses corresponding to vaccine hesitancy# Include all adults 18+# Set time period to monthly (excluding weekly samples) and filter for nationwide samplesvacc_intention_time <- covid |>filter(`Indicator Name`=='Vaccination uptake and intention') |>filter(`Indicator Category`=='Probably or definitely will not get vaccinated') |>filter(`Group Name`=='All adults 18+') |>filter(`Time Type`=='Monthly') |>filter(Geography =='National') |>arrange(`Geography Type`, Geography, `Time Period`)# Derive month of data collectionvacc_intention_time <- vacc_intention_time |>mutate(month_split =paste0(str_split(`Time Period`, "-",simplify =TRUE)[, 2],", ", Year),month =mdy(month_split))# Plot hesitancy over timeggplot(vacc_intention_time, aes(x = month, y =`Estimate (%)`,color =`Indicator Category`)) +geom_line() +labs(title ="U.S. Vaccination Response Rates Over Time",x ="Month",y ="Response Rate (%)") +theme(legend.position ="bottom") +guides(color =guide_legend(title ="Indicator Category:"))
However, some segments of the population demonstrated a greater decline in hesitance than others. To better understand population-specific changes in hesitance, we developed the following cleveland dot plot, which breaks down vaccine hesitance for 20 population segments that had the highest initial hesitance 2021.
Code
annual_monthly_snapshots =c("April 22 - May 29","May 1 - May 28","April 30 - May 27")snapshots <- covid |>filter(`Time Period`%in% annual_monthly_snapshots,`Group Name`%in% group_name_toloop, Geography =="National",`Indicator Category`=="Probably or definitely will not get vaccinated" ) |>mutate(Year =as.factor(Year)) |>mutate_at(vars(`Group Name`), ~recode(., !!!group_name_shortening)) |>mutate_at(vars(`Group Category`), ~recode(., !!!group_cat_shortening)) |>mutate(rows =paste(`Group Name`, `Group Category`, sep =" - ")) |>select(rows,Year,`Estimate (%)`) |>arrange(rows, Year) row_order <- snapshots |>pivot_wider(names_from = Year, values_from =`Estimate (%)`) |>drop_na(`2021`) |>drop_na(`2023`) |>mutate(decline_21_23 =`2021`-`2023`) |>arrange((decline_21_23))row_order_list_chg_21_23 <-c(row_order$rows)row_order_list_2021 <- row_order |>arrange(`2021`) |>top_n(20,`2021`) |>select(rows)row_order_list_2021 <- row_order_list_2021$rows row_order_list_2023 <-c(row_order |>arrange((`2023`)) |>select(rows))snapshots <- snapshots[snapshots$rows %in% row_order_list_chg_21_23, ]snapshots |>filter(rows %in% row_order_list_2021) |>ggplot(aes(x =`Estimate (%)`, y =fct_relevel(rows,(row_order_list_2021)),color = Year)) +geom_point(size =1) +labs(title ="20 Most Hesitant Segments in 2021, and Changes in 2022-23",x ="Vaccine hesitance (%)", y ="Population Segments") +theme(legend.position ="bottom",axis.text.x =element_text(size =10),axis.text.y =element_text(size =6), plot.title =element_text(size =10))
It can be seen from the cleveland dot plot that the change in hesitance of the top 20 most hesitant segments in 2021 varies by population segment. To visualize this change better, we also created the bar graph below, which shows the 10 segments that demonstrated the largest decreases in hesitance between 2021 and 2023.
Code
# Bar chart of changesrow_order$rows <-str_wrap(row_order$rows, width =10)row_order |>top_n(10, decline_21_23) |>ggplot(aes(x =fct_reorder(rows,rev(decline_21_23)), y = decline_21_23)) +geom_bar(stat ="identity", fill ="navy", color ="black") +labs(title ="Top 10 Segments with the Largest 2021-23 Decreases",x ="Population Segments", y ="2021-23 % Decrease in Hesitance") +theme_bw() +theme(axis.text.x =element_text(angle =0, vjust =1, hjust=0.5,size=6),axis.text.y =element_text(size=8),axis.title =element_text(size =8),plot.title =element_text(size =8))
The bar chart shows that individuals who did not perceive vaccines as essential had the sharpest decline in hesitance. Other groups who also had large decreases are those who had low confidence in vaccine safety/efficacy and those who contracted COVID.
While more research is required to understand why some of these segments had more substantial decreases in hesitance, we can analyze whether initial hesitance in 2021 is an associated factor. The scatterplot below shows the relationship between hesitance rates in 2021 and the change in hesitance from 2021-23 across population segments.
Code
row_order %>%ggplot(aes(x =`2021`, y = decline_21_23)) +geom_point() +geom_smooth(method ="lm", se =FALSE, color ="blue") +labs(title ="Scatterplot of 2021 hesitance vs. 2021-23 decline by population segment",x ="2021 Hesitance (%)",y ="Decline in hesitance (2021-23) (%)")+stat_poly_eq(formula = y ~ x, # Add R-squared valueaes(label =paste( ..rr.label.., sep ="~~~~")),parse =TRUE,size =4)
Segments with the highest initial hesitance generally showed the greatest reduction in hesitance, with a high R2 of 0.75. This could be because those with the highest hesitance received targeted public healthcare interventions, or simply because they had the greatest scope for reduction in the first place. Determining the exact reason requires further investigation that is beyond the scope of this project.
Similar to Part 1 of our analysis, we are also interested in understanding changes in hesitance by geography. To answer this question, we constructed the following heatmap of changes in hesitance by state.
Code
comparison_baseline_2021 <- covid |>filter(`Year`==2021, `Time Type`=="Monthly",`Time Period`=="April 22 - May 29",`Group Category`=="All adults age 18+ years",`Geography Type`%in%c("Jurisdictional Estimates"),`Indicator Category`=="Probably or definitely will not get vaccinated",!grepl("-", Geography)) |>mutate(n_hesitant =round(`Estimate (%)`/100*`Sample Size`, 0)) |>group_by(Geography) |>summarise(`Sample Size`=sum(`Sample Size`),n_hesitant =sum(n_hesitant)) |>mutate(`Estimate (%)`= n_hesitant /`Sample Size`*100,`Time Period`="2021 most recent snapshot",Year =2021)covid_2023_analysis_allstates <- covid |>filter(`Year`==2023, `Time Type`=="Monthly",`Time Period`=="May 28 - June 30",`Group Category`=="All adults age 18+ years",`Geography Type`%in%c("Jurisdictional Estimates"),`Indicator Category`=="Probably or definitely will not get vaccinated",!grepl("-", Geography)) |>mutate(n_hesitant =round(`Estimate (%)`/100*`Sample Size`, 0)) |>group_by(Geography) |>summarise(`Sample Size`=sum(`Sample Size`),n_hesitant =sum(n_hesitant)) |>mutate(`Estimate (%)`= n_hesitant /`Sample Size`*100,`Time Period`="2023 most recent snapshot",Year =2023)state_delta <-left_join(covid_2023_analysis_allstates, comparison_baseline_2021 |>select(Geography, estimate_2021 =`Estimate (%)`),by ="Geography")state_delta <- state_delta |>mutate(`Change in Hesitancy Rates`=`Estimate (%)`- estimate_2021)state_delta_map <- state_delta |>select(Geography, `Change in Hesitancy Rates`) |>transmute(region =tolower(Geography),value =as.numeric(`Change in Hesitancy Rates`))# red_green_palette <- brewer.pal(n = 7, name = "RdYlGn")state_choropleth(state_delta_map,title =" 2021-2023 Changes in Vaccine Hesitance by State",legend ="Change in \nHesitancy Rates ",num_colors =0)
The heatmap indicates that West Virginia exhibited the greatest overall change in hesitance (-23.3%). Other geographies that had higher hesitance rates in 2021, such as the South East and Northern Mountain West, also had more notable decreases in hesitance. Conversely, some states had almost no change in hesitance. New Mexico, in particular, experienced a slight increase in hesitance of 0.3%.
The scatterplot below shows the relationship between initial hesitance in 2021 and the change in hesitance between 2021-23 by state.
Code
state_delta %>%mutate(`Decline in Hesitancy Rates`=-1*`Change in Hesitancy Rates`) %>%ggplot(aes(x = estimate_2021, y =`Decline in Hesitancy Rates`)) +geom_point() +geom_smooth(method ="lm", se =FALSE, color ="blue") +labs(title ="Scatterplot of 2021 Hesitance vs. 2021-23 Decline by State",x ="2021 Hesitance (%)",y ="2021-23 Decline in Hesitance (%)")+stat_poly_eq(formula = y ~ x, # Add R-squared valueaes(label =paste( ..rr.label.., sep ="~~~~")),parse =TRUE,size =4)
As seen in the heatmap, states with the highest initial hesitance generally showed the greatest decline in hesitance; although, the R-Squared statistic is low (0.15). This relationship is significantly weaker than a similar relationship seen for population segments where the R-Squared was 0.75.
To view the latest distribution of vaccine hesitance by state, we produced one additional heatmap of statewide hesitance rates as of June 30th, 2023.
Code
# Prep dataframe to be used in the state_choropleth functioncovid_2023_analysis_allstates_map <- covid_2023_analysis_allstates |>select(Geography, `Estimate (%)`) |>transmute(region =tolower(Geography), value =`Estimate (%)`)# Display state heatmap of hesitance ratesstate_choropleth(covid_2023_analysis_allstates_map,title =" June 2023 COVID-19 Vaccine Hesitance Rates by State",legend ="Hesitancy Rates (%) ")
This map shows vaccine hesitance is still prevalent in some states, namely in southern states, as well as Wyoming, Wisconsin, Indiana, and Ohio.
In the following section, we summarize our takeaways from Part 1 as well as the time series analysis from Part 2.
3.3 Part 3: High-level recommendations for segments of the population that need further public health interventions.
Our analyses highlighted several key findings:
In 2021, COVID-19 vaccine hesitance was present in all states and many segments of the population. However, some regions and groups were more susceptible to hesitance, as seen by the disparities in hesitance rates between states and population segments.
Some differences between population demographics may have been drivers of vaccine hesitance in 2021. For example, population characteristics such as one’s perception of the importance, safety, and efficacy of vaccines or having unvaccinated friends and family appear to be linked with hesitancy. We recommend further analysis into how someone’s willingness to receive the COVID vaccine may be impacted by the following characteristics:
General perception of vaccination importance
Past history of vaccination
Race/ethnicity
Political views
Living in a rural vs urban community
Being U.S. Born or not
Three years after the beginning of the pandemic, vaccine hesitance persists in most states and some segments of the populations. However, hesitance can be reduced. There was a history of successful reductions in hesitance between 2021-23 nationally (-7%), state-wise (up to -23%), and by population segment (up to -32%). This was especially visible in states like West Virginia, and segments such as those who had reservations about the importance or the safety and efficacy of vaccines. This indicates potential to further reduce hesitance in these segments; although, further study into the causes of these declines in hesitance is required. As of June 2023, the geographies and population segments listed below exhibit the largest hesitance. In the event that future COVID outbreaks were to occur, we recommend prioritizing outreach to these groups to reduce hesitance and mitigate further spread of the disease.
Residents of states with high hesitance rates: Texas, Oklahoma, Arkansas, Wyoming, Wisconsin, Indiana, and Ohio
Individuals who have not been previously vaccinated
Individuals who are not insured
Women who are pregnant and have not been vaccinated
Individuals who are white and between the age of 18 and 49
Individuals living in rural communities
Additional trends in vaccination hesitance over time may exist. In the following section of this report, we provide a tool that visualizes month-by-month hesitance by state and population segment.