With a current COVID-19 pandemia we have a lot of discussion in media about vaccination effect. Does it help or not? We have polarized views on different sides of political spectrum. I decided to check the data which I can access on the moment.
Sources, Data Download
The data on mortality is taken from here:
https://github.com/GeoDaCenter/covid/tree/master/public/csv
The date at which the set was retrieved is printed below. I included it in the code because the information is updated regularly. I kept only numbers for the last 2 months.
library(ggplot2)
library(lubridate, warn.conflicts = FALSE)
today()
## [1] "2021-08-12"
url <- "https://raw.githubusercontent.com/GeoDaCenter/covid/
master/public/csv/covid_deaths_usafacts_state.csv"
covid_death_full = read.csv(url, stringsAsFactors = FALSE)
dim(covid_death_full)
## [1] 51 569
names(covid_death_full)[1:5]
## [1] "State" "StateFIPS" "X2020.01.22" "X2020.01.23" "X2020.01.24"
N <- ncol(covid_death_full)
covid = covid_death_full[, c(1, (N-61):N)]
rm(covid_death_full)
N <- ncol(covid)
dim(covid)
## [1] 51 63
What do we have as our covid-19 data?
head(covid[, c(1, 56:61)])
## State X2021.08.03 X2021.08.04 X2021.08.05 X2021.08.06 X2021.08.07 X2021.08.08
## 1 AL 11542 11561 11574 11600 11600 11600
## 2 AK 382 382 390 390 390 390
## 3 AZ 18282 18289 18300 18342 18342 18342
## 4 AR 6215 6230 6247 6269 6269 6269
## 5 CA 63950 64001 64055 64091 64091 64091
## 6 CO 6956 6963 6970 6970 6970 6970
Here we have numbers of accumulated daily death counts. I will use these numbers, hoping that diagnostics approaches are consistent across states. State population counts are obtained from here as a csv file: https://worldpopulationreview.com/states
census_data=read.csv("US-State-population-2020.csv")
str(census_data)
## 'data.frame': 52 obs. of 9 variables:
## $ ï..rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ State : chr "California" "Texas" "Florida" "New York" ...
## $ Pop : int 39613493 29730311 21944577 19299981 12804123 12569321 11714618 10830007 10701022 9992427 ...
## $ Growth : num 0.0038 0.0385 0.033 -0.0118 0.0003 -0.0121 0.0033 0.0303 0.0308 0.0008 ...
## $ Pop2018 : int 39461588 28628666 21244317 19530351 12800922 12723071 11676341 10511131 10381615 9984072 ...
## $ Pop2010 : int 37319502 25241971 18845537 19399878 12711160 12840503 11539336 9711881 9574323 9877510 ...
## $ growthSince2010: num 0.0615 0.1778 0.1644 -0.0051 0.0073 ...
## $ Percent : num 0.1184 0.0889 0.0656 0.0577 0.0383 ...
## $ density : num 254 114 409 410 286 ...
Note that I have 52 rows and not 51, as in a previous data set. I need to drop “Puerto Rico”.
popltns= census_data[census_data$State !="Puerto Rico", c("State", "Pop", "density")]
Now for vaccination information. I worked with text information from here:
https://www.beckershospitalreview.com/public-health/
I got it on July, 18, 2021. I considered only fully vaccinated. I added delay between vaccination statistics and mortality counts because a person does not get full immunity for the first 2 weeks, plus the disease itself takes time. I wrangled the data into a csv file, but I skipped the code for this because the process was simple, and I do not want my post to be too long.
vaccinated=read.csv("vaccinated.2021.07.18.csv")
I would prefer for my files to contain full state names and abbreviations. I will combine my data in one data frame.
StateAbbrev <- read.csv("state-abbreviations.csv")
popltns <- merge(popltns, StateAbbrev)
df <- merge(popltns,vaccinated)
Comparing with last month data.
Let us compute mortality rates for the last month.
last_month <- data.frame(st = covid$State,
last_month = covid[,N] - covid[,(N-31)])
df <- merge(df, last_month, by.x="st_abbr", by.y= "st", all =TRUE)
df$mortality_last_month <- df$last_month*1e05/df$Pop
For a plot I modified sizes of points to be proportional to state population densities. Our outliers appear mostly not densely populated or with very high density.
ggplot(data=df, aes(x=Vaccinated_percentage, y =mortality_last_month, size = density)) +
geom_point(color ="green") + ylim(-1, 7.6) +
geom_smooth(method='loess', level=.95, span = 0.75, color='darkgreen') +
geom_text(aes(label=st_abbr), hjust=-.3, vjust=0, size=3
#, check_overlap =TRUE
) +
xlab("Vaccinated percentage") + ylab("Last month mortality per 100,000")+
theme(legend.position = "none")
As we see when percentages of vaccination are lower, it appears to be more helpful. One of possible reasons is that senior citizens got vaccines first and foremost, and they benefited greatly.
I would like to add state colors: reds, blues and purples. I tried to define colors by the last 20 years of presidential elections. If you believe that I made mistakes, please comment.
df$color[df$st_abbr=="DC"] = "blue"
rep.st <- c('Alabama', 'Alaska', 'Arkansas', 'Georgia', 'Idaho', 'Kansas',
'Kentucky', 'Louisiana', 'Mississippi', 'Missouri', 'Montana', 'Nebraska',
'North Dakota', 'Oklahoma', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas',
'Utah', 'West Virginia', 'Wyoming')
df$col="blue"
df$col[df$State %in% rep.st] <- "red"
df$col[df$st_abbr %in% c('AR', 'AZ', 'GA', 'FL', 'NC', 'OH', 'VA')] <- "purple4"
Let us check for the last month.
ggplot(data=df, aes(x=Vaccinated_percentage, y =mortality_last_month
, colour=col)) +
ylim(-1, 7.6) +
geom_point( size=2) +
scale_color_manual(breaks = c("red", "purple4", "blue"),
values=c("red", "purple4", "blue"))+
geom_smooth(method='lm', level=.95, aes(fill= col)) +
scale_fill_manual(breaks = c("red", "purple4", "blue"),
values=c("red", "darkmagenta", "deepskyblue"))+
geom_text(aes(label=st_abbr), hjust=-.3, vjust=0, size=3
#, check_overlap =TRUE
) +
xlab("Vaccinated percentage") + ylab("Last month mortality per 100,000")+
theme(legend.position = "none")
If you are curious where the blue line intersects 0, I estimated it around 95%. Although 3 week ago it was about 71%.