Mathematician in Data Science: August 2021

Vaccinated-VS-Mortality-Rate.knit

With a current COVID-19 pandemia we have a lot of discussion in media about vaccination effect. Does it help or not? We have polarized views on different sides of political spectrum. I decided to check the data which I can access on the moment.

Sources, Data Download

The data on mortality is taken from here:

https://github.com/GeoDaCenter/covid/tree/master/public/csv

The date at which the set was retrieved is printed below. I included it in the code because the information is updated regularly. I kept only numbers for the last 2 months.

library(ggplot2)
library(lubridate, warn.conflicts = FALSE)
today()

## [1] "2021-08-12"

url <- "https://raw.githubusercontent.com/GeoDaCenter/covid/
master/public/csv/covid_deaths_usafacts_state.csv"
covid_death_full = read.csv(url, stringsAsFactors = FALSE)
dim(covid_death_full)

## [1]  51 569

names(covid_death_full)[1:5]

## [1] "State"       "StateFIPS"   "X2020.01.22" "X2020.01.23" "X2020.01.24"

N <- ncol(covid_death_full)
covid = covid_death_full[, c(1, (N-61):N)]
rm(covid_death_full)
N <- ncol(covid)
dim(covid)

## [1] 51 63

What do we have as our covid-19 data?

head(covid[, c(1, 56:61)])

##   State X2021.08.03 X2021.08.04 X2021.08.05 X2021.08.06 X2021.08.07 X2021.08.08
## 1    AL       11542       11561       11574       11600       11600       11600
## 2    AK         382         382         390         390         390         390
## 3    AZ       18282       18289       18300       18342       18342       18342
## 4    AR        6215        6230        6247        6269        6269        6269
## 5    CA       63950       64001       64055       64091       64091       64091
## 6    CO        6956        6963        6970        6970        6970        6970

Here we have numbers of accumulated daily death counts. I will use these numbers, hoping that diagnostics approaches are consistent across states. State population counts are obtained from here as a csv file: https://worldpopulationreview.com/states

census_data=read.csv("US-State-population-2020.csv")
str(census_data)

## 'data.frame':    52 obs. of  9 variables:
##  $ ï..rank        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ State          : chr  "California" "Texas" "Florida" "New York" ...
##  $ Pop            : int  39613493 29730311 21944577 19299981 12804123 12569321 11714618 10830007 10701022 9992427 ...
##  $ Growth         : num  0.0038 0.0385 0.033 -0.0118 0.0003 -0.0121 0.0033 0.0303 0.0308 0.0008 ...
##  $ Pop2018        : int  39461588 28628666 21244317 19530351 12800922 12723071 11676341 10511131 10381615 9984072 ...
##  $ Pop2010        : int  37319502 25241971 18845537 19399878 12711160 12840503 11539336 9711881 9574323 9877510 ...
##  $ growthSince2010: num  0.0615 0.1778 0.1644 -0.0051 0.0073 ...
##  $ Percent        : num  0.1184 0.0889 0.0656 0.0577 0.0383 ...
##  $ density        : num  254 114 409 410 286 ...

Note that I have 52 rows and not 51, as in a previous data set. I need to drop “Puerto Rico”.

popltns= census_data[census_data$State !="Puerto Rico", c("State", "Pop", "density")]

Now for vaccination information. I worked with text information from here:

https://www.beckershospitalreview.com/public-health/

I got it on July, 18, 2021. I considered only fully vaccinated. I added delay between vaccination statistics and mortality counts because a person does not get full immunity for the first 2 weeks, plus the disease itself takes time. I wrangled the data into a csv file, but I skipped the code for this because the process was simple, and I do not want my post to be too long.

vaccinated=read.csv("vaccinated.2021.07.18.csv")

I would prefer for my files to contain full state names and abbreviations. I will combine my data in one data frame.

StateAbbrev <- read.csv("state-abbreviations.csv")
popltns <-  merge(popltns, StateAbbrev)
df <- merge(popltns,vaccinated)

Comparing with last month data.

Let us compute mortality rates for the last month.

last_month <- data.frame(st = covid$State,  
                            last_month = covid[,N] - covid[,(N-31)])
df <-  merge(df, last_month, by.x="st_abbr", by.y= "st", all =TRUE)
df$mortality_last_month <- df$last_month*1e05/df$Pop

For a plot I modified sizes of points to be proportional to state population densities. Our outliers appear mostly not densely populated or with very high density.

ggplot(data=df, aes(x=Vaccinated_percentage, y =mortality_last_month, size = density)) +
    geom_point(color ="green") + ylim(-1, 7.6) +
    geom_smooth(method='loess', level=.95, span = 0.75, color='darkgreen') +
    geom_text(aes(label=st_abbr), hjust=-.3, vjust=0, size=3
              #, check_overlap =TRUE
              ) + 
    xlab("Vaccinated percentage") + ylab("Last month mortality per 100,000")+
    theme(legend.position = "none")

As we see when percentages of vaccination are lower, it appears to be more helpful. One of possible reasons is that senior citizens got vaccines first and foremost, and they benefited greatly.

I would like to add state colors: reds, blues and purples. I tried to define colors by the last 20 years of presidential elections. If you believe that I made mistakes, please comment.

df$color[df$st_abbr=="DC"] = "blue"

rep.st <- c('Alabama', 'Alaska', 'Arkansas', 'Georgia', 'Idaho', 'Kansas', 
'Kentucky', 'Louisiana', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 
'North Dakota', 'Oklahoma', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 
'Utah', 'West Virginia', 'Wyoming')

df$col="blue"
df$col[df$State %in% rep.st] <- "red"
df$col[df$st_abbr %in% c('AR', 'AZ', 'GA', 'FL', 'NC', 'OH', 'VA')] <- "purple4"

Let us check for the last month.

ggplot(data=df, aes(x=Vaccinated_percentage, y =mortality_last_month
                    , colour=col)) + 
    ylim(-1, 7.6) + 
    geom_point( size=2) + 
    scale_color_manual(breaks = c("red", "purple4", "blue"),
                        values=c("red", "purple4", "blue"))+
    geom_smooth(method='lm', level=.95, aes(fill= col)) +
    scale_fill_manual(breaks = c("red", "purple4", "blue"),
                        values=c("red", "darkmagenta", "deepskyblue"))+
    geom_text(aes(label=st_abbr), hjust=-.3, vjust=0, size=3
              #, check_overlap =TRUE
              ) + 
    xlab("Vaccinated percentage") + ylab("Last month mortality per 100,000")+
    theme(legend.position = "none")

If you are curious where the blue line intersects 0, I estimated it around 95%. Although 3 week ago it was about 71%.

Mathematician in Data Science

Thursday, August 12, 2021

Covid_19: Vaccination Rate VS Mortality Rate

Sources, Data Download

Comparing with last month data.