Homework 1

Due by 11:59 PM on Tuesday, February 1, 2022

Instructions

This assignment is just about GitHub and data management. The goal is to give you a chance to practice wrangling and tidying data. We do this very early in the class because soon (next week) we will start doing some empirical analysis using real data. The sooner you are comfortable with the datasets, the better.

Please turn in your empirical work as a PDF in Canvas. I recommend creating this PDF using R Markdown. If you are not familiar with R Markdown, take a look at the following resources:

If you’re new to R Markdown, then there will be some growing pains. Please be patient with yourself and stick to it. One of the goals of the course is to develop good workflow habits. This means doing work in a way that minimizes mistakes and maximizes reproducability. R Markdown exists exactly for these reasons. It allows you to keep your data analysis and the description of that analysis together in a single document or group of documents. It’s really easy to introduce mistakes when copying and pasting empirical results into a text document, and it’s even easier to waste hours (days/weeks?) re-pasting results into your text document and re-writing the results accordingly. Doing everything in R Markdown avoids all of these issues. In the long run, it’s much more efficient.

Note that, for your submission, I do not need to see all of your code. You can include snippets of code files to highlight the most important parts of your work, but please don’t include all of your code or large printouts of data in the final PDF submission.

Building the data

The purpose of this part of the assignment is essentially to practice database management. Most of your professional lives will likely involve managing data. It can be tedious but also extremely rewarding when you finally get to find out what’s going on in the analysis stage. Anyway, let’s get to work! All of these questions require you to use the Medicare Advantage GitHub Repo.

Enrollment Data

Run the R code to organize the Monthly Plan Enrollment Data. Once you’ve created your final dataset (it’s called full_ma_data in my code), answer the following:

How many observations exist in your current dataset?
How many different plan_types exist in the data?
Provide a table of the count of plans under each plan type in each year. Your table should look something like Table 1.

knitr::kable(test.data, col.names=c("2010","2011","2012","2013","2014","2015"),
             type="html", caption = "Plan Count by Year", booktabs = TRUE)

Table 1: Plan Count by Year
	2010	2011	2012	2013	2014	2015
Type 1	21	32	14	47	35	21
Type 2	42	33	13	24	14	42
Type 3	21	27	40	24	32	21

Remove all special needs plans (SNP), employer group plans (eghp), and all “800-series” plans. Provide an updated version of Table 1 after making these exclusions.
Merge the contract service area data to the enrollment data, and restrict the data only to contracts that are approved in their respective counties. The R script to create the service area dataset is here: Contract Service Area. And you can follow the _BuildFinalData.r script to see where/how I join the datasets. The join takes place in the following section of the code:

final.data <- full.ma.data %>%
  inner_join(contract.service.area %>% 
               select(contractid, fips, year), 
             by=c("contractid", "fips", "year")) %>%
  filter(!state %in% c("VI","PR","MP","GU","AS","") &
           snp == "No" &
           (planid < 800 | planid >= 900) &
           !is.na(planid) & !is.na(fips))

Finally, limit your dataset only to plans with non-missing enrollment data. Provide a graph showing the average number of Medicare Advantage enrollees per county from 2008 to 2015. Be sure to format your graph in a meaningful way as we did in class.

Premium Data

Now we’re going to incorporate the plan premium information. This is part of the “Plan Characteristics” data, and the underlying R scripts for these files can be found here: Plan Characteristics.

Merge the plan characteristics data to the dataset you created in Step 6 above. Note that you’ll need to join the Market Penetration Data in order to get the information you need to merge the plan characteristics. This is because the plan characteristics data only have state name and county (not FIPS codes). The penetration files have both FIPS codes and state/county names, so that dataset serves as a good crosswalk file.
Provide a graph showing the average premium over time. Don’t forget about formatting!
Provide a graph showing the percentage of $0 premium plans over time. Also…remember to format things.

Summary Questions

With all of this data work and these great summaries, let’s take a step back and think about what all this means.

Why did we drop the “800-series” plans?
Why do so many plans charge a $0 premium? What does that really mean to a beneficiary?
Briefly describe your experience working with these data (just a few sentences). Tell me one thing you learned and one thing that really aggravated you.