July 23, 2015
Medicare Advantage Stars: Are the Grades Fair?
Medicare Advantage (MA) offers seniors a one-stop option for hospital care, outpatient physician visits, and prescription drug coverage. MA is popular; enrollment has increased every year since 2004 and reached 16 million individuals in 2014, which represents 30 percent of the Medicare population. Since 2008 MA plan performance has been rated on a 5-star scale to inform beneficiaries of the quality of plan options, and since 2012 plans with higher ratings receive bonuses that are in part returned to beneficiaries.
A disproportionately high number of enrollees are lower-income and minority beneficiaries. Among minority beneficiaries, Hispanics are twice as likely and African–Americans are 10 percent more likely to enroll in MA. Concern has been raised that the Stars rating system penalizes plans that have large enrollments of Low-Income Subsidy (LIS) and dual-eligible beneficiaries.
In this short paper, we investigate this concern. Our primary findings are that:
· On average, MA contracts with over 30 percent LIS enrollment have a lower rating than other plans by 0.5 Stars;
· Using a more refined statistical technique does not eliminate this finding; we continue to find a statistically and quantitatively significant penalty to observed Stars rating from the presence of a significant LIS enrollment;
· These findings imply a significant financial penalty to those MA plans with a significant LIS enrollment, which in turn reduces the resources available to those vulnerable populations; and
· At the upper end of the spectrum, we estimate a total of more than $470 million in bonus payments lost due to reduced ratings. For individuals, our estimates range from a reduction of $380 to $410 per senior.
Medicare Advantage (MA) offers seniors a one-stop option for hospital care (Part A in traditional Medicare), outpatient physician visits (Part B), and prescription drug coverage (Part D). MA is popular; enrollment has increased every year since 2004 and reached 16 million individuals in 2014, which represents 30 percent of the Medicare population.
Among those who enroll, a disproportionately high number of lower-income and minority beneficiaries opt for MA. This may be in part because MA plans tend to have lower co-pays and deductibles than traditional Medicare. Also, low-income enrollees are less likely to have supplemental coverage (employer plans or Medigap plans) that covers these costs. Among minority beneficiaries, Hispanics are twice as likely and African–Americans are 10 percent more likely to enroll in MA.
As an attempt to identify and reward quality care, since 2008 the Department of Health and Human Services (HHS) has rated Medicare Advantage Organizations’ (MAOs’) performance on a 5-star scale. Beginning in 2012, payment adjustments have been made to plans based on their star rating, with higher rated plans receiving bonuses. Over time the rating needed to receive bonus payments has risen. This year is the first in which an MA plan has to meet 4 Stars in order to receive the bonus (a 5 percent increase in the benchmark payment).
In the MA Stars program, ratings are assigned at the “contract” level. An MA contract contains one or more MA plans across all areas in the county that each plan is available. As a result, star ratings for a contract may include quality measures of beneficiaries served by many different hospital systems and providers with varying levels of MA, per-beneficiary funding.
A concern has arisen that Stars ratings may reflect features other than the quality of care provided to beneficiaries. In particular, plan carriers suggest that plans with significant numbers of low-income beneficiaries (those who receive the Low-Income Subsidy or LIS) receive lower ratings than their peer plans with fewer low-income enrollees. These lower ratings are not due to differences in plan quality but socio-economic barriers low-income enrollees face in achieving health outcomes. There is significant reason to believe that LIS beneficiaries could potentially be the cause of the rating deficit. Individuals with low income have higher mortality rates, higher incidences of disease, and poorer outcomes from health care.
As detailed below, the data support this claim. In our data, the average Star rating for plans with low LIS enrollment is 3.81, and the average for plans with high LIS enrollment is 3.28 – a difference of over one-half Star. (For purposes of our investigation, we define “significant number of low-income beneficiaries” as at least 30 percent of a contract’s total enrollment.)
A rating deficit that results from low-income enrollment would have important public policy implications. The Affordable Care Act (ACA) includes substantial, phased cuts to MA. As those cuts continue, MA contracts that do not meet the criteria for a bonus payment will continue to be squeezed financially, further restricting the contract’s ability to provide the benefits and care management necessary to care for low-income seniors and improve quality measures. In the extreme, plans may simply withdraw from MA. Alternatively, CMS has the authority to terminate at the end of 2016 any MA contracts that score consistently below 3 Stars, which may disproportionately impact low-income beneficiaries. There is a strong case for extending bonus payments to contracts for which the current targets are unfairly out-of-reach as a result of low-income enrollment.
At the same time, there are a number of reasons why this rating deficit may not solely be the result of low-income beneficiaries, some of which are still unrelated to plan quality. Plans with low ratings may be the lowest cost plans and attract low-income enrollees. It could also be the case that the areas in which low-income seniors seek care are generally poor performing areas. These problems should also be public policy concerns.
The Centers for Medicare & Medicaid Services (CMS) examined the relationship between LIS enrollment and low Star Ratings. (In addition to LIS, they studied dual-eligibility and Special Needs Plan (SNP) enrollment as possible sources of the rating deficit.) CMS described that LIS and non-LIS beneficiaries within poor-performing plans did not have significantly different outcomes at the individual level. CMS claims that if a high LIS contract were to be rated solely on non-LIS beneficiaries, the ratings would not substantially improve. Still, CMS identified 7 out of 46 total measures that might plausibly be affected by LIS enrollment. As an interim step, CMS proposed to reduce the weighting on these measures in the Star calculation, thereby increasing the importance of the other 39 measures. The proposal was not implemented and CMS continues to study the issue.
Data and Analysis
We obtained data on 692 contracts for the 2015 plan year. CMS makes publicly available detailed information on contract Star Rating scores for each individual measure and overall results of the CMS Star Rating calculation. We combined this information with other CMS data sets that describe where a given contract’s beneficiaries live, the number of low-income enrollees in a given contract, and the county-level benchmarks used to determine per-beneficiary payment to MA contracts. We also used information published by the Health Resources and Services Administration on Health Professional Shortage Areas (HPSA) to control for regional health care disparities. Our analysis focuses on MA plans that offer a drug benefit and have at least 1,000 beneficiaries. Our data do not provide information on dual-eligibility and we are unable to investigate their impact on Star ratings.
To begin, we define “significant number of low-income beneficiaries” as at least 30 percent of a contract’s total enrollment. As noted above, there is a clear difference in the Star ratings between those without, those with, a significant number of LIS. In our data, the average Star rating for plans with low LIS enrollment is 3.81, and the average for plans with high LIS enrollment is 3.28 – a difference of over one-half Star.
The remaining question is whether this difference is a statistical artifact or a durable difference that merits policy attention. To address this, our analysis divides the 46 rating measures into 23 measures that are unaffected by beneficiary behavior and 23 measures that could be plausibly affected by beneficiary behavior., We identified these measures based on whether the measure criteria required proactive action by the beneficiary, such as coming into the office for a flu shot or adhering to medication prescriptions, rather than criteria that were mostly in the control of hospitals, such as prescribing the correct medication in the event of a certain heart condition. We also assume that case-mix adjusted measures are unaffected by beneficiary behavior. This set of measures is not the only division of the measures that one could choose, but explorations of other approaches suggest that our results are relatively robust to these decisions.
Our strategy is to assume that MA contracts have inherent quality that can be measured by the weighted average of a contract’s ratings on the set of measures identified as unaffected by beneficiary behavior. That is, we use this metric as a good prediction of the “true” quality of an MA contract. We find that contracts with high LIS enrollment perform worse, on average, than low LIS enrollment plans in this measure of quality.
Our analysis then employs a linear regression to see whether – contingent on our measure of inherent quality – there is a statistically significant reduction in the observed Stars ratings. In addition to plan quality, we also control for the number of beneficiaries that live in an HPSA. Since the overall Star Rating provided by CMS is rounded to the nearest half star, we reconstruct an un-rounded version of the overall star rating according to the CMS formula. Our reconstruction is not perfect. We correctly estimate 91 percent of the overall star ratings and estimate a rating within 0.5 of the remaining plans. We also performed our analysis using the CMS rounded star ratings and found similar results.
Our most important finding is that in the presence of our other controls LIS enrollment has a negative, statistically significant impact on the overall Stars Rating. That is, the more sophisticated statistical approach confirms the finding in the raw data that contracts serving this demographic are at a disadvantage in receiving bonuses. 
The fundamental concern is that the bias in Stars ratings will translate into fewer resources for LIS beneficiaries and reduced access to quality care. To get a sense of the magnitudes, we estimate that there are $7.3 billion in total possible payments if every MA plan received a bonus payment. Of this, roughly $2.3 billion in payments are not made because plans achieve less than 4 Stars, while 107 out of 128 high-LIS plans receive less than 4 stars. Focusing more closely on those plans that might not receive bonuses strictly because of the LIS-induced bias, it appears that $470 million in payments are forfeited by high-LIS enrollment plans by only achieving 3.5 stars.
How large is the financial impact? Our analysis does not provide much insight into the remaining portion of the half-star gap between plans with high- and low-LIS enrollment. There are many variables that could influence a contract’s rating that we cannot observe. We do not have a perfect measure for plan quality, and we also lack detailed statistics on the demographics of contract enrollment, such as ethnicity, gender, age, disability, and detailed income information. In light of this uncertainty, we focus on a range of possible outcomes rather than a single number.
At the upper end of the spectrum, we estimate that in 2015, 44 MA contracts with high-LIS enrollment—covering nearly 1 million beneficiaries—will lose a total of more than $470 million in payment due to missing the bonus payment cutoff by less than a half-star. That payment reduction will correspond to benefit reduction of roughly $380 for each senior.
In our regression framework, moving from a low-LIS to a high-LIS enrollment reduces the Star rating by an estimated 0.18 Stars. In addition, we find that contracts in which 20 percent of beneficiaries live in HPSAs could lead to a lower star rating of as much as 0.12 stars. Using our regression-based estimate would translate to average additional benefits of $410 per senior.
These suggest that there is some merit to adjusting the Star Rating calculation. A simple adjustment, which would be the easiest to implement as a temporary legislative solution, would apply an adjustment to the overall rating calculation. A straightforward approach would be to increase the Star Rating of any MA contract on the basis of the fraction of LIS enrollment, ranging from a minimum of 0.18 Stars to a maximum adjustment of 0.5 Stars.
A better solution could establish a method for case-mix adjusting specific measures, rather that reducing weights as suggested by CMS. We identify six measures that are candidates for such an adjustment. (The set of measures that we identify has some overlap with the set of measures identified by CMS). Adjusting specific measures for high-risk populations comes with a risk of reducing the incentive for hospitals and insurance companies to improve the care for those populations. Accordingly, these efforts should be carefully researched and tested.
 As a practical matter, the Stars bonus payments have served to in part offset ongoing reductions in the benchmark payments to MA plans under the Affordable Care Act.
 Centers for Disease Control and Prevention, “REACH 2010 Surveillance for Health Status in Minority Communities — United States, 2001—2002,” August 24, 2004, available at: http://www.ncbi.nlm.nih.gov/pubmed/15329648; Smith GD et al, “Socioeconomic differentials in mortality risk among men screened for the Multiple Risk Factor Intervention Trial: I. White men,” April 1996, available at: http://www.ncbi.nlm.nih.gov/pubmed/8604778
 Out of 394 MA contracts that include a drug benefit, we drop 10 contracts for low enrollment.
 The 23 unaffected by beneficiary behavior are Monitoring Physical Activity, Adult BMI Assessment, Care for Older Adults – Medication Review, Osteoporosis Management in Women who had a Fracture, Rheumatoid Arthritis Management, Improving Bladder Control, Reducing the Risk of Falling, Plan All-Cause Readmissions, Getting Needed Care, Getting Appointments and Care Quickly, Customer Service, Rating of Health Care Quality, Rating of Health Plan, Care Coordination, Plan Makes Timely Decisions about Appeals,
Reviewing Appeals Decisions, Appeals Auto–Forward, Appeals Upheld, Rating of Drug Plan, Getting Needed Prescription Drugs, MPF Price Accuracy, High Risk Medication, and Diabetes Treatment. The remainder are Colorectal Cancer Screening, Care – Cholesterol Screening, Diabetes Care – Cholesterol Screening, Annual Flu Vaccine, Improving or Maintaining Physical Health, Improving or Maintaining Mental Health, Special Needs Plan (SNP) Care Management, Care for Older Adults – Functional Status Assessment, Care for Older Adults – Pain Assessment, Diabetes Care – Eye Exam, Diabetes Care – Kidney Disease Monitoring, Diabetes Care – Blood Sugar Controlled, Diabetes Care – Cholesterol Controlled, Controlling Blood Pressure, Complaints about the Health Plan, Choosing to Leave the Plan, Plan Quality Improvement, Complaints about the Drug Plan, Choosing to Leave the Plan, Plan Quality Improvement, Medication Adherence for Diabetes Medications, Medication Adherence for Hypertension (RAS antagonists), and Medication Adherence for Cholesterol (Statins).
 We tested the sensitivity of our results to classifying Plan All-Cause Readmissions as unaffected by beneficiary behavior. The results do not change based on this classification.
 Throughout the analysis, we weight particular measures according to the CMS Star Rating formula.
 Our results are not sensitive to the inclusion or exclusion of a measure of special needs or the HPSA variable.
 We focus in LIS status. CMS has the ability to identify dual-eligibles in contracts, allowing a broader investigation of the impacts of socioeconomic factors.
 Our estimate is somewhat smaller than one in a McKinsey study that indicates plans with less than 4 stars forfeit $3.7 billion. http://healthcare.mckinsey.com/sites/default/files/2015%20MA%20Stars%20Intel%20Brief%20-%20McKinsey%20Reform%20Center%20-%20110514B.pdf
 We tested whether there was an extra sensitivity of the Stars rating to very high (over 50 percent) LIS enrollment. The data show no difference in the impact on the Stars rating for those plans.
 We believe that the 6 candidate measures for adjustment are Colorectal Cancer Screening, Annual Flu Vaccine, Diabetes Care – Eye Exam, Diabetes Care – Blood Sugar Controlled, Diabetes Care – Cholesterol Controlled, and Medication Adherence for Hypertension.