The COVID-19 pandemic has resulted in a previously unimaginable impact both in the United States and globally. It is increasingly clear as we enter the fifth month of the pandemic, with rising caseloads and deaths, that the United States will need to continue to address the COVID-19 pandemic for the foreseeable future. Each day, we learn more – about the virus clinically, about the people and communities hardest hit by the virus, and about the social and economic toll the pandemic is expected to take. While the pandemic has exposed the fragility of the US public health surveillance infrastructure, there remain valuable, albeit often disconnected, data resources around the country that can contribute to our collective knowledge of the pandemic's impact. Using data made available by the COVID-19 Research Database, a cross-industry collaborative contributing real world, de-identified data to researchers wishing to study issues related to COVID-19, we analyze COVID-19 test and antibody positivity rates from a sample of electronic health records in office and clinic settings.
Our findings show that, for our sample population, the disparity in infections among Black and Hispanic communities is significantly higher than most current assumptions. Additionally, we observed that patients presenting in an office or clinic setting who test positive are more likely to be younger and less likely to be older than 65. The findings, based on data from March through June 2020, also showed that patients in New York, New Jersey, and Connecticut were most likely to have tested positive compared to other regions in the United States.
Preliminary research at the national level indicates that people of color are more likely to test positive for COVID-19 than white individuals. Two large studies found that Black and Hispanic individuals were up to two and a half times more likely than non-Hispanic white individuals to test positive for COVID-19. In one study, these findings held true "even after accounting for underlying health conditions, other demographics and geographic locations."
These national findings persist among smaller, more locally focused studies. As of July 2020 in Montgomery County, Maryland, Hispanic residents accounted for more than two-thirds of new infections. A Washington Post analysis of data through May 2020 found that Latinos made up about one third of COVID-19 cases in the District of Columbia, Virginia, and Maryland region, even though they only account for about 10 percent of the population. In Northern Virginia's Fairfax county, the Washington Post analysis found that Latinos accounted for 64 percent of COVID-19 cases even though they only account for 16.8 percent of the population.
This study used data from an ambulatory electronic medical record data platform with data for over two hundred community health centers, primary care, immediate/urgent care, and specialty care providers, and is based on 76,969 COVID tests and 10,998 antibody tests. In our sample of tests administered to patients in these settings, nine percent of COVID tests and seven percent of antibody tests were recorded as positive. For more details on our data and methods please refer to the Methods section.
Our analysis supports previous reporting on the disproportionate impact of COVID-19 infections on racial and ethnic minorities. In our sample, Hispanic and Black patients were more likely to test positive for COVID and for COVID antibodies than non-Hispanic white patients were. Figure 1 shows how much more likely Hispanic and Black patients were to test positive compared to white patients. Each dot represents the estimate of how much more or less likely one group of patients is to test positive compared to the reference group of patients (in this figure, white patients are the reference group). Estimates to the left of the vertical dashed line mean the group is less likely to be positive compared to the reference group and estimates to the right of the vertical dashed line mean the group is more likely to be positive compared to the reference group. The width of the horizontal bar indicates the range in certainty of the estimate; if the bar crosses the dashed line, the group did not have a significantly different likelihood of testing positive compared to the reference group. Using this method of analysis allows us to measure the relative strength of the association between testing positive and a patient's race/ethnicity.
The figure above shows that Hispanic patients were over four times more likely to test positive for COVID and over six times more likely to test positive for antibodies compared to white patients. Moreover, Hispanic patients accounted for 16 percent of COVID tests, but represented almost half of positive COVID tests. Similarly, they accounted for 11 percent of antibody tests but 51 percent of positive antibody tests. Black patients were also over twice as likely to test positive for COVID and over three times more likely to test positive for antibodies compared to tests for non-Hispanic white patients.
We aggregated our location data into 10 U.S. regions and assigned each patient to a region based on their state of residence. A list of states by region can be found in the methods section at the bottom of the page. Figure 2 shows the distribution of both COVID and antibody testing by region. The data in our sample are concentrated in the New York "Tri-State" area (New York, New Jersey, and Connecticut) the Mountain states (Utah, Colorado, Idaho, Wyoming, Montana), as well as the Deep South (Arkansas, Mississippi, Louisiana, Alabama).
Figure 3 shows how likely COVID tests in each region were to be positive and how likely antibody tests in each region were to be positive compared to states in the Deep South. Note that there was an insufficient amount of data in New England and the Pacific/South West regions to report results in those areas.
Among the areas included in our dataset, patients in the "Tri-State" region (New York, New Jersey, and Connecticut) were about three times more likely to have a positive COVID test and more than five times more likely to have a positive antibody test than residents of the deep south; this is consistent with reporting that COVID was more prevalent in this region during the time period covered by our data. Patients residing in the Mid-Atlantic region (Pennsylvania, Maryland, Delaware, Virginia, District of Columbia) were twice as likely to have a positive COVID test and forty-seven percent more likely to have a positive antibody test compared to states in the Deep South. As more recent data starts to become available, the geographic patterns we observed here may change.
Finally, Figure 4 shows how likely male patients were to test positive compared to female patients and how likely younger and older age groups were to test positive compared to our reference group of patients 35-54 years old.
Males were twenty-one percent more likely to test positive for COVID compared to females and eighteen percent more likely to test positive for antibodies compared to females. Patients 21 to 34 years old were twelve percent more likely to test positive for COVID and patients 65 years or older were forty-two percent less likely to test positive for COVID compared to our reference group of patients 35-54 years old.
This analysis uses data from a convenience sample of electronic medical records to explore reports that Hispanic and Black communities are bearing an undue burden of the COVID-19 pandemic. We find that the experiences of patients in our sample are consistent with those earlier reports. In particular, the disproportionate positivity rate among Hispanic and Black patients within our sample underscores previous reporting that minority communities have been bearing the brunt of the impact of the pandemic. Prevailing theories as to why rates of COVID-19 positive cases are higher in Hispanic communities include higher likelihood of working essential jobs, higher likelihood of living in densely populated housing, or limited access to healthcare and other public support networks. While this analysis is not equipped to assess these other factors that may be associated with increased likelihood of COVID infection, HCCI will continue to use available data to support the understanding of how the pandemic is affecting people, communities, and the health care system with the goal of informing stakeholders to make decisions that mitigate the harm caused by the pandemic.