Publications
HCCI’s original reports powered by #HCCIdata

Analysis of Electronic Medical Record Data Shows Significantly Higher Rates of COVID-19 Infection among Hispanic and Black Patients

The COVID-19 pandemic has resulted in a previously unimaginable impact both in the United States and globally. It is increasingly clear as we enter the fifth month of the pandemic, with rising caseloads and deaths, that the United States will need to continue to address the COVID-19 pandemic for the foreseeable future. Each day, we learn more – about the virus clinically, about the people and communities hardest hit by the virus, and about the social and economic toll the pandemic is expected to take. While the pandemic has exposed the fragility of the US public health surveillance infrastructure, there remain valuable, albeit often disconnected, data resources around the country that can contribute to our collective knowledge of the pandemic's impact. Using data made available by the COVID-19 Research Database, a cross-industry collaborative contributing real world, de-identified data to researchers wishing to study issues related to COVID-19, we analyze COVID-19 test and antibody positivity rates from a sample of electronic health records in office and clinic settings.

Our findings show that, for our sample population, the disparity in infections among Black and Hispanic communities is significantly higher than most current assumptions. Additionally, we observed that patients presenting in an office or clinic setting who test positive are more likely to be younger and less likely to be older than 65. The findings, based on data from March through June 2020, also showed that patients in New York, New Jersey, and Connecticut were most likely to have tested positive compared to other regions in the United States. 

What we know from the literature 

Preliminary research at the national level indicates that people of color are more likely to test positive for COVID-19 than white individuals. Two large studies found that Black and Hispanic individuals were up to two and a half times more likely than non-Hispanic white individuals to test positive for COVID-19. In one study, these findings held true "even after accounting for underlying health conditions, other demographics and geographic locations."

These national findings persist among smaller, more locally focused studies. As of July 2020 in Montgomery County, Maryland, Hispanic residents accounted for more than two-thirds of new infections. A Washington Post analysis of data through May 2020 found that Latinos made up about one third of COVID-19 cases in the District of Columbia, Virginia, and Maryland region, even though they only account for about 10 percent of the population. In Northern Virginia's Fairfax county, the Washington Post analysis found that Latinos accounted for 64 percent of COVID-19 cases even though they only account for 16.8 percent of the population.

What we found in our analysis: percent of positive tests 

This study used data from an ambulatory electronic medical record data platform with data for over two hundred community health centers, primary care, immediate/urgent care, and specialty care providers, and is based on 76,969 COVID tests and 10,998 antibody tests. In our sample of tests administered to patients in these settings, nine percent of COVID tests and seven percent of antibody tests were recorded as positive. For more details on our data and methods please refer to the Methods section.  

What we found in our analysis: race and ethnicity 

Our analysis supports previous reporting on the disproportionate impact of COVID-19 infections on racial and ethnic minorities. In our sample, Hispanic and Black patients were more likely to test positive for COVID and for COVID antibodies than non-Hispanic white patients were. Figure 1 shows how much more likely Hispanic and Black patients were to test positive compared to white patients. Each dot represents the estimate of how much more or less likely one group of patients is to test positive compared to the reference group of patients (in this figure, white patients are the reference group). Estimates to the left of the vertical dashed line mean the group is less likely to be positive compared to the reference group and estimates to the right of the vertical dashed line mean the group is more likely to be positive compared to the reference group. The width of the horizontal bar indicates the range in certainty of the estimate; if the bar crosses the dashed line, the group did not have a significantly different likelihood of testing positive compared to the reference group. Using this method of analysis allows us to measure the relative strength of the association between testing positive and a patient's race/ethnicity.  

The figure above shows that Hispanic patients were over four times more likely to test positive for COVID and over six times more likely to test positive for antibodies compared to white patients. Moreover, Hispanic patients accounted for 16 percent of COVID tests, but represented almost half of positive COVID tests. Similarly, they accounted for 11 percent of antibody tests but 51 percent of positive antibody tests. Black patients were also over twice as likely to test positive for COVID and over three times more likely to test positive for antibodies compared to tests for non-Hispanic white patients.  

What we found in our analysis: geography

We aggregated our location data into 10 U.S. regions and assigned each patient to a region based on their state of residence. A list of states by region can be found in the methods section at the bottom of the page. Figure 2 shows the distribution of both COVID and antibody testing by region. The data in our sample are concentrated in the New York "Tri-State" area (New York, New Jersey, and Connecticut) the Mountain states (Utah, Colorado, Idaho, Wyoming, Montana), as well as the Deep South (Arkansas, Mississippi, Louisiana, Alabama).  

Figure 3 shows how likely COVID tests in each region were to be positive and how likely antibody tests in each region were to be positive compared to states in the Deep South. Note that there was an insufficient amount of data in New England and the Pacific/South West regions to report results in those areas.

Among the areas included in our dataset, patients in the "Tri-State" region (New York, New Jersey, and Connecticut) were about three times more likely to have a positive COVID test and more than five times more likely to have a positive antibody test than residents of the deep south; this is consistent with reporting that COVID was more prevalent in this region during the time period covered by our data. Patients residing in the Mid-Atlantic region (Pennsylvania, Maryland, Delaware, Virginia, District of Columbia) were twice as likely to have a positive COVID test and forty-seven percent more likely to have a positive antibody test compared to states in the Deep South. As more recent data starts to become available, the geographic patterns we observed here may change.

What we found in our analysis: age and sex

Finally, Figure 4 shows how likely male patients were to test positive compared to female patients and how likely younger and older age groups were to test positive compared to our reference group of patients 35-54 years old.  

Males were twenty-one percent more likely to test positive for COVID compared to females and eighteen percent more likely to test positive for antibodies compared to females. Patients 21 to 34 years old were twelve percent more likely to test positive for COVID and patients 65 years or older were forty-two percent less likely to test positive for COVID compared to our reference group of patients 35-54 years old. 

What this analysis offers: a discussion

This analysis uses data from a convenience sample of electronic medical records to explore reports that Hispanic and Black communities are bearing an undue burden of the COVID-19 pandemic. We find that the experiences of patients in our sample are consistent with those earlier reports. In particular, the disproportionate positivity rate among Hispanic and Black patients within our sample underscores previous reporting that minority communities have been bearing the brunt of the impact of the pandemic. Prevailing theories as to why rates of COVID-19 positive cases are higher in Hispanic communities include higher likelihood of working essential jobs, higher likelihood of living in densely populated housing, or limited access to healthcare and other public support networks. While this analysis is not equipped to assess these other factors that may be associated with increased likelihood of COVID infection, HCCI will continue to use available data to support the understanding of how the pandemic is affecting people, communities, and the health care system with the goal of informing stakeholders to make decisions that mitigate the harm caused by the pandemic.

Methods

Methods

We examined laboratory orders from a convenience sample of electronic medical records1 provided by a data management platform vendor with order dates between March 1 and June 30, 2020. Our unit of analysis is a unique “patient laboratory order” + “order date” which means a patient who gets tested more than once on different days may appear in the data more than once. The data cover orders from 3.7 million patients seen by over 200 group practices or clinics for all patients seen by the practice or clinic regardless of payer type. The majority of encounters on the laboratory order were classified as an office visit.

We divided the data into 10 U.S. regions. Table 1 shows the states included in each region and the proportion of data contained in each region. Our sample did not include sufficient data from states in New England and states in the Pacific/South West, so we excluded those regions from the analysis. Figure 5 shows the distribution of COVID and antibody tests by region.

Table 1. Distribution of All Tests Administered (COVID and Antibody) by U.S. Region

Region

States

Percent Data Distribution

Deep South

Arkansas, Mississippi, Louisiana, Alabama

40

Tri-State

New York, New Jersey, and Connecticut

18

Mountain

Utah, Colorado, Idaho, Wyoming, Montana

14

Mid-Atlantic

Pennsylvania, Maryland, Delaware, Virginia, District of Columbia

11

Costal South

North Carolina, South Carolina, Georgia, Florida

5

Appalachian

Kentucky, Tennessee, West Virginia

4

Great Lakes

Michigan, Ohio, Illinois, Minnesota, Indiana, Wisconsin

3

Plains

North Dakota, South Dakota, Kansas, Oklahoma, Missouri, Nebraska, Iowa

3

Pacific/South West

California, Oregon, Washington, Hawaii, Alaska, Nevada, Arizona, New Mexico, Texas

1

New England

Massachusetts, Maine, New Hampshire, Vermont, Rhode Island

<1

fig5 

Using a series of regular expression queries, we searched for two types of tests that are used to determine COVID-19 infection. We identified 93,039 nasopharyngeal tests (“COVID test”) that are used to determine whether a patient currently has the virus that causes COVID-19, and we identified 14,576 blood serum antibody tests (“antibody test”) that are used to help determine if a patient has previously been infected with the virus that causes COVID-19. Key search terms were compiled from a list of COVID-19 PCR and COVID-19 antibody test descriptions at the FDA, and can be found here. After excluding the unknown test results, we retained 76,969 COVID tests and 10,998 antibody tests. This represents a positive rate of 11 percent and 9 percent for COVID and antibody tests, respectively. Figure 6 shows the distribution of all tests with a known result, by demographic characteristic. Table 2 shows the percent of positive results by demographic characteristic. Note, the percent positive is based on the number of positive tests among all tests with a known result.

fig6

Table 2. COVID and Antibody Test Results with Known Results by Demographic Characteristic

Characteristic

N

COVID Test (N=76,969) Percent Positive

Antibody Test (N=10,998)

Percent Positive

Total

87,967

11

9

Sex

     

Female

52,822

11

9

Male

35,145

12

9

Age

     

Less than 21

22,616

13

12

21 to 34

15,328

13

11

35 to 54

15,691

13

9

55 to 64

16,539

11

7

65 and older

17,793

5

6

Race and Ethnicity

     

Non-Hispanic, white

40,255

5

4

Non-Hispanic, Black

13,069

11

13

Hispanic

13,603

34

41

Unknown, other

21,040

9

5

We used a pair of logistic regressions to model the likelihood of a COVID test being positive and the likelihood of an antibody test being positive, adjusting for age, sex, race and ethnicity, and region of the country. Table 3 shows the results from the models. The odds ratios below represent the exponentiated coefficients of the models; we include the 95% confidence interval in parentheses. Models were run using the logit model procedure from SAS (SAS Institute, Carry, NC). The concordance statistics from the COVID model and the antibody model were 0.785 and 0.824, respectively, indicating a good to strong model fit.

Table 3. Results from Logistic Regression Models

Characteristic

COVID Model

OR (95% CI)

Antibody Model

OR (95% CI)

Sex

   

Female

Ref

Ref

Male

1.21 (1.15 – 1.27)

1.18 (1.02 – 1.38)

Age

   

Less than 21

1.00 (0.93 – 1.07)

1.02 (0.82 – 1.28)

21 to 34

1.12 (1.04 – 1.21)

1.01 (0.77 – 1.31)

35 to 54

Ref

Ref

55 to 64

0.99 (0.91 – 1.07)

0.99 (0.79 – 1.25)

65 and older

0.58 (0.53 – 0.64)

0.98 (0.77 – 1.28)

Race and Ethnicity

   

Non-Hispanic, white

Ref

Ref

Non-Hispanic, Black

2.35 (2.17 – 2.55)

3.30 (2.46 – 4.43)

Hispanic

4.06 (3.72 – 4.42)

6.31 (5.16 – 7.71)

Unknown, other

2.43 (2.25 – 2.62)

1.49 (1.19 – 1.86)

Region

   

Deep South

Ref

Ref

Appalachian

0.70 (0.60 – 0.81)

0.89 (0.40 – 2.02)

Costal South

0.65 (0.55 – 0.78)

1.23 (0.79 – 1.90)

Great Lakes

0.22 (0.16 – 0.31)

2.18 (1.16 – 4.08)

Mid-Atlantic

2.06 (1.88 – 2.25)

1.47 (1.02 – 2.11)

Mountain

0.34 (0.30 – 0.34)

0.49 (0.31 – 0.77)

New England

 NA

 NA

Plains

0.73 (0.55 – 0.96)

1.12 (0.34 – 3.74)

Tri-State

3.06 (2.83 – 3.30)

5.38 (3.86 – 7.52)

Pacific/South West

 NA

NA 

Model Fit Statistics

 

 

c-statistic

0.785

0.824ORs in bold are significant

* ORs in bold are significant

Finally, below we include a detailed list of the limitations of our analysis:

  • The use of a convenience sample and the lack of a sufficient amount of data in certain regions of the country limits the generalizability of the findings and may bias the results.
  • This sample only includes people accessing regular medical care in an office or clinic setting, and with the availability of testing outside a typical office setting (e.g. drive-in testing), we may be biasing the results to those with certain forms of insurance, particularly with Medicaid, Medicare, or those that have insurance through their work.
  • There are several important factors that we were unable to control for, including urbanicity of the provider, patient language preference, income, safety net benefits, living situation, insurance status, and employment status.
  • The lack of the availability of testing in the U.S. during our study period may bias the results since not everyone seeking out testing was able to get tests, and tests were reserved for the most probable cases at the time.
  • This analysis likely includes patients who present (at least initially) with mild to moderate symptoms, it likely does not include information on severe cases who are most likely to show up in the inpatient setting.
  • Race and ethnicity data are difficult to collect and may be subjectively entered into the record; additionally, our race/ethnicity data have a high proportion of unknowns (23 percent among COVID Tests and 30 percent among antibody tests), which may bias our results.
  • Our analysis includes a discrete four-month period; we don’t use more sophisticated methods to account for variation over time.

1Electronic medical record (EMR) data including diagnosis, procedures, labs, vitals, medications and histories sourced from a data management platform vendor with EMR data on 40 million unique patients, updated daily. The data, technology, and services used in the generation of these research findings were generously supplied pro bono by the COVID-19 Research Database partners, who are acknowledged at https://covid19researchdatabase.org/.

By accepting you will be accessing a service provided by a third-party external to https://healthcostinstitute.org/