All HCCI Reports
HCCI’s original reports powered by #HCCIdata

Analysis of Electronic Medical Record Data Shows Significantly Higher Rates of COVID-19 Infection among Hispanic and Black Patients

The COVID-19 pandemic has resulted in a previously unimaginable impact both in the United States and globally. It is increasingly clear as we enter the fifth month of the pandemic, with rising caseloads and deaths, that the United States will need to continue to address the COVID-19 pandemic for the foreseeable future. Each day, we learn more – about the virus clinically, about the people and communities hardest hit by the virus, and about the social and economic toll the pandemic is expected to take. While the pandemic has exposed the fragility of the US public health surveillance infrastructure, there remain valuable, albeit often disconnected, data resources around the country that can contribute to our collective knowledge of the pandemic's impact. Using data made available by the COVID-19 Research Database, a cross-industry collaborative contributing real world, de-identified data to researchers wishing to study issues related to COVID-19, we analyze COVID-19 test and antibody positivity rates from a sample of electronic health records in office and clinic settings.

Our findings show that, for our sample population, the disparity in infections among Black and Hispanic communities is significantly higher than most current assumptions. Additionally, we observed that patients presenting in an office or clinic setting who test positive are more likely to be younger and less likely to be older than 65. The findings, based on data from March through June 2020, also showed that patients in New York, New Jersey, and Connecticut were most likely to have tested positive compared to other regions in the United States. 

What we know from the literature 

Preliminary research at the national level indicates that people of color are more likely to test positive for COVID-19 than white individuals. Two large studies found that Black and Hispanic individuals were up to two and a half times more likely than non-Hispanic white individuals to test positive for COVID-19. In one study, these findings held true "even after accounting for underlying health conditions, other demographics and geographic locations."

These national findings persist among smaller, more locally focused studies. As of July 2020 in Montgomery County, Maryland, Hispanic residents accounted for more than two-thirds of new infections. A Washington Post analysis of data through May 2020 found that Latinos made up about one third of COVID-19 cases in the District of Columbia, Virginia, and Maryland region, even though they only account for about 10 percent of the population. In Northern Virginia's Fairfax county, the Washington Post analysis found that Latinos accounted for 64 percent of COVID-19 cases even though they only account for 16.8 percent of the population.

What we found in our analysis: percent of positive tests 

This study used data from an ambulatory electronic medical record data platform with data for over two hundred community health centers, primary care, immediate/urgent care, and specialty care providers, and is based on 76,969 COVID tests and 10,998 antibody tests. In our sample of tests administered to patients in these settings, nine percent of COVID tests and seven percent of antibody tests were recorded as positive. For more details on our data and methods please refer to the Methods section.  

What we found in our analysis: race and ethnicity 

Our analysis supports previous reporting on the disproportionate impact of COVID-19 infections on racial and ethnic minorities. In our sample, Hispanic and Black patients were more likely to test positive for COVID and for COVID antibodies than non-Hispanic white patients were. Figure 1 shows how much more likely Hispanic and Black patients were to test positive compared to white patients. Each dot represents the estimate of how much more or less likely one group of patients is to test positive compared to the reference group of patients (in this figure, white patients are the reference group). Estimates to the left of the vertical dashed line mean the group is less likely to be positive compared to the reference group and estimates to the right of the vertical dashed line mean the group is more likely to be positive compared to the reference group. The width of the horizontal bar indicates the range in certainty of the estimate; if the bar crosses the dashed line, the group did not have a significantly different likelihood of testing positive compared to the reference group. Using this method of analysis allows us to measure the relative strength of the association between testing positive and a patient's race/ethnicity.  

The figure above shows that Hispanic patients were over four times more likely to test positive for COVID and over six times more likely to test positive for antibodies compared to white patients. Moreover, Hispanic patients accounted for 16 percent of COVID tests, but represented almost half of positive COVID tests. Similarly, they accounted for 11 percent of antibody tests but 51 percent of positive antibody tests. Black patients were also over twice as likely to test positive for COVID and over three times more likely to test positive for antibodies compared to tests for non-Hispanic white patients.  

What we found in our analysis: geography

We aggregated our location data into 10 U.S. regions and assigned each patient to a region based on their state of residence. A list of states by region can be found in the methods section at the bottom of the page. Figure 2 shows the distribution of both COVID and antibody testing by region. The data in our sample are concentrated in the New York "Tri-State" area (New York, New Jersey, and Connecticut) the Mountain states (Utah, Colorado, Idaho, Wyoming, Montana), as well as the Deep South (Arkansas, Mississippi, Louisiana, Alabama).  

Figure 3 shows how likely COVID tests in each region were to be positive and how likely antibody tests in each region were to be positive compared to states in the Deep South. Note that there was an insufficient amount of data in New England and the Pacific/South West regions to report results in those areas.

Among the areas included in our dataset, patients in the "Tri-State" region (New York, New Jersey, and Connecticut) were about three times more likely to have a positive COVID test and more than five times more likely to have a positive antibody test than residents of the deep south; this is consistent with reporting that COVID was more prevalent in this region during the time period covered by our data. Patients residing in the Mid-Atlantic region (Pennsylvania, Maryland, Delaware, Virginia, District of Columbia) were twice as likely to have a positive COVID test and forty-seven percent more likely to have a positive antibody test compared to states in the Deep South. As more recent data starts to become available, the geographic patterns we observed here may change.

What we found in our analysis: age and sex

Finally, Figure 4 shows how likely male patients were to test positive compared to female patients and how likely younger and older age groups were to test positive compared to our reference group of patients 35-54 years old.  

Males were twenty-one percent more likely to test positive for COVID compared to females and eighteen percent more likely to test positive for antibodies compared to females. Patients 21 to 34 years old were twelve percent more likely to test positive for COVID and patients 65 years or older were forty-two percent less likely to test positive for COVID compared to our reference group of patients 35-54 years old. 

What this analysis offers: a discussion

This analysis uses data from a convenience sample of electronic medical records to explore reports that Hispanic and Black communities are bearing an undue burden of the COVID-19 pandemic. We find that the experiences of patients in our sample are consistent with those earlier reports. In particular, the disproportionate positivity rate among Hispanic and Black patients within our sample underscores previous reporting that minority communities have been bearing the brunt of the impact of the pandemic. Prevailing theories as to why rates of COVID-19 positive cases are higher in Hispanic communities include higher likelihood of working essential jobs, higher likelihood of living in densely populated housing, or limited access to healthcare and other public support networks. While this analysis is not equipped to assess these other factors that may be associated with increased likelihood of COVID infection, HCCI will continue to use available data to support the understanding of how the pandemic is affecting people, communities, and the health care system with the goal of informing stakeholders to make decisions that mitigate the harm caused by the pandemic.

  • Methods

    Methods

    We examined laboratory orders from a convenience sample of electronic medical records1 provided by a data management platform vendor with order dates between March 1 and June 30, 2020. Our unit of analysis is a unique “patient laboratory order” + “order date” which means a patient who gets tested more than once on different days may appear in the data more than once. The data cover orders from 3.7 million patients seen by over 200 group practices or clinics for all patients seen by the practice or clinic regardless of payer type. The majority of encounters on the laboratory order were classified as an office visit.

    We divided the data into 10 U.S. regions. Table 1 shows the states included in each region and the proportion of data contained in each region. Our sample did not include sufficient data from states in New England and states in the Pacific/South West, so we excluded those regions from the analysis. Figure 5 shows the distribution of COVID and antibody tests by region.

    Table 1. Distribution of All Tests Administered (COVID and Antibody) by U.S. Region

    Region

    States

    Percent Data Distribution

    Deep South

    Arkansas, Mississippi, Louisiana, Alabama

    40

    Tri-State

    New York, New Jersey, and Connecticut

    18

    Mountain

    Utah, Colorado, Idaho, Wyoming, Montana

    14

    Mid-Atlantic

    Pennsylvania, Maryland, Delaware, Virginia, District of Columbia

    11

    Costal South

    North Carolina, South Carolina, Georgia, Florida

    5

    Appalachian

    Kentucky, Tennessee, West Virginia

    4

    Great Lakes

    Michigan, Ohio, Illinois, Minnesota, Indiana, Wisconsin

    3

    Plains

    North Dakota, South Dakota, Kansas, Oklahoma, Missouri, Nebraska, Iowa

    3

    Pacific/South West

    California, Oregon, Washington, Hawaii, Alaska, Nevada, Arizona, New Mexico, Texas

    1

    New England

    Massachusetts, Maine, New Hampshire, Vermont, Rhode Island

    <1

    fig5 

    Using a series of regular expression queries, we searched for two types of tests that are used to determine COVID-19 infection. We identified 93,039 nasopharyngeal tests (“COVID test”) that are used to determine whether a patient currently has the virus that causes COVID-19, and we identified 14,576 blood serum antibody tests (“antibody test”) that are used to help determine if a patient has previously been infected with the virus that causes COVID-19. Key search terms were compiled from a list of COVID-19 PCR and COVID-19 antibody test descriptions at the FDA, and can be found here. After excluding the unknown test results, we retained 76,969 COVID tests and 10,998 antibody tests. This represents a positive rate of 11 percent and 9 percent for COVID and antibody tests, respectively. Figure 6 shows the distribution of all tests with a known result, by demographic characteristic. Table 2 shows the percent of positive results by demographic characteristic. Note, the percent positive is based on the number of positive tests among all tests with a known result.

    fig6

    Table 2. COVID and Antibody Test Results with Known Results by Demographic Characteristic

    Characteristic

    N

    COVID Test (N=76,969) Percent Positive

    Antibody Test (N=10,998)

    Percent Positive

    Total

    87,967

    11

    9

    Sex

         

    Female

    52,822

    11

    9

    Male

    35,145

    12

    9

    Age

         

    Less than 21

    22,616

    13

    12

    21 to 34

    15,328

    13

    11

    35 to 54

    15,691

    13

    9

    55 to 64

    16,539

    11

    7

    65 and older

    17,793

    5

    6

    Race and Ethnicity

         

    Non-Hispanic, white

    40,255

    5

    4

    Non-Hispanic, Black

    13,069

    11

    13

    Hispanic

    13,603

    34

    41

    Unknown, other

    21,040

    9

    5

    We used a pair of logistic regressions to model the likelihood of a COVID test being positive and the likelihood of an antibody test being positive, adjusting for age, sex, race and ethnicity, and region of the country. Table 3 shows the results from the models. The odds ratios below represent the exponentiated coefficients of the models; we include the 95% confidence interval in parentheses. Models were run using the logit model procedure from SAS (SAS Institute, Carry, NC). The concordance statistics from the COVID model and the antibody model were 0.785 and 0.824, respectively, indicating a good to strong model fit.

    Table 3. Results from Logistic Regression Models

    Characteristic

    COVID Model

    OR (95% CI)

    Antibody Model

    OR (95% CI)

    Sex

       

    Female

    Ref

    Ref

    Male

    1.21 (1.15 – 1.27)

    1.18 (1.02 – 1.38)

    Age

       

    Less than 21

    1.00 (0.93 – 1.07)

    1.02 (0.82 – 1.28)

    21 to 34

    1.12 (1.04 – 1.21)

    1.01 (0.77 – 1.31)

    35 to 54

    Ref

    Ref

    55 to 64

    0.99 (0.91 – 1.07)

    0.99 (0.79 – 1.25)

    65 and older

    0.58 (0.53 – 0.64)

    0.98 (0.77 – 1.28)

    Race and Ethnicity

       

    Non-Hispanic, white

    Ref

    Ref

    Non-Hispanic, Black

    2.35 (2.17 – 2.55)

    3.30 (2.46 – 4.43)

    Hispanic

    4.06 (3.72 – 4.42)

    6.31 (5.16 – 7.71)

    Unknown, other

    2.43 (2.25 – 2.62)

    1.49 (1.19 – 1.86)

    Region

       

    Deep South

    Ref

    Ref

    Appalachian

    0.70 (0.60 – 0.81)

    0.89 (0.40 – 2.02)

    Costal South

    0.65 (0.55 – 0.78)

    1.23 (0.79 – 1.90)

    Great Lakes

    0.22 (0.16 – 0.31)

    2.18 (1.16 – 4.08)

    Mid-Atlantic

    2.06 (1.88 – 2.25)

    1.47 (1.02 – 2.11)

    Mountain

    0.34 (0.30 – 0.34)

    0.49 (0.31 – 0.77)

    New England

     NA

     NA

    Plains

    0.73 (0.55 – 0.96)

    1.12 (0.34 – 3.74)

    Tri-State

    3.06 (2.83 – 3.30)

    5.38 (3.86 – 7.52)

    Pacific/South West

     NA

    NA 

    Model Fit Statistics

     

     

    c-statistic

    0.785

    0.824ORs in bold are significant

    * ORs in bold are significant

    Finally, below we include a detailed list of the limitations of our analysis:

    • The use of a convenience sample and the lack of a sufficient amount of data in certain regions of the country limits the generalizability of the findings and may bias the results.
    • This sample only includes people accessing regular medical care in an office or clinic setting, and with the availability of testing outside a typical office setting (e.g. drive-in testing), we may be biasing the results to those with certain forms of insurance, particularly with Medicaid, Medicare, or those that have insurance through their work.
    • There are several important factors that we were unable to control for, including urbanicity of the provider, patient language preference, income, safety net benefits, living situation, insurance status, and employment status.
    • The lack of the availability of testing in the U.S. during our study period may bias the results since not everyone seeking out testing was able to get tests, and tests were reserved for the most probable cases at the time.
    • This analysis likely includes patients who present (at least initially) with mild to moderate symptoms, it likely does not include information on severe cases who are most likely to show up in the inpatient setting.
    • Race and ethnicity data are difficult to collect and may be subjectively entered into the record; additionally, our race/ethnicity data have a high proportion of unknowns (23 percent among COVID Tests and 30 percent among antibody tests), which may bias our results.
    • Our analysis includes a discrete four-month period; we don’t use more sophisticated methods to account for variation over time.

    1Electronic medical record (EMR) data including diagnosis, procedures, labs, vitals, medications and histories sourced from a data management platform vendor with EMR data on 40 million unique patients, updated daily. The data, technology, and services used in the generation of these research findings were generously supplied pro bono by the COVID-19 Research Database partners, who are acknowledged at https://covid19researchdatabase.org/.

×
Stay Informed

When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.