## Abstract

Estimates of the reproductive number for novel pathogens such as SARS-CoV-2 are essential for understanding the potential trajectory of the epidemic and the level of intervention that is needed to bring the epidemic under control. However, most methods for estimating the basic reproductive number (*R*_{0}) and time-varying effective reproductive number (*R*_{t}) assume that the fraction of cases detected and reported is constant through time. We explore the impact of secular changes in diagnostic testing and reporting on estimates of *R*_{0} and *R*_{t} using simulated data. We then compare these patterns to data on reported cases of COVID-19 and testing practices from different United States (US) states. We find that changes in testing practices and delays in reporting can result in biased estimates of *R*_{0} and *R*_{t}. Examination of changes in the daily number of tests conducted and the percent of patients testing positive may be helpful for identifying the potential direction of bias. Changes in diagnostic testing and reporting processes should be monitored and taken into consideration when interpreting estimates of the reproductive number of COVID-19.

## Introduction

The initial stages of the COVID-19 epidemic in the United States (US) were characterized by difficulties in delivering and administering diagnostic tests (1). First, the real-time quantitative PCR assay developed and distributed by the US Centers for Disease Control and Prevention (CDC) suffered from performance issues (2, 3). As a result, all initial and confirmatory testing needed to be carried out by the CDC, which led to reporting delays and capacity issues early in the epidemic (4, 5). Initially, tests were only administered to individuals with a history of travel to certain countries or known contact with a positive case. By the time testing capacity increased, state and local health departments were faced with heavy demand for COVID-19 testing. Only individuals meeting specific criteria could receive a test, and these criteria have varied from state to state and over time (see Supporting Information (SI) Dataset).

Concurrently, mathematical modelers have been analyzing data on reported COVID-19 cases in order to develop forecasts of future incidence and evaluate the potential impact of social distancing and other control measures, often at the behest of policymakers and public health officials. These models typically rely on estimates of the reproductive number of the virus. The basic reproductive number (*R*_{0}) is defined as the expected number of secondary infections produced by an infectious individual in a fully susceptible population; this can be used to derive the expected fraction of the population that will become infected in the absence of interventions and the level of control and/or immunity that is needed to eliminate the pathogen from circulation (6). The time-varying effective reproductive number (*R*_{t}) measures the average number of secondary infections per case at each time-point in the epidemic (6), and can be used for real-time monitoring of the impact of control measures (7–9). As control measures are implemented and immunity increases in the population, *R*_{t} will decrease (6, 10). The value of *R*_{0} can be estimated from the growth rate of the number of cases early on in the epidemic and estimates of the distribution of the generation time (i.e. the time between infection events of successive cases in a transmission chain) (11), whereas instantaneous values of *R*_{t} can be estimated based on the time series of case notifications and the distribution of the generation time (7). These methods have been shown to be robust to under-detection and underreporting, so long as the probability that a true case is detected and reported remains constant through time.

Here, we use simulations to explore the potential magnitude and direction of biases introduced by changes in diagnostic testing and reporting practices similar to those occurring during the early stages of the COVID-19 epidemic in the US. We then compute preliminary estimates of *R*_{0} and *R*_{t} for different states, based on publicly available data from The COVID Track Project (12). We examine changes in testing practices and trends of the cumulative number of tests reported to evaluate the potential that these estimates are biased.

## Results

Based on our simulations (Table S1, Fig. S1), the likelihood and degree to which *R*_{0} and *R*_{t} are biased depends on the manner in which diagnostic testing practices and reporting changes over time. When the fraction of incident cases detected and reported is constant over time and testing capacity scales with the number of “true” cases, the number of confirmed positive cases provides an unbiased estimate of *R*_{0}, despite possible delays in the reporting process (Fig. 1A, Fig. S2). In this instance, the percent of individuals testing positive is expected to be stable over time. Estimates of *R*_{t} are also expected to be unbiased after the first 5-6 days (approximately equal to the mean reporting delay), but lag 2-4 days behind in detecting a decrease in *R*_{t} below the threshold value of 1 (i.e. when the epidemic is receding). If the fraction of true cases detected and reported is increasing or decreasing linearly over time, estimates of *R*_{0} based on the growth rate of confirmed cases will be over- or under-estimates, respectively, of the true *R*_{0} (Fig. 1B-C). The time-varying reproductive number, *R*_{t}, will also be slightly over- or underestimated, especially early on when there is a reporting delay. A gradual increase or decrease in the percent of individuals testing positive is a potential indicator of such bias. However, the percent positive is also expected to decrease or increase over time if the testing capacity expands more or less quickly than the number of true cases, respectively (Fig. 1D-E). In this instance, estimates of *R*_{0} and *R*_{t} based on the number of confirmed cases are unbiased (Fig. S2).

Abrupt changes to testing criteria, affecting the fraction of true cases detected and reported, are also expected to bias estimates of *R*_{0} and lead to a large (up to two-fold) but temporary bias in estimates of *R*_{t} (Fig. 2A-B). The potential for such bias may be indicated by a sudden change in the percent of individuals testing positive, as well as a temporary change to the slope of the log of cumulative cases. A similar change in the percent positive may also occur with an abrupt change to the testing capacity (Fig. 2C-D). However, in this case, it is accompanied by a change to the slope of the log of cumulative individuals tested, and estimates of *R*_{0} and *R*_{t} based on fitting to the number of positive cases are not expected to be biased. The most difficult bias to detect may be due to a change in the reporting delay distribution (Fig. 2E-F). In this case, the percent positive is likely to remain roughly constant through time, but estimates of *R*_{0} and *R*_{t} will be biased, especially when the reporting delay increases.

When the bias is due to a change in the testing criteria, affecting the fraction of true cases detected and reported, fitting models to the total number of individuals tested over time can recover unbiased estimates of the reproductive numbers (Fig. S2-S3). Thus, when the percent of positive tests is changing through time and there is uncertainty over whether it is due to changes in the testing probability or the testing ratio, fitting models to both the number of positive cases and the total number of tests may provide bounds on the uncertainty in estimates of *R*_{0} and *R*_{t}. However, this approach cannot correct for bias due to changes in the reporting delay distribution.

Based on the number of confirmed COVID-19 cases across the US through March 24, 2020 (before any observable impact of social distancing measures, Fig. 3), and assuming a fixed serial interval of 6.5 days, we estimate that *R*_{0} for the US is 3.45 (95% confidence interval (CI): 3.44-3.46, accounting *only* for uncertainty in the growth rate). Estimates of *R*_{0} for all 50 states and the District of Columbia vary from 1.92 (95% CI: 1.61-2.27, South Dakota) to 5.17 (95% CI: 4.75-5.65, Missouri) (Table S2). Estimates of *R*_{0} based on the growth rate of the number of tests performed are similar for the entire US, but are slightly larger on average for the individual states (Table S2). Nationally, the percent of individuals testing positive for COVID-19 decreased from 20-25% in early March to around 15% in mid-March. This early decline in the percent positive is likely attributable to an increase in testing capacity, rather than a decrease in the fraction of true cases detected and reported; therefore, estimates of *R*_{0} based on the number of confirmed cases should be unbiased. However, trends in the percent of tests positive vary by state (Fig. 4).

Estimates of *R*_{t} for the US and individual states were generally high (*R*_{t}>4) initially but decreased over time (Fig. S4). This may be due to a low probability of detecting cases at the start of the epidemic in late February/early March. In Washington and California, where COVID-19 cases in the US were first recognized, estimates of *R*_{t} for early March were slightly lower (Fig. 4). Nationally, *R*_{t} had decreased to less than 2 by March 24, 2020. In some states, *R*_{t} had already declined substantially before stay-at-home orders were issued. As of April 5, 2020, which is the last reliable day for which we could generate estimates, *R*_{t} was less than 1 (meaning the epidemic was starting to decline) in 24 of 51 states, but was hovering around 1 in a number of states (e.g. California) (Fig. 4, Fig. S4).

Nationally (and in most states), the percent of individuals testing positive for COVID-19 has been gradually increasing since mid-March (Fig. 3, Fig. 4). This could be occurring either because the fraction of individuals infected with SARS-CoV-2 detected and reported is increasing, or because states are reaching their testing capacity and individuals more likely to be infected are being prioritized for testing. If it is the former, our analysis suggests that the more recent estimates of *R*_{t} may be upwardly biased. Moreover, temporary increases in *R*_{t} around mid-March in states like New York and California might be due to early restrictions to testing criteria. However, abrupt decreases in the percent of tests positive were generally due to large increases in the total number of tests reported, and therefore should not reflect a bias in estimates of *R*_{t} based on the number of confirmed cases.

## Discussion

Over the first month and a half of the COVID-19 epidemic in the US, testing practices have varied dramatically over time and from state to state (SI Dataset) (13). As of April 17, 2020, the total number of tests reported per capita has varied from 5.7 tests per 1,000 people in Virginia to 29.5 tests per 1,000 people in New York (12). Due to the limited availability of tests early on, most states recommended that only those with a history of travel to affected countries or known contact with a confirmed case be tested (4, 5). Since the disease has become more widespread throughout the US and testing capacity has increased, testing guidelines have been relaxed, but there are still considerable differences from state to state (SI Dataset). For example, as of April 6, 2020, Washington state had no restrictions on who can be tested for COVID-19, but prioritized hospitalized individuals and essential service providers exhibiting symptoms (14). As of April 15, New York still recommended restricting testing to those with a known positive contact or travel history, as well as symptomatic individuals who had tested negative for other infections (15). As testing practices change over time, we have demonstrated that these changes may introduce bias into estimates of *R*_{0} and *R*_{t}, affecting inference about how much control is needed and when control measures have reduced transmission below the critical threshold necessary to sustain the epidemic.

Our estimate of the mean *R*_{0} in the US of 3.45 is reasonably consistent with published estimates. Early analyses from China report a range for *R*_{0} of 2.24-3.58, assuming a mean serial interval of 8 days and a 2- to 8-fold increase in the reporting rate (16). However, more recent analyses estimate a median *R*_{0} of 5.7 (95% CI 3.8, 8.9) (17). Analysis of data from Europe and the US present a similarly high *R*_{0}, with median values ranging from 4.0 to 7.1 when assuming a serial interval of 6-9 days (18). Our estimated mean *R*_{0} is notably lower, but this was based on a similar estimated growth rate of 0.275, suggesting the difference can be attributed to different assumptions about the serial interval. Estimates of *R*_{t} have not been widely reported for the US, but other studies have found that *R*_{t} decreased from 2-5 at the start of the epidemic to ∼1 following travel restrictions in Wuhan and various European countries (19, 20), which is in line with our findings.

Monitoring the number of tests performed and the percent of tests that are positive over time can help to indicate the potential for bias in estimates of reproductive numbers. However, the reporting of test results, particularly for negative tests, has been inconsistent in many states. In California, for example, there was a more than eight-fold increase in the number of negative tests reported on March 13, and another four-fold increase on April 4, 2020 (12). These large increases in the number of negative tests were not accompanied by a corresponding increase in the number of confirmed cases. Thus, while estimates of *R*_{0} and *R*_{t} based on the number of confirmed cases are not expected to be influenced by these abrupt changes in the number of reported tests, it becomes difficult to interpret the intervening gradual increases in the percent of tests positive.

It is more difficult to detect whether the time between onset of infectiousness and the reporting of test results (i.e., the reporting delay) has changed over time. Our simulations suggest that such changes could bias estimates of *R*_{0} and *R*_{t}, but would not be reflected in the percent of individuals testing positive over time. Individual-level data on the date of symptom onset, date of testing, and date of reporting are needed to resolve this potential bias. Estimates of *R*_{t} are also expected to lag behind true changes in the transmission rate due to reporting delays. Now-casting approaches may be useful for resolving this by inferring the number of infections occurring on each day based on the observed cases, hospitalizations and deaths, and known reporting delays (21).

Imminent decisions regarding the lifting of stay-at-home orders and loosening social distancing requirements, and when such measures may need to be reinstated, depend on having a good understanding of current levels of transmission. Reliable estimates of the reproductive number are essential for quantifying the impact of control measures on transmission and making informed decisions about future interventions, e.g. (7, 19, 20, 22, 23). However, changes in testing policies and practices, as well as delays in the reporting process, can lead to bias in estimates of the reproductive number, as we have demonstrated. It important to carefully document and track such changes in testing and reporting practices in order to make correct inferences.

## Methods

### Examining the impact of changes in testing using simulated data

We simulated a stochastic SEIR (Susceptible-Exposed-Infected-Recovered) model to explore the potential impact of changes in testing practices on estimates of *R*_{0} and *R*_{t}. We modeled a population of 1 million individuals and initialized the epidemic with 10 infectious individuals to minimize the chances of early epidemic fadeout. We assumed everyone else was susceptible at the start of the epidemic. New infections were assumed to arise according to a Poisson process (approximate method); the state transitions and rates are described in Table 1, and model parameters are given in Table 2. We simulated the model to day 70 using a time-step of Δ*t*=0.05 days, and assumed a decrease in the transmission rate occurring on day 50, consistent with the impact of social distancing interventions. Models were run using MATLAB v9.3 (MathWorks, Natick, MA); code is available from https://github.com/vepitzer/COVIDtestingbias.

We tracked the number of “true cases” on day *d* (*Y*_{d}) as the number of individuals entering the infectious period each day. We assumed that each true case occurring on day *d* had a probability *p*_{test}(*d*) of being tested, and that for every true case that occurred on day *d*, there were *n*_{test}(*d*) individuals with similar symptoms who were tested. Furthermore, we assumed that testing and reporting of positive tests occurred with some delay *ρ*_{d}(*t*), which followed a gamma distribution with parameters *a*_{d} and *b*_{d}. Thus, we calculated the number of individuals tested (*T*_{d}) and the number of positive cases (*C*_{d}) on day *d* as follows:

We rounded the value of *T*_{d} to the nearest integer and sampled *C*_{d} as binomial random variable. We allowed for the observation of cases occurring up to day 95 (even though no new cases were modeled after day 70) to examine the impact of the reporting delay on estimates of *R*_{t}.

As our base case, we assumed that the probability of a “true case” being tested was *p*_{test}=0.1 and the number of individuals tested for each true case was *n*_{test}=0.5; we assumed a mean reporting delay between onset of infectiousness and testing results of 6.6 days (24). We then modelled scenarios in which the fraction of true cases detected and reported (as indicated by *p*_{test}) and the testing capacity (i.e. number of individuals tested for every true cases, *n*_{test}) either increased or decreased linearly between days 20 and 60 of the epidemic. To examine the impact of sudden changes to the probability of a true case being tested (e.g. associated with changes in testing criteria) and testing capacity (e.g. associated with new companies entering the market), we also explored scenarios in which *p*_{test} and *n*_{test} increased or decreased abruptly on day 40 (Table S1). Finally, we examined the effect of a two-fold increase or decrease in the average reporting delay.

### Estimation of R_{0} from simulated data

We estimated *R*_{0} from the growth rate of the epidemic, as described by Lipsitch et al (11):
where *r* is the growth rate, *V* is the serial interval (also known at the generation interval), and *f* is the proportion of the serial interval spent in the latent period. The growth rate (*r*) was determined by fitting Poisson regression models to the cumulative number of “true cases” (*Y*_{d}) and reported cases (*C*_{d}) on days *d*=21 to 40. Thus, we implicitly assumed that cases occurring over the first 20 days of the epidemic are unlikely to have been recognized. We assumed that the serial interval is equal to the sum of the average latent period and the average infectious period (*V*=1/*ν* + 1/*γ*), and that the proportion of the serial interval spent in the latent period is *f*=1/(*Vν*). We estimated 95% confidence intervals (CIs) for our estimate of *R*_{0} by incorporating uncertainty in the estimated growth rate, but did not incorporate uncertainty in *f* or *V*.

### Estimation of R_{t} from simulated data

We estimated *R*_{t} using the Wallinga-Teunis method (7). This is a likelihood-based framework that infers the estimated number of secondary cases per case at each point in time based on information about the generation interval. First, the relative likelihood that case *i* was infected by case *j* is estimated based on the probability distribution of the difference in the time of symptom onset (*t*_{i} *– t*_{j}), over the probability that case *i* was infected by any other case *k*:
where *g*(*x*) is the probability distribution function for the generation interval. We assume all cases are equivalent in terms of infectiousness. Thus, to estimate *R*_{t} based on the number of “true” cases, this can be rewritten as:

We do not allow for values of the generation interval less than or equal to zero; thus, we assume all cases were infected by a case that occurred on an earlier day. While this is not necessarily true when accounting for possible reporting delays, we make the same assumption when estimating *R*_{t} based on the number of reported cases. We assumed that the generation interval was gamma distributed with a mean of 6.5 days and coefficient of variation of 0.62, consistent with data from Flaxman et al (20):

Second, the reproductive number for case *j* is estimated by calculated the weighted sum of *p*_{ij} over all cases *i*:

Again, assuming all cases are equally infectious, this becomes:

To estimate *R*_{t} based on the number of true cases, we set *t*_{max}=70 and estimate *R*_{t} up to day 60. We correct for right-censoring of the case data by dividing our estimates of *R*_{t} by the cumulative distribution function of the generation interval (*G*(*x*)) for *t*_{max} *– t*:

To explore the potential influence of the reporting delays, we estimate *R*_{t} based on the reported positive cases for *t*_{max}=70 and *t*_{max}=95.

### COVID-19 testing data for the United States

Daily data on the reported number of positive and negative tests for COVID-19 in the US and by state were downloaded from www.covidtracking.com on April 18, 2020 (12). The COVID Tracking Project data comes from state/district/territory public health authorities, and occasionally, from trusted news sources, official press conferences, or (rarely) social media updates from state public health authorities or governors [12]. All data sources are documented in a spreadsheet. We analyzed the data from March 4, 2020, onward, as this is the first date that negative tests for COVID-19 were consistently reported for the entire US.

Testing practices vary considerably between states, and some states changed their guidelines a few weeks into the outbreak, depending on test availability, outbreak intensity, and health systems infrastructure. We extracted information on COVID-19 testing criteria from each state’s website during the week of March 23 and again during the week of April 13. Data sources are documented in the SI Dataset.

For example, as of March 13, Louisiana recommended testing for people with fever, respiratory symptoms, and a negative influenza test, with priority given to the following categories:

Hospitalized patients with a severe respiratory illness with no other known cause.

Suspected outbreak of COVID-19 among associated individuals with recent onset of similar fever and lower respiratory symptoms.

Recent fever and lower respiratory symptoms in a healthcare worker with direct contact to a laboratory-confirmed COVID-19 case.

Suspected COVID-19 in a patient associated with a high-risk exposure setting such as a long-term care facility or a correctional facility.

Suspected COVID-19 in a homeless patient.

However, as of April 15, anyone with fever and respiratory symptoms was eligible to be tested, regardless of influenza status, and pharmacists were given permission to order and administer COVID-19 tests for the duration of the public health emergency.

### Estimation of R_{0} from state-level testing data

To estimate *R*_{0} from data on the number of reported positive COVID-19 cases in the US and different states, we fitted Poisson regression models to the first three weeks of data (March 4 through March 24, 2020) to estimate the growth rate. For states that did not report any cases early on, the Poisson regression models were fitted to the data beginning the first day a positive case was reported through March 24. This is approximately one week after the national “15 Days to Slow the Spread” guidelines were announced (on March 16, 2020) (25). We assumed that the mean serial interval was 6.5 days and that the average latent period was 4.6 days, based on (20). We calculated 95% CIs for the *R*_{0} estimates based on uncertainty in the estimated growth rate, but did not account for uncertainty in the serial interval or latent period.

### Estimation of R_{t} from state-level testing data

Estimates of the time-varying reproductive number in each state were generated using the Wallinga-Teunis method described above. We again assumed that the generation interval was gamma distributed with a mean of 6.5 days and coefficient of variation of 0.62 (20). Estimates after April 5, 2020, are increasingly uncertain because <95% of secondary cases are expected to have occurred by the end of the available data (based on the cumulative distribution function of the generation interval). For later dates, we present both the uncorrected estimates and estimates corrected for right-censoring, as described above.

## Data Availability

All data used in this analysis are publicly available from https://covidtracking.com/

## Supporting Information

## Acknowledgements

This work was supported by the following grants from the National Institutes of Health/National Institute of Allergy and Infectious Diseases (NIH/NIAID): R01AI137093 (VEP, JLW, DMW), R01AI112970 (VEP), R01AI123208 (DMW), and R01AI146555 (NAM, TC).