White Paper » Section 7

The Basic Method of calculating Person Risk is the regional average

As we know, prevalence varies widely across different geographic locations. For example, at the time of writing, Sydney has much lower rates of COVID than San Francisco. So the Person Risk from your friend in Sydney will be much lower than the risk from your friend in San Francisco.

The Basic Method is to just assume that a person is “average” for their region. The chance your friend has COVID is the chance that anyone in their geographic area has COVID.

How we estimate the regional average

To estimate the chance that a random resident in your area has COVID, you need to figure out the number of new infections last week in your area. This is because a typical person is infectious for about a one-week period.^[1]

We will give an overview of the steps, then explain the steps in more detail.

Start with the number of new reported cases in your region last week. The calculator does this automatically for you, or you can look up these numbers manually by Googling.

However, this is just a start. You cannot use this number directly because it underestimates how many people are actually sick. You need to take into account two important factors.

The first factor is underreporting. Many people with COVID won’t ever get counted in the official statistics. They might not think their symptoms are anything unusual, so they don’t get tested. Or they might not be able to access testing.^[2]
The second factor is delay. There’s a delay of 1-2 weeks between when someone becomes infected and when their positive test result comes back. The true number of confirmed cases who were sick last week isn’t known yet, and won’t be known until those tests come back next week. If cases are rising, last week’s statistics will be too low.

The calculator can look up new reported cases automatically, and takes these adjustments into account as well.

The chance someone has COVID is very different in different geographic regions.

While we were working on this writeup, in July 2020, we calculated the Person Risk (Basic Method) in San Francisco as about 5107-in-a-million, and about 84-in-a-million in Sydney.

This means that the risk of doing a specific activity in San Francisco that month was about 60 times higher than doing the same activity in Sydney.

Inviting one random person over for coffee (indoors, unmasked, undistanced) in San Francisco would’ve been about as risky as inviting 60 random Sydney residents to your home!

There is not just one answer for “How risky is it to invite one person over for coffee?” It depends on where they live and how widespread COVID is there.

Detailed steps for Basic Method

To learn how do these steps manually, or to understand how the calculator does it, read the rest of this page.

Step one: Look up reported cases

To estimate the prevalence of COVID where you live, start by looking up the number of reported cases last week in your region.

Make sure to look up new cases, not total cases.
Make sure to get statistics for a week, not a day.

You decide how to define your region. This might be based on the county where you live, or you might want to include multiple counties if you live in a major metropolitan area. If data is limited, you might have to use your entire state.

If you live in the US, you can use the CovidEstim website. This gives daily new reported cases per 100,000 people. To get a week’s worth of cases, you’ll need to calculate: daily new reported cases per 100,000 people * 7 days.^[3] You will then use 100,000 as the population.

Step two: Underreporting factor

Many people with COVID won’t ever get counted in the official statistics. The official statistics are underreporting the real number of new infections.

You can use the positive test rate (the percentage of tests that come back COVID-positive) as some evidence about how many infections are being caught by testing. Ideally, the positive test rate should be very low, indicating that contact tracing is working to find all contacts of an infected person, and that testing is available for each contact. If a high percentage of tests are coming back positive, then there are probably a lot more infected people out there than the testing data shows.

If you live in the US, you can look up the positive test rate in your state at CovidActNow.

We use the correction factor proposed by COVID-19 Projections:

prevalance_ratio = 1250 / (day_i + 25) * positive_test_rate ** 0.5 + 2
true_infections = prevalance_ratio * reported_infections
where day_i = number of days since 2020-02-12

More details are available in Research Sources or on COVID-19 Projection's website.

Step three: Delay factor

Since test results take about one week to come back on average, the number of new reported cases in your region last week really represents the number of new positive test results in your region the week before that. The results are delayed.

If cases are flat or falling, it’s fine to use this number as is.

If cases are rising, then we need to estimate the increase by comparing last week’s reported case numbers to the week before that. For example, if last week there were 120 reported cases, and the previous week there were 80 reported cases, then the weekly increase is 120 / 80 = 1.5. We would use 1.5x as our delay factor. To avoid over-extrapolating from a single superspreader event in an otherwise low-prevalance area, we have capped the delay factor at 2x

In the calculator this would be displayed as a 50% increase in cases from last week to this week

Step four: Estimate number of new infections last week

Use this equation to combine the previous three steps to estimate the regional prevalence of COVID in your area:

New Infections Last Week = Reported Cases ⨉ Underreporting Factor ⨉ Delay Factor

Step five: Divide by population to get a final estimate

From there, calculate the basic Person Risk by comparing the new infections last week with the overall population in your region.

Person Risk (Basic) = New Infections Last Week / Population In Millions

Example Sydney and San Francisco calculations

Here are two examples:

Sydney in July 2020 (lower prevalence)

Step 1: As of July 26, 2020, the state of New South Wales in Australia (where Sydney is located) had 81 reported cases in the last week, and a population of around 7.5 million.
Step 2: The week before that, there were 62 reported cases. 81 / 62 = 1.3 so we’ll use a 1.3x delay factor, i.e., a 30% increase in cases from last week to this week.
Step 3: The percentage of positive COVID tests is extremely low: 81 cases / 135,089 tests = 0.05% so we’ll use our minimum 6x underreporting factor.^[4]
Step 4: Therefore, 81 reported cases * 1.3 * 6 = 632 new infections last week.
Step 5: So the Person Risk (the chance that a random resident in New South Wales has COVID) is 632 infections / 7,500,000 people = 0.000084 or 0.0084%.
- An easier way to talk about this tiny number is to multiply it by a million: 0.000084 * 1,000,000 = 84.
- This is the same as if we had just divided by 7.5 (the population in millions).

So if all you knew about a person is that they lived in New South Wales in July 2020, their Person Risk at the time would’ve been 84, which means there’s a 84-in-a-million chance that they had COVID (in that particular week).

San Francisco in July 2020 (higher prevalence)

Compare this with San Francisco County in California, which had 749 new reported cases during that same week, and a population of 0.88 million.^[5] Cases at that time were declining, so we won’t use a delay factor. The positive test rate was 4.3%, so we’ll use a 6x underreporting factor. Therefore, 749 reported cases * 6 = 4494 new infections last week. To get the Person Risk, divide by the population (in millions): 4494 infections / 0.88 million people = 5107.^[6] So a resident of San Francisco had a Person Risk of 5107, or a 5107-in-a-million chance of currently having COVID (for this particular week).

Comparing the above examples

5107-in-a-million (in San Francisco) is about 60 times higher than 84-in-a-million (in Sydney). So the average Person Risk in San Francisco is 60x as high as in Sydney!

The most-infectious period starts a couple days after infection, but the day-to-day noise in new case numbers is enough that “0-7 days ago” and “2-9 days ago” are unlikely to be meaningfully different. See Research Sources for more about the infectious period. ↩︎
As an example, New York City in March–April 2020 was completely overwhelmed by COVID, with widespread reports that even people with obvious and severe symptoms were unable to receive a test. We’ll look specifically at the five boroughs plus Westchester, Nassau, and Suffolk counties, an area containing 12.2 million residents. A survey for COVID antibodies in these counties performed between April 25–May 6 found that 23% of people had previously been infected, but according to the Johns Hopkins dashboard only 263,900 cases (2.2% of the area’s population) had been officially recorded by May 1. ↩︎
It's not very obvious from their website, but CovidActNow's daily numbers are smoothed by taking the average over the past 7 days. Thus, this calculation will correctly compute the number of cases last week, not just 7 times the number of cases yesterday. You might find that other sources of data do this as well. ↩︎
With this low of a positive test rate, an even lower underreporting factor is quite plausible, but we don’t have enough data to estimate just how low we should go. ↩︎
Tip: if your data source lists a “7-day moving average” of cases on a certain day, the number of cases in the preceding week is just 7 times that. ↩︎
This seems high to us: a 5107-in-a-million chance over a week-long period of getting COVID from being an average SF resident implies the average SF resident has a 23% annualized chance of getting COVID. That seems pretty bad. We really hope we’re wrong somewhere and the real number is lower; perhaps we don’t need as high as a 6x underreporting factor anymore? ↩︎