Evidence of methodological bias in analyzing contact tracing app efficacy

Paul-Olivier Dehaye
11 min readMar 2, 2021

In this article, we review a preprint which claims essentially that the digital contact tracing app SwissCovid works. That preprint has received some press coverage, certainly in Switzerland but also for instance in Nature. We therefore feel it is urgent to correct the record.

We re-analyze here the data and show that its main claim is not quite the reality. Instead, we show that the data says something different, and more profound. We think this is actually an important lesson, and a timely and hopeful message in this pandemic.

We also briefly discuss the main UK study on the NHS contact tracing app, and the possibility it would suffer from a similar problem.

Digital proximity tracing app notifications lead to faster quarantine in non-household contacts: results from the Zurich SARS-CoV-2 Cohort Study

What the data says

The paper is based on data collected through the cohort study Corona Immunitas in Canton Zürich. Its goal is to analyze SwissCovid, the Swiss Bluetooth contact tracing app, and particularly the impact the app has on the time it takes for secondary contacts to go from exposure into quarantine. In case of an exposure, the theory would be that an app notification would arrive faster than public health authorities are able to contact trace manually, and therefore lead people to put themselves in quarantine earlier.

To achieve its objective, the paper compares two groups, sampled from secondary contacts known to the local health authority and who have agreed to participate in the cohort study.

Group A consists of non-household secondary contacts who received an app notification.

Group B consists of non-household secondary contacts who did not receive an app notification.

The reason to dismiss household contacts is that the app is not expected to play any role in those cases: if you test positive, you tell your household members right away, before public authorities even had a chance to make a move, or before you even received a code triggering the app notification. The data indeed confirms that intuition for household contacts.

Instead, the paper focuses on non-household contacts who will eventually have been reached by manual contact tracers (this excludes the very hard situations for contact tracing, for instance in public transport, which could conceivably be caught by a contact tracing app).

For our case of interest here, non-household contacts traced through MCT, the data shows that “non-household contacts who were notified by the app [Group A] started quarantine at a median of 2 days after exposure, while those not notified started quarantine at a median of 3 days [Group B]”.

This is summarized in the graph below.

Information about the median is captured on the 50% horizontal line. We see that for household setting, the median time from exposure to quarantine is at 0.5 day for app-notified individuals and 1 for not app-notified contacts. For non-household contacts, the blue (Group A) and red (Group B) lines indeed intercept the 50% horizontal line at 2 and 3 days respectively.

The paper also offers Supplementary Materials, which we will get back to later.

What the paper claims

Based on the data above, the paper repeatedly makes the same claim in different ways, for instance:

  • “We found evidence for a possible time advantage through the app in non-household setting, with app-notified contacts entering quarantine on average one day earlier than those not notified by the app.”
  • “This finding supports the hypothesis that receiving an app notification may lead to a shorter time between exposure and quarantine.”
  • The title of the paper itself: “Digital proximity tracing app notifications lead to faster quarantine in non-household contacts: results from the Zurich SARS-CoV-2 Cohort Study”

All these statements are very appealing, of course, and the title does not leave much doubt on how the authors interpret the data. As we will see, theses statements unfortunately do not reflect the data accurately.

What needs to be understood

Separately, the paper also explains that among the 43 app notified non-household contacts (Group A), only 8 reported to have received the app notification before they were contacted by official Manual Contact Tracing.

This quite directly shows that the app notifications are slower than manual contact tracing, the exact opposite of the main claim of the paper, as relayed in its title. How is this possible?

Digging into the Supplementary Materials, and particularly Appendix 6, we find the following table:

Appendix 6, “GD” stands for “Gesundheitsdirektion”, i.e. “Health Directorate”. The rows of most interest are the first and second. The columns of most interest are best thought as B, A+ and A-, as explained in the text. Note the strange “grouping” strategy around the (*) symbol.

Note at first the slight discrepancies in numbers. The first column in Appendix 6 (n=141) corresponds in the main text to Group B (n=138). The second column corresponds to those in Group A who have received a notification before being traced (A+, n=8) while those in the third corresponds to those who received a notification after being traced (A-, n=34). Again, 34+8 shows a slight discrepancy compared to the population of Group A announced in the main text (n=43). We will gloss over this and assume it is due to slightly different versions of the data reporting.

This supplementary data should now help us understand how this is all possible.

Biased sampling

First, looking back at the plot above, there does seem to be a signal in that data. The two time-to-quarantine curves do look different in Groups A and B. Narrowing down on the first two rows in Appendix 6 provide some good hints why this is happening (even though it doesn’t provide time to quarantine information).

First off, we need to think about the asterisk (*). It is indicated that it means “Includes a SwissCovid app notification”. Group B was never contacted, so the asterisk is not relevant to the first column. Group A- was app-notified after being MCT-contacted. The only case where SwissCovid could have made a difference in Group A- is thus if someone had received a phone call from contact tracing authorities, decided to ignore it, and then received a notification from the app and acted on the notification instead. It would mean that this individual trusted their device more than the contact tracers finally reaching them after a long and deliberate human process. Given what we know of the app’s accuracy, it would be a terrible outcome if people behaved this way. The scenario is so outlandish that instead we will just consider that this scenario didn’t happen in the data at hand. So the only place the asterisk is applicable is in column A+. We thus split the row “Self-Quarantine(*)” into “Non SwissCovid self-quarantine” and “SwissCovid quarantine”, and bunch up all the less relevant rows into one. We consolidate all we know into this table, where the two “?” sum up to 6.

The two “?” in the middle sum up to 6, all the other numbers are certain based on the data we have.

Ignore column A+ for a second. We see that:

  • for non-app notified people (B), 67 have been put in quarantine by health authorities while 43 have non-SC-self-quarantined.
  • for late-app notified people (A-), 15 have been put in quarantine by health authorities while 14 have non-SC-self-quarantined.

This is a statistically significant finding, and definitely worth a serious explanation. Notably, SwissCovid does not impact these numbers in its theorized way of denting the epidemic: all those people were app-notified either never or late — in any case after the quarantine started! Instead, the only impact of SwissCovid on this part of the study is in selection bias through the app, but the exact mechanism gets very subtle. Indeed, presence in column A- is also informative about the index case (they better have had SwissCovid active in order to send the notification, even if that notification arrived late).

Main finding: There are statistically significant discrepancies in the reasons for app users to go into quarantine, depending merely on whether they and their index case have installed the app or not.


How could it be that so many late-notified app users would go into self-quarantine?

We allow ourselves to speculate.

We feel the most natural explanation would be that individuals in group A- are more likely to be reached through a form of peer-to-peer/DIY contact tracing: simply put, their index case, who have themselves installed the app, are more likely to reach out to non-household contacts upon diagnostic, and these contacts (thus members of group A-) are more likely to quarantine themselves as a consequence.

The app is not evenly distributed into the population. This will create biases, worth thinking about for both index and contact sides:

  • app users are on average more wealthy and hence less likely to suffer through quarantine existential harm to their finances;
  • app users are more likely to trust the government, which might affect for instance the seriousness with which they might evaluate the need for secondary contacts to avoid interacting with others;
  • app users are more likely to adopt prosocial behavior on their own (surely a study has shown this, but none spring to mind at the moment);
  • app users know and care about contact tracing, by definition and at least minimally;
  • app users are more likely to be conscious about the pandemic, which might make them seek a test upon symptoms quicker, which in turn would affect the main metric used in the article.

We feel those characteristics make it more likely that these individuals would, once a person tests positive, themselves engage in peer-to-peer tracing and certainly that they would commit to self-quarantine, at least in anticipation of an official call.

Hopeful lesson

We deduce from all this a hopeful message: individuals can have a real impact by reaching out to friends and coworkers, and telling them about the risk they might have been exposed to, and the data seems to indicate some are actually engaging in this behaviour.

There is a slice of the population that is willing and ready to help. They might need support, but they should absolutely be encouraged to maintain the health of their community.

What about the original study?

Our main finding was: There are statistically significant discrepancies in the reasons for app users to go into quarantine, depending merely on whether they and their index case have installed the app or not.

Corollary: You can’t say anything of significance regarding time to quarantine upon notification if you already don’t acknowledge and understand the reasons for late-notified app users to go into quarantine.

In other words, column A-’s selection bias effect will also be present in column A+’s data, and might significantly affect how the 6 cases are distributed between the two “?”. Certainly you can’t deduce anything about those quickly-app-notified users from the evidence presented.

I can definitely admit it is not easy to wrap one’s brain around those issues. It really doesn’t come naturally to me either.

“Fortunately”, I have found one case which exactly matches the situation described here: a non-household contact going into quarantine because they were informed by the index case directly, and receiving the notification afterwards.

In Von Wyl et al’s study, that contact would be in Group A-, and would be chalked up as a success.

I am currently waiting for this person to get better, before asking them if I can post their story publicly. I believe it would be worth doing, as it would anchor the discussion in a concrete case. [UPDATE: This person has blocked me on Twitter — this happened when I asked them if we could exchange Direct Messages.].

Update (2021.04.23)

It looks like Von Wyl acknowledges now (2021.04.23) the flaws of this study, at least in its present state. At 35:05 below he concedes: “What I also need to mention here is... We do see this time advantage [but] this is also early[?] days. We have not been able to dissect each and every secondary survey. This is still ongoing, but what is kind of confusing is that about 8 of those 43 specifically mentioned that they received the exposure notification before they were called up by manual contact tracing. So there is an accumulation of those people in those groups but then you still have a few [sic, given it’s 35/43!?] where manual contact tracing was first and then exposure notification was second. So the picture is still a bit blurry, but overall I think we are getting closer and we are doing additional analysis”.

The UK app study

In The epidemiological impact of the NHS COVID-19 App, Wymant et al. measure the impact of the UK app through two different and fairly indirect means. They arrive to the conclusion that hundreds of thousands of cases have been averted thanks to the UK app.

We feel that the effect (evidenced here) of increased under-the-radar and peer-to-peer non-household contact tracing among app users would be very hard to distinguish from the anticipated mechanism of action of the app. Indeed, where there are apps, there are app users, who themselves behave differently once infected!

Even if the app and the tracing worked better in the UK than Switzerland, this would still make it very hard to pose the correct counterfactuals in assessing the utility of the UK app: presumably app users would be as motivated by civic duty to engage in peer-to-peer non-household tracing if there was no app. This form of tracing could be significantly more impactful than the range of cases that are impossible to address through manual contact tracing, such as Bluetooth tracing in public transport.

We summarize now what is known about the UK study:

  • It cannot directly distinguish the impact of the app from the impact the app users achieve from performing peer-to-peer contact tracing.
  • The latter effect is real, based on what is presented here.
  • The UK study presents a complex placebo regression on top of a natural experiment. Such experiments are very prone to confounding bias, and require explicit handling of confounding effect — which, again, hasn’t been done.
  • The Swiss study is much simpler and this effect has nevertheless stumbled its authors.

We encourage any statistician to reanalyze this data, including the richer stratification data present in the Supplementary Materials of the Swiss study.

Update (2021.06.24)

In an interview, Marcel Salathé commented (at 4000s) on the study discussed here. He specifically says that not enough data was collected in order to be conclusive. Remark that there might not be sufficient data to be conclusive on the delay (only 8 people in one of the two groups, after all), but enough to be conclusive on the fact that usage of SwissCovid correlates with earlier quarantine (maybe because of common causality in higher adherence to prosocial behaviours)

Update (21 Aug 2021)

This paper has now been peer-reviewed and published.

Update (19 Jan 2022)

I finally did get access to the data underlying the study, and wrote about it in a separate blog post.



Paul-Olivier Dehaye

Mathematician. Co-founder of PersonalData.IO. Free society by bridging ideas. #bigdata and its #ethics, citizen science