Study reveals inaccuracies in databases about new coronavirus

An analysis of the databases of the WHO, the European Center for Disease Control and Prevention and the Chinese Center for Disease Control and Prevention showed “a lot of inconsistencies” in the data related to covid- 19.

The news advanced by Público on Sunday reported “errors and discrepancies” between the platforms of the World Health Organization, the European Center for Disease Control and Prevention and the Chinese Center for Disease Control and Prevention, which aggregate the data infections by the new coronavirus from several countries, with, for example, entering negative numbers in the records, or dates that do not match.

To Lusa, the researcher Jorge Bravo, who together with Afshin Ashofteh carried out the article published in the magazine of the International Association of Official Statistics, Statistical Journal, explained that “there were a lot of inaccuracies, a lot of inconsistencies between the three big databases”.

“Some countries, for example, reported negative deaths, which is an impossibility”, he added, adding that in the sample he studied, “which was already significant, there were, in some cases, significant inaccuracies”.

The study ran “from the beginning of the pandemic, until mid-April”, but the experts intend to “follow up on the initial study”, replicating what was done “with more months of observation and with more countries”.

“But what we found was that the errors did not diminish with the widening of the pandemic, on the contrary. With more countries reporting to WHO and these agencies, errors have increased. There could be less preparation in the initial phase and over time they were preparing and adjusting to needs, but what we found was that the more countries reported, the more problems we encountered ”, said the professor at Universidade Nova de Lisboa.

Jorge Bravo stressed that, “basically, the epidemiological models that are being used to take various measures, such as confinement, after deflation, such as the reopening of shops, schools, various measures that continue to be taken by governments and health mechanisms (…) Are estimated based on incorrect data ”.

“These procedures of loading databases, compiling information locally and then aggregating everything and reporting internationally, are processes that involve the human factor”, pointed out the expert as one of the reasons for the occurrence of these errors.

Another problem is that “not all countries were reporting data digitally, with a file that could be aggregated and have a continuous series”.

“There were countries that reported, like the DGS [Directorate-General for Health], only the reports, in pdf (…). There are more countries that do this procedure and, when transposing it to an aggregated database, it is very susceptible to errors, introduction, typing, etc. ”, he said.

The solution goes through a validation system, which “can be done using specialized people or including the new mechanisms, using artificial intelligence or computational algorithms, which cross data”.

“Often, the human factor is important to investigate, such as calling the country to alert. It is a normal process, carried out by bodies responsible for compiling statistical information. It is not clear that such gross errors in information have arisen here, and continue to do so ”, concluded the researcher.

Even so, Jorge Bravo believes that the institutions are “learning from what is happening” and “are more than ever aware of the importance of official information being transparent, timely and credible”.


Sign up for Newsletter

Please enter your email address