Article Text
Statistics from Altmetric.com
How do you feel about red traffic lights—stop immediately, or the basis for a negotiation? Most people stop most of the time, which is why traffic works—because we obey the rules and we know red means danger. We may not have the statistics at our fingertips but we know that running a red light is a risky business. The USA Federal Highway Administration claims that roughly 45% of car collisions take place at a road intersection and the cause is usually related to ignoring a stop sign or running a red stop light. In 2018, 846 people were killed in the USA and 139 000 injured in crashes involving crossing a red light.1
Initiated in Cochrane reviews, many systematic reviews now use a red–amber–green system in the assessment of the risk of bias relating to the methods used in included studies. Green indicates no known risk of bias, amber (used when an item is not mentioned in a study) an unknown level of risk of bias, and red when a methodological issue is known to be associated with a high risk of bias.
Assessment of risk of bias is good, but how that risk of bias assessment is used is crucial in determining the overall quality and reliability of evidence, and the confidence we have in it. Trustworthy evidence should be based on studies that are green across the board—with no likelihood of bias. That is a rare event. Most systematic reviews use small and flawed studies where red for high risk of bias occurs with almost every included study, often for more than one item.
The fact of this high risk of bias in individual studies is all too often ignored when making conclusions or recommendations, but ignoring the red signals is a dangerous business. For example, experience shows that, for trials of efficacy of interventions, methodological issues such as lack of randomisation, lack of blinding, small study size, and inappropriate imputation methods to handle data from dropouts can all result in a positive assessment of efficacy where none exists. They give the wrong answer: ignore these red lights at your peril. A systematic review pronouncing on efficacy when each included study has one high risk of bias is like running a succession of red lights.
Ideally, a systematic review will highlight these problems and come to appropriate conclusions. For example, a comprehensive review of cannabinoids for pain included 37 randomised, double blind trials.2 None was without some risk of bias with green across the board, and 28 had at least one red light. The appropriate conclusion was that ‘Studies … have unclear or high risk of bias, and outcomes had GRADE rating of low-quality or very low-quality evidence. We have little confidence in the estimates of effect.’
Trials may have inadequate method reporting, but so do systematic reviews. For cannabinoids, there were more systematic reviews (57) than randomised trials (37). Confidence in the results in the systematic reviews using AMSTAR-2 definitions was critically low (41), low (8), moderate (6) or high (2); 86% were inadequate.3 4 Low or critically low confidence in the results of systematic reviews when assessed using AMSTAR-2 is all too common.
Another growing concern is that of research misconduct, to the extent of recognising that fraudulent studies pollute our literature. About 2% of scientists have fabricated, falsified or modified data or results at least once, and 34% admitted other questionable research practices, those numbers increase to 14% and 72% when asked about the behaviour of colleagues.5 Retraction Watch (https://retractionwatch.com/) is a website dedicated to highlighting fraud and research misconduct, now with over 20 000 retracted papers on its database. It came about because of a Japanese anaesthetist who faked—and had retracted—183 randomised trials, mainly concerning anti-vomiting medications. This was known for years before it came to light, and it is an interesting story.6–8A blog at the journal Science provides a fascinating insight on retractions,9 and tools are evolving better to detect fraud.10
The scandal of poor medical research is well known.11 We are becoming aware of the poor quality of most systematic reviews. Between 1986 and 2016, annual publication of systematic reviews increased by over 2700%, 18 times greater than for all studies. The 1400 systematic reviews published in 2000 grew to 29 000 by 2019, about 80 published each day.12 A superb critical analysis confirms a growing opinion that many systematic reviews are redundant and misleading; only 3% may be trustworthy and clinically useful.13 China is the largest produced of meta-analyses and second largest of systematic reviews,14 with considerable concerns about their conclusions.13
These considerations, plus many others, place a huge burden on journal editors and peer reviewers. Plucking the useful 3% from the many systematic review submissions is hard work, but rewarding. One of the top cited articles for the EJHP over the past 3 years was an overview review on medication adherence.15 It used evidence from eight good systematic reviews; evidence from a further 24 was excluded because of limited evidence, poor review methods, or both, meaning that 75% of our literature on this topic had to be discarded. But it was cited because quality was valued.
Journals are now beginning to demand more than manuscripts—for clinical trials it is the raw data for independent analysis. For systematic reviews and meta-analyses, we also need raw data for independent analysis, together with the confidence to reject the inadequate. It is a big ask, with a big bill. The question is who should pay the insurance cost of running red lights.
For readers of systematic reviews, just bear in mind that red means stop.
Ethics statements
Patient consent for publication
References
Footnotes
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Commissioned; internally peer reviewed.