Background There has been an interest in real-world evidence (RWE) in recent years. RWE is usually generated from data derived from routine healthcare, such as electronic healthcare records and disease registries. While RWE has many advantages, it is often open to various biases, which may distort results. Appropriate understanding and interpretation are critical to the best use of RWE in healthcare decisions.
Methods On the basis of a literature review and empirical research experience, we summarised the concept and methodological framework of RWE, and discussed in detail methodological issues specific to routinely collected healthcare data and observational studies using such data.
Results RWE is derived from a spectrum of data generated from the real-world setting, using two broad study designs including observational studies and pragmatic clinical trials. Real-world data may usually be collected through routine practice or sometimes actively collected with a research purpose. Observational studies using routinely collected data (RCD) are the most common type of RWE, although they are prone to biases. When planning and implementing RWE studies, coherent working steps are warranted, including definition of a clear and answerable research question, development of a research team, selection of a fit-for-purpose data source, choice of state-of-the-art study design, establishing a database with transparent data processing, performing multiple statistical analysis to control bias, and reporting results in accordance with established guidelines.
Conclusions RWE has been mounting over the years. The appropriate interpretation and use of such evidence often warrant adequate understanding about methodology. Researchers and policymakers should be aware of the methodological pitfalls when generating and interpreting RWE.
- evidence-based medicine
- practice guideline
- practice guidelines as topic
- research design
Data availability statement
No data are available. Not applicable.
Statistics from Altmetric.com
In recent years, the concept of real-world evidence (RWE) has become widely accepted. In particular, with the release of the 21st Century Cure Act in the USA, the interest in RWE was fuelled among researchers and policymakers.1 RWE may have a wide spectrum of applications, such as understanding about treatment patterns, informing treatment outcomes in vulnerable populations, and assessing treatment effects in real-world practice.
However, misunderstanding or confusion is still common concerning what RWE is and how one should interpret the evidence. For example, a common misconception about RWE is that it can only be generated using data from routine clinical care and does not involve new data collection over a pre-defined protocol.2 Another common misconception is that RWE merely refers to evidence generated from observational studies.3
In reality, the methodological framework for RWE is more complex than classical clinical trials. Lack of strong methodological and statistical expertise may sometimes lead to inappropriate handling of data, thus producing unreliable or even incorrect conclusions.3 Therefore, it is important to better understand the concept and methodological issues regarding RWE.
Conceptual framework of real-world evidence
The term ‘real-world evidence’ is often used to refer to clinical evidence about utilisation (eg, treatment pattern or compliance), benefits and harms of medical products in a defined population or a subgroup population. The evidence is typically derived from analyses of healthcare data outside of classical clinical trials.2
Data sources for generating RWE usually come in two main forms, including routinely collected healthcare data (RCD) and actively collected healthcare data in routine clinical practice settings.4 While RCD are often generated from routine practice for non-research purposes, such as electronic medical records, claims data and health surveillance data, actively collected healthcare data are often collected with certain research purposes. These two forms of data are important sources of real-world data, and their common features are that the data are derived from routine clinical practice.
RWE is derived from the analyses of real-world data, and in many cases is based on observational study designs. However, such studies are susceptible to bias due to the complexity in the healthcare setting, data, and observational nature of the design. Both researchers and evidence users must be highly cautious about observational studies using real-world data. It is also worth noting that RWE is not equivalent to observational study. An interventional study can also be used to generate RWE, and one such design is a pragmatic clinical trial.3 5 This study design is often prioritised where the treatment effect on a heterogeneous population is urgently needed, the optimal treatment is largely unknown in routine practice, or medical needs (typically those related to patient welfare) are insufficiently met.6 In this paper, we specifically discuss issues about observational studies using real-world data, especially routinely collected healthcare data.
Issues about data sources: focusing on routinely collected data
Routinely collected healthcare data represent the most common type of real-world data. Because these data are typically collected in routine healthcare without a priori research purposes, their quality and applicability are often issues of methodological concerns.7 8
The quality of RCD may be assessed in two dimensions—completeness and accuracy.9 Completeness refers to the extent to which data are missing from the research perspective. For example, while information regarding cigarette smoking is important for many epidemiological studies, this information may often go unrecorded in routine practice.10 Missing data are inevitable in RCD. However, understanding the extent to which important variables are missing among RCD and potential reasons for them missing is often needed. Another important dimension is accuracy. Information in electronic medical records, such as disease codes or numerical values, may sometimes be recorded inaccurately. Also, the underlying reasons may vary.11 Validation of data is often needed when applying RCD for research purposes, and the involvement of manual checking is also often needed.12
One should also assess the relevance of data. In the generation of RWE, the choice of data should always be made according to predefined research purposes.13 For example, claims data may be more suitable for studies on health economics and treatment patterns; however, they may not provide sufficient information on patient characteristics, laboratory results or clinical endpoints, which are crucial for studies assessing treatment effects.14 In another example, spontaneous adverse events report databases may often be used for detecting a signal of adverse events or generating hypotheses, but are of limited relevance for testing a hypothesis about adverse drug reaction. In the third example, electronic health records contain abundant clinical information, such as operation, imaging and laboratory results. They are useful data sources for answering a wide spectrum of clinical questions, ranging from disease burdens to prognoses, but are lacking regular follow-up visits.7
In order to enhance the use of real-world data, several guidance documents are readily available that discuss the key issues about data sources for pharmacoepidemiology studies.15–17
Observational studies using routinely collected data
Observational studies are the most common approach to using routinely collected data. A common research flow may be used when planning and implementing such studies (figure 1).
Research question, study planning and design
In observational studies using RCD, the initial step is to specify a clear and answerable research question that contains the key components, including population, exposure, comparator (if applicable), outcome and timing. A multidisciplinary team would usually be developed which is responsible for the planning, design, and implementation of a study. In the study planning, the research team needs to identify potential data sources and determine the appropriateness of the data. The data appropriateness often varies by research questions. However, it may commonly be assessed in dimensions including representativeness, size of data, availability, completeness and accuracy of key research variables, and duration of database coverage.18
In observational studies using RCD, study designs may be highly variable and are typically retrospective in nature. Retrospective cohort studies, case–control studies or nested case–control studies are the most frequently chosen epidemiological designs in assessing effects of drug treatments. However, these designs are usually subject to selection bias and measurement bias, both of which may distort the estimates of drug treatment effects, and even flip the direction of the effects. Many forms of selection biases have been identified in studies using RCD,19 and indication bias is among the most common selection biases that warrants strong attention.20 Another common bias is time-dependent bias, such as immortal time bias and time-lag bias, which may derive from a wrongly defined timeframe of the exposure group (eg, a waiting time between initiation of follow-up and treatment inappropriately assigned to the exposed group).21 There is an extensive literature discussing the different forms of selection biases22–24 and interested researchers may find them helpful in designing their studies. In general, new user design, treatment-naïve new user design or active comparator are often desirable strategies to resolve some of these important biases.25 26New user design align exposure and comparator groups at the same initiation time, while active comparators can restrict participants with the same indications.
Developing research dataset from the RCD
On the completion of study planning and design, a research dataset should be established. As RCD are collected for administrative purposes, they are not usable for observational studies in their original forms. Therefore, it is necessary to transform the data into a uniformed and structured format. The transformation of RCD into a research dataset may include multiple running steps, such as data linkage, structurisation of the free texts and variable labelling.
Additional data cleaning is also an essential part of building a research dataset. This process often includes establishing variable dictionaries, processing special data (ie, extreme values, outliers, missing values and contradictory data). Notably, raw data, detailed cleaning rules, and data processing procedures should be kept to ensure the transparency of the study.
A specific question of using RCD is to how to frame operational phenotyping algorithms—computer-executable definitions that use diagnosis codes, clinical markers, or demographic characteristics—for identifying research variables (including exposure, outcome and covariates).27 The validity or reliability of these codes or algorithms for research variables are critical.
Statistical analysis in observational studies should be mindful of controlling for confounding factors. Confounding is very common in observational studies, and many types of confounding may be present in the use of RCD for assessing drug treatment effects, for example, time-dependent confounding and unmeasured confounding. These issues may often distort the estimated treatment effects.19 20 28–30 Various methods have been developed to address confounding issues such as multivariable models, propensity score analysis and instrumental variable analysis.31–33 Guidance is available for the use of sophisticated statistical methods in the analysis of RCD.34
Given these methodological challenges in observational studies, both regulatory decision-makers and academic experts are committed to developing methodological guidelines about observational studies using RCD.13 15 25 35–39 It is always recommended that researchers should develop a research protocol for any study.25
Complete and transparent reporting is essential for evaluating the reliability and validity of study findings. However, the reporting quality of observational studies using RCD is often suboptimal,40 especially in the elaborations of research questions, type of data sources, time frames, study designs, and statistical models.40 41 Several guidelines have been developed to enhance reporting, such as Strengthening the Reporting of Observational Studies in Epidemiology (STROBE),42 the Reporting of studies Conducted using Observational Routinely-collected Data (RECORD) statement,43 and its extension for pharmacoepidemiology studies (RECORD-PE).44 Interested researchers should always consult these guidelines for reporting of their studies.
In this paper, we provide a snapshot of the concepts and key methodological issues for RWE. For researchers, real-world data have provided important data sources to address a variety of questions. Nevertheless, important methodological challenges may be present, and careful planning, implementing and reporting of such studies are highly desirable. The users of RWE should also be cautious when interpreting the findings from such studies and should always be aware of the potential methodological pitfalls.
What this paper adds
What is already known on this subject
The release of the 21st Century Cure Act in the USA has accelerated the interest in real-world evidence (RWE), especially among healthcare researchers and policymakers.
Misunderstanding and lack of methodological know-how is common about RWE.
What this study adds
This paper summarises the conceptual framework of RWE and proposes a research flow to assist in the understanding and implementation of an RWE study.
This paper provides an overview of pitfalls inherent with RWE, especially those observational studies using routinely collected healthcare data, and offers reference to guidance documents about reporting.
Data availability statement
No data are available. Not applicable.
Patient consent for publication
EAHP Statement 6: Education and Research.
Contributors Conceptualisation: XS. Writing – original draft: ML. Writing – review and editing: XS, ML, YQ, WW. XS is the guarantor who takes responsibility for the overall content.
Funding This research was supported by Sichuan Youth Science and Technology Innovation Research Team (Grant No. 2020JDTD0015), 1.3.5 project for disciplines of excellence, West China Hospital, Sichuan University (Grant No. ZYYC08003), and China Medical Board (Grant No. CMB19-324).
Competing interests None declared.
Provenance and peer review Commissioned; internally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.