Article Text


Chapter 5: Appraising the evidence
  1. Phil Wiffen1,
  2. Tommy Eriksson2,
  3. Hao Lu3
  1. 1Pain Research Unit, Churchill Hospital, Oxford, UK
  2. 2Department of Clinical Pharmacology, Laboratory Medicine, Lund University, Lund, Sweden
  3. 3Department of Pharmacy, Beijing United Family Hospital, Beijing, China
  1. Correspondence to Professor Phil Wiffen, Pain Research Unit, Churchill Hospital, Oxford OX3 7LE, UK; phil.wiffen{at}

Statistics from

Evidence-based Pharmacy was first published as a book by Phil Wiffen in 2001. The first chapter was published in Eur J Hosp Pharm 2013;20:308–12.

In the first edition of Evidence-based Pharmacy, Chapter 5 covered issues on accessing evidence. Since that time, accessing information via search engines on the internet has become commonplace such that it no longer warrants a separate chapter.

This chapter now deals with tools that can be used for assessing the quality of evidence using appraisal tools. While we may think that quality is assured by processes such as peer review or even the reputation of a particular journal, in reality that is often not the case. It is important that we make our own judgement before using an article in our practice. In this chapter, tools to help with that appraisal process for a range of study designs are presented. At the end of this article, we present some thoughts on appraising patient-oriented outcomes.

We present an algorithm to help identify study designs (figure 1). This is based on an idea from SIGN.1

Figure 1

Algorithm of study designs.

Observational and experimental research

There are two ways to test a hypothesis, either by observation or by experiment.

Observational research

In observational research, the researcher observes a population or group of patients or manipulates data about these patients. Such data may come from interviews or from existing data sets such as prescription analysis data or registries for particular diseases such as cancer or infectious disease. Surveys and case control studies are examples of observational research. A great deal of the research around pharmacy services is, of necessity, observational in style. Observational research is likely to be more biased but is still necessary and sometimes the only means to answer certain questions.

Experimental research

In experimental research, an intervention is performed as part of a planned investigation. The most powerful type of experimental study is the randomised controlled trial because the methods used are designed to eliminate a great deal of the observer bias found in observational studies.

Once we have identified potentially useful evidence, it needs to be appraised both in terms of the hierarchy (see Chapter 1) but also in terms of the validity of the study itself. So if the evidence is a randomised controlled trial, is the trial of a reliable quality? Most evidence contains some clues as to its value and these clues can help to determine the overall usefulness. Probably the most important question to ask is: Does the article address an issue of relevance to my practice? If the answer is ‘No’ then discard it move on to the next article. The whole point of appraisal is to determine how reliable that evidence is and to identify bias likely to affect the outcome. Many articles are written to prove a theory or point and it is not difficult to find publications where the conclusions bear little relevance to the results actually described!

A number of checklists exist which help to evaluate published literature. Some of these have originated or are used by the Critical Appraisal Skills Programme in Oxford.2

Systematic reviews and overviews

The following 10 questions are extremely useful in appraising a systematic review. They are designed to be used as an aid to assist in the general reading of reviews as they now regularly appear in the mainline medical journals but are also of use in assessing the value of a review for a specific inquiry. Reviews that tick most of the ‘Yes’ boxes are likely to be valuable in practice.

Box 1

Appraisal tool for a systematic review

Ten questions to make sense of a systematic review

For each question answer: Yes, No or Don't know

A. Are the results of the review valid?

1. Did the review address a clearly focused issue?

  For example, the population, intervention and or outcomes

 2. Did the authors look for the appropriate sort of papers?

  Did they deal with the issues and have appropriate study design?

Is it worth continuing?

 If the answer was No to either Q1 or Q2 above, there is little value in proceeding

3.Do you think important relevant studies were included?

 Look for search methods, databases used, reference list use, unpublished studies and conference abstracts

4. Did the authors do enough to assess the quality of included studies?

 Is there evidence of an assessment of potential bias?

5. If the results of studies have been combined, was it reasonable to do so?

 This would normally be seen as combined summary statistics such as a relative risk (RR) or an OR but also a displayed forest plot of a meta-analysis

B. What are the results?

6. What is the overall result of the review?

 Is there a clear numerical expression? In addition to RR or OR as in Q5 there may be a number needed to treat percentage of responders

7. How precise are the results?

 Have CIs been presented? Usually at 95%

C. Will the results help my practice?

 This section is designed to help assess whether the findings of the review are useful in your practice

8. Can the results be applied locally?

 Is the review relevant to your patient population?

9. Were all important outcomes considered?

 Are there questions that you would consider important that were not addressed in the review?

10. Are the benefits worth the harms and costs?

 Consider the adverse effects associated with the intervention but also costs

Adapted from Oxman et al.3

Therapy and prevention studies

Box 2

Appraisal tool for a clinical trial

Eleven questions to make sense of a controlled trial (either randomised or not)

Randomisation—the assignment of groups within a trial—is important in reducing bias. Controlled trials where no randomisation occurs are frequently found and should be considered as less reliable. The term quasi-randomised is sometimes seen where the allocation process is not truly random such as a clinic number.

For each question answer: Yes, No or Don't know

Are the results of the trial valid?

1. Did the trial address a clearly focused issue?

Is there a clear question; can the PICO be identified?

2. Was the assignment of patients to treatments randomised?

Look for the term randomised and ideally how this was undertaken (controlled trials will not be randomised).

3. Were all of the patients who entered the trial properly accounted for at its conclusion?

Was follow-up complete? Were patients analysed in the groups to which they were allocated?

Detailed questions

4. Were patients and study personnel ‘blind’ to treatment including any who assessed outcomes?

Look for the terms blinding, double blind or masking

5. Were the groups similar at the start of the trial?

Important issues include age, severity of the condition and possibly gender

6. Aside from the experimental intervention, were the groups treated equally?

What are the results?

7. How large was the treatment effect?

What outcomes were measured?

8. How precise was the estimate of the treatment effect?

Look for CIs, usually 95%

Will the results help locally?

9. Can the results be applied to your population?

Are the participants in the study similar to the population seen in your practice?

10. Were all clinically important outcomes considered?

Were the outcomes the ones you would choose? If not, the trial may be less valuable

11. Are the benefits worth the harms and costs?

This probably will not be in the trial but a rough evaluation should be possible to decide if you want to use this intervention in practice

Questions adapted from Guyatt et al.4 ,5

PICO, participants, interventions, comparisons, outcomes.

Box 3

Appraisal tool for a case control study

Eleven questions to make sense of a case control study

For each question answer: Yes, No or Don't know

Are the results of the trial valid?

12. Did the study address a clearly focused issue?

 Look for population, risk factors studies, does it look for benefit or harm?

13. Was an appropriate method used to answer the question?

 Is the use of a case control method appropriate? Usually used for rare conditions or harmful outcomes

14. Were cases recruited in an appropriate way?

 Is there a clear definition of a case? Did cases represent a defined population? Was there a reliable system for selecting cases? Is the timescale relevant? Was there a sufficient number of cases (look for a calculation to determine how many might be needed)

15. Were controls selected in an appropriate way?

 Look for any bias in the selection which could compromise the results. Did controls represent a defined population? Are the controls matched? Was there a sufficient number of controls?

16. Was the exposure accurately measured to minimise bias?

 Important issues include age, severity of the condition, possibly gender

17. What confounding factors have the authors accounted for? Have these been taken into account in the design and or analysis?

 These should be included in the Methods section. Confounding occurs when the link between exposure and outcome is distorted by another factor. Look for factors that were not considered according to your clinical judgement. A study that does not address confounding should be rejected.

What are the results?

18. What are the results of the study?

 What outcomes were measured?

19. How precise was the estimate of risk?

 Look for CIs usually 95%, a p value

20. Do you believe the results?

 A large effect has to be taken seriously. Can the result be due to chance? Have you spotted flaws that make the results unreliable?

Will the results help locally?

21. Can the results be applied to the local population?

 Are the subjects similar to your population? Does your setting differ significantly? Can you gauge benefit and harm for your local situation?

22. Do the results fit with other available evidence?

 Consider evidence from other study designs for consistency

Questions adapted from Guyatt et al.4 ,5

Economic analyses

Economic analyses are being demanded yet can be very difficult to interpret. Often the arguments lack credibility or relate to meaningless financial concepts. This set of questions goes some way to enable some appraisal to be made.

Box 4

Appraisal tool for an economic evaluation

Twelve questions to make sense of an economic evaluation

Screening questions (Is the economic evaluation likely to be useable?)

1. Was a well-defined question posed in an answerable form?

2. Was a comprehensive description of the competing alternatives given (ie, can you tell who did what to whom, where and how often)?

How were consequences and costs assessed and compared?

3. Was there evidence that the programme's effectiveness had been established?

4. Were all important and relevant consequences and costs for each alternative identified?

5. Were consequences and costs measured accurately in appropriate units (eg, hours of nursing time, number of physician visits, years-of-life gained) prior to valuation?

6. Were consequences and costs valued credibly?

7. Were consequences and costs adjusted for differential timings (discounting)?

8. Was an incremental analysis of the consequences and costs of alternatives performed?

9. Was a sensitivity analysis performed?

Will the results help in purchasing for local people?

10. Did the presentation and discussion of the results include all of the issues that are of concern to purchasers?

11. Were the conclusions of the evaluation justified by the evidence presented?

12. Can the results be applied to the local population?

Questions adapted from Drummond et al.9

Clinical guidelines

Guidelines enjoy popularity yet are often not evidence-based. This guide is useful both to the potential users of guidelines and also to those who write guidelines.

Box 5

Appraisal tool for guidelines

Ten questions to make sense of clinical guidelines

Are the recommendations valid?

I. Primary guides

1. Were the options for management and the projected outcomes of care clearly specified?

2. Was an explicit and sensible process used to identify, select and combine evidence?

II. Secondary guides

3. Was an explicit and sensible process used to consider the relative values of different outcomes associated with alternative practices?

4. Was the guideline subjected to a credible external review process?

5. Is the guideline likely to account for important recent developments?

What are the recommendations?

6. Are clear recommendations made?

7. Are important caveats identified?

Will the recommendations help you in caring for your patients?

8. Is the primary objective of the guideline consistent with your objective?

9. Are the recommendations applicable to your patients?

10. Are the expected benefits of guideline implementation worth the anticipated harms and costs?

Questions are adapted from Hayward et al.10

These skills are fundamental to pharmacists but are rarely taught during basic training although they are beginning to appear in some postgraduate training programmes. It is essential that pharmacists acquire the skills required to critical appraise the literature in each of the following areas: therapy; screening programme; treatment guidelines; systematic reviews; randomised controlled trials; case control studies and cohort studies.

Appraising outcomes in evidence

Patient-oriented outcomes

In Chapter 3, we have discussed the use of PICO to ask the right questions. ‘Outcomes’ are important to an evidence-based approach for two reasons. First, we must make sure the outcomes in the evidence are the outcomes of concern for the patients. Second, the outcomes need to be measured appropriately to make the evidence sound.

When we are looking at the evidence for drugs, many outcomes can be used for decision-making. For example, β-blockers, as part of treatment for heart failure, can have many outcomes of relevance. These are illustrated in table 1.

Table 1

Outcomes for patients using β-blockers

In this example, β-blockers were considered to be contra-indicated in heart failure until recent years. In the early days of β-blockers (1960s–1970s), the pharmacological theory had suggested that β-blocker suppresses cardiac output, and hence harmful for heart failure patients. This was also demonstrated in the ‘disease-oriented outcomes’, such as acute reductions on cardiac output. However, in the last two decades, it has emerged that β-blockers are actually beneficial to heart failure patients in long-term if they are started at an appropriate time and gradually increased.6 ,7 These are reflected by slowing down progression of heart failure, improved symptoms and survival.

Some of the outcomes in evidence are presented in the form of ‘surrogate outcomes’. For example, a lab value or a particular sign of the disease may not directly translate to a patient-oriented outcome, such as mortality or progression of disease.

There are a few other examples to illustrate the importance of consideration of patient-oriented outcomes (table 2).

Table 2

Patient-oriented outcomes

As illustrated above, when we are appraising evidence with outcomes reported, we need to ensure that all patient-oriented outcomes are considered and reported. If the evidence is presented in the form of surrogate outcomes, we have to be cautious to ensure it can translate to patient-oriented outcomes. If some important outcomes have not been considered, then care must be taken to consider other patient-oriented outcomes and look for other relevant evidence accordingly.

Measurement of outcomes

There are usually many forms of measurements for each outcome. For example, in the field of oncology, mortality is usually described as ‘1-year survival, 3-year survival and progression-free survival’. Therefore, we must ensure the outcome is measured in an appropriate way.

Objective outcomes are preferred to subjective outcomes, simply to avoid bias. However, many outcome measurements used are subjective patient-reported outcomes, for example, in pain medicine. The most widely used pain assessment measurement is the Visual Analogue Scale. It is a 0–10 point scale to measure a patient's pain experience. Although it reasonably quantifies pain, it is still subjective to individual patients and may be affected by many factors. In this example, if patient-reported outcomes are used, there must be a baseline assessment before treatment and during a post-treatment period. There are also objective pain measurements, such as whether the patient has used analgesics or days off work.11 The objective outcomes may be helpful when used in conjunction with patient-reported outcomes. This is to ensure patients’ improvement as well as scientifically sound evidence.

To summarise, if we are considering the evidence with patient-reported outcomes, we must ensure the tools are validated in the particular specialty for measuring the outcomes and used appropriately. At the same time, we should seek for objective outcome measurements to complement subjective outcomes in order to make the evidence more robust.


  • Phil Wiffen is Editor-in-Chief of EJHP and also teaches methodology for EBM and systematic reviews.

  • Tommy Eriksson is Professor in Clinical Pharmacy and Program Director of the MSc pharmacy programme at Lund University in Sweden.

  • Hao Lu is a clinical pharmacist based at the Beijing United Family Hospital in China.


View Abstract


  • Competing interests None.

  • Provenance and peer review Commissioned; internally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.