Original article
Assessing the clinical significance of change scores recorded on subjective outcome measures

https://doi.org/10.1016/j.jmpt.2003.11.003Get rights and content

Abstract

Background

To date, clinical trials have relied almost exclusively on the statistical significance of changes in scores from outcome measures in interpreting the effectiveness of treatment interventions. It is becoming increasingly important, however, to determine the clinical rather than statistical significance of these change scores.

Objective

To determine cutoff values for change scores that distinguish patients who have clinically improved from those who have not.

Method

Data were obtained from 165 back and 100 neck patients undergoing chiropractic treatment. Patients completed the Bournemouth Questionnaire (BQ) before treatment and the BQ and Patient's Global Impression of Change (PGIC) scale after treatment. Three statistical methods were applied to individual change scores on the BQ. These were (1) the Reliable Change Index (RCI); (2) the effect size (ES); and (3) the raw and percentage change scores. The PGIC scale was used as the “gold standard” of clinically significant change.

Results

The RCI, using the cutoff value of >1.96, appropriately identified clinical improvement in back patients but not in neck patients. An individual ES of approximately 0.5 had the highest sensitivity and specificity in distinguishing back and neck patients who had undergone clinically significant improvement from those who had not. In terms of raw score changes, percentage BQ change scores [(raw change score/baseline score) x 100] of 47% and 34% were identified as having the highest sensitivity and specificity in distinguishing clinically significant improvement from nonimprovement in back and neck patients, respectively.

Conclusion

This study provides a methodological framework for identifying clinically significant change in patients. This approach has important implications in providing clinically relevant information about the effect of a treatment intervention in an individual patient.

Introduction

Evidence-based medicine advocates the application of findings from clinical trials in the treatment of individual patients. However, results from research studies are usually given as group mean values and the statistical significance of their differences. Data analyzed in this way give no indication of the proportion of patients in the group achieving a clinically important benefit from the treatment intervention. The information is therefore of limited clinical relevance, since there is no indication of the likelihood of a good response in a single patient. To counteract this, treatments are now being evaluated in terms of numbers needed to treat (NNT). NNT is an easily interpreted statistic informing the clinician of the number of patients that must be treated for a single patient to improve.1, 2 To calculate the NNT statistic, it is necessary to identify those patients in the group who have undergone a clinically important improvement.

Defining the proportion of patients who have clinically improved is problematic, however, when the outcome of interest is subjective and there are no directly measurable end points to indicate that the patient's condition has resolved. An example is in evaluating the effect of treatment in nonspecific back and neck pain where the outcomes of most interest are changes in patients' self-reported levels of pain and disability. In such cases, it is necessary to distinguish those individual change scores on pain and disability scales that represent clinically important change from those that do not.

There are now a number of methods available for identifying clinically important intraindividual change in subjective outcome measures.3, 4 These fall into 1 of 2 camps: the statistical or distribution-based methods on 1 hand and the global ratings or anchor-based methods on the other. The most common of the statistical methods are the effect size (ES) statistic and the Reliable Change Index (RCI), as well as simple change scores on the outcome measure itself.

The ES statistic is a method whereby mean differences between pretreatment and posttreatment scores can be standardized to quantify an intervention's effect in units of standard deviation (SD). It is therefore independent of measuring units and can be used to compare outcomes.5 ES statistics are widely used to assess the magnitude of treatment-related changes over time and can be applied both to group data and to data recorded from a single patient.4 Using threshold values put forward by Cohen6 and Testa,7 ES values for group mean changes and individual changes, respectively, can be interpreted as small, medium, or large treatment effects. The question remains, however, as to how effect sizes relate to patients' own perceptions of change in their condition and how effect sizes can be interpreted as clinically important effects. For example, thresholds for individual effect sizes in terms of clinically important change would enable patients to be identified as improved or not.

The RCI, originally proposed by Jacobsen et al8 and later modified by Christensen and Mendoza,9 is similar to the ES statistic in that it calculates mean differences between pretreatment and posttreatment scores but divides the difference by a standard error of measure that includes not only the SD of the measure but also its reliability coefficient. RCI values can be referenced to the normal distribution, and values that exceed 1.96 are unlikely (P < .05) unless an actual and reliable change has occurred.3 Again, the question arises as to how this statistical method of arriving at a clinically important change compares with patients' own perceptions of a real and worthwhile change in their condition following treatment.

To assess patients' own impressions of change, a global scale from “much better” through “no change” to “much worse” is commonly used.5, 10, 11 Since patients themselves make a subjective judgement about the meaning of the change to them following treatment, this scale is often taken as the external criterion or “gold standard” of clinically important change.11 This makes intuitive sense and underlies current debates on statistical versus clinical significance.12 Hence, in clinical trials in which end points cannot be directly measured, for example in pain conditions, assessing patients' experiences and what makes a difference to them in terms of a worthwhile and meaningful improvement is pivotal. Moreover, it is worth noting that statistical significance of change scores is derived from outcome measures that again rely on patients' interpretations and subjective judgments colored by their experiences of their condition.

The study reported in this article uses a patient self-report global change questionnaire based on a 7-point numerical rating scale (NRS) to determine from the patients' own perspective the degree of change (improvement) following treatment. This change was judged for its clinical importance by asking patients just how noticeable the change was. Using this as the “gold standard” of clinically significant improvement, the objectives of the study were to determine the sensitivity and specificity of statistical methods of determining clinically significant improvement, namely: (1) the RCI; (2) the ES statistic; and (3) the outcome measure's raw score and percentage score changes. Deyo and Centor13 highlighted the importance of a measure not only in its ability to detect a clinically important change when it has occurred but equally in its ability to detect when a clinically important change has not occurred. The issue is therefore not merely one of sensitivity to change but also the ability of a measure to distinguish between those patients who do improve and those who do not. All the statistical methods under test in this study were based on individual change scores before and after treatment recorded on the Bournemouth Questionnaire (BQ), a multidimensional outcome measure based on the biopsychosocial model of musculoskeletal pain and validated for use in back14 and neck15 pain patients.

Section snippets

Data collection

Consecutive new patients attending a chiropractic practice in Bristol, England with an episode of neck or back pain were recruited to the study. Existing patients who had not attended the clinic in the previous 3 months or more and presented with a new episode of back or neck pain were also recruited in a consecutive manner to the study. All patients were over 16 years of age. Eligible patients were asked to complete a pretreatment questionnaire, after which they underwent treatment as usual.

Results

One hundred sixty-five back and 100 neck pain patients were recruited to the study between November 2000 and September 2001. Of the total patient sample, approximately half were males (51%) and the mean age was 40.5 (±13.91 [SD]) years. There was no difference in either the gender ratio or age between back and neck patients. There was an approximately even split between acute and chronic cases being treated. In the back pain group, 58% reported that their current episode of pain had lasted less

Discussion

In this study, 3 statistical methods derived from different computations of change scores on the BQ were investigated for their ability to distinguish patients who had undergone a clinically significant change from those who had not. The a priori definition of clinically significant improvement was a score of 6 or more on a 7-point NRS based on patients' global impression of change in their condition following treatment. This equated to feeling better or much better and a noticeable,

Conclusion

This study presents a number of threshold values on statistical computations from change scores that best identify patients undergoing clinically significant change from those who have not. This work is based, however, on the PGIC as an external criterion of clinically significant change, and while this may be both conceptually reasonable and clinically relevant, it remains to be seen whether or not this is a valid assumption. By identifying proportions of patients who have undergone clinically

Acknowledgements

With thanks to Ms Luci Rowe and Ms Christine Kite.

Cited by (462)

View all citing articles on Scopus
View full text