Lipkin's Monster ME/CFS Study: Microbes, Immunity & Big Data
The Microbe Discovery Project outlines an ambitious new study by top researchers that has collected patient samples, but needs desperately funds to complete the work.
Discuss the article on the Forums.

The Hawthorne Effect... & Overestimation of Treatment Effectiveness (2010)

Discussion in 'General ME/CFS News' started by oceanblue, Mar 6, 2012.

  1. oceanblue

    oceanblue Guest

    Messages:
    1,174
    Likes:
    362
    UK
    Evidence that self-report measures often used in clinical trials, including CBT for CFS trials, can overstate the real effectiveness of treatment.

    The Hawthorne effect, sponsored trials, and the overestimation of treatment effectiveness, Wolfe 2010

    This study, on a Rheumatoid Arthritis treatment, provides evidence that patients report higher scores on questionnaires when they are in clinical trials than when they are being treated by their own doctor, leading to an overstatement of effectiveness of the treatment being trialed.

    Abstract

    Objective. To determine if the results of rheumatoid arthritis (RA) clinical trials are upwardly biased by the Hawthorne effect.

    Methods. We studied 264 patients with RA who completed a commercially sponsored 3-month, open-label, phase 4 trial of a US Food and Drug Administration approved RA treatment. We evaluated changes in the Health Assessment Questionnaire disability index (HAQ) and visual analog scales for pain, patient global, and fatigue during 3 periods: pretreatment in the trial, on treatment at the close of the trial, and by a trial-unrelated survey 8 months after the close of the trial, but while the patients were receiving the same treatment.

    Results. The HAQ score (03) improved by 41.3% during the trial, but only by 16.5% when the endpoint was the post-trial result. Similar results for the other variables were patient global (010) 51.9% and 34.6%, pain (010) 51.7% and 39.7%, fatigue (010) 45.6% and 24.6%. Worsening between the trial end and the first survey assessment was HAQ 0.29 units, pain 0.8 units, patient global 0.8 units, and fatigue 1.1 units.

    Conclusion. Almost half the improvement noted in the clinical trial HAQ score disappeared on entry to a non-sponsored followup study, and from 23% to 44% of improvements in pain, patient global, and fatigue also disappeared. These changes can be attributed to the Hawthorne effect. Based on these data, we hypothesize that the absolute values of RA outcome variables in clinical trials are upwardly biased, and that the treatment effect is less than observed.
     
  2. oceanblue

    oceanblue Guest

    Messages:
    1,174
    Likes:
    362
    UK
    The Hawthorne Effect is named after a famous study in the 1920s looking at worker productivity at an industrial plant. The Hawthorne effect occurs where subjects improve or modify an aspect of their behavior being experimentally measured simply in response to the fact that they know they are being studied, not in response to any particular experimental manipulation.

    From the paper (all emphasis mine):
    The putative 'type B' Hawthorne Effect, where patient-reported improvements aren't real, is highly relevant to a lot of CBT clinical trials e.g. PACE. This paper studies the type B effect in a RA drug trial. I'll try to explain the study design, which is a little bit complex:

    1. The Sponsored Clinical Trial
    Stage 1 involved around 2,000 patients in a FDA-required manufacturer run trial of a drug* for Rheumatoid Arthritis. Patients were evaluated pre-treatment and at the conlcusion of the trial using a number of measures including the Health Assessment Questionnaire Disability Index (HAQ), as well as scales for pain, fatigue and overall improvement.

    The study found large improvements (effect sizes of around 1.0) across all measures at the end of the trial. Success!

    *the drug was unnamed at the request of the manufacturer, and as conditon of its permission for access to patients in this study

    2. The sneaky follow-up
    The researchers in this study, which was independent of the manufacturers, thought that patients might be inflating questionnaire scores just because of the extra attention they received in the trial. So they recruited patients at the end of the trial into the National Data Bank for Rheumatic Diseases (RDB), a large ongoing study of patients in normal clinical practice under the care of their usual physician. Crucially, the patients continued the same treatment they'd been taking in the formal clinical trial. The RDB routinely collects data from patients on a number of outcomes, including HAQ, pain, fatigue and overall status. This allowed researchers to compare the outcomes from the clinical trial with the outcomes for the same patients on the same medication but in a normal, non-trial, environment.

    The Findings
    All of the measures (HAQ disability, fatigue, pain, global health status) were higher in the sneaky follow-up study than at pre-treatment, but were only about half as high as found at the end of the formal clinical trial. The researchers hypothesised that these changes can be attributed to the Hawthorne Effect, where clinical trial reports of improvement were overstated.


    More to follow when I have more energy
     
  3. Sean

    Sean Senior Member

    Messages:
    3,250
    Likes:
    17,960
    Which just emphasises the importance of using genuinely objective outcome measures.
     
  4. floydguy

    floydguy Senior Member

    Messages:
    650
    Likes:
    242
    Yes, questionnaires suck and shouldn't be used. There seems to be a false belief that if one tallies up the numbers and does statistical analysis this miraculously becomes objective research.
     
  5. charityfundraiser

    charityfundraiser Senior Member

    Messages:
    124
    Likes:
    21
    There are two things here, the data results and the explanation which is actually just a hypothesis which the study wasn't even set up in a way to be able to test. The only thing they can really say is that the reported improvement was higher in the first study and lower in the second. They haven't shown why this is, whether it is due to as they claim, sponsored clinical trial vs. non-trial survey, or any other reason one could come up with to "explain" it such as first study vs. second study. How do they know it is due to the type of trial rather than time? They didn't do the study in a way that could distinguish that.

    They could have compared two groups side by side, one sponsored clinical trial and one non-trial survey. They could have taken some non-self-report type tests that indicate level of improvement to see if the change in self-report variables did or did not match non-self-report variables.

    They didn't show whether self-report variables differed from non-self-report variables. Even if they had shown that, they didn't show whether the difference was due to "being watched", time, natural adjustment of perspective to an improvement, drug effect wearing off the longer one takes it, or anything else. (As far as the abstract says. I haven't seen the full paper.)

    If you have been very unwell, even bedridden like CFS patients, and something makes you improve, initially you might feel like oh my gosh, I can take a shower, walk around, and surf the Web and call that a big improvement because it is compared to where you were before. After a couple months, you get used to the improvement and might realize, well yes that was an improvement but nowhere near normal.

    From reading various forums, this effect seems to exist in Dr. Montoya's Valcyte pilot study as well. 90% of patients initially reported 90% improvement. News article profiles of a participant and anecdotes on other boards suggest that after some time, the self-reported improvement was actually lower.
     
    oceanblue likes this.
  6. Enid

    Enid Senior Member

    Messages:
    3,309
    Likes:
    858
    UK
    Never doubted the Hawthorne effect - I've always responded to Docs/Consultants as "things do seem a bit better" for many reasons - eg. they are trying to aid, thinking more positively may improve the situation. Of course nothing done stopped the course of the illness. And I've known times of yep OK just to get any more useless questioning out of one's hair and escape.

    Fact is they do not believe in illness - full stop.
     
  7. oceanblue

    oceanblue Guest

    Messages:
    1,174
    Likes:
    362
    UK
    Background
    Evidence before this study:
    Note that in the final example both groups remain in the trial, the only difference was the intensity of the follow-up.

    Discussion
    This study found worse results in patients in the post-trial follow-up than had been recorded in the same patients at the end of the trial, even though they were on the same medication and the trial readings were taken in the final open label (unblinded) phase. This adds weight to the evidence for self-report bias from questionnaires, but is by no means definitive (as is true of most research in CFS). I thought it was worth summarising strengths and weaknesses of the study:

    Strengths & Weaknesses:
    This study compared the same patients on the same medication but no longer in a formal clinical trial.

    It's possible that the lower effect found after the trial is due to the effect of the drug wearing off within a year. The authors comment "The alternative interpretation that true improvement occurred in the trial but is lost at the end of the trialseems untenable", and perhaps more persuasively note that "Sponsored extension studies of RA trials usually show maintenance of improvement". So drug effects wearing off is possible but seems unlikely.

    Also, the observation of lower effects outside clinical trials ties in with their earlier evidence from the National Database that drug effects in normal clinical practice are less than those in trials. They then took a new sample from the National Database and found the outcomes in this trial were very similar to those for patients already in the database:
    One significant problem with this study is that the patients from the original trial who volunteered to join the National Database (and so were measured in the 'sneaky follow-up') were not fully representative of all patients in the original trial:
    It's also possible that there regression to the mean could play a role as the 'best' responders only were followed.

    So, not a perfect study, but I think taken with the all the other studies discussed here this does provide further evidence that subjective self-reports in clinical trials may overstate the benefits of treatment. Equally, I'm not aware of any good evidence that subjective self-report questionnaires are not subject to bias in clinical trials.
     
  8. oceanblue

    oceanblue Guest

    Messages:
    1,174
    Likes:
    362
    UK
    Hi Charityfundraiser

    You make some excellent points. Unfortunately I ran out of energy to complete my posts on the study before you replied and I hope some of the new information I posted there, including other relevant studies discussed by the author,s helps put their findings in context.

    Certainly this study does not give a definitive answer. You've suggested some better ways to run a replication study (the authors hands were tied to some extent by the manufacturer running the original trial who would placed a lot of limitations on how the study could operate). Similarly comparing self-report to objective measures in trial and non-trial environments is a great idea.

    However, as I mentioned in my previous post, I'm not aware of any good evidence that self-reports are reliable measures of pre/post change in clinical trials, even though the possiblity of self-report bias is often acknowledged. In fact, the only evidence I do know of is the often-cited Wiborg CFS/CBT study which found that improvements in self-rated physical function was not matched by any improvement in objective actometer ratings.

    I very much home that some researchers will try to get better data on this, not just in the field of CFS research but across all clinical trials. I'm always amazed at how often researchers rely on self-reported improvement data without good evidence that it accurately reflects reality.
     
    Enid likes this.
  9. oceanblue

    oceanblue Guest

    Messages:
    1,174
    Likes:
    362
    UK
    I certainly agree with that - where it is feasible. As yet there are no objective measures of either fatigue or pain so self-report is the only option. Nonetheless, if it can be convincingly demonstrated that self-report measures that are open to objective verification leads to overestimation of therapeutic effects, the same is likely to be true for purely subjective symptoms like fatigue.


    Comparison with CFS/CBT studies
    Like CFS/CBT studies, the Wolfe RA study is open label ie particpants know if they are receiving the 'active' therapy. However, the Hawthorne Effect is based on the extra attention received by patients in a trial. In the case of face-to-face therapy, that attention is far greater than in a drug trial. For instance, in PACE, participants had 15 hours of contact with CBT, GET and Pacing therapists and the therapeutic relationship that developed was independently rated as very strong. So any effect due to attention could be significantly stronger in such therapy trials compared with drug trials.
     
  10. Snow Leopard

    Snow Leopard Hibernating

    Messages:
    4,613
    Likes:
    12,428
    South Australia
    There was a CBT paper which compared the results of clinical trials to that of regular practise and found the effect size in clinical trials was much larger. I can't remember it off the top of my head though.
     
  11. charityfundraiser

    charityfundraiser Senior Member

    Messages:
    124
    Likes:
    21
    Thanks for including those quotes from the paper. The study about ginko biloba and level of follow up is interesting. With the others, one other thing that comes to mind with commercially sponsored clinical trials and research is that the published ones generally have more positive results than non-commercially sponsored research due to the "file-drawer" problem.

    Personally, I don't like the self-report questionnaires because all of the questions are relative. I'd rather have a list of activities and check off which ones I can do. I've been told by the doctor/researcher whom I asked that this is even more difficult than the vague relative scales but I don't understand why.

    The actometer studies would be more interesting if they hadn't used a homegrown software that hasn't been validated. I did some research in the actometer thread. If the research sucks and the actometer sucks, well, what can you make of it....
     
  12. oceanblue

    oceanblue Guest

    Messages:
    1,174
    Likes:
    362
    UK
    Thanks for that, and you're right re publication bias too.

    The SF-36 scale does measure what activities you can do (HAQ too, I think). Try it yourself. Nb there are likely to be only a couple of questions for each person where the score is in doubt e.g. limited a little vs not limited at all, but that could give a 5-10 point difference in scores (ex 100). PACE, for instance, only found an 8 point difference.

    Actometers are imperfect devices, whether home-grown or not. However, they are not subject to self-report bias so where self-report measures show a change and actometers don't, it's still a cause for concern - though not definitive proof the study didn't work.

    re:
    The outcomes don't report recovery but improvement, and that improvement is measured on an absolute scale eg 0-100, not relative to how they felt about it last time, so a change in prespective like you suggest shouldn't apply. If it did, then all sponsored extension clinical trials would report a diminishing effect, wheras most of them (according to the authors of this study), show the initial gains being maintained.
     
  13. oceanblue

    oceanblue Guest

    Messages:
    1,174
    Likes:
    362
    UK
    I remember the same paper just as vaguely! BACME data published to date is similarly disappointing. CBT proponents have argued the difference is that outside trials CBT therapists 'don't do it right'. That can't apply to this Wolfe Study as it uses a drug, not a face to face therapy.
     
  14. Enid

    Enid Senior Member

    Messages:
    3,309
    Likes:
    858
    UK
    Not a scientist oceanblue but that defence "don't do it right" sounds questionable - what not persuasive enough.
     
  15. Snow Leopard

    Snow Leopard Hibernating

    Messages:
    4,613
    Likes:
    12,428
    South Australia
    They can imagine up all the reasons they want, but the effect is real.

    See also:

    Cognitive-behaviour therapy for chronic fatigue syndrome: comparison of outcomes within and outside the confines of a randomised controlled trial.
    Quarmby L, Rimes KA, Deale A, Wessely S, Chalder T.
    http://www.ncbi.nlm.nih.gov/pubmed/17074300
     
    oceanblue likes this.
  16. Sean

    Sean Senior Member

    Messages:
    3,250
    Likes:
    17,960
    A minimum of 50 % of primary outcome measures should be objective, in all clinical trials.

    Otherwise we will spend the rest of our days in endless frustrating arguments about the 'meaning' of subjective terms, and the psychosocial crowd will win that game hands down. Just like they have done for the last quarter century.
     
  17. oceanblue

    oceanblue Guest

    Messages:
    1,174
    Likes:
    362
    UK
    Makes sense then to have a fatigue questionnaire (maybe on of Lenny Jasnon's, properly validated first) and an objective one for physical function. But I wouldn't drop SF-36 altogether until there is good evidence that actometers (or whatever else is chosen) does a better job of measuring genuine change in physical function in clinical trials.
     
  18. oceanblue

    oceanblue Guest

    Messages:
    1,174
    Likes:
    362
    UK
    Good recall and now read - thanks.

    I had hoped this study would be useful for looking at the Hawthorne Effect as it looks at CBT for CFS in the same clinic both within a clinical trial and in normal clinical practice. Unfortunately the two groups are too different for direct detailed comparison to be meaningful. The Clinical Trial was carried out in 93-94 by one therapist, also the main researcher (Alicia Deale), using a written manual for the CBT and on only 30 patients. The clinical data comes from 227 patients with significantly different characteristics from the Trial, between 1995 and 2000, treated by a number of different therapists not using the formal manual and with much lower follow-up rates (and those not completing follow-up had different characteristics from those that did).

    However, it seems reasonable to use the data on this large sample of patients in normal clinical practice to gauge the effectiveness of CBT in normal use. The two outcomes reported are Fatigue (Chalder Fatigue Scale) and the Work & Social Adjustment Scales (WSAS) which gives some measure of function.
    Results for CBT for CFS in normal clinical practice
    The paper only gives outcomes on a graph so data is estimated from that.

    WSAS
    Baseline 5.7; six months post-treatment 4.0, (change 1.7). Nb scores above 2.0 associated with significant functional impairment.

    Chalder Fatigue Score bimodal scoring: 0 best, 11 worst
    Baseline: 9.0; six months 5.8 (change 3.2). General population, (including those with health problems) approx 3.3.

    These figures are significantly worse than achieved in the clinical trial (6 months post treatment WSAS=3.3; Chalder=4.0).
     
  19. Snow Leopard

    Snow Leopard Hibernating

    Messages:
    4,613
    Likes:
    12,428
    South Australia
    Enid likes this.
  20. Enid

    Enid Senior Member

    Messages:
    3,309
    Likes:
    858
    UK
    No it doesn't :victory: (oh just from personal experience)
     

See more popular forum discussions.

Share This Page