I thought I would read the paper and try to lok at it as a paper rather than work of particular authors I dislike.
Method
They start with the premis that CBT makes a significant difference. Their trial methodology cannot conclude this it can at best draw a conclusion around delivery methods. This is as they have no control group.
There randomisation is not very random. They are not matching people with different characteristics across the two groups but just allocating people at random across the groups.
The CBT they report seems to have a number of different aims. This suggest previous trials are a bit like drug trials that try to test a cocktail of drugs.
"The essence of CBT is to help patients to change behavioural and cognitive factors, focusing specifically on changing avoidance behaviour, unhealthy sleep patterns and unhelpful beliefs in order to improve levels of fatigue and disability." and also "Social factors, for example work, relationship or child care issues, were addressed if they were identified as being important in perpetuating the symptoms and disability associated with their CFS."
Its hard to say which aspects make a difference (e.g. is it help talking through practical issues such as work, relationships and child care issues - stresses that must use a lot of energy or helping people with sleep patterns or telling people they are not really ill). It is probably fair that this trail doesn't address these points since they have an initial assumption that their technique works so why look further at the technique itself. However, they are concerned with the delivery of the technique and persumably different peices are easier to deliver in different ways. So I feel they should have recorded the % of time spend on the different aspects of the treatment.
The 3 hour initial CBT session seems very long with someone with ME. This brings in ordering issues. Persumably people will take in the early parts better than the later bits when concentration will be poor.
They use 8 theorpists but they don't specify if the patients have the same theropist or whether the same theropists are used for both face to face and phone based theropy. They also don't talk about recording which theropist was used hence they can not judge whether effects are due to delivery methods or the skills of the theropist ( for example in helping people cope with their practical anxieties associated with disability or in making people feel they should give positive results (bias to help interviewer).
They talk of their use of a number of scales. I've not followed up the references where they say they are considered
"reliable and valid". I'm interested in what they mean by this but they don't specify the properties. I would expect, for example that an improvement of one unit in one symptom (say concentration or cognitive fatigue) would lead to the same change in the scale as a one unit improvement in a different symptom (say physical fatigue). From Chandlers original paper it is clear that this is not the case since the principle component analysis showed the questions are highly correlated hence questions are not independent and also they are not balanced. A different property i would expect of a scale is that a one unit improvement of a symptom to lead to the same movement on the scale whereever the person is on the scale. I've not worked through the numbers but where questions are not independant and the use of a very course scale suggests that this will not be the case.
From a pychological perspective I believe questions should be mixed in order and also suggesting a range of intents - that is ask the same question several times in different ways.
The statistical analysis section doesn't make a great deal of sense to me. It suggests they are following some standard methodology but without referencing it. Without equations or a detailed reference to the methodology I'm not sure what exactly they are doing. I suspect there analysis is fairly standardised however it may be done without much thought as to whether all the assumptions behind the different techniques are valid for the data sets (e.g. are the assuming normal data, if they are using non parametric tests are all the assumptions (e.g. same distribution shape) valid.
With their discussion about the use of regression to find which variables can be associated with the outcomes I am concerned with their use of the term 'predictor' to me this is a loaded term that suggests causation when all the regression will show is some form of correlation. Where there are correlations this may suggest areas to look at to understand further cause but I think any scientist confusing correlation with causation should be sacked. For example, there may be correlations between people being members of support groups and having bad outcomes - this doesn't mean being a member is a cause but maybe sugests looking for some underlying reason. This could be as simple as the elements of the treatment that are effective are helping people deal with practical problems associated with disability hence reducing stress (which requires a lot of energy). People who are members of support groups may already have those things covered hence improvement is less. I'm not suggesting that this is the case but just sugesting that correlations as to where to look for mechanism and hence causation. (A very good and relatively non technical paper on the subject
http://www.rochester.edu/College/PSC/clarke/204/Freedman91.pdf)
Results
If I was a suspisious person I would wonder about why 7% had been sent for group CBT and 7% referred back to local services. These are what happened not reasons.
I don't understand why the groups did not have the same number in.
The selected set of people seemed to already have a high recovery rate 3 out of 80 in the (unspecified) time between recruitment and starting the trial. This seems high does this represent the overall selection effects. The trial seemed to have 28 and 30 people in the different groups.
Fig 1 post treatment measures are assessed on more people than had treatment (29 vs 28).
Table 2: The mean baseline scores include those who didn't have treatment this doesn't strike me as very useful. Quoted improvements may just be the selection of the particular sub group. The variance increases with treatment - to me this is interesting and it would be nice to see the distributions are they multimodal or smooth with an increased spread. Given potential non lineararities atleast in the fatigue scale the step down could also be a function of a slight improvement that manifests itself in changes in several highly linked answers. I'm not sure what the ES column is meant to tell me.
The next bit seems complex. My understanding is that they analyse the trends in the data for those that completed the trial and then use this as a way of filling in missing values for those people who dropped out of the trail. I don't feel that this analysis is valid we don't know why people dropped out. If they dropped out at random maybe the analysis is valid but if those getting the least value dropped out then it doesn't seem reasonable to treat them as if they were the same as those who remained in the trial. The initial analysis said that they looked at various factors they had recoreded and these didn't predict drop outs and therefore drop outs must be random. However this inference can only be made if sufficient variables are present to characterise the patient here they are not - of particular significance is the absance of any symptoms in the variable set.
Discussion
I'm pretty sure there is a body of literature looking at how people interact via phone and video conferencing with having/not having met (reported in HCI work - I've heard mention of it not read it). They should look at this literature when assessing the different techniques and the effects of the initial face to face contact.
They try to give excuses as to why their trial results are not as good as previous ones but I'm not convinced they are entirely comparable,
There was only one therapist in the Deale et al. trial whereas eight therapists treated participants in this trial which is more indicative of real life. Therapists may have had diverse treatment outcomes.
This factor should have been analysed in an attempt to control for different therapists treating different proportions of different groups. Also this should have been a factor in their analysis of drop outs.
showed smaller changes in mean scores on our main outcomes than our previous
randomized controlled trial
Fatigue and social adjustment scores at 6 months follow-up in the face to face group in this study were slightly better than the outcomes from our study of routine clinical practice
I think the inference from this is that they are getting better results for trials than in clinical practice.
I think there are far more methodological flaws than the ones discussed.
They are trying to give the impression that this trial backs up CBT. They don't quite say this but try to imply it. The trail was not designed to do that and no conclusions can be drawn.
In there conclusion they say
sessions was highly structured and manuals were adhered to by therapists therefore making the treatment easy to replicate. but there is no evidence for this within their paper infact the previous discussion on methodological flaws suggests there is no basis to draw this conclusion.