The discrepancy between what the numbers say and what the authors say continues to impress me. Even though I know this is no more than reading tea leaves, in the absence of actual released data, bear with me while I examine another implication of the argument about "responders".
Bob has argued above that the percentage of responders was actually between 11% and 15%. Percentages give a false impression of precision, so let's convert this to between 1 in 6 and 1 in 9 of the participants in each arm of the study. 640 subjects provide 160 in each of 4 arms.
If only 1/9 of these responded, we would have 17 or 18 on which the results were based. But wait, there's more! Recall that 1/3 simply declined the walk test. This says objective results from GET were based on only 12 individuals. This is a far cry from the 640 touted in the news.
At this point it should be obvious that the effect of a small number of outliers is important. If only 3 individuals who were misdiagnosed were included, and fully recovered, this would be enough to account for a substantial change in mean value. Likewise, a small number of negative outliers could change results substantially, which bears on the question of possible harms. Excluding data from a few negative outliers could also have substantial effects on mean values.
Without seeing complete data sets we can't decide if any of the claims made were valid. In any case, it appears that the confidence projected by the authors, who started with 3158 patients referred, then whittled this down, ends up being based on a group which could fit in a modest-sized room. Absent considerable hype we wouldn't even be having this discussion.
Added: I don't want anyone to get into serious arguments defending these specific numbers, there are simply too many assumptions involved. What is important is the extent to which publicized results could depend on such a small group of patients that we could end up arguing about individual cases. This is not a valid basis for setting national policy.
An inference which is less obvious is that the somewhat mysterious process that reduced the initial intake from 3158 to 640, supplying 160 in each arm, assumes great importance if a handful of patients could skew the results. The assumption that these result apply to all such patients depends on unproven claims of diagnostic homogeneity. I'll say more about the consequences later.