Statistics watch: spotting 'sexy but unreliable' results

Simon · Jun 2, 2014

You've probably heard that something is amiss with life science research and that many scientists are beginning to question how robust scientific literature really is, eg Why Most Published Research Findings are False.

One big reason that findings appear real, but are false, is the 'gold standard' of assessing research, using p values - usually if p<0.05 then the tested hypothesis is treated as true with 95% confidence. Hurrah!

Except that's not the case. Read on, as this is all based on sound maths, not some weird theory.

This recent blog explains how even with p<0.05, a hypothesis might still be more likely to be wrong than right, especially if that finding is surprising. Given that research journals like to publish unexpected, surprising findings, we should be worried:

Daniel Lakens: Prior probabilities and replicating 'surprising and unexpected' effects

Using Bayesian probability (see below re betting on horses) Lakens shows that if both the new, tested hypothesis (your shiny new idea) and the null hypothesis (nothing doing) are equally likely before the study is done, then with a result of p=0.04 (success), the chances of the new hypothesis really being right is still only 73%. (not 96%)

Let's say the result is of the surprising sort that journals like to trumpet. If it's surprising it must be unlikely, perhaps very unlikely, but lets say for simplicity the new hypotheses is 25% likely (before the evidence from the experiment), while the null hypothesis is 75% likely. Obviously you can never compute an exact probability of being right, but we know it must be under 50% or it isn't surprising.

In this scenario, if the new hypothesis is only 25% likely before the study, and results are p=0.04 as before, then the possibility the new hypothesis is really right is still only 49% (at best) slightly more likely to be wrong that right.

Read the blog to find out more, read below for a dummy's guide (this dummy) to Bayesian probability.

Bayesian Probability
The good news is that although Bayesian probability sounds scary, it's not that hard to grasp, especially if you like to bet on horses. This is the basic idea:

Using the example of betting on a 2-horse race - Dogmeat vs FleetFoot, how would you bet if you knew that Fleetfoot had won 7 out of their 12 previous races? Easy - put your money on Fleetfoot. But what if you knew that Dogmeat won 3 out of 4 races in the rain, and it's raining now? Well, based on past performance, Dogmeat has a 75% chance of winning, and if it's raining you should bet on Dogmeat, not Fleetfoot.

That is why Bayesian probability matters - it looks at conditional probability, in this case who wins IF it's raining. In research, the question is how likely is this research true, given it likelihood of being true before the experiment.

Of course, the precise probability of it being right before the study is unknown, but if the finding is surprising we know the probability of it being right is less than 50%. And if that's the case, a p=0.04 does NOT mean a 4% chance of being wrong/96% chance of being right.

biophile · Jun 2, 2014

Statistical Flaw Punctuates Brain Research in Elite Journals

By Gary Stix | March 27, 2014

http://blogs.scientificamerican.com...elite-journals/?WT.mc_id=SA_sharetool_Twitter

Neuroscientists need a statistics refresher. That is the message of a new analysis in Nature Neuroscience that shows that more than half of 314 articles on neuroscience in elite journals during an 18-month period failed to take adequate measures to ensure that statistically significant study results were not, in fact, erroneous. Consequently, at least some of the results from papers in journals like Nature, Science, Nature Neuroscience and Cell were likely to be false positives, even after going through the arduous peer-review gauntlet.

anciendaze · Jun 2, 2014

Hi Simon, this is quite true, as far as it goes. Ignoring Bayesian methods is also of considerable historical importance, as this was how R. A. Fisher managed to delay recognition of the risks of smoking by about 10 years. (How many people died as a result?) While Fisher is regarded as a major force in the development of modern statistics, his role as a hired gun for the tobacco industry is glossed over, as is his strong advocacy of eugenics.

Other methods were available at the time, and were used by Bradford Hill to trace causes of a number of environmental factors in disease which are now firmly accepted. Bayesian methods can be misused to force a desired conclusion, but that can also be said of any statistical procedure, if nobody checks. (One irony in the controversy is that the UK government knew very well that Bayesian methods worked, they had been used in breaking the Enigma codes during WWII, but that was highly classified. You can also find related techniques in physics under the name maximum entropy methods. These also have applications which were highly classified during the 1950s, and some probably remain secret to this day.)

All of the most common statistical measures seem to be tied to the idea of normal distributions. These assumptions are used in applications because they are convenient, not necessarily true. They are not even true for the textbook example of variation in adult heights fitting a bell curve, unless you treat males and females separately. I've tried to explain the pitfalls of assuming that the Central Limit Theorem will save you in practice, if you use a modicum of randomness in sampling, even if the original distribution you start with is far from normal. Mostly, I feel like I've failed.

The CLT is not a trivial piece of mathematics. The easy proofs rely on assumptions which are commonly falsified in practice. Even the difficult proofs have implicit assumptions which are violated more often than those appealing to the CLT want to believe. My main argument has been that we have strong evidence that health problems resulting in major departures below mean values combine multiplicatively to degrade normal function, whereas the CLT assumes additive combination. The underlying distributions should then be considered as Lévy distributions, even if you can make them look sort of normal by cutting off the tail.

The importance of this observation is that a Lévy distribution does not have any bounded standard deviation. Any value you assign will be the result of the bounds you choose and the number of samples. Any deductions based on assumptions about variation in such cases will be almost entirely artifacts of the procedure you use.

I got into this line of inquiry from examining the methods used in the PACE trial. I already knew that psychological research was full of seriously bad examples of misuse of statistical methods. In comparing techniques with those used in medical research I realized similar errors were practically endemic in medical research. If you took a vote, those misusing statistics to back their agendas (and maintain funding) would win. Good examples are rare.

I'm afraid the problem of fixing applications and interpretation of medical statistics resembles cleaning the Augean stables.

Even if an impressive p value is real, the leap from correlation to causation is particularly dangerous. This can be illustrated by correlations between ice cream sales at Coney Island and cholera epidemics in India.

Firestormm · Jun 2, 2014

If I only knew a fraction of what you chaps were going on about I'd be a happy bunny :thumbdown: [scratch head]

Esther12 · Jun 2, 2014

I had to like @anciendaze 's post from what I'd picked up... but half of it went right over my head.

I'm a bit sceptical of claims that we can say how likely particular outcomes are in advance of performing a particular test - in many instances those making money from claiming to be experts will be suffering from a degree of group think.

Most of my medical reading is in dodgy areas where maybe there's more room for problems, but in more rigorous areas shouldn't they then be more able to rely more upon non-bayesian analysis (along with the need for independent replication, etc)?

I'm just instinctively distrustful of making more room for the judgements of researchers... but totally admit that I'm ill informed here, and that lots of people I respect write as if Bayesian analysis is clearly a good idea.

I was just thinking 'why would Bayesian analysis be good' - is it because people like having a way which seems systematic to assess how likely it is that a result is accurate? But if it rests upon people's prior beliefs about what is likely to be true (which leaves room for inaccurate oppinion), is it really better than just saying 'the results showed this, but I don't think that they're right'?

Also, I was reminded of this:

So they thought less bright children would be more likely to end up with CFS, and found:

We found an association between higher cognitive ability and self-reported CFS/ME, which was in contrast to our hypothesis. Since the association was only found when additionally adjusting for psychological symptoms and since it was rather small, we might consider this as a chance finding.

Click to expand...

I wonder if they'd consider a similarly sized finding to be just chance if it had supported their hypothesis? I'm not sure I remember having ever read a paper where a significant finding in support of the researchers hypothesis was suggested to be a matter of chance.

http://forums.phoenixrising.me/inde...ldhood-cognitive-ability-and-somatic-s.26753/

WillowJ · Jun 2, 2014

I wish I'd been able to take more math, and remembered and could use the math I did take, so I could engage this issue more comprehensively (I did understand some of it, but not as much as I wanted)...

but I have no trouble believing that many people writing, reviewing, and publishing papers have no idea how to use statistics, and that a lot of "established" thought is highly suspect.

anciendaze · Jun 3, 2014

I'll try to give an example of how the idea of normal distributions can be misused which will not require advanced mathematics. I'll use the textbook example where a normal distribution does turn up, distribution of adult heights.

First of all, the original data for this example came from medical examinations connected with military drafts. This meant each data set applied to healthy young adult males from a particular nation. It took a long time to produce similar comprehensive data for young adult females because these were not subject to draft. Even within a single nation there is a substantial difference between mean heights of adult males and females. (Humans exhibit sexual dimorphism.) This means we have two distributions with different means and standard deviations. It takes four numbers to describe this situation.

What happens when you ignore gender and combine these two into a single distribution with one mean and variation? You lose important information about the data. A normal distribution can be completely characterized by two numbers. All that is left is a variable number of sample points, plus random variation consistent with the distribution.

This immediately raises questions of interpretation, before we make further use of the data. We might say the mean height of an adult male or female describes an average male or female. What does the mean height of the combined distribution reflect, the height of a typical hermaphrodite?

When it comes to variation or standard deviation we are in even murkier waters. Variation in height of human males and females is fairly similar, but the variance computed as the mean square of all heights in the combined distribution is much larger. Once you get beyond interpretation of mean values, all common statistical tools involve analysis of variance. If you have muddled the data by indiscriminately mixing two different samples none of this analysis will mean much.

A second caveat shows up in the restriction of original data to young healthy adults, which is then used as a basis to analyze data from older sick adults. If you are studying, say, osteoporosis, you will find serious departures from your baseline statistics of heights. It is no longer enough to consider variation in heights as random when there are systematic changes apparent. You need to analyze those systematic changes before you can conclude that you are dealing with pure random variation in what remains. Even if you find some randomness in variation, you must still check to see if it conforms to ideas about being normally distributed. Frequently, this is simply glossed over because it is not convenient.

Pathological processes are admittedly complicated. Many are progressive, like the cumulative number of faults in complex machines operating for long periods of time without repair. The existence of one pathology makes the conditional probability of others much higher. (Consider MS and breast cancer.) There is enormous disparity in the way pathologies are distributed between individuals. This does not fit the hypothesis of a typical simple random process producing a normal distribution.

If an illness is far more prevalent in one group, like breast cancer in females, it is probably a bad idea to lump data together. You need to compare females with and without the disease. You may find useful information about males by comparing the few who get the disease with males who do not. What you absolutely, positively must not do is dilute data on the disease by including males who seldom get the disease on the same basis as females. This is how to make a disease disappear, or reinforce the bias that one gender is superior.

(We commonly accept gender differences in sports requiring strength or endurance by having separate competitions for men and women. This understanding has not penetrated to the study of illnesses characterized by patient reports of exercise intolerance. Exhorting women to behave more like men is unlikely to produce changes in gender. )

Perversely, efforts to improve numerical measures of significance frequently take exactly this approach, lumping distinct cohorts. To get better numbers you drop distinctions between different sample groups so you can put them all in one large group. The resulting cohort may not characterize anything more than the subjects who happened to be available to researchers. Honest researchers are at a disadvantage here because scrupulous restriction of claims then convinces people like myself it is a waste of time to even read how they tested an hypothesis which was meaningless to begin with. Snake oil and spin control can produce more impressive claims.

Implicit assumptions about etiology can confound a study in other ways. For a long time in the study of mental illness it was impossible to consider male hysteria, because males did not have the required organ (Greek hustera = womb). If males had a mental problem at all, it must have been something else. This would obviously introduce a tremendous reporting bias which would make nonsense of statistical conclusions.

Statistics may be numerical facts, but their interpretation is important, and frequently quite slippery. (R. W. Hamming: "The purpose of computing is insight, not numbers.") More studies claim results which are inadequately supported by data than there are studies with solid backing for claims. I tend to regard many studies not as objective science about the subjects, but as psychological projective tests applied to researchers. Give them random data, like a Rorschach test, and see how they interpret it.

Few studies are playing with a deck stacked as thoroughly as the PACE study was, but there is the popular fiction among researchers that a few sprinkles of randomness on top of a tall layer cake of systematic bias will absolve your statistical sins. They might do better to use holy water, but this would require confession.

Esther12 · Jun 3, 2014

Okay, I'm only semi following this, but is any human trait really normally distributed? It seems like, given how inter-related many things that determine human health are, the Central Limit Theorem wouldn't seem to really apply to much. Given that there are likely to be all manner of human subsets that are more normally distributed than when lumped together, is combining sexes really that much worse than combining people of different races (accepting the difficulty of 'races')? Economic classes (in some societies)? etc?

Lots of statistics seems to me to be a bit of a botch, but that's okay-ish so long as you remember that is just a bit of a botch, remember to keep engaging critically with the evidence, and don't start pretending stats are more valuable than they are.

anciendaze · Jun 3, 2014

Esther, we are dealing with models rather than raw reality, and some models are more appropriate for some uses than others. Any time you declare something "random" you are making a statement about your own ability to predict what will happen next. In terms of underlying reality it is important to realize random means "we don't know" what is going on in the kind of detail commonly expected. It may not even be possible to know.

Those models where normal distributions work best are the ones where nobody has any idea how to control the process, like spontaneous nuclear decays. It also helps if you have really large numbers of literally identical particles. Human beings don't approximate this very well, but that is what biomedical statistics must work with.

Done properly statistical inference is a very clever means of teasing out patterns from data that would otherwise tell you nothing. Any particular answer may only be yes or no. You are playing twenty questions with the Universe, and advancing one tiny step at a time, if everything is done right. This kind of honest humility does not bring in funding.

alex3619 · Jun 3, 2014

Models of reality are approximate at best. Even three bodies in space are not accurately accounted for by modelling - the classic three body problem. There is an old and good statement from General Systems Theory "The map is not the territory." All models are inherently abstracts - reductionist simplifications.

There are some caveats though. A good scale model using real conditions for testing is something different. Even then though it has limitations.

I am very concerned about the misuse of statistical methods. They are an indicator, but they do not substitute for reason, or testing the conclusions. By testing, I mean testing, not running simplistic confirmatory studies.

Even something like male and female heights has many confounds, including diet, disease, exposure to sunlight etc. Statistic analysis tells you about possibilities, not certainties.

anciendaze · Jun 4, 2014

Forgive me for taking exception Alex, but the three-body problem in Newtonian physics is a very accurate model. I wouldn't make this statement just to annoy you. There is an important point involved about models and emergent properties. To avoid differential equations and other confusing factors, I would mention Langton's ant as an example where you can't predict emergent behavior even if you have all the rules and initial data down cold. Before you run the simulation you are as uncertain of the outcome as everyone else.

While the Newtonian three-body problem is not a perfect description of gravitational interactions it is good enough to exhibit emergent behavior you did not anticipate. This is not a defect in the model, but in the way we expect models to answer questions. It took centuries to show that the question of stability in Newton's clockwork model of the Solar System is best answered by "maybe". Competing teams led by Jack Wisdom and Jacques Laskar have used the best possible computational models available at the time to push predictions of planetary motion far into the future. Go far enough and not only can you not predict which side of the Sun a planet will be on, you can't even predict that it will stay in orbit around the Sun. For Mercury the crisis where it will interact with Venus in ways we don't have the ability to predict may lie "only" 100 million years in the future.

Both predictions I'm talking about above are the result of strict determinism. This illustrates the subjective nature of judgments on "randomness". This is another red herring which appears in arguments about evolution and religion. If you can stand having your mind warped you might look at papers like Is Pi normal? or this one. Even the totally deterministic decimal digits of Pi pass all tests we can devise for randomness, except those using the knowledge that these are digits of Pi. If you could hide the source of your random numbers you could run a string of casinos using them. (Other parts of mathematics which could supply examples of apparent randomness include number theory and ergodic theory.) Just because we are presently unable to predict future behavior is no guarantee there is no systematic underlying process. In fact a great deal of scientific research would be pointless if we assumed there was no such process. Both religious and scientific fundamentalists have some kind of unacknowledged agreement that such underlying processes exist, but they disagree violently about what to call them.

Philosophers often confuse this situation with a breakdown in causality. The problem is not causality itself, but what we expect from it. There are physical behaviors which currently seem completely beyond a causal interpretation, but that is a very different question. Even if you could gather all the initial data down to the scale of atoms, and compute outcomes with unlimited precision, there are situations where moving an atom one micron would cause a planet to collide with another far in the future, or to be thrown out of the Solar System. In the context above it might tell you that Mercury would be captured as a new moon of Venus, or collide with it, or one or both would disappear into the depths of space. It is also possible the system would return to approximately periodic behavior until another crisis developed even further in the future. This kind of answer sounds more like the prediction of an unreliable oracle from mythology. It is not what we want, but what our tools tell us. Some predictions are extremely reliable, robust and easy to understand. This is not always true.

I apologize to other readers who find this confusing. I only brought this into discussion to illustrate that the word "random" is not so much an answer as an admission of ignorance.

alex3619 · Jun 4, 2014

I don't disagree with anything you just said, @anciendaze. Long term predictions require not only absolute precision about every entity involved, but perfect calculation of interactions. Maybe I am missing something in your argument?

I think precise prediction of complex phenomenon is impossible, if its large enough or taken far enough ahead. That doesn't mean that regularities cannot be found, or that a good understanding of specific processes cannot be attained. Its about the impact of very tiny amounts of uncertainty and imprecision over time, and the uncertainty just grows over time. Its about the impact of things totally unaccounted for in a model, the unknown. The worst kind of unknown is often the unknown we don't have any clue exists .. the unknown unknown.

Its also about the interaction of complex things with complex dynamics. Its not always computationally tractable.

The problem is very often not the model itself. Its the translation of reality to the model, and the translation of the model back to reality. Both of these things are often problematic. This is even the case with formal logic.Indeed I am coming to the idea that logic itself is fundamentally flawed, at least as it operates in the human brain. Abstract mathematics and real world messiness are not a perfect mix.

Models are often good within defined boundaries, but its not usually the case that actions in the real world have such defined boundaries.

Long term modelling in very complex systems is only a guide. It gives you insight into how things might react. The further off the time, the more factors involved, the more uncertainty there will be. Further, when a system is modeled there are limits placed on the modelling, which is a necessity to enable models to be tractable, especially computationally tractable. Things from outside those limits can completely destroy the results from the model. The chance of this happening grows over time.

I think there is good reason that serious science wont even consider a study using a p value of 0.05.

In any case, any statistical analysis has to be examined, and not just accepted based on the crunching of numbers. Excessive reliance on mathematics, and often poor use of mathematics, is a dumbing down. I suspect most serious scientists would not make the kinds of mistakes found in the PACE trial.

anciendaze · Jun 4, 2014

Alex, there's a really subtle point here. The Langton's ant model has both perfect information and potentially perfect computation to any finite number of steps. Imprecision is not a limitation. We still can't say that it will always settle down to building that "highway" when we start from a configuration we have not previously simulated. Even if it represents a really perfect model of some reality we are faced with our own limitations in assimilating what it tells us. We can't handle a zillion* different answers about a zillion different configurations.

This particular model has been proved "universal", equivalent to a Universal Turing Machine. This means that logical paradoxes known in relation to Turing and Gödel can arise as a result of its operation. I will remind readers that Gödel proved that if any formal mathematical system beyond counting and simple logic is consistent, we cannot prove this, and that there must be true propositions which can be stated within that system which cannot be proved, (incompleteness).

Even in the most rarefied heights of mathematics we must always do some trial and error testing to make sure the tools we are using have what attorneys call "fitness for a particular purpose." This is even more true when mathematics is applied to important practical problems. You should always suspect people who insist this is not necessary, just as you suspect people who get huffy when you ask to cut cards while playing poker.

I could make arguments that any mathematics involving continuous quantities like real numbers opens up even more cans of worms, but will not. What I can show via symbolic dynamics is that continuous mathematics must be at least as subject to inconsistency and paradox as discrete mathematics involving whole numbers and logic. This is because you can build models of finite state machines with unlimited storage within continuous dynamics that are equivalent to Turing machines.

There are any number of mathematical deductions I would not dream of challenging, because I have very good reason to believe they will hold up better than run-of-the-mill facts I encounter. There are any number of applications of mathematical models I will challenge, if they have not been carefully tested against physical reality. (I don't challenge those which have stood up to tests any more than I try skydiving without a parachute.) When you are dealing with matters that may be fundamentally impossible to test this way, like conjectures about mental states, I fall back to a more pragmatic approach based on demonstrated utility.

Utility in maintaining a particular social hierarchy does not count.

----
* technical term meaning a finite number so large I don't even want to name it.

Simon · Jun 4, 2014

Esther12 said:
I'm a bit sceptical of claims that we can say how likely particular outcomes are in advance of performing a particular test - in many instances those making money from claiming to be experts will be suffering from a degree of group think.

Most of my medical reading is in dodgy areas where maybe there's more room for problems, but in more rigorous areas shouldn't they then be more able to rely more upon non-bayesian analysis (along with the need for independent replication, etc)?

The short answer is 'no' - bayesian approaches has nothing to do with the specific area being studied.

I completely understand why people are sceptical of a method that relies on estimating 'prior probablity' as it introduces a subjective approach to what appears to be an objective statistical method of just p values and confidence levels.

Except that statistical method can be wildly misleading, but people assume it's robust, which is probably a worse situation.

Btw, the blog I first quoted agrees replication is essential, it just provides a technique to spot where postive findings are particularly likely to be wrong:

Lakens said:
This is why Lakens & Evers (2014, p. 284) stress that “When designing studies that examine an a priori unlikely hypothesis, power is even more important: Studies need large sample sizes, and significant findings should be followed by close replications.”

I really, really am not the right person to try to explain Baysian probability, but here's another attempt from me, with the help of this:
An Intuitive (and Short) Explanation of Bayes’ Theorem | BetterExplained

This is the central point

False positives skew results. Suppose you are searching for something really rare (1 in a million). Even with a good test, it’s likely that a positive result is really a false positive on somebody in the 999,999.

Normal hypothesis testing tell you:
"the probability of the data, given the (null) hypothesis"
So if p<0.05, then the null hypothesis is rejected at the 95% confidence level.

Except...
This isn't what you want to know. What everyone really wants to know is
"the probability of the hypothesis, given the data"
That's what most people assume the experiment yields, but that isn't correct. And that point is what everything hangs on, so you may need to read the full blog for a better explanation than I can give. However:

The blog goes on to give the example of using mammograms to test for breast cancer, to illustrate Bayes
- The test is 80% accurate at picking up cancers, missing 20% of them
- It also gives a false positive of 9.6% ie 96/1000 women with out breast cancer will get a positive test
- and only 1% of women have breast cancer.

What's the chance that a positive test for breast cancer actually means you have cancer?
80%?
No, it's actually 7.8%
That's because most positive tests come from the large number of women who don't have cancer. See the blog for the tedious maths.

Taking this back to the original point about positive experimental results, this example illustrates how prior probability (in this case the prior probability of breast cancer of 1%) makes a big impact on interpreting the result. Less than 10% of positives are real.

And although we can never know prior probabilities, this shows how big an impact they have. And assuming a surprising/unexpected result has a prior probability of 25% (which seems a fair assumption), only half of positive results will be true.

Just to show Bayesian maths does check out in the real world, remember the Air France jet that went down in the atlantic and took years to find? Bayesian analysis found it:

BBC News - MH370 Malaysia plane: How maths helped find an earlier crash:

Statisticians helped locate an Air France plane in 2011 which was missing for two years. Could mathematical techniques inspired by an 18th Century Presbyterian minister be used to locate the mysterious disappearance of Malaysia Airlines Flight MH370?

Esther12 said:
I'm just instinctively distrustful of making more room for the judgements of researchers... but totally admit that I'm ill informed here, and that lots of people I respect write as if Bayesian analysis is clearly a good idea.

I was just thinking 'why would Bayesian analysis be good' - is it because people like having a way which seems systematic to assess how likely it is that a result is accurate? But if it rests upon people's prior beliefs about what is likely to be true (which leaves room for inaccurate oppinion), is it really better than just saying 'the results showed this, but I don't think that they're right'?

As above, I think Bayesian analysis gives you a helpful guide as to how and why something might not be right. It will never be able to give an absolute answer, but can be a helpful tool in evaluating a result beyond saaying "I don't think that they are right".

But that probably wasn't the most convincing explanation of Bayes that the world has ever seen

Esther12 · Jun 4, 2014

Ta Simon. The cancer example made me realise I don't really know what non-Bayesian analysis is. I think I assumed all statistical approaches to answering that sort of question would take account of probabilities, and that Bayesian analysis came into play more when there was less data on the probabilites (if you see what I mean... I should duck out now).

re the discussion on randomness: I have to admit that I'm not really following the practical importance of some of this stuff... so I might as well ask about quantum mechanics and randomness. I used to have a better understanding of this, but have forgotten most of the reading I did: with randomness in quantum mechanics, researchers say that it is 'real' randomness, and not a result of us not knowing/being able to detect something... does anyone know how they know this/why they think they know this? A bit OT, but it had just been bugging me last week, and it seemed a bit relevant to the discussion here, so I thought I'd ask.

Also, for people a bit interested in the philosophy of mathematics, but without having much of a maths brain, I enjoyed 'The art of the infinite' by Robert Kaplan. I think it was meant to be a 'brief history of time' for maths, and it wasn't too hard to follow (other than a few bits that I never really got). (I have now forgotten most of it though).

alex3619 · Jun 4, 2014

anciendaze said:
Alex, there's a really subtle point here. The Langton's ant model has both perfect information and potentially perfect computation to any finite number of steps. Imprecision is not a limitation. We still can't say that it will always settle down to building that "highway" when we start from a configuration we have not previously simulated. Even if it represents a really perfect model of some reality we are faced with our own limitations in assimilating what it tells us. We can't handle a zillion* different answers about a zillion different configurations.

This particular model has been proved "universal", equivalent to a Universal Turing Machine. This means that logical paradoxes known in relation to Turing and Gödel can arise as a result of its operation. I will remind readers that Gödel proved that if any formal mathematical system beyond counting and simple logic is consistent, we cannot prove this, and that there must be true propositions which can be stated within that system which cannot be proved, (incompleteness).

Even in the most rarefied heights of mathematics we must always do some trial and error testing to make sure the tools we are using have what attorneys call "fitness for a particular purpose." This is even more true when mathematics is applied to important practical problems. You should always suspect people who insist this is not necessary, just as you suspect people who get huffy when you ask to cut cards while playing poker.

I could make arguments that any mathematics involving continuous quantities like real numbers opens up even more cans of worms, but will not. What I can show via symbolic dynamics is that continuous mathematics must be at least as subject to inconsistency and paradox as discrete mathematics involving whole numbers and logic. This is because you can build models of finite state machines with unlimited storage within continuous dynamics that are equivalent to Turing machines.

Universal Turing Machines would be subject to the same limitations I described. It doesn't matter that it in theory has universal applicability with respect to computation. What matters are the limitations to computation and mathematical systems.

That something is potentially a Universal Turing Machine does not alter the limitations of modelling in the universe.

Some things can be fairly reliably modeled. Some things can't. Only in highly artificial systems can there be certainty. In the real world the outcomes are subject to variation, and the degree of this is likely to increase over time.

Its fair enough to bring a model back to reality, and compare results to results in the real world. This gives you an indication of the model's utility, that is how reliable it is when used. Sadly its only an indication.

Imprecision creeps in at the initial data gathering stage. Let us suppose we are talking climate modelling. A system that measures data every square kilometer is obviously more useful than one that measures every ten square kilometers. Yet changes can occur at very small scales that affect the whole. Weather is also not flat. So now we are talking cubic kilometers. Why not cubic centimeters? Measuring at small scale becomes harder and harder to do, and more cost inefficient, and runs into interference from the measuring itself depending on the method.

When imprecision is present, then the eventual outcome after long computation can be very different with tiny changes in the initial settings. This is the butterfly effect.

Similar problems occur with measuring tools. A large old thermometer is less reliable than a digital one designed to measure tiny temperature changes. Yet no matter how precise method, variations will occur right down to the subatomic level.

Even if you had perfect equations, and the greatest computer ever made, such changes can create a divergence of outcomes with miniscule changes in input.

Yet lets go back before all this, to the design stage. How is it designed? For what purpose? Even in engineering there can be huge problems, and these cannot be modeled, though tolerance can be built into things to allow for some degree of variation in the real world. Suppose you rely on a critical component in a device. You test that device extensively, using whatever method you like. It meets minimum specifications. Then you do post production testing on samples to make sure those made meet minimum specifications.

But can you ever be sure? You can reduce risk with these methods, but never eliminate it. Worse, such production is subject to political, economic and (the real kicker) human influence. Some devices may not be even close to perfect. They may be sold. They may be used. People may die. Only then do we go back and say what is wrong, hopefully.

Such limitations are outside of the component testing parameters. Yet they are important.

My point is there are always limitations to modelling. We need to be aware of that, and things can be done to mitigate it. I see no reason that statistics is any different, except in highly artificial situations that may not even exist in the real world. What is measured, and the reliability of its measuring, and the nature of the system being measured, can lead to bias in results. So can the human element, from simple human error to pervasive entrenched cognitive dissonance.

anciendaze · Jun 5, 2014

Simon and Esther: There is a problem with terminology coming in here: objective vs. subjective. The standard "objective" interpretation of probability in terms of frequency gets downright weird, but you may need a graduate course in the foundations of probability to understand how weird. In effect they are talking about experiments run in many different parallel universes which only interact by way of your observation of results, which in no way modifies anything else. This avoids the pesky problem of early runs of an experiment modifying the environment in which later runs take place.

You can't even say the number of parallel universes involved is simply infinite. The mathematical basis for most common distributions requires a larger infinity than the infinity of counting numbers. One universe for every point on a line is more like it. How anyone is able to observe such results and accumulate measures of probability is a problem. It becomes more and more obvious that "objective" probability is only possible within the heads of the people imagining this.

"Subjective" probability depends on the fact that we must all operate on incomplete information. When our access to information changes, so does our estimate of probability. When all information is available we no longer talk about probability. A result is either true or false.

alex3619 · Jun 5, 2014

anciendaze said:
"Subjective" probability depends on the fact that we must all operate on incomplete information. When our access to information changes, so does our estimate of probability. When all information is available we no longer talk about probability. A result is either true or false.

I agree with this, but would add one caveat. People are really really bad at estimating probability. We can use math, including statistics, to do so, but our intuition on probability is not good. Even probability and statistical experts fail on being tested on basic estimates of probability. With the math they are fine. Asked to give an intuitive answer, and they often fail.

Esther12 · Jun 5, 2014

Thanks for the discussion all... think it's reached the point where I'm either going to have to choose to either do a lot of reading to understand it all, or duck out.

anciendaze · Jun 5, 2014

Don't worry Alex, the "subjective" interpretation of probability is based on a typical 18th-century concept, "the rational man", not actual human beings. (Notice that they had not got around to considering the possibility of rational women.) The idea was that this imaginary creature would look at data presented during an argument and weigh the information to reach a solid conclusion about where to place a bet, and Bayes' original thoughts were about bets involving your immortal soul, as in Pascal's wager.

(Bishop Berkeley also gets his innings here. See the subtitle of The Analyst. That book even has some valid criticisms of mathematical rigor in calculus mixed in with a lot of other material. Berkeley was less successful in dealing with the fact that, imperfect as it was, calculus had quite a track record in providing correct answers to problems previously considered intractable.)

This leaves us perplexed by terminology. The "objective" interpretation based on frequency regularly uses information which can't be derived from actual observation of a single objective Universe. The "subjective" interpretation is based on the opinions of rational creatures very different from anyone we actually know.

(If you disagree with that last statement, try telling your spouse about an amusing encounter with a member of the opposite sex which took place before you were married. When you recover, reflect that this individual may be the one person you thought you knew best.)

Statistics watch: spotting 'sexy but unreliable' results

Senior Member

Places I'd rather be.

Senior Member

Senior Member

Senior Member

คภภเє ɠรค๓թєl

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member