Bayesianism in science

Bayesians are so prevalent in Artificial Intelligence (and, to be honest, so strident) that it can sometimes be lonely being a Frequentist.   So it is nice to see a critical review of Nate Silver’s new book on prediction from a frequentist perspective.   The reviewers are Gary Marcus and Ernest Davis from New York University, and here are some paras from their review in The New Yorker:

Silver’s one misstep comes in his advocacy of an approach known as Bayesian inference. According to Silver’s excited introduction,
Bayes’ theorem is nominally a mathematical formula. But it is really much more than that. It implies that we must think differently about our ideas.
Lost until Chapter 8 is the fact that the approach Silver lobbies for is hardly an innovation; instead (as he ultimately acknowledges), it is built around a two-hundred-fifty-year-old theorem that is usually taught in the first weeks of college probability courses. More than that, as valuable as the approach is, most statisticians see it is as only a partial solution to a very large problem.
A Bayesian approach is particularly useful when predicting outcome probabilities in cases where one has strong prior knowledge of a situation. Suppose, for instance (borrowing an old example that Silver revives), that a woman in her forties goes for a mammogram and receives bad news: a “positive” mammogram. However, since not every positive result is real, what is the probability that she actually has breast cancer? To calculate this, we need to know four numbers. The fraction of women in their forties who have breast cancer is 0.014, which is about one in seventy. The fraction who do not have breast cancer is therefore 1 – 0.014 = 0.986. These fractions are known as the prior probabilities. The probability that a woman who has breast cancer will get a positive result on a mammogram is 0.75. The probability that a woman who does not have breast cancer will get a false positive on a mammogram is 0.1. These are known as the conditional probabilities. Applying Bayes’s theorem, we can conclude that, among women who get a positive result, the fraction who actually have breast cancer is (0.014 x 0.75) / ((0.014 x 0.75) + (0.986 x 0.1)) = 0.1, approximately. That is, once we have seen the test result, the chance is about ninety per cent that it is a false positive. In this instance, Bayes’s theorem is the perfect tool for the job.
This technique can be extended to all kinds of other applications. In one of the best chapters in the book, Silver gives a step-by-step description of the use of probabilistic reasoning in placing bets while playing a hand of Texas Hold ’em, taking into account the probabilities on the cards that have been dealt and that will be dealt; the information about opponents’ hands that you can glean from the bets they have placed; and your general judgment of what kind of players they are (aggressive, cautious, stupid, etc.).
But the Bayesian approach is much less helpful when there is no consensus about what the prior probabilities should be. For example, in a notorious series of experiments, Stanley Milgram showed that many people would torture a victim if they were told that it was for the good of science. Before these experiments were carried out, should these results have been assigned a low prior (because no one would suppose that they themselves would do this) or a high prior (because we know that people accept authority)? In actual practice, the method of evaluation most scientists use most of the time is a variant of a technique proposed by the statistician Ronald Fisher in the early 1900s. Roughly speaking, in this approach, a hypothesis is considered validated by data only if the data pass a test that would be failed ninety-five or ninety-nine per cent of the time if the data were generated randomly. The advantage of Fisher’s approach (which is by no means perfect) is that to some degree it sidesteps the problem of estimating priors where no sufficient advance information exists. In the vast majority of scientific papers, Fisher’s statistics (and more sophisticated statistics in that tradition) are used.
Unfortunately, Silver’s discussion of alternatives to the Bayesian approach is dismissive, incomplete, and misleading. In some cases, Silver tends to attribute successful reasoning to the use of Bayesian methods without any evidence that those particular analyses were actually performed in Bayesian fashion. For instance, he writes about Bob Voulgaris, a basketball gambler,
Bob’s money is on Bayes too. He does not literally apply Bayes’ theorem every time he makes a prediction. But his practice of testing statistical data in the context of hypotheses and beliefs derived from his basketball knowledge is very Bayesian, as is his comfort with accepting probabilistic answers to his questions. 
But, judging from the description in the previous thirty pages, Voulgaris follows instinct, not fancy Bayesian math. Here, Silver seems to be using “Bayesian” not to mean the use of Bayes’s theorem but, rather, the general strategy of combining many different kinds of information.
To take another example, Silver discusses at length an important and troubling paper by John Ioannidis, “Why Most Published Research Findings Are False,” and leaves the reader with the impression that the problems that Ioannidis raises can be solved if statisticians use Bayesian approach rather than following Fisher. Silver writes:
[Fisher’s classical] methods discourage the researcher from considering the underlying context or plausibility of his hypothesis, something that the Bayesian method demands in the form of a prior probability. Thus, you will see apparently serious papers published on how toads can predict earthquakes… which apply frequentist tests to produce “statistically significant” but manifestly ridiculous findings. 
But NASA’s 2011 study of toads was actually important and useful, not some “manifestly ridiculous” finding plucked from thin air. It was a thoughtful analysis of groundwater chemistry that began with a combination of naturalistic observation (a group of toads had abandoned a lake in Italy near the epicenter of an earthquake that happened a few days later) and theory (about ionospheric disturbance and water composition).
The real reason that too many published studies are false is not because lots of people are testing ridiculous things, which rarely happens in the top scientific journals; it’s because in any given year, drug companies and medical schools perform thousands of experiments. In any study, there is some small chance of a false positive; if you do a lot of experiments, you will eventually get a lot of false positive results (even putting aside self-deception, biases toward reporting positive results, and outright fraud)—as Silver himself actually explains two pages earlier. Switching to a Bayesian method of evaluating statistics will not fix the underlying problems; cleaning up science requires changes to the way in which scientific research is done and evaluated, not just a new formula.
It is perfectly reasonable for Silver to prefer the Bayesian approach—the field has remained split for nearly a century, with each side having its own arguments, innovations, and work-arounds—but the case for preferring Bayes to Fisher is far weaker than Silver lets on, and there is no reason whatsoever to think that a Bayesian approach is a “think differently” revolution. “The Signal and the Noise” is a terrific book, with much to admire. But it will take a lot more than Bayes’s very useful theorem to solve the many challenges in the world of applied statistics.” [Links in original]

Also worth adding here that there is a very good reason experimental sciences adopted Frequentist approaches (what the reviewers call Fisher’s methods) in journal publications.  That reason is that science is intended to be a search for objective truth using objective methods.  Experiments are – or should be – replicable  by anyone.   How can subjective methods play any role in such an enterprise?  Why should the  journal Nature or any of its readers care what the prior probabilities of the experimenters were before an experiment?    If these prior probabilities make a difference to the posterior (post-experiment) probabilities, then this is the insertion of a purely subjective element into something that should be objective and replicable. And if the actual numeric values of the prior probabilities don’t matter to the posterior probabilities (as some Bayesian theorems would suggest), then why does the methodology include them?  
 

Hard choices

Adam Gopnik in the latest New Yorker magazine, writing of his former teacher, McGill University psychologist Albert Bregman:

he also gave me some of the best advice I’ve ever received.  Trying to decide whether to major in psychology or art history, I had gone to his office to see what he thought.   He squinted and lowered his head.  “Is this a hard choice for you?” he demanded.  Yes! I cried. “Oh,” he said, springing back up cheerfully.   “In that case, it doesn’t matter.  If it’s a hard decision, then there’s always lots to be said on both sides, so either choice is likely to be good in its way.  Hard choices are always unimportant. ” (page 35, italics in original)

I don’t agree that hard choices are always unimportant, since different options may have very different consequences, and with very different footprints (who is impacted, in what ways, and to what extents).  Perhaps what Bregman meant to say is that whatever option is selected in such cases will prove feasible to some extent or other, and we will usually survive the consequences that result.  Why would this be?    I think it because, as Bregman says, each decision-option in such cases has multiple pros and cons, and so no one option uniformly dominates the others.  No option is obviously or uniformly better:  there is no “slam-dunk” or “no-brainer” decision-option.  
In such cases, whatever we choose will potentially have negative consequences which we may have to live with.  Usually, however, we don’t seek to live with these consequences.  Instead, we try to eliminate them, or ameliorate them, or mitigate them, or divert them, or undermine them, or even ignore them.  Only when all else fails, do we live in full awareness with the negative consequences of our decisions.   Indeed, attempting to pre-emptively anticipate and eliminate or divert or undermine or ameliorate or mitigate negative consequences is a key part of human decision-making for complex decisions, something I’ve called (following Harald Wohlrapp), retroflexive decision-making.   We try to diminish the negative effects of an option and enhance the positive effects as part of the process of making our decision.
As a second-year undergraduate at university, I was, like Gopnik, faced with a choice of majors; for me it was either Pure Mathematics or English.    Now, with more experience of life, I would simply refuse to make this choice, and seek to do both together.  Then, as a sophomore, I was intimidated by the arguments presented to me by the university administration seeking, for reasons surely only of bureaucratic order, to force me to choose:  this combination is not permitted (to which I would respond now with:  And why not?); there are many timetable clashes (I can work around those);  no one else has ever asked to do both (Why is that relevant to my decision?); and, the skills required are too different (Well, I’ve been accepted onto Honours track in both subjects, so I must have the required skills).   
As an aside:  In making this decision, I asked the advice of poet Alec Hope, whom I knew a little.   He too as an undergraduate had studied both Mathematics and English, and had opted eventually for English.  He told me he chose English because he could understand on his own the poetry and fiction he read, but understanding Mathematics, he said, for him, required the help of others.  Although I thought I could learn and understand mathematical subjects well enough from books on my own, it was, for me, precisely the social nature of Mathematics that attracted me: One wasn’t merely creating some subjective personal interpretations or imaginings as one read, but participating in the joint creation of an objective shared mathematical space, albeit a space located in the collective heads of mathematicians.    What could be more exciting than that!?
More posts on complex decisions here, and here
Reference:
Adam Gopnik [2013]: Music to your ears: The quest for 3D recording and other mysteries of sound.  The New Yorker, 28 January 2013, pp. 32-39.

Listening to music by jointly reading the score

Another quote from Bill Thurston, this with an arresting image of mathematical communication:

We have an inexorable instinct to convey through speech content that is not easily spoken.  Because of this tendency, mathematics takes a highly symbolic, algebraic, and technical form.  Few people listening to a technical discourse are hearing a story. Most readers of mathematics (if they happen not to be totally baffled) register only technical details – which are essentially different from the original thoughts we put into mathematical discourse.  The meaning, the poetry, the music, and the beauty of mathematics are generally lost.  It’s as if an audience were to attend a concert where the musicians, unable to perform in a way the audience could appreciate, just handed out copies of the score.  In mathematics, it happens frequently that both the performers and the audience are oblivious to what went wrong, even though the failure of communication is obvious to all.” (Thurston 2011, page xi)  

Reference:
William P. Thurston [2011]:   Foreword.   The Best Writing on Mathematics: 2010.  Edited by Mircea Pitici.  Princeton, NJ, USA:  Princeton University Press.

Mathematical thinking and software

Further to my post citing Keith Devlin on the difficulties of doing mathematics online, I have heard from one prominent mathematician that he does all his mathematics now using LaTeX, not using paper or whiteboard, and thus disagrees with Devlin’s (and my) views.   Thinking about why this may be, and about my own experiences using LaTeX, it occurred to me that one’s experiences with thinking-support software, such as word-processing packages such as MS-WORD or  mark-up programming languages such as LaTeX, will very much depend on the TYPE of thinking one is doing.
If one is thinking with words and text, or text-like symbols such as algebra, the right-handed folk among us are likely to be using the left hemispheres of our brains.  If one is thinking in diagrams, as in geometry or graph theory or much of engineering including computing, the right-handed among us are more likely to be using the right hemispheres of our brains.  Yet MS-WORD and LaTeX are entirely text-based, and their use requires the heavy involvement of our left hemispheres (for the northpaws among us).  One doesn’t draw an arrow in LaTeX, for example, but instead types a command such as \rightarrow or \uparrow.   If one is already using one’s left hemisphere to do the mathematical thinking, as most algebraists would be, then the cognitive load in using the software will be a lot less then if one is using one’s right hemisphere for the mathematical thinking.  Activities which require both hemispheres are typically very challenging to most of us, since co-ordination between the two hemispheres adds further cognitive overhead.
I find LaTeX immeasurably better than any other word-processor for writing text:  it and I work at the same speed (which is not true of MS-WORD for me, for example), and I am able to do my verbal thinking in it.  In this case, writing is a form of thinking, not merely the subsequent expression of thoughts I’ve already had.     However, I cannot do my mathematical or formal thinking in LaTeX, and the software is at best a tool for subsequent expression of thoughts already done elsewhere – mentally, on paper, or on a whiteboard.    My formal thinking is usually about structure and relationship, and not as often algebraic symbol manipulation.
Bill Thurston, the geometer I recently quoted, said:

I was interested in geometric areas of mathematics, where it is often pretty hard to have a document that reflects well the way people actually think.  In more algebraic or symbolic fields, this is not necessarily so, and I have the impression that in some areas documents are much closer to carrying the life of the field.”  [Thurston 1994, p. 169]

It is interesting that many non-mathematical writers also do their thinking about structure not in the document itself or as they write, but outside it and beforehand, and often using tools such as post-it notes on boards; see the recent  article by John McPhee in The New Yorker for examples from his long writing life.
References:
John McPhee [2013]: Structure:  Beyond the picnic-table crisisThe New Yorker, 14 January 2013, pages 46-55.
William F. Thurston [1994]:  On proof and progress in mathematicsAmerican Mathematical Society, 30 (2):  161-177.

Vale: Dave Brubeck

The BBC Radio 3 program Jazz Record Requests had a special edition yesterday in memory of Dave Brubeck.  It is available to listen for another 6 days, here.   
I heard Brubeck and his quartet play a concert in Liverpool about 10 years ago. He was old enough to have to shuffle slowly onto stage, but once at the piano, his playing was alive and energetic. My only disappointment was that he performed a concert in Liverpool and not once made any reference to the music of the city’s most famous musical sons. We could have been in Outer Woop Woop, for all the difference it had on his choice of repertoire. Not even an allusion in an improvisation was just churlish.
Brubeck’s reknown was remarkable.   I once requested a busking middle-aged violinist in a Kiev cafe in the mid 1990s to play Take Five, and saw his face light up with delight.  As it happened, he also knew The Hot Canary.

Thurston on mathematical proof

The year 2012 saw the death of Bill Thurston, leading geometer and Fields Medalist.   Learning of his death led me to re-read his famous 1994 AMS paper on the social nature of mathematical proof.   In my opinion, Thurston demolished the views of those who thought mathematics is anything other than socially-constructed.  This post is just to present a couple of long quotes from the paper.
Continue reading ‘Thurston on mathematical proof’

Mathematical hands

With MOOCs fast becoming teaching trend-du-jour in western universities, it is easy to imagine that all disciplines and all ways of thinking are equally amenable to information technology.   This is simply not true, and mathematical thinking  in particular requires hand-written drawing and symbolic manipulation.   Nobody ever acquired skill in a mathematical discipline without doing exercises and problems him or herself, writing on paper or a board with his or her own hands.   The physical manipulation by the hand holding the pen or pencil is necessary to gain facility in the mental manipulation of the mathematical concepts and their relationships.
Keith Devlin recounts his recent experience teaching a MOOC course on mathematics, and the deleterious use by students of the word-processing package latex for doing assignments:

We have, it seems, become so accustomed to working on a keyboard, and generating nicely laid out pages, we are rapidly losing, if indeed we have not already lost, the habit—and love—of scribbling with paper and pencil. Our presentation technologies encourage form over substance. But if (free-form) scribbling goes away, then I think mathematics goes with it. You simply cannot do original mathematics at a keyboard. The cognitive load is too great.

Why is this?  A key reason is that current mathematics-producing software is clunky, cumbersome, finicky, and not WYSIWYG (What You See Is What You Get).   The most widely used such software is Latex (and its relatives), which is a mark-up and command language; when compiled, these commands generate mathematical symbols.   Using Latex does not involve direct manipulation of the symbols, but only their indirect manipulation.   One has first to imagine (or indeed, draw by hand!) the desired symbols or mathematical notation for which one then creates using the appropriate generative Latex commands.   Only when these commands are compiled can the user see the effects they intended to produce.   Facility with pen-and-paper, by contrast, enables direct manipulation of symbols, with (eventually), the pen-in-hand being experienced as an extension of the user’s physical body and mind, and not as something other.   Expert musicians, archers, surgeons, jewellers, and craftsmen often have the same experience with their particular instruments, feeling them to be extensions of their own body and not external tools.
Experienced writers too can feel this way about their use of a keyboard, but language processing software is generally WYSIWYG (or close enough not to matter).  Mathematics-making software  is a long way from allowing the user to feel that they are directly manipulating the symbols in their head, as a pen-in-hand mathematician feels.  Without direct manipulation, hand and mind are not doing the same thing at the same time, and thus – a fortiori – keyboard-in-hand is certainly not simultaneously manipulating concept-in-mind, and nor is keyboard-in-hand simultaneously expressing or evoking concept-in-mind.
I am sure that a major source of the problem here is that too many people – and especially most of the chattering classes – mistakenly believe the only form of thinking is verbal manipulation.  Even worse, some philosophers believe that one can only think by means of words.     Related posts on drawing-as-a-form-of-thinking here, and on music-as-a-form-of-thinking here.
[HT:  Normblog]

Time, gentlemen, please

Much discussion again over at Language Log over a claim of the form “Language L has no word for concept C”.  This time, it was the claim by Wade Davis (whose strange use of past tense indicates he has forgotten or is unaware that many Australian Aboriginal languages are still in use) that:

In not one of the hundreds of Aboriginal dialects and languages was there a word for time.”

The rebuttal of this claim by Mark Liberman was incisive and decisive.   Davis was using this claim to support a more general argument:  that traditional Australian Aboriginal cultures had different notions of and metaphors for time to those we mostly have in the modern Western world.
We in the contemporary educated West typically use a spatial metaphor for time, where the past is in one abstract place, the present in another non-overlapping abstract place, and the future in yet a third non-overlapping abstract place.    In this construal of time, causal influence travels in one direction only:  from the past to the present, and from the present to the future.   Nothing in either the present  or the future may influence the past, which is fixed and unchangeable.   Events in the future may perhaps be considered to influence the present, depending on how much fluidity we allow the present to have.  However, most of us would argue that it is not events in the future that influence events in the present, but our present perceptions of possible future events that influence events and actions in the present.
Modern Western Europeans typically think of the place that represents the past as being behind them, and the future ahead.   People raised in Asian cultures often think of the abstract place that is the past as being below them (or above them), and the future above (or below).   But all consider these abstract places to be non-overlapping, and even non-contiguous.
Traditional Australian Aboriginal cultures, as Davis argues, construe time very differently, and influences may flow in all directions.   A better spatial metaphor for Aboriginal notions of time would be to consider a modern city, where there are many different types of transport and communications, each viewable as a network:  rivers, canals, roads, bus-only road corridors, railways, underground rail tunnels, underground sewage or water drains, cycleways, footpaths, air-transport corridors, electricity networks, fixed-link telecommunications networks, wireless telecommunications networks, etc.    A map of each of these networks could be created (and usually are) for specific audiences.  A map of the city itself could then be formed from combining these separate maps, overlaid upon one another as layers in a stack.   Each layer describes a separate aspect of reality, but the reality of the actual entire city is complex and more than merely the sum of these parts.  Events or perceptions in one layer may influence events or perceptions in other layers, without any limitations on the directions of causality between layers.
Traditional Aboriginal notions of time are similar, with pasts, the present and futures all being construed as separate layers stacked over the same geographic space – in this case actual geographic country, not an abstract spatial representation of time.  Each generation of people who have lived, or who will live, in the specific region (“country” in modern Aboriginal English) will have created a layer in the stack.   Influence travels between the different layers in any and all directions, so events in the distant past or the distant future may influence events in the present, and events in the present may influence events in the past and the future.
Many religions – for example, Roman Catholicism, Hinduism, and African cosmologies – allow for such multi-directional causal influences via a non-material realm of saints or spirits, usually the souls of the dead, who may have power to guide the actions of the living in the light of the spirits’ better knowledge of the future.   Causal influence can thus travel, via such spirit influences, from future to present.  Similarly, the view of Quantum Mechanics of space-time as a single 4-dimensional manifold allows for influences across the dimension of time as well as those of space.
I am reminded of an experience I once witnessed where the only sensible explanation of a colleague’s passionate enthusiasm for a particular future course of action was his foreknowledge of the specific details of the outcome of that course of action.  But these details he did not know and could not have known at the time of his enthusiasm,  prior to the course of action being executed.  In other words, only a causal influence from future to present provided a sensible explanation for this enthusiasm, and this explanation only became evident as the future turned into the present, and the details of the outcome emerged.  Until that point, he could not justify or explain his passionate enthusiasm, which seemed to be a form of madness, even to him.    Contemporary Western cosmology does not provide such time-reversing explanations, but many other cultures do; and current theories of quantum entanglement also seem to.
Contemporary westerners, particularly those trained in western science, have a hard time understanding such alternative cosmologies, in my experience.  I have posted before about the difficulties most westerners have, for instance,  in understanding Taoist/Zen notions of synchronicity of events, which westerners typically mis-construe as random chance.

Snow jobs

Advice from Geoffrey Pullum, when faced with people who tell you that Eskimos have multiple words for “snow”:

Stand up and tell the speaker this:  C.W. Schultz-Lorentzen’s Dictionary of the West Greenlandic Eskimo Language (1927) gives just two possibly relevant roots: qanik, meaning ‘snow in the air’ or ‘snowflake’, and aput, meaning ‘snow on the ground’. Then add that you would be interested to know if the speaker can cite any more.

References:
G. K. Pullum [1989]: The great Eskimo vocabulary hoax.  Natural Language and Linguistic Theory, 7: 275-281.  Available from here.
C. W. Schultz-Lorentzen [1927]:  Dictionary of the West Greenlandic Eskimo Language. Meddelelser om Grønland, 69, Reitzels, Copenhagen, Denmark.