From Abracadabra to Zombies | View All
The Skeptic's Toolbox
University of Oregon: August 14-17, 2003
A report by Bob Carroll
CSICOP (The Committee for the Scientific Investigation of Claims of the Paranormal, now CSI, the Committee for Skeptical Inquiry) has been offering this annual workshop since 1992. The topics change but the goal has remained the same: to make the participants better skeptics. Some workshops have focused on critical thinking skills, others on such topics as cold reading or recognizing scams. This year the topic was "Critiquing Research: Determining Adequacy." I enrolled because my background is very weak in statistics, while much of the research in the paranormal or supernatural is being done by people like Dean Radin and Robert Jahn, experts in statistical methods. The work of the parapsychologists is no longer so flawed by poor methodology, incompetence or gross self-deception as to make criticism a task anyone can easily do with a solid base in logic and in how to do a controlled experiment. There are some exceptions--Gary Schwartz's Afterlife Experiments comes to mind, as do most experiments on remote viewing--but most of the leading parapsychologists of our day have answered the criticisms of the skeptics and they've answered in spades.
You skeptics want well designed experiments with proper controls and sophisticated data analysis? You want studies that have been replicated? You want peer review? You want public availability of methods and results? The parapsychologists have delivered. Not only have they delivered in terms of methodology; they've delivered in terms of results. A prime example of the work of the new parapsychologists appeared in the December 1998 issue of The Western Journal of Medicine: "A Randomized Double-Blind Study of the Effect of Distant Healing in a Population With Advanced AIDS--Report of a Small Scale Study" by Fred Sicher, Elisabeth Targ, Dan Moore II, and Helene S. Smith. The study appears to be well designed, appears to use proper controls and methods of randomization, and is full of statistical gobbledygook that makes it impossible for someone not knowledgeable in statistics to evaluate. But the statistics look good. They support the claim that on several objective criteria there were significant differences in outcomes between the control group and a group of AIDS patients who were prayed for.
It is still possible to evaluate such a paper using some low-tech critical thinking skills, but to do a thorough job of understanding and evaluating this and similar papers, some knowledge of what to make of the statistical analyses is important. It turns out that this paper is very deceptive and the research was not what the authors claimed it to be, but we should not assume that other apparently well designed studies on the paranormal or supernatural that produce impressive statistical anomalies will also turn out to be significantly flawed. Skeptics need better tools to critique the latest batch of paranormal research studies. That's why I enrolled in The Skeptic's Toolbox.
First, let me comment on the value of the workshop. The cost was minimal and the instructors are stars (Ray Hyman, Barry Beyerstein, Wally Sampson, James Alcock, Loren Pankratz, and throw in for good measure Jerry Andrus). The group of about 50 was highly motivated and talented.
Ray Hyman put together a workbook with an introductory essay explaining why it is so difficult to properly evaluate the new studies. The task is daunting, and despite his many years of training and experience, he has found it very difficult and time-consuming to detect "subtle, but fatal, misuses of statistical procedures." Ray and the rest of the faculty know that there is no way they could ever impart the kind of expertise they have to a group of non-statisticians in just a few days. But they also realize that
More and more psychical researchers justify their claims with research that has all the trappings of standard scientific practice. The media routinely report results of such research as offering strong support for the efficacy of distant prayer, alternative medicine, memory in water, dowsing, communicating with the dead, and the like. More and more, as a skeptic, you will be confronted by proponents who cite the results of these studies with their double-blind controls, sophisticated apparatus, and highly statistically significant results.
You might think that since Hyman, Sampson, et al. are doing critiques of these studies, all we need to do is keep up with their critiques so that when confronted with a specific study we can refer to the work of other skeptics. Even so, "you will have to understand the issues so that you can defend the basis for their specific critiques. In other words, you have to know something about scientific methodology and statistical inference to speak with some authority on a given critique."
The workbook includes an excellent guide on how to evaluate research. I hope Ray will publish this material somewhere. It is not very long, but too long to cover adequately in a newsletter. Suffice it to say we were provided instruction on how to recognize "unwitting exaggeration of statistical significance in parapsychological research" and given some guidelines for evaluating meta-analyses and how to recognize when parapsychologists use the same set of data to both generate an hypothesis and to test it. Ray also provided us with a short essay on how to evaluate research, which can be used for constructing our own individual frameworks for making our evaluations. Here is my framework.
1. Clarity. Is the hypothesis being studied clearly stated? Is there a clear boundary and delineation regarding what data would support the hypothesis and what would not?
2. Appropriateness of the methods used to test the hypothesis, including anything that should have been done but was not.
3. What kind of evidence is presented? (Testimonials, anecdotes, experimental results?) Does the evidence support the hypothesis? Would it support other, even contrary hypotheses? Could the data be due to chance? Is it complete? Is it even relevant to the hypothesis? How significant is the evidence to the hypothesis?
5. Contact the statistician. Ask if any end points were added after the study was completed.
I added the last point after reading Po Bronson's piece on the healing prayer study (A Prayer Before Dying in Wired Dec. 2002). It is probably not realistic for the average skeptic, but the other four points are.
After two days of instruction, we broke up into four groups, each to study a different study. For example, some evaluated a study by Gary Schwartz on mediums who claim the dead talk to them. Schwartz's study claims his experiments provide "strong positive findings" for belief in the afterlife. My group studied the healing prayer study mentioned above. Other groups evaluated different studies.
Barry Beyerstein reminded us of the difficulties we face when trying to disabuse people of comforting beliefs. Our brains have evolved to favor emotionally comforting beliefs that enhance survival and reproduction. (See James Alcock's "The Belief Engine" SI May/June 1995.) We more easily accept as true things that enhance self-esteem and reinforce survival. In short, magical thinking is the default mode for many people. Logic and critical thinking are "unnatural" and must be taught. Beyerstein emphasizes the hidden persuaders, such as confirmation bias, that drive people to error. He recommend several books: Gilovich's How We Know What Isn't So: The Fallibility of Human Reason in Everyday Life and Human Inference: Strategies and Shortcomings of Social Judgment by Richard E. Nisbett. I'd also recommend Stuart Vyse's Believing in Magic: The Psychology of Superstition.
Ray Hyman talked about the principle of charity. He advised that we assume our "opponent" is acting in good faith until proven otherwise. Encourage your opponent to present the best case for belief. Ask, what most convinces you of your belief? Be fair, honest, and diligent in evaluating the data. Stick to the data; don't attribute motives. In short, no ad hominem attacks.
One of the best pieces of advice came from either Barry or Ray (my notes are unclear): Admit at times that the data do show something very interesting, but we don't know what it is. Not everything can be explained. Suspending judgment is sometimes the best option.
Wally Sampson went over the many difficulties in evaluating studies such as one that purports to have good evidence for the effectiveness of homeopathic remedies (Benveniste). But he first noted that demanding that parapsychologists use a p value of 0.001, which is a standard in physics, is unrealistic for medicine, where a p value of 0.05 is standard. (A p value of 0.05 means that there is a 1 in 20 chance of getting results that are statistically significant even if there is nothing significant going on.) Because of the variability of response of individuals to the same dosage of the same drug, medicine will rarely, if ever, find anything statistically significant at the 0.001 level, where there is a 1 in 1,000 chance of getting statistically significant results by chance. According to Sampson, in medicine large numbers of studies must be used to get a sense of the probability of the effectiveness of a given substance. A single study may show a very significant effect, but other studies may show a negative effect. It is even the case that several studies might show a positive effect, but several others show a negative effect. Given human variability of response to substances, doing controlled studies often requires a systematic review of many studies to get a good sense of what is going on.
What I got from Sampson's talk was that a single study of some substance, whether a homeopathic remedy or a new anti-depressant, should be taken with a grain of salt, no matter how dramatic the results might seem. Also, studies that use human evaluators may not be as reliable as those that use machines. Human researchers (like the rest of us!) sometimes see what they want to see and are very suggestible.
James Alcock talked about Rupert Sheldrake's staring studies and made the comment: "Statistical significance can be totally meaningless and it usually is." Sheldrake says that the statistical data show evidence of "psychic staring." Alcock thinks Sheldrake makes an illegitimate leap in drawing this conclusion. Why not attribute the data to Zeus or an error in the procedure. What he means is that Sheldrake (and many others in these "alternative" sciences) start with the false assumption that the study will show either chance results or psychic staring (or the healing power of prayer, messages from the dead, ESP, etc.), when there are other alternatives that can't be ruled out. Alcock brought up an important point: in Sheldrake's studies (and many others in these "alternative" sciences) if the results were negative, it wouldn't prove that psychic staring doesn't exist (or that prayer doesn't heal, that messages don't come from the dead, that ESP doesn't exist, etc.). The point being, I suppose, to raise a question about the falsifiability of theories in the "alternative" sciences. I would add that another reason for questioning the scientific nature of such studies is that you can't really control anything in the supernatural or paranormal realm. You can set up a controlled experiment like the prayer study but you can't control for whatever variables there might be in the supernatural realm. If you knew there were no other variables except the one you are testing or that whatever variables there might be will be randomly distributed over your control and experimental goups, such studies could proceed quite reasonably. But there is no way to know this to any reasonable degree of probability.
Sheldrake's study on staring suffered from using a random process that had a distinct pattern (Marks and Colwell, SI, Sept/Oct 2000). It was replicated only when the same random process was used. But when a truly random process was used, it couldn't be replicated. Alcock's advice as to how to respond to all these "scientific" studies in the paranormal is that we should try to persuade the public not to believe everything scientists say. We need to encourage skepticism towards science. I agree but I also think we ought to try to educate the public about the importance of proper randomization. (This may be hopeless. I was interviewed by Karen Peterson of USA Today about Sheldrake's book on the staring effect and I explained to her the randomization problem identified by Marks and Colwell. Her editors cut out all my remarks.)
The Friday afternoon session was spent mostly discussing statistics. We were reminded that probability is not a trait of anything, but a condition of uncertainty. A p-value of < 0.05 is considered impressive in the social sciences and medicine, but it is an arbitrary standard. "Never be impressed by the size of the p-value," said Alcock. When you reject the null hypothesis (that nothing is happening) that doesn't give proof to some other hypothesis, e.g., psychic power. That is, "not due to chance" is not logically equivalent to "my hypothesis."
According to Hyman, a p-value is set to say, in effect, "this is the gamble I was willing to take with Nature." He thinks it's unfortunate that the word 'significance' was used to mean not likely due to chance by some criteria. (He attributes the 0.05 standard to R. A. Fisher's Design of Experiments.) Hyman advises that one look for procedural flaws to explain patterns. Don't automatically assume that the pattern is significant just because the statistics are.
Loren Pankratz is a psychologist and expert on Munchausen by Proxy (MBP) and post traumatic stress disorder (PTSD). He believes that both disorders are real but that most diagnoses are bogus. He focused on the role of a distinguished authority, Sir Roy Meadow, in convincing loads of people to believe they could identify MBP by a check list of symptoms. Because some people have confused "warning signs" of Munchausen by Proxy with "diagnostic signs," many errors in diagnosis have been made with devastating, even deadly, consequences at times. Pankratz also talked about a study he was involved with that concerned patients in a veterans hospital who were diagnosed with PTSD (Sparr L, Pankratz L. Factitious posttraumatic stress disorder. American Journal of Psychiatry 1983;140:1016-9). The study showed how easily PTSD can be feigned. Of the five patients investigated, three said they were former prisoners of war. In fact, none had been prisoners of war, four had never been in Vietnam as they claimed, and two had never even been in the military. All five had convinced both civilians and VA doctors that they were suffering from PTSD. The men were simply believed when they told vivid war stories.
Loren believes that the extent of PTSD is greatly exaggerated. There is a lot of money to be made treating people for this disorder. But, consider the fact that most victims of trauma are not ruined for life (e.g., victims of violent crimes, of WWI, WWII, and the Holocaust).
Loren tried to get us to ask ourselves why do we trust anyone? Why do we believe him? The moral of the story is we can't take claims at face value, we have to be willing to act as historians and do the research, dig up the data, and not just accept claims because "experts" put them forth.
We were reminded by somebody (my notes are unclear) that "Lots of things published in the name of science are non-sense."