From Abracadabra to Zombies | View All
reader comments: precognition
15 Nov 2012
Hello Dr. Carroll,
Rob Elliott’s Response to Robert Carroll’s skepdic.com discussion of Dean Radin’s The Conscious Universe, especially with respect to "FUTURE TELLING": A META-ANALYSIS OF FORCED-CHOICE PRECOGNITION EXPERIMENTS, 1935-1987 by Diane Ferrari and Charles Honorton
I engage in skeptical debate in order to learn. If I prove to be wrong in my critique of The Skeptic's Dictionary, then I am a better human being for having been corrected.
In his review of The Conscious Universe by Dean Radin (http://skepdic.com/refuge/radin8.html), Robert Carroll misrepresents scientific findings in the Ferrari and Honorton meta-analysis (http://lfr.org/LFR/csl/library/HonortonFerrari.pdf). Carroll writes, “Only 23 of the 62 investigators (37%) got positive results. He (Radin) doesn’t tell us what percentage of the studies got positive results.” The Conscious Universe may not include said information, but the original paper published on the study does (30% of studies, p. 284). Carroll proceeds to write, “In analyzing variables, they found that 42.6% of the studies that provided trial-by-trial feedback were successful, while none of the studies that didn’t provide such feedback were successful! (If you are not using true randomization, you might be measuring guesses based on pattern recognition rather than precognition.)” That statement is factually incorrect. Feedback refers to letting participants know how well they’ve done in their previous guesses. Ferrari and Honorton described three different types of feedback: “delayed,” “run-score”, and “trial-by-trial.” Delayed feedback usually entailed “notification by mail” (not likely to affect the results of the study). 19% of studies in which participants were sent feedback by mail were statistically significant (p < 0.05). I couldn’t find an exact definition for run-score feedback, but I take it to mean that study participants are told how well they have done on a given run of trials (say 20 guesses) after the run is over. 33.3% of studies in which participants were given run-score feedback were significant. Trial-by-trial feedback means that the participant is told whether or not their guess was correct after each guess/trial. 42.6% of trial-by-trial feedback studies were significant. Inconsistent with Carroll’s claim, many studies that did not involve trial-by-trial feedback were in fact significant. Even if none of the non trial-by-trial studies were significant, however, I see little reason to suspect that the studies which provided no trial-by-trial feedback would be any more randomized than those for which such feedback was given. The only case I can foresee in which feedback would affect randomization is if the number of available choices dwindles as the run of trials progresses and trial-by-trial feedback is given (e.g. drawing a second card without thoroughly reshuffling the first card into the deck; such as with counting cards in Black Jack). Ferrari and Honorton clearly account for such a choice-dwindling effect by distinguishing between “formal” and “informal” randomization. Trial-by-trial and all other types of feedback are irrelevant to formally randomized studies in which random number generators/tables were used to select from the available targets.
As for the file-drawer problem (common in all sciences… not just psi research), Carroll writes, “He (Radin) also eliminates the file-drawer problem as an explanation because he used some sort of statistical formula (not revealed here) to arrive at 14,268 as the number of papers that would have to be in the drawer to tip these odds back to chance.” By “not revealed here”, Carroll must mean that it is not revealed in The Conscious Universe. It is, however, revealed in the original study (p. 284) as the “Fail Safe N” test (Rosenthal, 1979 & 1984). Carroll’s whole argument seems to be against Radin’s discussion of the Ferrari and Honorton study in The Conscious Universe. Radin’s book is meant as an introduction to the evidence for psi for a popular audience. It would be unreasonable to assume the he would include every detail of the various studies that the book introduces. Carroll should have assessed the primary publication itself before writing a critique of the study.
Robert Carroll also criticizes the statistics in the Ferrari and Honorton meta-analysis. The study had an overall p-value of 6.3 x 10-25 (0.00000000000000000000000063). With respect to the extremely small p-value, Carroll writes, “I don’t think ten million billion billion to one is “effectively the same” as a billion to one, except in the sense that they're both absurd.” (http://skepdic.com/precog.html). I believe that Radin is saying that they are “effectively the same” in the sense that a billion to one odds are about 6 times worse than the odds of winning the Powerball Jackpot with a single ticket (http://www.powerball.com/powerball/pb_prizes.asp), and odds of “ten million billion billion” to one are somewhere in the ballpark of my odds for winning the Powerball Jackpot three times on three straight tries. Neither scenario is at all likely to happen, so in that sense “they’re the same.” We’re talking about numbers with many zeros, but that doesn’t necessarily mean that they’re “absurd”. Let me first say that I am not a statistician, but there may be an argument to be made here. While not with respect to the Ferrari and Honorton study, blutoski (screen-name) writes on the James Randi forum, “If you're doing multiple metrics, the confidence interval for statistical significance changes. eg: if one metric needs p<=.05, then a three metric study may need (say) p<=.000125” (http://forums.randi.org/showthread.php?p=3511976). I am not experienced with “multiple metric” statistical analysis, so I can’t personally validate blutoski’s claim. If true, though, it opens the possibility that even 6.3 x 10-25 is not a statistically significant p-value (given that 309 studies were included in the meta-analysis). In this case, I have to assume that Ferrari and Honorton are more competent in statistics than I am and that their p-value is in fact significant. Regardless, the fact that 309 studies were included in the meta-analysis is the reason that the p-value is so small. Robert Carroll is unjustified in calling the value absurd.
For those who distrust meta-analyses, the individual studies in the Honorton and Ferrari paper can be assessed on a study-by-study basis (see below). Many of the studies in the Ferrari and Honorton meta-analysis were statistically significant independently of the meta-analysis. I found very few full-text articles available online, but I found many abstracts. Not surprisingly, I was able to locate abstracts for relatively recent studies more often than earlier studies (pre-1960s). If it is true that study methodology improved over time, then the abstracts I was able to locate should have been from “higher-quality” studies than those I was unable to locate.
Most of the abstracts I found referred to statistically-significant results, but some did not. The inclusion of non-significant studies is worth mentioning, given that they were in fact published rather than being stashed away in a file drawer. Here are several non-significant studies that were included in the Ferrari and Honorton meta-analysis.
Here are the study abstracts that I found which included statistically-significant p-values:
The only free full-text online study included in the Ferrari and Honorton meta-analysis can be found here:
With the file-drawer issue ruled out by the Fail-safe N test, the number of significant individual study results that I found within the Ferrari and Honorton meta-analysis could only be explained by: a) consistent abject incompetence and/or gross academic dishonesty on part of the researchers, or b) the researchers actually obtained the results they claim. Option “a” would no doubt insure that the study researchers never worked as scientists again, so I am leaning heavily towards option b. Based on my analysis of the Ferrari and Honorton meta-analysis as well as the individual studies included therein, I am forced to conclude that there is indeed solid scientific evidence for precognition.
reply: For the sake of argument, I will grant Mr. Elliott all his points except one. I will not grant his final claim. These studies, taken singly or taken together, do not provide solid scientific evidence for precognition.
I have explained why this is so in my entry on the psi assumption, which I will summarize here.
Briefly, the psi assumption is the assumption that any significant departure from the laws of chance in a test of psychic ability is evidence that something anomalous or paranormal has occurred. Departure from the laws of chance would be consistent with the psi hypothesis, but until all other plausible explanations have been ruled out, it is hasty to conclude that evidence for psi has been found. There are several plausible explanations for the data in psi experiments. Cheating by subjects is commonplace. Fraud by experimenters is rare, but it has happened (e.g., the Soal-Goldney experiment [1941-1943]). Methodological errors and sloppiness have occurred in experiments that have been hailed as slam-dunk proof by parapsychologists like Dean Radin. For example, Susan Blackmore was appalled when she visited the lab of Carl Sargent, whose work played a major role in the ganzfeld studies of Bem and Honorton.
....I went to visit Sargent's laboratory in Cambridge where some of the best ganzfeld results were then being obtained. Note that in Honorton's database nine of the twenty-eight experiments came from Sargent's lab. What I found there had a profound effect on my confidence in the whole field and in published claims of successful experiments.
These experiments, which looked so beautifully designed in print, were in fact open to fraud or error in several ways, and indeed I detected several errors and failures to follow the protocol while I was there. I concluded that the published papers gave an unfair impression of the experiments and that the results could not be relied upon as evidence for psi. (Blackmore 1987)
Other errors, such a sensory leakage and experimenter effects, questionable methodologies such as displacement and psi missing, and misapplication of statistics must all be considered before jumping to the conclusion that a statistic that is unlikely due to chance according to some arbitrary formula is proof of anything paranormal.
Furthermore, since there is no way to distinguish direct communication with another mind from communication with a past or future perception by that or some other mind, there is no way to distinguish telepathy from precognition. There is no way to distinguish telepathy, clairvoyance, retrocognition, or precognition from a mind perceiving directly the akashic record. There is no way to distinguish telepathy, clairvoyance, retrocognition, precognition, or perceiving the akashic record from perceiving what is directly placed in the mind by a god (occasionalism). There is no way to distinguish telepathy, clairvoyance, retrocognition, precognition, perceiving the akashic record, or having perceptions directly implanted in our minds by some god from perceiving the hidden record of all perceptions in the eleventh dimension that is vibrating in the intersection between the tenth and twelfth dimensions. I could go on, but it would be too annoying.
18 Nov 2010
1) [Daryl] Bem’s paper ["Feeling the Future: Experimental evidence for anomalous retroactive influences on cognition and affect." ] does not convince me of the existence of precognition (nor do I believe in any other so-called psi phenomena).
2) The concept just does not make sense to me, and I have not heard any plausible possible mechanism behind it.
3) My guess is that attempts to replicate the reported effects by other researchers will fail, and these studies will be pointed out as due to a statistical fluke or some undetermined factor in the experimental apparatus he used (a possibility you also point out).
4) This comment is directed only to the paper in discussion. I am not referring to other writings or talks by Bem or any of his colleagues (such as Radin), where their arguments are not so “constrained”, and where other procedures and data analysis strategies are employed.
5) I am a skeptic, atheist, and long time fan of the Skeptic’s Dictionary (although true, this one is clearly off topic and irrelevant! It is simply a shameless attempt to motivate you to continue reading this long email!)
Having said this, I am clearly disappointed with your recent post regarding Bem’s paper. It is a highly biased and unfair comment.
Knowing that many people believe in these sort of (supposed) effects, as skeptics we should welcome serious attempts to test them (irrespective of the author's position). If for no other reason, this allows us to show empirical evidence supporting our arguments (even if we felt we did not need to spend resources to test what seem absurd ideas to us). This seems to be exactly one of those cases, with (judging from the paper descriptions) methodologically well delineated studies.
Moreover, this approach uses quite simple procedures and analyses that are easy for anyone to replicate and present the results. In fact, it should be stressed that contrary to what your text induces the reader to assume (by referring to other papers' strategies, without stating that this is not the case), no “occult statistics” (complex and unfamiliar to most) or procedures are presented. Clearly, this does necessarily imply that their data analysis is correct, and contesting it is a legitimate course of action (see, for example, Wagenmakers, Wetzels, Borsboom & Maas [Why Psychologists Must Change the Way They Analyze Their Data: The Case of Psi[*]] 2010). However, you do not do that… You simply have a number of quite strange statements on statistics and their meaning.
Do you really agree with the citation by Rutherford?? Imagine (and I am pretty convinced this will not happen) that a small reliable effect is also replicated by the vast majority of other independent labs (lots of them, skeptics included) that attempt to replicate the studies (using whatever you think the appropriate statistical test is). In such a case, do you continue to refute that it is empirical evidence for precognition? If yes, why??? It clearly would not be evidence for actual relevant impact in our daily lives, but that is a different matter. Although most people talking about evidence of psi probably believe (maybe even Bem) in its broader and larger prevalence in our behavior and interactions, that is not argued in this paper. If he had made that argument, the paper would most likely not be accepted before modification. There are lots of interesting results that have been found to be reliable although quite small effects (psychology is full of them). Although they may not be useful for actual behavior prediction “in the real world” (on their own), they are still highly important in theory building and one of the main contributors to scientific knowledge advancement.
Although agreeing that the differences from more “extreme” forms of precognition usually found in anecdotal reports should be stressed, if these effects were to replicate, they would still be compelling evidence for precognition. This seems to me that should deserve a lot of attention from all of us given the extremely high difficulty in accommodating it with our view of ‘things’ (for lack of a better word).
Although the APA has ‘sponsored’ many things (mostly in clinical psychology) that should embarrass their members that adopt a scientific perspective (one of the reasons for the creation of APS…), publishing this paper is, in my perspective, not one of them. Although there are some things I would have liked to see different (explanation for not expecting or finding the effect with the negative stimuli in Experiment 1; less time dismissing of other potential psi explanations; insufficient explanation regarding some of the stimulus materials; etc), the empirical part of the paper (and although some speculations regarding theory are presented, that is how the paper is presented – as a first effort to test the mere existence of the phenomenon) meets the desired and usual standards.
Clearly, the paper does not ‘definitively prove’ anything, even if including 9 studies (or 15 or 20 for that matter…). However, having had a rigorous experimental design, and finding the results he did, publication seems mandatory (even if plausible theoretical accounts are unfortunately not provided). Mainly, because it promotes diverse attempts to replicate, and that is how science advances. I am honestly perplexed to read your classification of such attempts as ‘ludicrous’ in this case! Why???
If one agrees that the presented statistics are not adequate, I would more easily understand dismissing it. However, that is not what you say… In addition, even if one agrees with Wagenmakers and colleagues, publishing it in JPSP will still have the positive consequence of alerting psychology researchers to inadequate analysis strategies that they regularly use in ‘standard experiments’.
In sum, your tone is simply too negative (the comment on the showing of pornographic stimuli to “college kids”, for example, is just silly, and a result of the predetermined negative assessment), and is definitely not an objective assessment of the data available at this moment. It exemplifies what a skeptic should avoid doing.
We may present objections on the methodology or data analysis procedures.
We may (and should) raise the hypothesis of sloppiness in following the described procedures or cheating when the data is all coming from a specific lab or group of researchers (although it seems simply unreasonable to expect the author or editors of the paper to present that as an alternative account!)
We must demand extensive replication from independent sources before accepting the mere existence of these kinds of effects (in addition to the one you cite, see also the failed replication by Hadlaczky[**]). In fact, this should be done in a public way, as Richard Wiseman is attempting to do by having a website for people to register their intention to pursue replications before knowing the results. http://www.richardwiseman.com/BemReplications.html).
reply: Thank you, Peter, for your thorough whipping with the wet noodle. I'll try to respond to some of your concerns.
1. I admire what Richard Wiseman is doing with respect for Daryl Bem and his paper. He's a practicing scientific researcher with a strong interest in the paranormal. He notes that Bem has provided open access to both his data and the software used in conducting the experiments. This is how all science should work. No secrets. No missing or lost data. No need to personally inspect the lab to detect deception. Etc. This makes it easy for labs to replicate and for detecting flaws in methodology.
Wiseman has detected an important flaw in Bem's work. First, he focuses on the two studies that had the most robust findings, i.e., resulted in the largest deviation from chance expectation. These are experiments 8 and 9, 9 being a replication of 8. Those are the studies he has set up a registry for, encouraging researchers to try to replicate the best that Bem has to offer. Second, he found a problem with the method of scoring in these tests that could significantly affect the statistical outcomes. He thus advises those attempting replication to change the way these tests are scored. He also advises rechecking the scoring done in Bem's experiments.
The potential problem is in the scoring. The experimenters used a second piece of software to score participants’ responses. Of course, participants may have misspelled remembered words (e.g., typing ‘CTT’ instead of ‘CAT’) or come up with words that were not on the original list (e.g., typing ‘CAR’ instead of ‘CAT’). To deal with this, the scoring software was designed to automatically go through the participant’s responses and to flag up any words that were not absolutely identical to the words that were not in the original list. The experimenter then had to go through these ‘unknown’ words manually, and either correct the spelling or tell the software to ignore them because they did not appear on the original list. To prevent any possibility of unconscious bias, the experimenter should have been doing this blind to the words in the ‘target’ and ‘control’ lists. Unfortunately, this was not the case.
When you are dealing with very small statistical deviations from chance and relatively small numbers of participants (100 in this case), a small difference of 3 or 4 cases wrongly scored could mean the difference between getting a result that's statistically significant (the goal of those trying to prove a psi effect by these kinds of methods) and one that is likely due to chance.
I grant Peter that what Wiseman writes is much more elegant and incisive than anything I have written about Bem's work.
2. I agree my tone is negative. Regarding your response to my reference to calling the "erotic stimuli" used by Bem pornographic....another reader said that when he came to that comment he laughed so hard that he shot his morning coffee out of his nose.
Seriously, I am not a psi researcher, but I have read and reviewed two books by Dean Radin, one by Gary Schwartz, and one by Charles Tart. That should qualify me for one free pass out of the Skeptic's Hall in Hell. I've read many other articles purporting to have found evidence for some psi effect.
Yes, I call attempts to replicate Bem's recent work "ludicrous." By implication, that means I'm calling Wiseman's call for replication ludicrous, which it isn't. Wiseman's approach has already paid dividends by exposing a problem with the scoring method used in experiments 8 and 9. It may turn out that there is no statistical significance in Bem's data, and the APA journal will have to recall the paper and admit that its reviewers missed this problem. If this happens, however, don't expect all those in the media who have headlined this story as "precognition proved in the lab" to recant and tell the true story.
My comment refers to the method commonly used in psi research of seeking a statistically significant result and claiming it is evidence for some psi effect. Psi researchers like Bem and Radin call such observations "anomalous" when they aren't. There is nothing anomalous about observing people making guesses about Zener cards or names. Observing one person who consistently guesses correctly would be an anomalous event. Observing many people, some of whom make correct guesses some of the time, is not anomalous. Observing numbers of people making guesses that, taken collectively, deviate from chance expectation according to some arbitrary statistical formula is not anomalous. What I find sidesplittingly ludicrous is doing meta-analyses on numerous psi studies, most of which are too small to be of much value, to seek some statistic one can claim, as Radin often does, has a bazillion to one odds against chance.
What wouldn't be ludicrous is that if these psi researchers, using methods similar to Bem's, consistently found deviations from chance on the order of, say, 30 or 40 percent, in the group or in special individuals. But they don't. They find deviations in the 1 or 2 percent range or less (see the PEAR experiments on psychokinesis). This is called pathological science. Langmuir described typical cases as involving such things as barely detectable causal agents observed near the threshold of sensation which are nevertheless asserted to have been detected with great accuracy. I think we can extend this idea to another batch of typical cases that involve such things as slight deviations from chance involving imperceptible perceptions expressed as a statistic. The participants are never aware of feeling any psi. The only evidence that psi has occurred is the statistic, which is always barely across the threshold of statistical improbability.
3. I do agree with the quotation from Rutherford when applied to psi research. Bem and others speak of the "anomalous transfer of information," but the only evidence that any information has been transferred at all, anomalously or otherwise, is a statistic at the threshold of statistical improbability according to an arbitrary statistical formula. Psi researchers are trying to establish extraordinary claims with evidence that, at best, can be said to be consistent with their hypotheses regarding psi phenomena. Correlations with statistical improbability do not imply causality.
4. I may be negative and dismissive because I resent having spent many wasted hours reading and studying the works of psientists. I'm appalled that anyone takes the work of Gary Schwartz seriously. It is pathetic and without any merit. I'm outraged at the way Dean Radin uses meta-analyses to produce exasperatingly ridiculous, breathtakingly inane, statistics. I won't pretend that I am not annoyed by their persistence, their assumptions, and their occasional hoodwinking of respected publications to promote their fantasies. Bem is one of the more respectable psientists and he probably deserves the kind of respect Wiseman gives him rather than the derision that I heap upon him and his kind. What's done is done, though. I stand by what I wrote.
5. If precognition is proved, I will have to rewrite a bit of doggerel I composed some 20 years ago. To wit:
No silliness intended
but the future's been suspended
until the past can catch up
to where the present has just ended.
I've thought about various possibilities. Here's one:
Some silliness intended,
the present's been suspended
until the future can can go back
to where the past can be rear ended.
* Wagenmakers et al. contend that Bem's statistical formulas are too liberal, that his work is more exploratory than confirmatory. They write: "one-sided p-values may overstate the statistical evidence against the null hypothesis. We reanalyze Bem’s data using a default Bayesian t-test and show that the evidence for psi is weak to nonexistent." I think their background is different from mine, and so is their audience. I would be more prone to note that for all Bem knows he got the results he got because Zeus willed it. I stole that line about Zeus messing with the minds of psientists from psychologist James Alcock who, I understand, has a response to Bem's paper in preparation. (update: Professor Alcock's scathing review of Bem's work has been posted on the CSI website. update2: Bem's lame response to Alcock's critique has been posted, as has Alcock's restrained response to Bem's vociferous whining.)
** Hadlaczky's paper on the failure to replicate Bem's results is much more respectful of Bem and psi research than I have been in my comments. For example, Hadlaczky writes:
For this design to be able to show precognition, several assumptions have to be made. Firstly, that precognition exists. Secondly, that time reversed effects can be achieved using the mere exposure paradigm with highly affective stimuli. This second assumption has no base however, if it is not possible to achieve regular mere-exposure effects using highly affective stimuli. Unfortunately, as of yet there have been no studies investigating this, and therefore the validity of the second crucial assumption of the PH-design is vulnerable.
Further, the PH design is not an exact copy of a mere-exposure design. Despite the main difference that it is run backwards, there are other dissimilarities like the number of exposures, the time between exposure and selection task etc. These dissimilarities could possibly lead to the conclusion that a PH-study is not a mere-exposure study run backwards at all but simply a new design, which is only slightly similar to a mere-exposure study run backwards.
Furthermore, Hadlaczky points out that even if Bem's experiments were total failures, precognition could still exist. I might add that without a plausible mechanism for how things that haven't happened yet can cause things to happen now, it will be very difficult to design an experiment that could falsify the precognition hypothesis. Psychologists have devised tests of unconscious perception from past and present experience. This is possible because we can control the causal factors we're testing, e.g., tests that establish blindsight. But designing a test that measures unconscious perception affected by the future in such a way that failure would imply that the precognition hypothesis (or any other psi hypothesis for that matter) has been falsified may require a bit more ingenuity than shown so far.
update 23 April 2011. Bems's study has been replicated by Stuart Richie, Chris French, and Richard Wiseman. According to Ben Goldacre, they re-ran three of Bem's backwards experiments, just as Bem ran them, "and found no evidence of precognition. They submitted their negative results to the Journal of Personality and Social Psychology, which published Bem’s paper last year, and the journal rejected their paper out of hand. We never, they explained, publish studies that replicate other work." Classy. Any bets on how fast it will take the same media that trumpeted Bem's work to ignore the negative findings?
Last updated 26-Oct-2015