Critical Thinking Fallacy Test
Novices vs. expert (Ann Emerg Med 2013;61:96)Back to top
Induction and Deduction
Massimo Pigliucci is certainly correct in saying that it is important for anyone interested in critical thinking and science to understand the difference between deduction and induction (Elementary, Dear Watson May/June 2003). However, it has been several decades since logicians have defined that difference in terms of going from general to particulars or vice versa. His own example of deduction belies the problem. It doesnt go from the general to the particular but from one general and one particular statement to another particular statement. All men are mortal. Socrates is a man. Therefore, Socrates is mortal. General statements arent needed at all in the premises of some deductive arguments. For example, Socrates is a stonemason. Socrates is a philosopher. Therefore, at least one stonemason is a philosopher. This is a valid deductive argument. Rumsfeld is arrogant. Rumsfeld is Republican. Therefore, all Republicans are arrogant is also a deductive argument, though an invalid one, going from particulars to the general.
Induction, says Pigliucci, seeks to go from particular facts to general statements. That is true sometimes, but not all the time. Jones was late yesterday so hell probably be late today is an inductive argument. I admit it is not a cogent argument, but cogency is a different matter.
The general to particular relationship isnt rich enough to serve as a good line of demarcation between induction and deduction. Any standard logic text today will make the distinction in terms of arguments that claim their conclusions follow with necessity from their premises (deductive arguments) and those which claim their conclusions follow with some degree of probability from their premises (inductive arguments). This distinction in terms of premises either implying their conclusions with necessity or supporting their conclusions to some degree of probability is not without its problems, however. One virtue of the general/particular distinction is that there is not likely to be any ambiguity about a statement being one or the other. But there will be many cases where it wont be clear whether an arguer is claiming a conclusion follows with necessity. There will also probably be many cases where the arguer should be claiming a conclusion follows with some degree of probability but the language might well indicate that the arguer thinks it follows with necessity. For example, many people might argue that since the sun has always risen in the east, it is necessarily the case that the sun will always rise in the east. Yet, it isnt necessarily the case at all. It just happens to be the case and it is easy to imagine any number of things happening to the earth that could change its relationship with the sun.
By dividing arguments into those whose conclusions follow with necessity and those which dont we end up dividing arguments into those whose conclusions are entailed by their premises and those whose conclusions go beyond the data provided by the premises. A valid deductive argument cant have true premises and a false conclusion, but a cogent inductive argument might. This may sound peculiar, but its not. Even the best inductive argument cannot claim that the truth of its premises guarantees the truth of its conclusion. Even the worst valid deductive argument–that is, one with premises that are actually false–can still claim that if its premises were true, its conclusion would have to be true. No valid deductive argument can guarantee the truth of its premises unless its premises are tautologies. (In logic, a tautology is a statement that cannot possibly be false: e.g., A rose is a rose or Either it will rain or it will not rain or If Browne is psychic and stupid, then Browne is stupid.)
So, how does knowing the difference between induction and deduction have any bearing on critical thinking? If you understand deduction, then you should be able to understand why scientific experiments are set up the way they are. For example, if someone claims to be able to feel another persons energy field by moving her hands above the patients body, as those who practice therapeutic touch claim, then she should be able to demonstrate that she can detect another persons energy field when that field is beneath one of her hands, even if her vision is blocked so that she cant see which hand is over the alleged energy field. If one can detect energy fields by feel alone then one must be able to detect energy fields without the assistance of any visual or aural feedback from the patient. Likewise, if one claims to be able to detect metal or oil by dowsing, then one ought to be able to detect metal or oil hidden from sight under controlled conditions. If one claims to be able to facilitate communication from someone who is retarded and physically unable to talk or point, then one should be able to describe correctly objects placed in the visual field of the patient even if those objects cannot be seen by the facilitator.
On the other hand, the nature of induction should, at the very least, make us humble by reminding us that no matter how great the evidence is for a belief, that belief could still be false.
See also Austin Cline, Deductive & Inductive Arguments How do they differ?
The Concept of Validity
Deductive arguments are those whose premises are said to entail their conclusions (see lesson 1). If the premises of a deductive argument do entail their conclusion, the argument is valid. (The term valid is not used by most logicians when referring to inductive arguments, but that is a topic for another mini-lesson.) If not, the argument is invalid.
Here’s an example of a valid argument:
Shermer and Randi are skeptics. Shermer and Randi are writers. So, some skeptics are writers.
To say the argument is valid is to say that it is logically impossible for its premises to be true and its conclusion false. So, if the premises of my example are true, then the conclusion must be true also. The premises of this argument happen to be true, so this argument is not only valid, but sound or cogent. A sound or cogent deductive argument is defined as one that is valid and has true premises.
A valid argument may have false premises, however. For example,
All Protestants are bigots. All bigots are Italian. So, all Protestants are Italian.
Being valid is not the same as being sound. Validity is determined by the relationship of premises to conclusion in a deductive argument. This relationship, in a valid argument, is referred to as implication or inference. The premises of a valid argument are said to imply their conclusion. The conclusion of a valid argument may be inferred from its premises.
While many errors in deduction are due to making unjustified inferences from premises, the vast majority of unsound deductive arguments are probably due to premises that are questionable or false. For example, many researchers on psi have found statistical anomalies and have inferred from this data that they have found evidence for psi. The error, however, is one of assumption, not inference. The researchers assume that psi is the best explanation for the statistical anomaly. If one makes this assumption, then one’s inference from the data is justified. However, the assumption is questionable and the arguments based on it are unsound. Similar unsound reasoning occurs in the arguments that intercessory prayer heals and that psychics get messages from the dead. Researchers assume that a statistically significant correlation between praying and healing is best explained by assuming prayer is a causative agent, but this assumption is questionable. Researchers also assume that results that are statistically improbable if explained by chance, guessing, or cold reading, are best explained by positing communication from the dead, but this assumption is questionable. These researchers reason well enough. That is, they draw correct inferences from their data. But the reasons on which they base their reasoning are faulty because questionable.
I am not suggesting by the above comments that the data and methods of these researchers is beyond criticism. In fact, I find it interesting that skeptics seem to divide into two camps when criticizing such things as Gary Schwartz’s so-called afterlife experiments. One camp attacks the assumptions. The other camp attacks the data or the methods used to gather the data. The former camp finds errors of assumption and fallacies such as begging the question, argument to ignorance, or false dilemma. The other finds cheating, sensory leakage, poor use of statistics, inadequate controls, and that sort of thing.
Finally, some deductive arguments are unsound because they are invalid, not because their premises are false or questionable. Here is an unsound deductive argument whose premises may well be true:
If my astrologer is clairvoyant, then she predicted my travel plans correctly. She predicted my travel plans correctly. So, my astrologer is clairvoyant.
This conclusion is not entailed by these premises, so the argument is invalid. It is possible that both these premises are true but the conclusion is false. (She may have predicted my travel plans because she got information from my travel agent, for example.) This argument is said to commit the fallacy of affirming the consequent. Another example of this fallacy would be:
If God created the universe, we should observe order and design in Nature. We do observe order and design in Nature. So, God created the universe.
The premises of this argument may be true, but they do not entail their conclusion. This conclusion could be false even if the premises are true. (We should also observe order and design in Nature if something like Darwin’s theory of natural selection is true.)
The Wason Card Problem
One of the nicer features of the James Randi Educational Foundation’s Amazing Meeting earlier this year was the time set aside for mini-talks by those responding to a call for papers. One of those talks was given by Dr. Jeff Corey, who teaches experimental psychology at C. W. Post College. His talk was on “The Wason Card Problem” and its role in teaching critical thinking skills. Four cards are presented: A, B, 4, and 7. There is a letter on one side of each card and a number on the other side. Which card(s) must you turn over to determine whether the following statement is false? “If a card has a vowel on one side, then it has an even number on the other side.”
(I suggest you spend a few minutes trying to solve the problem before continuing.)
(I hope you have been able to restrain yourself from jumping ahead and have worked out your solution to the problem. Before continuing, try to solve the following alternative version: Let the cards show “beer,” “cola,” “16 years,” and “22 years.” On one side of each card is the name of a drink; on the other side is the age of the drinker. What card(s) must be turned over to determine if the following statement is false? If a person is drinking beer, then the person is over 19-years-old.)
I gave the Wason Card Problem to 100 students last semester and only seven got it right, which was about what was expected. There are various explanations for these results. One of the more common explanations is in terms of confirmation bias. This explanation is based on the fact that the majority of people think you must turn over cards A and 4, the vowel card and the even-number card. It is thought that those who would turn over these cards are thinking “I must turn over A to see if there is an even number on the other side and I must turn over the 4 to see if there is a vowel on the other side.” Such thinking supposedly indicates that one is trying to confirm the statement If a card has a vowel on one side, then it has an even number on the other side. Presumably, one is thinking that if the statement cannot be confirmed, it must be false. This explanation then leads to the question: Why do most people try to confirm a statement, when the task is to determine if it is false? One explanation is that people tend to try to fit individual cases into patterns or rules. The problem with this explanation is that in this case we are instructed to find cases that don’t fit the rule. Is there some sort of inherent resistance to such an activity? Are we so driven to fit individual cases to a rule that we can’t even follow a simple instruction to find cases that don’t fit the rule? Or, are we so driven that we tend to think that the best way to determine whether an instance does not fit a rule is to try to confirm it and if it can’t be confirmed then, and only then, do we consider that the rule might be wrong?
Corey noted that when the problem is changed from abstract items, such as numbers and letters, and put in concrete terms, such as drinks and the age of the drinker, the success rate significantly increases (see the example described above). One would think that confirmation bias would lead most people to say they must turn over the beer card and the 22 card, but they don’t. Most people see that the cola and 22 cards are irrelevant to solving the problem. If I remember correctly, Corey explained the difference in performance between the abstract and concrete versions of the problem in terms of evolutionary psychology: Humans are hardwired to solve practical, concrete problems, not abstract ones. To support his point, he says he simplified the abstract test to include only two cards (showing 1 and 2) with equally poor results.
I had discussed confirmation bias, but not conditional statements, with my classes before giving them the Wason problem. The majority seemed to understand confirmation bias; so, if the reason so many do so poorly on this problem is confirmation bias, then just knowing about confirmation bias is not much help in overcoming it as a hindrance to critical thinking. This is consistent with what I teach. Recognition of a hindrance is a necessary but not a sufficient condition for overcoming that hindrance. However, next semester I’m going to give my students the Wason test after I discuss determining the truth-value of conditional statements. The reason for doing so is that anyone who has studied the logic of conditional statements should know that a conditional statement is false if and only if the antecedent is true and the consequent is false. (The antecedent is the if statement; the consequent is the then statement.) So, the statement If a card has a vowel on one side, then it has an even number on the other side can only be false if the statement a card has a vowel on one side is true and the statement it has an even number on the other side is false. I must look at the card with the vowel showing to find out what is on the other side because it could be an odd number and thus would show me that the statement is false. I must also look at the card with the odd number to find out what is on the other side because it could be a vowel and thus would show me that the statement is false. I don’t need to look at the card with the consonant because the statement I am testing has nothing to do with consonants. Nor do I need to look at the card with the even number showing because whether the other side has a vowel or a consonant will not help me determine whether the statement is false.
There is a possibility that the reason many think that the even-numbered card must be turned over is that they mistakenly think that the statement they are testing implies that if a card has an even number on one side then it cannot have a consonant on the other. In other words, it is possible that the high error rate is due to misunderstanding logical implication rather than confirmation bias. In the concrete version of the problem, perhaps it is much easier to see that the statement If a person is drinking beer, then the person is over 19-years-old does not imply that if a person is over 19 then they cannot be drinking cola. If this is the case, then an explanation in terms of the difference between contextual implication and logical implication might be better than one in terms of confirmation bias. Perhaps it is the context of drinking and age of the drinker that indicates to many people that a person can be over 19 and not drink beer without falsifying the statement being tested, i.e., that simply because if you’re drinking beer you are over 19 doesn’t imply that if you’re over 19 you can’t be drinking cola. That is, in the concrete case people may not have any better understanding of logical implication than they do in the abstract case and neither case may have anything to do with confirmation bias.
On the other hand, some might reason that if I turn over the even card and find a vowel, then I have confirmed the statement, which is in effect the same as showing that the statement is not false, but true. This would be classic confirmation bias. Finding an instance that confirms the rule does not prove the rule is true. But, finding one instance that disproves the rule shows that the rule is false.
The Wason Card Problem Revisited
I received several responses to my analysis of the Wason problem. Mathematician and author Jan Willem Nienhuys wrote from the Netherlands:
I don’t think that the card problem as presented is compatible with the beer over 21 problem. What would happen if you said “vowels and odds are forbidden to go together on one card” and ask someone to check whether there are cards that are forbidden. That’s the beer over 21 problem. Another problem with the example is that the beer problem has a known social setting. If you made some kind of funny restriction, like ‘over 22 must drink coke’, it’s much harder, or you can make a restaurant setting, with a completely strange restriction like ‘girls (or people with a polysyllabic name) must order broccoli’, then it’s much more difficult, for the problem solvers must then keep an odd fact in mind while analyzing several cases. The less unfamiliar facts one has to keep at same time ready in the mind, the easier it is. (And it is quite possible that not everybody knows what’s an even number or what’s a vowel, or that people with slightly deficient knowledge know at most one of these concepts, you’d be surprised how deficient people’s knowledge is).
I replied to Jan that, unless I’m mistaken, both problems imply that two cards are forbidden together (vowel and odd number; beer and 19-years or under). I think I will try the problem on my classes with Jan’s suggested instruction and see if the results vary significantly. (I’ll send him the results and he, the mathematician, can tell me whether the difference, if any, is significant!) The social setting would be part of what I’m calling the context that might be why the beer problem is easier to solve for most people. It had not occurred to me that part of the problem might be in understanding the meaning of words like “vowel” and “even,” but that is a consideration that should not be taken lightly (unfortunately) and maybe I should try the test with some set-up questions to make sure those taking it understand such terms.
I will be very interested in what you find. You might try variations like: if there are two primes on one side, the other side must show their product. This means that if a card shows a single number that is the product of two primes, you don’t have to turn it around. If it shows two numbers that aren’t primes, you also don’t have to turn it around. Obviously the difficulty is that lots of people don’t know what are primes, and even if they do so theoretically, some know their tables of multiplication so poorly, that they are at loss what to do when the card shows 42 or 49 or 87 or 36 or 39. Or 10.
Yikes! Jan, I teach a general course in logic and critical thinking, not math! My students would lynch me if I posed such a problem to them.
I do think that one of the problems with solving this problem (and many others!) has to do with how one reads or misreads the instructions. (For those who don’t recall the exact instructions, here they are again: Four cards are presented: A, D, 4, and 7. There is a letter on one side of each card and a number on the other side. Which card(s) must you turn over to determine whether the following statement is false? “If a card has a vowel on one side, then it has an even number on the other side.”
One reader wrote:
My solution to the problem is to check all cards (or a random sample if there are a large number of them) – Sometimes it’s best to see what rules apply. (Sometimes “if” means if and only if…)
This approach represents a common mistake in problem-solving: self-imposed rules. The instructions do not imply that there are more than four cards, nor does “if” mean “if and only if.” (See James Adams’ Conceptual Blockbusting for a good discussion on common hindrances to problem-solving.)
The reader continues:
A simpler explanation for people choosing A and 4: Given that people tend to satifice, it makes sense that many will just check the cards where they see a vowel or an even number. It’s a quick solution made with the immediate data on hand, requiring no additional thought (about the implications of the statement or anything else). Classic satisficing behavior.
Whether this solution is satisficing or satificing, it’s wrong.
Another reader, Jack Philley, wrote:
Thanks for a great newsletter. I am a safety engineer and incident investigator. I also teach a segment on critical thinking in my incident investigation course, and I have been using the Wason card challenge. I picked it up from Tom Gilovich’s book How We Know What Isn’t So. About 80 % of my students get it wrong and some of them become very angry and embarrassed and defend their logic to an unreasonable degree. I use it to illustrate our natural talent to try to prove a hypothesis and our weakness in thinking about how to disprove a suspected hypothesis. This comes in handy when trying to identify the actual accident scenario from a set of speculated possible cause scenarios.
For those who haven’t read Gilovich (or have but don’t remember what he said about the Wason problem), he thinks that people turn over card “2” even though it is uninformative and can only confirm the hypothesis because they are looking for evidence that would be consistent with the hypothesis rather than evidence which would be inconsistent with the hypothesis. He also finds this behavior most informative because it “makes it abundantly clear that the tendency to seek out information consistent with a hypothesis need not stem from any desire for the hypothesis to be true (33).” Who really cares what is true regarding vowels and numbers? Thus, the notion that we seek confirmatory evidence because we are trying to find support for things we want to be true is not supported by the typical results of the Wason test. People seek confirmatory evidence, according to Gilovich, because they think it is relevant.
As to the notion I put forth that it is because of the context that people do better when the problem is in terms of drinking beer or soda and age, Gilovich notes that only in contexts that invoke the notion of permission do we find improved performance (p. 34 note). This just shows, he thinks, that there are some situations where “people are not preoccupied with confirmations.”
Logical fallacies are errors that occur in arguments. In logic, an argument is the giving of reasons (called premises) to support some claim (called the conclusion). There are many ways to classify logical fallacies. I prefer listing the conditions for a good or cogent argument and then classifying logical fallacies according to the failure to meet these conditions.
Every argument makes some assumptions. A cogent argument makes only warranted assumptions, i.e., its assumptions are not questionable or false. So, fallacies of assumption make up one type of logical fallacy. One of the most common fallacies of assumption is called begging the question. Here the arguer assumes what he should be proving. Most arguments for psi commit this fallacy. For example, many believers in psi point to the ganzfeld experiments as proof of paranormal activity. They note that a .25 success rate is predicted by chance but Honorton had some success rates of .34. One defender of psi claims that the odds of getting 34% correct in these experiments was a million billion to one. That may be true but one is begging the question to ascribe the amazing success rate to paranormal powers. It could be evidence of psychic activity but there might be some other explanation as well. The amazing statistic doesn’t prove what caused it. The fact that the experiment is trying to find proof of psi isn’t relevant. If someone else did the same experiment but claimed to be trying to find proof that angels, dark matter, or aliens were communicating directly to some minds, that would not be relevant to what was actually the cause of the amazing statistic. The experimenters are simply assuming that any amazing stat they get is due to something paranormal.
Another common–and fatal–fallacy of assumption is the false dilemma, whereby one restricts consideration of reasonable alternatives.
Not all fallacies of assumption are fatal. Some cogent arguments might make one or two questionable or false assumptions, but still have enough good evidence to support their conclusions. Some, like the gambler’s fallacy, are fatal, however.
Another quality of a cogent argument is that the premises are relevant to supporting their conclusions. Providing irrelevant reasons for your conclusion need not be fatal, either, provided you have sufficient relevant evidence to support your conclusion. However, if all the reasons you give to support of your conclusion are irrelevant then your reasoning is said to be a non sequitur. The divine fallacy is a type of non sequitur.
One of the more common fallacies of relevance is the ad hominem, an attack on the one making the argument rather than an attack on the argument. One of the most frequent types of ad hominem attack is to attack the person’s motives rather than his evidence. For example, when an opponent refuses to agree with some point that is essential to your argument, you call him an “antitheist” or “obtuse.”
Other examples of irrelevant reasoning are the sunk-cost fallacy and the argument to ignorance.
A third quality of a cogent argument is sometimes called the completeness requirement: A cogent argument should not omit relevant evidence. Selective thinking is the basis for most beliefs in the psychic powers of so-called mind readers and mediums. It is also the basis for many, if not most, occult and pseudoscientific beliefs. Selective thinking is essential to the arguments of defenders of untested and unproven remedies. Suppressing or omitting relevant evidence is obviously not fatal to the persuasiveness of an argument, but it is fatal to its cogency. The regressive fallacy is an example of a fallacy of omission. The false dilemma is also a fallacy of omission.
A fourth quality of a cogent argument is fairness. A cogent argument doesn’t distort evidence nor does it exaggerate or undervalue the strength of specific data. The straw man fallacy violates the principle of fairness.
A fifth quality of cogent reasoning is clarity. Some fallacies are due to ambiguity, such as the fallacy of equivocation: shifting the meaning of a key expression in an argument. For example, the following argument uses ‘accident’ first in the sense of ‘not created’ and then in the sense of ‘chance event.’
Since you don’t believe you were created by God then you must believe you are just an accident. Therefore, all your thoughts and actions are accidents, including your disbelief in God.
Finally, a cogent argument provides a sufficient quantity of evidence to support its conclusion. Failure to provide sufficient evidence is to commit the fallacy of hasty conclusion. One type of hasty conclusion that occurs quite frequently in the production of superstitious beliefs and beliefs in the paranormal is the post hoc fallacy.
Some fallacies may be classified in more than one way, e.g., the pragmatic fallacy, which at times seems to be due to vagueness and at times due to insufficient evidence.
The critical thinker must supplement the study of logical fallacies with lessons from the social sciences on such topics as
James Alcock reminds us that The true critical thinker accepts what few people ever accept — that one cannot routinely trust perceptions and memories (The Belief Engine). The unhappy truth is that humans are not truth-seeking missiles. In addition to understanding logical fallacies, we must also understand why we are prone to them.
There are literally hundreds of logical fallacies. For a good general introduction to fallacies I recommend Attacking Faulty Reasoning: A Practical Guide to Fallacy-Free Arguments by T. Edward Damer or Asking the Right Questions: A Guide to Critical Thinking by M. Neil Browne and Stuart M. Keeley.
There are some on-line sites that focus on fallacies. I refer the reader to them without comment:
replication of scientific studies
A student who did very well in my Logic and Critical Reasoning course sent the following news item along with the suggestion that I might need to revise my thinking about lunar effects. I replied that I might need to emphasize more strongly what I teach: Look for what is not mentioned in the study, not just at what is mentioned. And don’t forget how important replication of a study is.
Aug 11, 2003. (Bloomberg) — Car accidents occur 14 percent more often on average during a full moon than a new moon, according to a study of 3 million car policies by the U.K.’s Churchill Insurance Group Plc.
The data show a rise in all types of accidents, involving single vehicles or multiple cars, the company said in an e-mailed press release. The next full moon will be tomorrow night.
“We know that the moon is a strong source of energy, as it affects the tides and weather patterns, but were surprised by this bizarre trend,” Craig Staniland, head of car insurance at Churchill, said in the release.
The company, which Royal Bank of Scotland Group Plc agreed to buy from Credit Suisse Group in June, speculated that eastern philosophy’s concepts of yin and yang may explain the accident rate. It cited a feng shui expert, Simon Brown, saying that the full moon radiates more of the sun’s yang energy onto earth, making people more aggressive and impatient.
The insurer said it won’t change its underwriting criteria to take the full moon into account, the company said.
In addition to yin and yang, there might be other explanations for this data, but before searching for explanations one should make sure there is something that needs to be explained. The study seems to claim that there are 14% more accidents on nights when there is a full moon than on nights when there is a new moon. (When the moon is full, if the weather is clear, it will generally be very bright. When the moon is new, even if the weather is clear, the moon will hardly be visible.)* The results of a single study may be suggestive but they are not usually considered conclusive. This study may have been well-designed but we are not told anything about how it was conducted or how it was designed, so we can’t be sure. The Churchill Insurance Group may have a flawless study, but note that they didn’t take the results seriously enough to alter their underwriting criteria. Why not? I don’t know. What I would like to know is how was the study done?
The press release mentions a study of 3 million car policies but that’s a bit vague. Did they analyze 3 million policies and separate those who made accident claims from those who didn’t? Then, did they find that claims that involved accidents that happened at night when there was a full moon occurred 14% more frequently than claims that involved accidents that happened at night when there was a new moon? Did they control for weather? That is, did they review their data to make sure that there were about the same number of stormy nights on both full and new moon nights? Otherwise, they might just be measuring an effect of bad weather, not moon phases.
How many accidents are we talking about? Without knowing the numbers we can’t determine whether this study had a sufficient number of cases to analyze. But even if it had many thousands of cases, we don’t know over how long a period of time this study was conducted. If it analyzed data over a very long period of time, that would be more impressive than if it analyzed data over a very short period of time. Why? Over a short period of time they are more likely to get skewed results. For example, maybe the period they evaluated had two full moons in 30 days and both occurred on Saturdays. With smaller numbers it becomes more important to control for factors like the weather or weekends.
We need to know exactly how many accidents were involved in the study, the beginning date and end date of the data collection, the exact number of nights involved, and the exact number of full and new moons during the study. We should also be assured that only accidents that occurred after the rising and before the setting of the full moon were included in the study. If the accidents happened during the day or before the full moon was present, the likelihood that the moon had anything to do with diminishes significantly.
Finally, even if the study was based on a sufficient number of cases over an adequate period of time and included only data it should include (and didn’t include data it shouldn’t include), and even if the data were analyzed properly by professional statisticians, we should still wait until it is replicated before worrying about finding an explanation for the 14% statistic. A single study with statistically impressive results should not be taken as sufficient to base any important decisions on.
Now, trying to prove the statistic is due to yin and yang is another matter altogether. I have no idea how anyone could construct a scientific study to test that hypothesis.
But we can at least correct one misconception put forth in this press release: the moon is not a strong source of gravitational energy on earthlings. George Abell has calculated that the moon’s gravitational pull on a human individual is less than that of a mosquito. Ivan Kelly put it this way: “A mother holding her child “will exert 12 million times as much tidal force on her child as the moon.”*
Why would anyone cite this study favorably? Confirmation bias. If you already believe in lunar effects, this study confirms your belief. You will be less likely to be critical of it than if it goes against your beliefs. Also, the suburban myth that the moon is a strong source of energy continues to be reported in the media, giving many people the impression that it must be true.
the fallacy of suppressed evidence
One of the basic principles of cogent argumentation is that a cogent argument presents all the relevant evidence. An argument that omits relevant evidence appears stronger and more cogent than it is.
The fallacy of suppressed evidence occurs when an arguer intentionally omits relevant data. This is a difficult fallacy to detect because we often have no way of knowing that we haven’t been told the whole truth.
Many advertisements commit this fallacy. Ads inform us of a product’s dangers only if required to do so by law. Ads never state that a competitor’s product is equally good. The coal, asbestos, nuclear fuel, and tobacco industries have knowingly suppressed evidence regarding the health of their employees or the health hazards of their industries.
Occasionally scientists will suppress evidence, making a study seem more significant than it is. In the December 1998 issue of The Western Journal of Medicine scientists Fred Sicher, Elisabeth Targ, Dan Moore II, and Helene S. Smith published “A Randomized Double-Blind Study of the Effect of Distant Healing in a Population With Advanced AIDS–Report of a Small Scale Study.” (I’ll refer to this as “the Sicher report.”) The authors do not mention, nor has The Western Journal of Medicine ever acknowledged, that the study was originally designed and funded to determine one specific effect: death. The 1998 study was designed to be a follow-up to a 1995 study of 20 patients with AIDS, ten of whom were prayed for by psychic healers. Four of the patients died, a result consistent with chance, but all four were in the control group, a stat that appeared anomalous enough to these scientists to do further study. I don’t know whether evidence was suppressed or whether the scientists doing the study were simply incompetent, but the four patients who died were the four oldest in the study. The 1995 study did not control for age when it assigned the patients to either the control or the healing prayer group. Any controlled study on mortality that does not control for age is by definition not a properly designed study.
The follow-up study, however, did suppress evidence, yet it is “widely acknowledged as the most scientifically rigorous attempt ever to discover if prayer can heal” (Bronson 2002). The standard format for scientific reports is to begin with an abstract that summarizes the contents of the report. The Abstract for the Sicher report notes that controls were done for age, number of AIDS-defining illnesses, and cell count. Patients were randomly assigned to the control or healing prayer groups. The study followed the patients for six months. “At 6 months, a blind medical chart review found that treatment subjects acquired significantly fewer new AIDS-defining illnesses (0.1 versus 0.6 per patient, P = 0.04), had lower illness severity (severity score 0.8 versus 2.65, P = 0.03), and required significantly fewer doctor visits (9.2 versus 13.0, P = 0.01), fewer hospitalizations (0.15 versus 0.6, P = 0.04), and fewer days of hospitalization (0.5 versus 3.4, P = 0.04).” These numbers are very impressive. They indicate that the measured differences were not likely due to chance. Whether they were due to healing prayer (HP) is another matter, but the scientists concluded their abstract with the claim: “These data support the possibility of a DH effect in AIDS and suggest the value of further research.” Two years later the team, led by Elisabeth Targ, was granted $1.5 million of our tax dollars from the National Institutes of Health Center for Complementary Medicine to do further research on the healing effects of prayer.
What the Sicher study didn’t reveal was that the original study had not been designed to do any of these measurements they report as significant. Of course, any researcher who didn’t report significant findings just because the original study hadn’t set out to investigate them would be remiss. The standard format of a scientific report allows such findings to be noted in the abstract or in the Discussion section of the report. It would have been appropriate for the Sicher report to have noted in the Discussion section that since only one patient died during their study, it appears that the new drugs being given AIDS patients as part of their standard therapy (triple-drug anti-retroviral therapy) were having a significant effect on longevity. They might even have suggested that their finding warranted further research into the effectiveness of the new drug therapy. However, the Sicher report Abstract doesn’t even mention that only one of their subjects died during the study, indicating that they didn’t recognize a truly significant research finding. It may also indicate that the scientists didn’t want to call attention to the fact that their original study was designed to study the effect of healing prayer on the mortality rate of AIDS patients. Since only one patient died, perhaps they felt that they had nothing to report.
It was only after they mined the data once the study was completed that they came up with the suggestive and impressive statistics that they present in their published report. The Texas sharpshooter fallacy seems to have been committed here. Under certain conditions, mining the data would be perfectly acceptable. For example, if your original study was designed to study the effectiveness of a drug on blood pressure but you find after the data is in that the experimental group had no significant decrease in blood pressure but did have a significant increase in HDL (the “good” cholesterol), you would be remiss not to mention this. You would be guilty of deception, however, if you wrote your paper as if your original design was to study the effects of the drug on cholesterol and made no mention of blood pressure.
So, it would have been entirely appropriate for the Sicher report to have noted in the Discussion section that they had discovered something interesting in their statistics: Hospital stays and doctor visits were lower for the HP group. It was inappropriate to write the report as if that was one of the effects the study was designed to measure when this effect was neither looked for nor discovered until Moore, the statistician for the study, began crunching numbers looking for something of statistical significance after the study was completed. That was all he could come up with. Again, crunching numbers and data mining after a study is completed is appropriate; not mentioning that you rewrote your paper to make it look like it had been designed to crunch those numbers isn’t.
It would have been appropriate in the Discussion section of their report to have speculated as to the reason for the statistically significant differences in hospitalizations and days of hospitalization. They could have speculated that prayer made all the difference and, if they were competent, they would have also noted that insurance coverage could make all the difference as well. “Patients with health insurance tend to stay in hospitals longer than uninsured ones” (Bronson 2002). The researchers should have checked this out and reported their findings. Instead, they then took a list of 23 illnesses associated with AIDS and had Sicher go back over each of the forty patient medical charts and use them to collect the data for the 23 illnesses as best he could. This was after it was known to Sicher which group each patient had been randomly assigned to, prayer or control. The fact that the names were blacked out, so he could not immediately tell whose record he was reading, does not seem sufficient to justify allowing him to review the data. There were only 40 patients in the study and he was familiar with each of them. It would have been better had an independent party, someone not involved in the study, gone over the medical charts. Sicher is “an ardent believer in distant healing” and he had put up $7,500 for the pilot study (ibid.) on prayer and mortality. His impartiality was clearly compromised. So was the double-blind quality of the study.
Thus, there was quite a bit of significant and relevant evidence suppressed in the Sicher study that, had it been revealed, might have diminished its reputation as the best designed study ever on prayer and healing. Instead of being held up as a model of promising research in the field of spiritual science, this study might have ended up in the trash heap where it belongs.
- prayer entry in The Skeptic’s Dictionary
- A Prayer Before Dying by Po Bronson (Wired Dec. 2002)
- Abstract of the Sicher report in the Western Journal of Medicine
One of the traits of a cogent argument is that the evidence be sufficient to warrant accepting the conclusion. In causal arguments, this generally requires–among other things–that a finding of a significant correlation between two variables, such as magnets and pain, be reproducible. Replication of a significant correlation usually indicates that the finding was not a fluke or due to methodological error. Yet, I am often sent copies of articles regarding single studies and advised that it may be about time for me to change my mind on some subject. For example, I recently heard from Jouni Helminen that “It may be time to update the Skepdic website regarding magnet therapy on fibromyalgia patients.” Jouni referred me to an article from the University of Virginia News. I state in my entry on magnet therapy: “There is almost no scientific evidence supporting magnet therapy.” The article about a study done on magnet therapy to reduce fibromyalgia pain did nothing to change my mind. The study, conducted by University of Virginia (UV) researchers, was published in the Journal of Alternative and Complementary Medicine, which asserts that it “includes observational and analytical reports on treatments outside the realm of allopathic medicine….”
The only people who refer to conventional medicine as allopathic are rabid opponents of conventional medicine and may not be the most objective folks in the world when it comes to evaluating anything “alternative.” Be that as it may, the study must stand or fall on its own merits, not on the biases of those who publish it. Furthermore, the study must be distinguished from the press release put out by UV. The headline of the UV article states that Magnet Therapy Shows Limited Potential for Pain Relief. The first paragraph states that “the results of the study were inconclusive.” Not very promising. Even so, the researchers claimed that magnet therapy reduced fibromyalgia pain intensity enough in one group of study participants to be “clinically meaningful.” I guess “limited potential” is the middle ground between “inconclusive” and “clinically meaningful.” This is somewhat confusing.
The UV study involved 94 fibromyalgia patients who were randomly assigned to one of four groups. One control group “received sham pads containing magnets that had been demagnetized through heat processing” and the other got nothing special. One treatment group got “whole-body exposure to a low, uniformly static magnetic field of negative polarity. The other…[got]…a low static magnetic field that varied spatially and in polarity. The subjects were treated and tracked for six months.”
“Three measures of pain were used: functional status reported by study participants on a standardized fibromyalgia questionnaire used nationwide, number of tender points on the body, and pain intensity ratings.”
One of the investigators, Ann Gill Taylor, R.N., Ed.D. stated: “When we compared the groups, we did not find significant statistical differences in most of the outcome measures.” Taylor is a professor of nursing and director of the Center for Study of Complementary and Alternative Therapies at UV. “However, we did find a statistically significant difference in pain intensity reduction for one of the active magnet pad groups,” said Taylor. The article doesn’t mention how many outcome measures were used.
The study’s principal investigator was Dr. Alan P. Alfano, assistant professor of physical medicine and rehabilitation and medical director of the UV HealthSouth Rehabilitation Hospital. Alfano claimed that “Finding any positive results in the groups using the magnets was surprising, given how little we know about how magnets work to reduce pain.” Frankly, I find it surprising that Alfano finds that surprising, since it is unlikely he would have conducted the study if he didn’t think there might be some pain relief benefit to using magnets. His statement assumes they work to reduce pain and the task is to figure out how. Alfano is also quoted as saying that “The results tell us maybe this therapy works, and that maybe more research is justified. You can’t draw final conclusions from only one study.” Certainly, his last claim is correct. His double use of the weasel word “maybe” indicates that he realizes that you can’t even make a strong claim that more research ought to be done based on the results of one study, especially if the results aren’t that impressive.
Not knowing how many outcome measures the researchers used makes it difficult to assess the significance of finding one or two outcomes that look promising. Given all the variables that go into “pain” and measuring pain, and the variations in the individuals suffering pain (even those diagnosed as having the same disorder), it should be expected that if you measure enough outcomes you are going to find something statistically significant. Whether that’s meaningful or not is another issue. A competent researcher would not want to make any strong causal claims about magnets and pain on the basis of finding one or two statistically significant outcomes in a study that found that most outcomes showed nothing significant.
But even if most of the outcomes had been statistically significant in this study of 94 patients, that still would not amount to strong scientific evidence in support of magnet therapy. The experiment would need to be replicated. Given the variables mentioned above, it would not be surprising if this study were replicated but found different outcomes statistically significant. Several studies might find several different outcomes statistically significant and some researcher might then do a meta-study and claim that when one takes all the studies together one gets one large study with very significant results. What you would actually get is one misleading study.
If other researchers repeat the UV study, looking only at the outcome that was statistically significant in the original study, and they duplicate the results of the UV study, then we should conclude that this looks promising. But one replication shouldn’t seal the deal on the causal connection between magnets and pain relief. One lab might duplicate another lab’s results but both might using faulty equipment manufactured by the same company. Or both might be using the same faulty subjective measures to evaluate their data. Several studies that showed nothing significant for magnets and pain might be followed by several that find significant results, even if all the studies are methodologically sound. Why? Because you are dealing with human beings, very complex organisms who won’t necessarily react the same way to the same treatment. Even the same person won’t necessarily react the same way to the same treatment at different times.
So, a single study on something like magnets and pain relief should rarely be taken by anybody as significant scientific evidence of a causal connection between the two. Likewise, a single study of this issue that finds nothing significant should not be taken as proof that magnets are useless. However, when dozens of studies find little support that magnets are effective in warding off pain, then it seems reasonable to conclude that there is no good reason to believe in magnet therapy. And I would not give up that belief on the basis of what I read in the UV press release about their little study on magnets and fibromyalgia.
The Univ. Virginia study sounds fairly typical of Alternative Medicine studies. Take a large number of indicators that the treatment could be effective. After the study, hunt around for one or two indicators that show effectiveness. In the “normal” curve, you can get a 2-standard-deviation effect by chance about 5% of the time. So if you have 20 indicators, you will likely find one indicator that “shows” that the treatment is effective. If you have 40 indicators, you will likely find two that indicate effectiveness. This procedure is called “data dredging,” and is a definite no-no. Instead, the proper scientific procedure is to use all the data, not just those that support your pet hypothesis. “Inconclusive” is often a euphemism for “didn’t work.”
John W. Farley Professor of Physics
straw man fallacy
One of the characteristics of a cogent refutation of an argument is that the argument one is refuting be represented fairly and accurately. To distort or misrepresent an argument one is trying to refute is called the straw man fallacy. It doesn’t matter whether the misrepresentation or distortion is accidental and due to misunderstanding the argument or is intentional and aimed at making it easier to refute. Either way, one commits the straw man fallacy.
In other words, the attacker of a straw man argument is refuting a position of his own creation, not the position of someone else. The refutation may appear to be a good one to someone unfamiliar with the original argument.
To understand the example of the straw man fallacy I will present here, I suggest you first read my entry on the unconscious mind and identify what my arguments and positions are in that essay. The straw man I am going to present was created by Karl Tyler of England in a review of The Skeptic’s Dictionary posted on Amazon.com.
In a world where the flow of information that daily assails us has turned into a veritable tidal wave, the process of debunking myths, snake oil salesmen and the like not only makes fun reading, it also provides a valuable service – BUT ONLY when it is done well.
In this case, the book unfortunately tells us little more than what groups/ideas have earned the author’s vitriolic displeasure. What we DON’T find out is what has really shaped the contents of the book, and this despite the fact that there is solid evidence that in numerous instances the views and claims are based on indirectly obtained, and often wildly off beam, information rather than on solid investigation.
In short, we are offered prejudices posing as objectivity.
Take the rejection of “the unconscious mind”, for example.
The book provides a lengthy, even tedious, pseudo-scientific discussion of how “science” has failed to demonstrate the existence of the “unconscious” mind as described by Freud, Jung and Tart, and then leaps to the unsupported conclusion that therefore there is no such thing as the unconscious mind (emphasis added).
I must admit to Karl and the world that leaping to conclusions is one of my favorite exercises, but I have neither leapt nor crept to the belief that there is no such thing as the unconscious mind. It is not quite accurate to state that I reject Freud’s notion, Jung’s notion, and Tart’s notion because science has failed to demonstrate the existence of any of their notions. It would be more accurate to say that I reject Freud’s notion because the empirical evidence regarding trauma and memory mostly contradicts it. I reject Jung’s and Tart’s notion that the subconscious mind is a reservoir of transcendent truths not because it is metaphysical and thus false, but because I don’t find it useful or convincing as an explanation for anything it supposedly explains. Mr. Tyler might well have criticized me for misleading the reader by claiming that there is no scientific evidence for this metaphysical position. Of course there isn’t. There couldn’t be. Metaphysical claims by their very nature can’t be supported by scientific evidence.
Mr. Tyler states my position as being there is no such thing as the unconscious mind. Yet, in the third paragraph, I say “It would be absurd to reject the notion of the unconscious mind simply because we reject the Freudian notion of the unconscious as a reservoir of repressed memories of traumatic experiences….it seems obvious that much, if not most, of one’s brain’s activity occurs without our awareness or consciousness. Consciousness or self-awareness is obviously the proverbial tip of the iceberg.”
I also state that “there is ample scientific data to establish as a fact that some conscious perception goes on without self-consciousness.” I present four examples to support this point: blindness denial, jargon aphasia, blindsight. and oral/verbal dissociation. Mr. Tyler does not address these claims at all, though they clearly imply a belief in the unconscious, although quite a different kind of unconscious than Freud or Jung envisioned. I refer to this aspect of unconscious processing as “lost memory,” “fragmented memory,” or “implicit memory” and cite the work of Schacter and Tulving, who came up with the latter term. Mr. Tyler, continuing with more straw man argumentation, says that “we clearly have a capacity for mental processing which is something rather more sophisticated than just “lost memory”, as this author suggests.” In other words, he suggests that I have not only rejected the unconscious mind, but I’ve rejected the conscious mind as well! Of course we have a capacity for acts of mental processing beyond those associated with the aforementioned perceptions without self-consciousness.
Mr. Tyler apparently thinks that by proving I reject belief in the unconscious mind (which I don’t) and in the conscious mind beyond those involving conscious perception without awareness (which I don’t), he has shown I am wrong. It doesn’t take much evidence to support his claim that I am wrong since he is refuting a rather moronic position that he has misrepresented as mine. His premise consists of the statement: “you really don’t need to be a rocket scientist to recognise the validity of the notion of an “unconscious”, or “out of conscious” mind.” In other words, the straw man is so obviously false that no refutation is even needed.
I can only guess at why Mr. Tyler misrepresents my position. There really isn’t enough said that is clear and specific to figure out what motivates him. He says things like “the book unfortunately tells us little more than what groups/ideas have earned the author’s vitriolic displeasure.” But this just tells us that I write mostly about ideas I dislike (which is another misrepresentation). He writes that “there is solid evidence that in numerous instances the [author’s] views and claims are based on indirectly obtained, and often wildly off beam, information rather than on solid investigation. In short, we are offered prejudices posing as objectivity.” Unfortunately, Mr. Tyler’s evidence for my “indirectly obtained” views (whatever that might mean) and my “prejudices posing as objectivity” is his claim that I deny the existence of the unconscious mind (which I don’t).
Tyler claims that my book “provides a lengthy, even tedious, pseudo-scientific discussion of how “science” has failed to demonstrate the existence of the “unconscious” mind as described by Freud, Jung and Tart.” He is certainly entitled to claim that the work of Daniel Schacter is pseudoscientific, but he ought at least try to explain what he means by pseudoscientific and why he considers pseudoscientific what everybody else in the psychological community considers scientific.
Tyler says that my “argument presupposes that nothing is “true” until it has been scientifically validated. Which is a bit like arguing that Australia didn’t exist until the first white explorers discovered it.” The analogy is a distraction. I do presuppose that no empirical claim is true until it has been scientifically validated. That’s why I reject Freud’s claims about the unconscious as a reservoir of repressed memories that cause behavioral and mental disorders. The empirical evidence doesn’t support his claim.
I don’t really discuss why I reject the notion of the unconscious mind as a reservoir of transcendent truths in the entry on the unconscious mind. One has to read my entries on Jung and Tart for that argument. Tyler might find it interesting to remember that Jung also rejected Freud’s notion of the unconscious mind. True, Jung didn’t reject it for lack of scientific evidence, since he seemed to be more interested in intuition and anecdotes than in scientific studies. But he rejected it nonetheless.
In my entries on Jung and Tart, I think the careful reader will find that I don’t reject their metaphysical theories of the unconscious because they’re metaphysical and therefore false. I reject them because they’re fuzzy and I don’t find them very useful. They aren’t very clear and they aren’t needed to explain anything. So again, Mr. Tyler has created a straw man. I leave it to the reader to figure out how this straw man is like arguing that Australia didn’t exist until the first white explorers discovered it.
Tyler claims that my book “studiously ignores the fact that “scientific” knowledge is itself a highly moveable feast – what seemed to be proven/disproven yesterday may well turn out to be disproven/proven tomorrow.” More of the straw man here, though I must grant him that “studiously ignores” is an admirable turn of phrase.
Perhaps the key to what motivated Mr. Tyler to misrepresent my arguments and positions can be found in his curious reference to a professor and an experiment.
Fact: numerous experiments carried by Prof. Robert “Pygmalion in the Classroom” Rosenthal have shown that students can accurately predict a teacher’s perceived effectiveness (as rated at the end of a complete semester) on the basis of just three 2 (TWO) second video clips.
So what process do they use to make that evaluation?
How can they be so accurate?
What yardstick(s) are they using to make the evaluation?
We have no idea, because the processing takes place OUTSIDE of the conscious mind.
If I understand Mr. Tyler correctly, he is saying that Rosenthal has solid evidence that a significant percentage of students can accurately predict on the basis of just a two-second video clip what rating the teacher will get from the students at the end of the term. I would have to agree with Tyler that such processing is unconscious and that it has nothing to do with implicit memory, despite Mr. Tyler’s claim that this must be my position (another straw man distortion). [As an aside, while I am not familiar with this particular study, other studies done by Rosenthal have demonstrated the powerful effect of first-impressions on subsequent judgments. My guess is that he would explain the accuracy of the student evaluations as a result of the snap judgments they made. That is, the students prime themselves to find a teacher effective or not by their initial judgments of what the teacher’s going to be like. In short, the student’s snap judgments are self-fulfilling prophecies.]
Tyler concludes with a couple of general, disparaging comments about my book:
This is the sort of book that greatly appeals to dilettante cynics, offering broad grounds for scepticism regarding numerous topics by way of a host of half-baked “facts” which the reader isn’t expected to check out for him/herself.
He doesn’t mention any other specific “half-baked” facts. I suspect this is because his reading of other facts in the book is analogous to his reading of the unconscious mind entry. If anyone did bother to check his claims against what I actually say, Mr. Tyler would be revealed for what he is: one who misrepresents another’s positions and arguments, which he then proceeds to knock over with slam-dunk refutations.
His final comment is most telling:
One measure of a truly useful critique is that BOTH sides (or ALL sides) of the story are presented and compared so that the listener/reader can reach their own conclusions. But don’t worry – you’ll find nothing that open or constructive in this volume.
More straw man. If Tyler will read the introduction to my book he will find that I specifically advise the reader that The Skeptic’s Dictionary is not a “critique” or critical evaluation of all sides of the issues presented. Mine is a book for skeptics, aimed at providing skeptical arguments and references to the best skeptical literature.
As far as I know, I have never corresponded with Mr. Tyler, so I have no idea why he would so misrepresent my book in order to post his “review.” I put review in quotes because he hasn’t reviewed my book. He hasn’t even reviewed the single entry he focuses on. He’s critiqued positions I don’t hold and goals I don’t have. Why? I think the most charitable explanation is that he hasn’t read the whole book and what he has read, he hasn’t read very carefully.
But Mr. Tyler’s review does serve one good purpose: It is an excellent example of the straw man fallacy.
control group studies
Dr. Alan Hirsch claims to be “The World Expert In Smell & Taste.” He is an M.D. – a psychiatrist, in fact – who has developed some magical crystals that will “help you reduce your appetite and food cravings.” You can read all about his crystals, which he calls SprinkleThin, on his website. On July 25, 2005, I found the following testimonial on that website.
Dateline NBC Investigates SprinkleThin
What Dr. Hirsch discovered might surprise you. [Certain smells] seem to control appetite. Dr. Hirsch studied 2,700 people over six months, like the six people we met. They tried just about every diet imaginable. Dr. Hirsch brought along with him these special, non-caloric, scented crystals and asked the six to sprinkle it on their food.
All the participants kept a video diary for Dateline to prove they were using the product. At the end of three months when we checked in on them, they were all losing weight.
What is wrong with Dateline’s investigation? Among other things, Dateline did not have a control group. Dr. Hirsch says he has been studying eating behavior and weight loss for 25 years. He says he has done many studies, but if his studies were like Dateline’s study they are not of much scientific value.
What is a control group and why is having one important?
A well designed study on the diet crystals would use two groups. The group getting the crystals would be called the experimental group. The control group would be a group that, ideally, is identical to the experimental group except that the members of the control group do not use the crystals. The ideal can never be fully achieved with humans, especially when one is doing a study that involves weight loss because (a) weight loss is affected by many factors (motivation, eating behavior, amount of activity – especially exercise – overall health, metabolism, stress, and so on) and (b) experimenters can’t lock up humans in cages to make sure they do what they’re supposed to do for the study. But, at the very least, a well designed scientific study should use a control group and try to match the members of that group to those in the experimental group for factors that might have a significant effect on the outcome. For example, if you were doing a study that was testing whether prayer has an effect on the longevity of patients dying of AIDS, you should make sure that the ages of the subjects in both groups match up. It would not be a fair study to have 60-year-olds in one group and twenty-somethings in the other group.
Having a control group allows the scientist to test a causal hypothesis. In this case, the hypothesis is that SprinkleThin is a significant causal factor in producing weight loss.
Without a control group, a scientist can’t be sure that the diet crystals contributed significantly to the weight loss or, if they did, in what way. The placebo effect may be at work here: dieters may believe these crystals really affect their sense of taste and smell to such a degree that their appetites are suppressed. They may be deceiving themselves, but the crystals help them anyway. However, powdered beetle dung might have had the same effect. The diet scientist doesn’t just want to help people lose weight. If a product works, she wants to know why it works.
Dateline (and Dr. Hirsch) should not just give the crystals to dieters and observe whether they lose weight. They should have a group of similar people who want to lose weight and give them a placebo, a substance that looks like the diet crystals and is ingested in exactly the same way, but which is inert. They should agree to study the two groups for a set length of time, long enough for any diet to show results (several weeks, at least). At the end of the study they would compare the weight loss of the two groups. If the experimental group shows a significantly greater weight loss than the control group, then the scientists have good evidence that the crystals might be effective. There are various reasons why the results of a single study should not be taken as proof of one’s causal hypothesis. We’ll return to this issue later.
Having a control group is necessary but it is not sufficient for having a well-designed control group study. The study must use an adequate number of participants. Six people would not be adequate for a control group study. Several hundred would be a better number. Why? With only six people, all it takes is one participant to do really well to elevate the average of the group significantly above the average of the other group. But this one person’s success might be a fluke. By having a larger sample, the researcher reduces the chances that a few fluky individuals have skewed the results.
Another way to reduce the chances of fluky results is to randomly assign subjects to the control and experimental groups. Randomization is very important to reduce the chances of biasing the samples. If highly motivated folks are placed in the diet crystal group and a bunch of lazy couch potatoes are in the control group, the results of the study would be biased. It is important that a method of true randomization be used, such as a random number table. You might think that assigning all the dark-haired subjects to one group and the light-haired subjects to the other would be sufficient to avoid having biased groups, but you cannot be sure that there is not something about hair color that is related to a person’s weight. It is unlikely, but a scientist should not go with hunches in matters such as randomization.
It is also important that the subjects in this study not know whether they have been given the magic crystals or the placebo. There is much controversy regarding the ethics of deceiving subjects, but from a scientific point of view it might be better if the subjects didn’t even know that the study is about weight loss. If they think, for example, that the study is testing the effectiveness of a new blood pressure medicine, you would eliminate such things as motivation to lose weight or belief that the crystals are appetite suppressants as possible causes of any weight loss achieved. However, many, if not most, scientists argue that it is unethical to deceive participants in scientific studies. The subjects in a study don’t need to be told which group they are in, but they should be told that they have been randomly assigned to their group and that at the end of the study they will be told which group they were in.
In many studies, not only should the subjects be blind to which group they are in for the duration of the study, but the experimenters should also be blind to which group the subjects have been assigned to. Double-blind studies require at least two experimenters, one who assigns the subjects to their groups and one who keeps track of the data. Had Dr. Hirsch done a double-blind study, an assistant might have randomly assigned the subjects to their groups and kept a record of who is in which group. Dr. Hirsh or another assistant might have weighed all the subjects and kept weight records for each participant. After all the data had been collected, Dr. Hirsch would “unblind” the study and the data for the two groups be compared.
The final step in a well-designed study is the analysis of the data. You might think that the scientists should be able to look at the results and see right away whether the crystals did any good. This would only be true if, say, there were hundreds in each group and the experimental group lost 50 pounds each on average, while the control group gained 2 pounds. If the study had been designed properly, such results would be extremely unlikely to be a fluke. But what if the experimental group lost 2% more weight than the control group? Would that be significant? To answer that question, scientists revert to statistical formulae. By some formula, a 2% weight loss might be statistically significant. If, however, a 2% weight loss meant 4 ounces over six weeks, most of us would say that even if this is statistically significant it is not important and not worth the money or the risk to use these crystals. The crystals might have some wicked side effect that hasn’t yet been discovered.
The moral of this story is that while testimonials of six people who use crystals and lose weight might have a powerful effect on a television audience, a critical thinker should recognize that without a well-designed control group study, such testimonials do not have much scientific value.
A critical thinker also knows that information should be put in the proper context, which requires a certain amount of background knowledge. For example, you should know that many well designed scientific studies get significant results that cannot be replicated at all or in a consistent fashion. If there is a causal relationship between diet crystals and losing weight, it should not work sporadically but consistently, unless, of course, there are so many factors that affect body weight as to make it nearly impossible to isolate the true effectiveness of a single item. A single study, no matter how well designed or how significant the results, rarely justifies drawing strong conclusions about causal relationships.
Finally, as mentioned above, there might be some deleterious side effect of these crystals that has not yet been discovered. SprinkleThin might help you lose weight but if it kills you in the process, what have you gained?
The kind of control group study described above is known as a parallel group study. However, as Dr. Gerard Dallal writes: “It takes little experience with parallel group studies to recognize the potential for great gains in efficiency if each subject could receive both treatments. The comparison of treatments would no longer be contaminated by the variability between subjects since the comparison is carried out within each individual.” Such studies are known as crossover studies. They are highly recommended.
How Thinking Goes WrongTwenty-five Fallacies That Lead Us to Believe Weird Thingsby Michael Shermerfrom his 1997 book “Why People Believe Weird Things”(used by kind permission of the author; all rights reserved)
In 1994 NBC began airing a New Age program called The Other Side that explored claims of the paranormal, various mysteries and miracles, and assorted “weird” things. I appeared numerous times as the token skeptic — the “other side” of The Other Side, if you will. On most talk shows, a “balanced” program is a half-dozen to a dozen believers and one lone skeptic as the voice of reason or opposition. The Other Side was no different, even though the executive producer, many of the program producers, and even the host were skeptical of most of the beliefs they were covering. I did one program on werewolves for which they flew in a fellow from England. He actually looked a little like what you see in werewolf movies — big bushy sideburns and rather pointy ears — but when I talked to him, I found that he did not actually remember becoming a werewolf. He recalled the experience under hypnosis. In my opinion, his was a case of false memory, either planted by the hypnotist or fantasized by the man.
Another program was on astrology. The producers brought in a serious, professional astrologer from India who explained how it worked using charts and maps with all the jargon. But, because he was so serious, they ended up featuring a Hollywood astrologer who made all sorts of predictions about the lives of movie stars. He also did some readings for members of the audience. One young lady was told that she was having problems staying in long-term relationships with men. During the break, she told me that she was fourteen years old and was there with her high-school class to see how television programs were produced.
In my opinion, most believers in miracles, monsters, and mysteries are not hoaxers, flimflam artists, or lunatics. Most are normal people whose normal thinking has gone wrong in some way. I would like to … [look] at twenty-five fallacies of thinking that can lead anyone to believe weird things. I have grouped them in four categories, listing specific fallacies and problems in each. But as an affirmation that thinking can go right, I begin with what I call Hume’s Maxim and close with what I call Spinoza’s Dictum.
Skeptics owe a lot to the Scottish philosopher David Hume (1711-1776), whose An Enquiry Concerning Human Understanding is a classic in skeptical analysis. The work was first published anonymously in London in 1739 as A Treatise of Human Nature. In Hume’s words, it “fell dead-born from the press, without reaching such distinction as even to excite a murmur among the zealots.” Hume blamed his own writing style and reworked the manuscript into An Abstract of a Treatise of Human Nature, published in 1740, and then into Philosophical Essays Concerning the Human Understanding, published in 1748. The work still garnered no recognition, so in 1758 he brought out the final version, under the title An Enquiry Concerning Human Understanding, which today we regard as his greatest philosophical work.
Hume distinguished between “antecedent skepticism,” such as René Descartes’ method of doubting everything that has no “antecedent” infallible criterion for belief; and “consequent skepticism,” the method Hume employed, which recognizes the “consequences” of our fallible senses but corrects them through reason: “A wise man proportions his belief to the evidence.” Better words could not be found for a skeptical motto.
Even more important is Hume’s foolproof, when-all-else-fails analysis of miraculous claims. For when one is confronted by a true believer whose apparently supernatural or paranormal claim has no immediately apparent natural explanation, Hume provides an argument that he thought so important that he placed his own words in quotes and called them a maxim:
The plain consequence is (and it is a general maxim worthy of our attention), “That no testimony is sufficient to establish a miracle, unless the testimony be of such a kind, that its falsehood would be more miraculous than the fact which it endeavors to establish.”
When anyone tells me that he saw a dead man restored to life, I immediately consider with myself whether it be more probable, that this person should either deceive or be deceived, or that the fact, which he relates, should really have happened. I weigh the one miracle against the other, and according to the superiority, which I discover, I pronounce my decision, and always reject the greater miracle. If the falsehood of his testimony would be more miraculous than the event which he relates; then, and not till then, can he pretend to command my belief or opinion.( 1952, p. 491)
Problems in Scientific Thinking
1. Theory Influences Observations
About the human quest to understand the physical world, physicist and Nobel laureate Werner Heisenberg concluded, “What we observe is not nature itself but nature exposed to our method of questioning.” In quantum mechanics, this notion has been formalized as the “Copenhagen interpretation” of quantum action: “a probability function does not prescribe a certain event but describes a continuum of possible events until a measurement interferes with the isolation of the system and a single event is actualized” (in Weaver 1987, p. 412). The Copenhagen interpretation eliminates the one-to-one correlation between theory and reality. The theory in part constructs the reality. Reality exists independent of the observer, of course, but our perceptions of reality are influenced by the theories framing our examination of it. Thus, philosophers call science theory laden.
That theory shapes perceptions of reality is true not only for quantum physics but also for all observations of the world. When Columbus arrived in the New World, he had a theory that he was in Asia and proceeded to perceive the New World as such. Cinnamon was a valuable Asian spice, and the first New World shrub that smelled like cinnamon was declared to be it. When he encountered the aromatic gumbo-limbo tree of the West Indies, Columbus concluded it was an Asian species similar to the mastic tree of the Mediterranean. A New World nut was matched with Marco Polo’s description of a coconut. Columbus’s surgeon even declared, based on some Caribbean roots his men uncovered, that he had found Chinese rhubarb. A theory of Asia produced observations of Asia, even though Columbus was half a world away. Such is the power of theory.
2. The Observer Changes the Observed
Physicist John Archibald Wheeler “Even to observe so minuscule an object as an electron, [a physicist] must shatter the glass. He must reach in. He must install his chosen measuring equipment. Moreover, the measurement changes the state of the electron. The universe will never afterward be the same” (in Weaver 1987, p. 427). In other words, the act of studying an event can change it. Social scientists often encounter this phenomenon. Anthropologists know that when they study a tribe, the behavior of the members may be altered by the fact they are being observed by an outsider. Subjects in a psychology experiment may alter their behavior if they know what experimental hypotheses are being tested. This is why psychologists use blind and double-blind controls. Lack of such controls is often found in tests of paranormal powers and is one of the classic ways that thinking goes wrong in the pseudosciences. Science tries to minimize and acknowledge the effects of the observation on the behavior of the observed; pseudoscience does not.
3. Equipment Constructs Results
The equipment used in an experiment often determines the results. The size of our telescopes, for example, has shaped and reshaped our theories about the size of the universe. In the twentieth century, Edwin Hubble’s 60- and 100-inch telescopes on Mt. Wilson in southern California for the first time provided enough seeing power for astronomers to distinguish individual stars in other galaxies, thus proving that those fuzzy objects called nebulas that we thought were in our own galaxy were actually separate galaxies. In the nineteenth century, craniometry defined intelligence as brain size and instruments were designed that measured it as such; today intelligence is defined by facility with certain developmental tasks and is measured by another instrument, the IQ test. Sir Arthur Stanley Eddington illustrated the problem with this clever analogy:
Let us suppose that an ichthyologist is exploring the life of the ocean. He casts a net into the water and brings up a fishy assortment. Surveying his catch, he proceeds in the usual manner of a scientist to systematize what it reveals. He arrives at two generalizations:
(1) No sea-creature is less than two inches long.
(2) All sea-creatures have gills.
In applying this analogy, the catch stands for the body of knowledge which constitutes physical science, and the net for the sensory and intellectual equipment which we use in obtaining it. The casting of the net corresponds to observations.
An onlooker may object that the first generalization is wrong. “There are plenty of sea-creatures under two inches long, only your net is not adapted to catch them.” The ichthyologist dismisses this objection contemptuously. “Anything uncatchable by my net is ipso facto outside the scope of ichthyological knowledge, and is not part of the kingdom of fishes which has been defined as the theme of ichthyological knowledge. In short, what my net can’t catch isn’t fish.” (1958, p. 16)
Likewise, what my telescope can’t see isn’t there, and what my test can’t measure isn’t intelligence. Obviously, galaxies and intelligence exist, but how we measure and understand them is highly influenced by our equipment.
Problems in Pseudoscientific Thinking
4. Anecdotes Do Not Make a Science
Anecdotes — stories recounted in support of a claim — do not make a science. Without corroborative evidence from other sources, or physical proof of some sort, ten anecdotes are no better than one, and a hundred anecdotes are no better than ten. Anecdotes are told by fallible human storytellers. Farmer Bob in Puckerbrush, Kansas, may be an honest, church-going, family man not obviously subject to delusions, but we need physical evidence of an alien spacecraft or alien bodies, not just a story about landings and abductions at 3:00 A.M. on a deserted country road. Likewise with many medical claims. Stories about how your Aunt Mary’s cancer was cured by watching Marx Brothers movies or taking a liver extract from castrated chickens are meaningless. The cancer might have gone into remission on its own, which some cancers do; or it might have been misdiagnosed; or, or, or. What we need are controlled experiments, not anecdotes. We need 100 subjects with cancer, all properly diagnosed and matched. Then we need 25 of the subjects to watch Marx Brothers movies, 25 to watch Alfred Hitchcock movies, 25 to watch the news, and 25 to watch nothing. Then we need to deduct the average rate of remission for this type of cancer and then analyze the data for statistically significant differences between the groups. If there are statistically significant differences, we better get confirmation from other scientists who have conducted their own experiments separate from ours before we hold a press conference to announce the cure for cancer.
5. Scientific Language Does Not Make a Science
Dressing up a belief system in the trappings of science by using scientific language and jargon, as in “creation-science,” means nothing without evidence, experimental testing, and corroboration. Because science has such a powerful mystique in our society, those who wish to gain respectability but do not have evidence try to do an end run around the missing evidence by looking and sounding “scientific.” Here is a classic example from a New Age column in the Santa Monica News: “This planet has been slumbering for eons and with the inception of higher energy frequencies is about to awaken in terms of consciousness and spirituality. Masters of limitation and masters of divination use the same creative force to manifest their realities, however, one moves in a downward spiral and the latter moves in an upward spiral, each increasing the resonant vibration inherent in them.” How’s that again? I have no idea what this means, but it has the language components of a physics experiment: “higher energy frequencies,” “downward and upward spirals,” and “resonant vibration.” Yet these phrases mean nothing because they have no precise and operational definitions. How do you measure a planet’s higher energy frequencies or the resonant vibration of masters of divination? For that matter, what is a master of divination?
Critical thinking can be extremely useful if applied effectively to the practice of business analysis. A key element of critical thinking is being able to spot logical fallacies in your thinking and in conversations with stakeholders. Logical fallacies when recognized, should not be ignored as ensuing decisions can easily lead to inaccuracies. Identifying these fallacies is a first and crucial first step to eliminating them in daily interactions with stakeholders.
Here are 5 logical fallacies to be aware of:
Argument from Authority
This fallacy goes like this: “Manager X believes Y, Manager X speaks from a position of authority, therefore Y is true”.
Stakeholders in most cases, have preconceived notions of where problems exist in an organization. As such, it’s possible for the analyst to encounter several SMEs or authorities who are knowledgeable about problems in the domain or organization. The mere fact that a person is highly placed does not mean that their claims are always true. The analyst should therefore, look beyond posts, experience, reputation and formal degrees to investigate all claims. After all, no matter the amount of experience a person has, he or she can still be wrong.
The same goes for frontline staff; the fact that frontline staff usually have less authority does not mean that they do not have any valuable information to contribute. The key takeaway is to investigate all claims, regardless of the authority (or otherwise) associated with the person.
An analyst says after the first week on a project, “I can tell that this project won’t work, just from observing the attitudes of the end users. They don’t seem to be enthusiastic about this project”.
Hasty generalization is said to have occurred when the analyst rushes to a conclusion before gathering all the facts. Such rushed conclusions are usually based on insufficient or biased evidence. To make a fair and reasonable evaluation of project success, the analyst must have discussed with several stakeholders, immersed themselves in the organization to get a sense of people’s perspectives, and invested significant effort in trying to reduce negative vibes.
Generalizing before getting all the facts right is a fallacy that should be avoided in all situations.
This can be said to have occurred when the analyst oversimplifies reality by reducing it to only two choices.
Let's say a company is having problems keeping up with competition. On closer investigation, it is discovered that a system that allows the company to store, manage and analyze customers' buying patterns would be helpful in improving customer satisfaction and staying ahead of competition.