Journal of Experimental Psychology: Learning, Memory, and Cognition 2015, Vol. 41, No. 4, 949 –956

© 2015 American Psychological Association 0278-7393/15/$12.00 http://dx.doi.org/10.1037/xlm0000092

Deductive Updating Is Not Bayesian Henry Markovits, Janie Brisson, and Pier-Luc de Chantal

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Université du Québec a` Montréal One of the major debates concerning the nature of inferential reasoning is between counterexample-based theories such as mental model theory and probabilistic theories. This study looks at conclusion updating after the addition of statistical information to examine the hypothesis that deductive reasoning cannot be explained by probabilistic inferences. In Study 1, participants were given an initial “If P then Q rule” for a phenomenon on a recently discovered planet, told that “Q was true,” and asked to make a judgment of either deductive validity or probabilistic likelihood of the putative conclusion that “P is true.” They were then told the results of 1,000 observations. In the low-probability problem, 950 times P was false and Q was true, whereas 50 times P was true and Q was true. In the high-probability problem, these proportions were inverted. On the low-probability problem, probabilistic ratings and judgments of logical validity decreased. However, on the high-probability problem, probabilistic ratings remained high whereas judgments of logical validity significantly decreased. Confidence ratings were consistent with this different pattern for probabilistic and for deductive inferences. Study 2 replicated this result with another form of inference, “If P then Q. P is false.” These results show that deductive updating is not explicable by Bayesian updating. Keywords: deduction, probabilistic inference, mental models, updating, logical inference

2005; Oaksford & Chater, 2007). Variability related to content can be explained by the effect of stored knowledge on likelihood estimations. In addition, such models allow for a process in which additional information can be used to modify these estimates by Bayesian updating. Thus, such models can elegantly model the content-related variability of human reasoning and its nonmonotonic character. When asked to make a deductive inference, which requires a judgment of validity, people will transform their estimation of conclusion likelihood into a dichotomous judgment of validity. A second category of model focuses particularly on the use of information to generate potential counterexamples. The most influential of these is mental model theory (Johnson-Laird, 2001). Although there are variants, the basic underlying principle is that people will construct internal models (representations) of the premises. If there are counterexamples to a putative conclusion in these models, then this conclusion will be considered to be invalid. The nonmonotonic character of reasoning can be explained by the incorporation of additional information into this internal representation via pragmatic or semantic factors (Johnson-Laird, & Byrne, 2002; Markovits & Barrouillet, 2002). If such additional information generates a counterexample, then a conclusion that was previously considered to be valid will be considered to be invalid. Content-related variation can be explained by similar processes. Each of these theories claims to provide a unique model of inferential reasoning. One of the difficulties in distinguishing between them is that in many cases these different approaches make parallel predictions. This difficulty is increased by the fact that both theories have open parameters, such as the translation between probabilistic evaluations and deductive judgments or how the search for additional models is set off. Although potentially suggestive, existing empirical results simply do not allow resolution of this debate. For example, it is clear that people modify the

The ability to make deductive inferences is one of the most striking examples of advanced human cognition. In fact, some early theories consider that standard propositional logic is a clear model for the cognitive processes underlying inferential behavior (Inhelder & Piaget, 1958). Such logic produces a single valid conclusion (or not) that depends only on the nature of the logical connector in the major premise and the truth value of the minor premise. Thus, logical reasoning is theoretically monotonic (resulting in the same conclusion irrespective of any additional information that might be added to an original inference) and independent of content. Unfortunately, a wide variety of studies have shown that even educated adults produce very inconsistent patterns when responding to deductive reasoning problems. For example, they make inferences that clearly vary according to the specific content of premises (Cummins, Lubart, Alksnis, & Rist, 1991; Markovits & Vachon, 1990; Thompson, 1994). In particular, they are very willing to change inferences when information is added to an original set of premises (Byrne, 1989; Markovits, 1984). Such variability underlies one of the principal debates about the nature of inferential reasoning. Probabilistic theories consider that people’s inferences generate estimations of the likelihood of a given conclusion, with such estimates reflecting stored statistical knowledge about the premises (e.g., Evans, Over, & Handley,

This article was published Online First January 19, 2015. Henry Markovits, Janie Brisson, and Pier-Luc de Chantal, Université du Québec a` Montréal. This study was financed by a Discovery grant from the Natural Sciences and Engineering Research Council of Canada to H. M. Correspondence concerning this article should be addressed to Henry Markovits, Department of Psychology, Université du Québec a` Montréal, C. P. 8888, Succ A, Montréal QC H3C 3P8, Canada. E-mail: [email protected] 949

MARKOVITS, BRISSON, AND DE CHANTAL

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

950

way that they make inferences when explicitly instructed to use deductive or inductive parameters (Heit & Rotello, 2010; Rips, 2001) or as a result of different forms of constraint (Markovits, Forgues, & Brunet, 2010). However, although these results show response variability, they are not sufficient to conclude that the underlying processes are different. In particular, probabilistic theories assume that the process underlying deductive or inductive responses is the same and relies on generating likelihood estimates of conclusion probability. Critically, making a dichotomous judgment of validity requires using some cutoff to translate a likelihood estimate into a validity judgment. It is possible that people might change the cutoff value used to translate a likelihood estimation into a deductive judgment when problem parameters are changed. This could provide a way to explain evidence that appears to show different response patterns under varying instructions. Given the strong parallel between most predictions made by these two theories, resolving this debate requires a different approach. We examine this question by distinguishing between statistical (Bayesian) updating and what we will refer to as “counterexample-based deductive updating.” The former results in changes in conclusion probabilities whereas the latter should produce a change in judgments of deductive validity when new information provides potential counterexamples.

Study 1 As previously mentioned, a major difficulty in distinguishing between these two models is that they often make similar predictions because likelihood evaluations and counterexample reasoning often produce similar patterns of variation. The key to distinguishing between them is to design a situation in which statistical updating produces a change in conclusion probability that is inconsistent with counterexample updating. To do this, we adapted a method used in previous studies (Geiger & Oberauer, 2007; Markovits, Forgues, & Brunet, 2010) to examine conclusion updating based on additional statistical information. The basic paradigm is the following. Participants are asked to make a series of inferences on the basis of unfamiliar causal rules, which are either all deductive, in which they are asked whether the conclusion “P is true” can be logically derived from the premises, or probabilistic, in which they are asked to indicate the probability that “P is true.” Participants are given sets of two repeated inferences. In the initial inference, they are first given a (unfamiliar) causal rule observed on the planet Kronus of the form “if P then Q.” Participants are then told that an observation is made in which Q is found to be true (this is an affirmation of the consequent [AC] inference, for which there is no valid conclusion). They must then make an initial inference about P. Participants are asked to indicate how certain they are about their response. In the updated inference, participants are given the result of 1,000 observations about the conditional rule and asked to make the same AC inference with these new data. Because initial inferences all use a completely unfamiliar rule, with no supporting evidence, these will simply generate baseline inferences under uncertainty (given people’s standard levels of AC responding, this should be between 40% and 60% acceptances and whatever probabilistic evaluation corresponds to this). The key difference between the problem sets is the nature of the observations used to make the updated inference. In the updated version

of the low-probability problem set, participants are then told that there are 1,000 observations in which Q is true. Of these, P is also true for 50 whereas on 950 of them P is not true. In this case, the new information should result in a relative decrease of conclusion probability by Bayesian updating. Because there are potential counterexamples, the probability of the conclusion being accepted after counterexample-based updating should also decrease. Thus, probabilistic and counterexample reasoning will result in parallel tendencies. In fact, this situation mirrors many of the results underlying variability in reasoning and underscores why it is difficult to distinguish between the two theories. The key contrast is provided by the high-probability problem set. In the updated version, participants are told that of 1,000 observations in which Q is true, on 950 of these P is true whereas on the other 50 P is not true. This problem is designed so that statistical updating should result in a relatively high estimation of the probability of the conclusion being true. In other words, the updated probability of P being true should remain at least as high as the initial estimate. If Bayesian updating underlies deduction, then the “P is true” conclusion should be judged to be valid at the same rate as was observed on the initial inference. However, because this same information gives access to potential counterexamples, under counterexample updating, this information should result in a decrease in acceptance of the conclusion that “P is true.” We also examined a second prediction. Previous results show that people have a metacognitive representation of the efficacy of at least some reasoning processes (Thompson, Prowse Turner, & Pennycook, 2011). In fact, there is recent evidence that metacognitive judgments of confidence correlate with abstract logical reasoning (Markovits, Thompson, & Brisson, in press) when making deductive inferences. The idea that people have two inferential systems corresponding to counterexample and statistical reasoning suggests the hypothesis that they should have a metacognitive evaluation of the relative efficacy of the latter that is distinct from evaluations of counterexample-based reasoning. This allows the following predictions. When making deductive judgments, metacognitive evaluations of counterexample-based reasoning will generate a positive correlation between certainty and rejection of the “P is true” conclusion when making deductive inferences for the high-probability and low-probability problems because in both cases the availability of counterexamples should mostly underlie people’s inferences. Evaluation of the efficacy of probabilistic judgments should directly reflect the correspondence between inferences and the nature of the statistical information used in the updated version of each problem. Thus, we predict that when making probabilistic judgments, higher confidence ratings should correspond to lower ratings of conclusion likelihood on the lowprobability problem but that the opposite should be found on the high-probability problem. In other words, we predict that the same information will result in different patterns of inferential updating and its relation to confidence depending on whether inferences are made deductively or probabilistically on the high-probability, but not on the low-probability, problem.

Method Participants. A total of 158 college (Cégep) students (67 males, 91 females: average age ⫽ 20 years 6 months) took part in

DEDUCTIVE UPDATING

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

this experiment. Students were native French speakers and volunteers. Material. Four paper and pencil booklets were prepared. Half of these booklets used only deductive inferences whereas the other half used only probabilistic inferences. Thus, participants were assigned to either a deductive inference condition or a probabilistic inference condition. On the first page of each booklet, participants were asked to give basic demographic information. After this, they were given the following instructions (translated from the original French): Imagine that team of scientists are on an expedition on a recently discovered planet called Kronus. On the following pages, we will ask you to answer the question about phenomena that are particular to this planet. For each problem, you will be given a rule of the form if . . . then that are true on Kronus according to the scientists. It is very important that you suppose that each rule that is presented is always true. You will then be given additional information and a conclusion that you must evaluate. For each problem, you must (1) evaluate the proposed conclusion; (2) indicate on scale between 1 and 7 how you felt at the moment you gave your response (1 ⫽ certain that I was right to 7 ⫽ guessing). Attention. These two tasks constitute two different questions for each problem. A person could evaluate a conclusion while feeling that they were certain, whereas another person could evaluate the conclusion in the same way while feeling that they guessing.

In the first booklet, participants were asked to make only deductive inferences. On the top of the next page, the following instructions were given followed by the initial formulation of the high-probability problem set: For each of the following problems you must consider the statements presented as true and you must indicate whether the proposed conclusion can be drawn logically from the presented information. The Scientists noted that on Kronus: If it thardonnes, then the ground will become sticky.

951

After this, participants were given exactly the same inference as had been presented previously, along with the scale of subjective confidence. Two further problem sets were then presented that followed this same pattern; that is, an initial inference with no information followed by an updated inference based on additional observations. The second problem was a control problem. This presented in an initial rule, “If a trolyte is heated, then it will generate Philoben,” followed by an updated version in which participants were told that of 1,000 observations, 500 times P and Q were both observed to be true whereas 500 times P and Q were both observed to be false. This problem was designed so that it would only indirectly affect the probability that P was true if Q is true because there was no mention of the possibility that Q might be true whereas P was false. The final problem was the low-probability problem. This presented an initial rule “If ⫻45 is given to a plant, the plant will change color,” followed by an updated version in which participants were told that of 1,000 observations, 50 times P and Q were both observed to be true whereas 950 times it was observed that P was false and Q was true. Note that participants responded to all three problem sets. A second version of the deductive problems was also prepared in which the order of the three problem sets was inverted: the low-probability problem set was presented first, followed by the control problem set, and then the high-probability problem set. Two further booklets using probabilistic inferences were then prepared. These were identical to the deductive condition except that instead of being asked for the deductive validity of a putative conclusion, participants were asked to indicate the probability of the conclusion on an 11-point scale going from 0% to 100% in increments of 10. Each participant was asked to make a total of six inferences comprising the initial inference and the updated inference for the low-probability, high-probability, and control problem sets. Procedure. Booklets were randomly distributed to entire classes. Students who wished to participate were told to take as much time as they needed to answer the questions.

Consider the following statements and respond to the question: If it thardonnes, then the ground will become sticky.

Results and Discussion

Observation: The ground is sticky.

An initial analysis established that there was no effect of the order of the problems. We then calculated the percentage of participants that judged that the presented (P is true) conclusion on the initial problems was logically valid, the percentage who did so on the updated problems for the deductive inferences, and the mean conclusion likelihood for the initial and updated probabilistic inferences (see Table 1). Inspection of this table shows that inferential judgments on the initial problem were very similar for all three problem sets. Initial conclusions were accepted a little over half of the time on the deductive problems. Initial conclusion likelihood was approximately .7 on the probabilistic problems. We first examined performance on the deductive problems. We performed a Friedman test with conclusion acceptance or not as the dependent variable for each of the initial and updated inferences for all three problem sets. This showed a significant effect of inference, ␹2(5, 79) ⫽ 65.56, p ⬍ .001, Cramer’s V ⫽ .41. We then compared initial to updated conclusion acceptance for each of the three problem sets using the Wilcoxon signed-ranks test. This

Conclusion: It has thardonned. Indicate whether this conclusion can be drawn logically or not from the statements.

Participants were given a choice between a NO and a YES response. After that, they were asked to indicate how they felt when they gave their response on a 7-point scale of certainty in which 1 was very certain and 7 was guessing. On the top of the next page, participants received the following information, in which updated statistical information for the highprobability problem set was presented: In one of their monthly communiqués, the scientists sent the following supplementary information. They said that they had made 1,000 observations on Kronus. On this, they found that 950 times it had thardonned and the ground became sticky, whereas 50 times it had not thardonned and the ground became sticky.

MARKOVITS, BRISSON, AND DE CHANTAL

952

We then examined confidence ratings. For methodological purposes, lower values on this scale corresponded to higher ratings of confidence. To simplify interpretation of the results, we inverted confidence scores so that higher reported values corresponded to higher rates of confidence. For each version of the three problems, we calculated mean confidence ratings (see Table 2). We then performed an ANOVA with confidence rating as the dependent variable with problem type and updated information as repeated measures and inference type as the independent variable. This showed significant effects of problem type, F(2, 154) ⫽ 6.39, p ⬍ .01, partial ␩2 ⫽ .077, updated information, F(2, 154) ⫽ 8.81, p ⬍ .01, partial ␩2 ⫽ .054, and inference type, F(1, 155) ⫽ 6.68, p ⬍ .02, partial ␩2 ⫽ .041, and an interaction involving Problem Type ⫻ Updated Information ⫻ Inference Type, F(2, 154) ⫽ 5.34, p ⬍ .01, partial ␩2 ⫽ .065. Although there was a tendency for confidence ratings to be lower in the deductive condition, individual differences were not significant. Confidence ratings were significantly lower in the updated versions of the high-probability problems for deductive and probabilistic reasoning. This was also the case for the low-probability problem with deductive reasoning and the control problem with probabilistic reasoning. We then examined correlations between confidence ratings and acceptance or not of the invited conclusion (coded as 1 or 0) on the three updated problems (see Table 3). As can be seen from inspection of this table, correlations between confidence ratings and conclusion acceptance or conclusion probability were very similar on the low-probability and the control problems. In contrast, although the correlation between confidence and conclusion acceptance on the high-probability problem was marginally negative on the deductive inferences, this correlation was significantly positive on the probabilistic inferences, with the difference between the two being significant, z(150) ⫽ 3.81, p ⫽ .001. On the deductive inferences, higher confidence ratings indicated a greater tendency to correctly reject the conclusion on the low- and highprobability problems, unlike the control problem, for which the opposite was true. On the probabilistic inferences, higher confidence ratings were correlated with lower likelihood ratings on the low-probability problem, with the opposite pattern for the highprobability problem. Thus, the relationship between confidence and inferential behavior on the high-probability problem was in opposite directions for deductive and for probabilistic inferences. This is another clear indication that the process underlying the way that the updated information on this problem is treated differs between these two forms of inference.

Table 1 Proportion of Participants Accepting the AC Conclusion on Each of the Three Problem Sets for Deductive Inferences and Mean Likelihood Ratings for Probabilistic AC Inferences

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Problem High-Probability

Low-Probability

Inference type

Initial

Updated

Initial

Updated

Initial

Control Updated

Deductive Probabilistic

.53 .70

.27 .74

.55 .71

.09 .34

.56 .71

.58 .72

showed a significant decrease in conclusion acceptance between initial and updated inferences for the high-probability, z ⫽ 3.16, p ⬍ .002, and for the low-probability, z ⫽ 5.43, p ⬍ .0001, problem sets. No difference was observed for the control problem set. We then performed an analysis of variance (ANOVA) with conclusion likelihood as the dependent variable and inference (initial and updated inferences for all three problem sets) as a repeated measure for the probabilistic problems. This showed a significant effect of inference, F(5, 74) ⫽ 19.03, p ⬍ .001, partial ␩2 ⫽ .563. Post hoc comparisons of initial versus updated likelihoods were done using a Tukey’s test with p ⫽ .05. This showed that the likelihood estimate was significantly lower on the updated than on the initial inference for the low-probability problem set. No significant difference was observed between initial and updated inferences for the high-probability and the control problem set. These results are clearly inconsistent with the idea that some form of probabilistic evaluation underlies deductive and likelihood inferences. For the low-probability problem set, the same updated statistical information produced a relative decrease between initial and updated probabilistic inferences and a relative decrease in judgments of deductive validity between initial and updated deductive inferences. The critical contrast is with the highprobability problem set. In this case, the updated information, which suggested a very high probability of the “P is true” conclusion being true, produced no difference between initial and updated probabilistic inferences (which were if anything slightly higher). If such probabilistic evaluations really underlie deductive inferences, then the same pattern should have been found on the deductive inferences. Instead, there was a very clear decrease in the judgments of deductive validity between the initial and updated deductive inferences.

Table 2 Confidence Ratings of Conclusions on Each of the Three Problems for Deductive and Probabilistic Inferences (Higher Values ⫽ More Confident) Problem High-Probability

Low-Probability

Control

Inference type

Initial

Updated

Initial

Updated

Initial

Updated

Deductive Probabilistic

3.86 (1.89) 3.13 (1.96)

4.26 (1.93) 3.53 (1.87)

3.68 (2.01) 3.34 (2.14)

4.10 (2.04) 3.49 (1.95)

4.19 (1.81) 3.22 (2.25)

4.27 (1.89) 4.25 (1.81)

Note.

Numbers in parentheses are standard deviations.

DEDUCTIVE UPDATING

Table 3 Correlations Between Confidence Ratings and Conclusion Acceptance on the Updated Forms of the High-Probability, the Low-Probability, and the Control Problems for Deductive and Probabilistic Inferences Inference Deductive

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Probabilistic



⬍ .08.

ⴱⴱ

⬍ .05.

ⴱⴱⴱ

Confidence ratings

Inferential performance

High-probability Low-probability Control High-probability Low-probability Control

⫺.205ⴱ ⫺.239ⴱⴱ .289ⴱⴱ .400ⴱⴱⴱ ⫺.300ⴱⴱⴱ .230ⴱⴱ

⬍ .01.

Study 1a The content of the syllogisms used in the initial study were designed to be very unfamiliar. Thus, it is unlikely that the observed effects were limited to the specific content used. Nonetheless, we decided to examine whether the basic effect observed on the high-probability problems would generalize to the two other contents used. Thus, we examined a further sample of 88 university students. Each participant received the initial and the updated inferences identical to the high-probability condition with either deductive or probabilistic inferences. Half of these used the specific content employed in the control problem in Study 1 (“If a trolyte is heated, then it will generate Philoben”) whereas the other half used the specific content employed on the low-probability problem of Study 1 (“If ⫻45 is given to a plant, the plant will change color”).

Results and Discussion The pattern of changes observed was similar to that observed in Study 1. We first performed an ANOVA with conclusion likelihood on the initial and the updated problems as the dependent variable with problem type as the repeated measure and content as a between-subject variable. This showed no significant effects of either problem type or content. Probabilistic evaluations were very similar for the trolyte problems (initial ⫽ 80.0, updated ⫽ 77.1) and the ⫻45 problems (initial ⫽ 74.7, updated ⫽ 75.0). We then examined performance on the deductive problems. We first performed an ANOVA with deductive conclusions on the initial and the updated problems as the dependent variable with problem type as the repeated measure and content as a betweensubject variable. This showed a significant effect of problem type, F(1, 42) ⫽ 13.69, p ⬍ .001. Overall, mean acceptance rates for the initial problems (M ⫽ 0.52) were higher than those for the updated problems (M ⫽ 0.18). The interaction between Problem Type ⫻ Content was not significant, F(1, 42) ⬍ 1. Mean acceptance rates of the deductive conclusions were very similar for both the trolyte problems (Initial ⫽ 0.52, Updated ⫽ 0.22) and the ⫻45 problems (Initial ⫽ 0.53, Updated ⫽ 0.14). We also compared initial to updated conclusion acceptance for each of the two contents using the Wilcoxon signed-ranks test. This showed a significant decrease in conclusion acceptance between initial and updated inferences for the trolyte content, z ⫽ 2.33, p ⬍ .02, and for the ⫻45 content, z ⫽ 2.31, p ⬍ .02.

953

As would indeed be expected, the critical differences in updating patterns between probabilistic inferences and deductive conclusions found in Study 1 were independent of the specific content used.

Study 2 The results of the first study are certainly consistent with the idea that deductive updating does not rely on an underlying probabilistic evaluation. In the following study, we wished to extend these results. First, we extended this basic paradigm to another inferential form, specifically the denial of the antecedent (DA), which corresponds to “P implies Q, P is false,” and does not allow any valid conclusion. In addition, we changed the information available in the initial presentation of the causal rule. The first study simply presented a conditional rule that was to be taken to be true by scientists. It is possible that people’s interpretation of the initial scenario, in the absence of any evidence, might have differed in some qualitative sense from data-based interpretations, leading them to generate an initial evaluation that somehow might be qualitatively undone by subsequent evidence. Thus, we added the existence of a few preliminary observations (numbering four in total) that showed two cases with P and Q and two cases with not-P and not-Q. Although not providing any direct evidence for or against the possibility of not-P and Q, these suggested that the initial conditional relation might tend toward a biconditional rule. In addition, to prevent any possible confounds with additional measures, we eliminated the control condition and the confidence measure. Updated information addressed the probability that not-P and not-Q were more or less strongly associated. Thus, we examined a version of the low-probability problem set (with 980 cases of not-P and Q and 20 cases of not-P and not-Q) and the highprobability problem set (with 20 cases of not-P and Q and 980 cases of not-P and not-Q).

Method Participants. A total of 87 University students (50 males, 37 females: average age ⫽ 28 years 4 months) took part in this experiment. Students were native French speakers and volunteers. Material. Four paper and pencil booklets were prepared. Initial instructions were identical to those of Study 1. As before, half of the booklets asked for only deductive inferences whereas the other half asked for probabilistic inferences. Inferences were all DA. As before, the order of the problem sets was inverted on half of the booklets. In addition, only the low- and high-probability sets were presented. The high-probability problem set was presented as follows (in the deductive condition): The scientists noted that on Kronus: If it thardonnes then the ground will become sticky. In one of their monthly communiqués, the scientific team stated that they had made four preliminary observations on Kronus. These showed that: Two times: it thardonned and the ground became sticky. Two times: it did not thardonne and the ground did not become sticky.

MARKOVITS, BRISSON, AND DE CHANTAL

954

Consider the following statements and respond to the question: If it thardonnes then the ground will become sticky. Observation: It does not thardonne.

Table 4 Proportion of Conclusions Accepted on Each of the Two Problems for Deductive DA Inferences and Mean Likelihood Ratings for Probabilistic DA Inferences

Conclusion: The ground will not become sticky. Indicate whether this conclusion can be drawn logically or not from the statements.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Participants were given a choice between a NO and a YES response. On the following page, participants received the following: Later on, the scientific team stated that they had made 1,000 reliable observations on Kronus. These showed that: 980 times: it did not thardonne and the ground did not become sticky. 20 times: it did not thardonne and the ground became sticky. With this additional information, consider the following statements and respond to the question: If it thardonnes then the ground will become sticky. Observation: It does not thardonne. Conclusion: The ground will not become sticky. Indicate whether this conclusion can be drawn logically or not from the statements.

The low-probability set used the premise “If ⫻45 is given to a plant, the plant will change color.” The initial problem presented the following four observations: Two times: ⫻ 45 was given to a plant and the plant changed color. Two times: ⫻ 45 was not given to a plant and the plant did not change color. The updated problem added the following 1,000 observations: 20 times: ⫻ 45 was not given to a plant and the plant did not change color. 980 times: ⫻ 45 was not given to a plant and the plant changed color. Thus, all participants responded to all four of the DA inferences.

Results and Discussion An initial analysis established that there was no effect of the order of the problems. We then calculated the percentage of participants that judged that the presented (Q is false) conclusion on the initial problems was logically valid, the percentage who did so on the updated problems for the deductive inferences, and the mean conclusion likelihood for the initial and updated probabilistic inferences (see Table 4). Inspection of this table shows that the pattern of responses was very similar to that found in the initial study. We first examined performance on the deductive problems. We performed a Friedman test with conclusion acceptance or not as the dependent variable for each of the initial and updated inferences for all two problem sets. This showed a significant effect of inference, ␹2(3, 43) ⫽ 24.57, p ⬍ .001, Cramer’s V ⫽ .44. We then compared initial to updated conclusion acceptance for each of

High-Probability

Low-Probability

Inference type

Initial

Updated

Initial

Updated

Deductive Probabilistic

.56 .59

.16 .64

.56 .65

.21 .33

the two problem sets using the Wilcoxon signed-ranks test. This showed a significant decrease in conclusion acceptance for the low-probability, z ⫽ 2.69, p ⬍ .007, and for the high-probability, z ⫽ 3.40, p ⬍ .0001, problem sets. We then performed an ANOVA with conclusion likelihood as the dependent variable and inference (initial and updated inferences for both problem sets) as a repeated measure for the probabilistic problems. This showed a significant effect of inference, F(3, 40) ⫽ 10.08, p ⬍ .003, partial ␩2⫽ .200. Post hoc comparisons of initial versus updated likelihood were done using a Tukey’s test with p ⫽ .05. This showed that the likelihood estimate was significantly lower on the updated than on the initial inference for the low-probability problem. The difference was not significant for the high-probability problem. These results show exactly the same pattern as that observed in Study 1. Additional statistical information strongly suggesting that not-P was only weakly associated with not-Q produced a significant decrease in the estimated probability that not-Q was true and a concomitant decrease in judgments of the deductive validity of the not-Q conclusion. By contrast, statistical information suggesting that not-P was very strongly associated with not-Q produced no difference in the estimated probability that not-Q was true, but it did produce a significant decrease in judgments of deductive validity.

General Discussion Whether inferential reasoning corresponds to a single underlying form of probabilistic evaluation (e.g., Evans et al., 2005; Oaksford & Chater, 2007) or to counterexample-based models (e.g., Johnson-Laird & Byrne, 2002) is one of the critical debates in reasoning. In these studies, we addressed this question by examining inferential updating. This relies on the idea that when people make an initial inference, they will modify this when given additional information. Our starting point is the idea that, although probabilistic models do not clearly indicate exactly how people might translate a likelihood estimate of a putative conclusion into a judgment of deductive validity, these models do allow the general conclusion that changes in validity must parallel changes in conclusion likelihood (Evans, Over, & Handley, 2003; Oaksford & Chater, 2007). Accordingly, we compared explicitly probabilistic inferences and inferences about deductive validity for unknown causal conditional rules. Each problem started with an initial scenario presenting an if-then conditional rule on a fictitious planet to eliminate any effects of stored knowledge. Participants were given either AC inferences in Study 1 (P then Q, Q is true) or DA

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

DEDUCTIVE UPDATING

inferences in Study 2 (P then Q, P is false). In Study 1 the initial conditional rule was presented with no additional evidence whereas in Study 2 there was some preliminary evidence consistent with the rule being a biconditional. Before further discussion, it is useful to note that the initial rates of endorsement of the validity of the AC inference and the DA inference are very similar to what is usually observed. In addition, the initial probabilistic evaluations are very similar. Given that this was found in both conditions, it is reasonable to suppose that this reflects the way that people normally interpret such conditional rules. We then presented additional statistical information that updated people’s knowledge about the probability that the putative conclusion was true. In the low-probability problem set, the information suggested that this probability was very low whereas in the highprobability set this suggested that the probability of the putative conclusion was very high. Critically, in both problem sets, the updated information contained potential counterexamples. For the low-probability problems, updated information generated both a decreased likelihood evaluation of the conclusion and decreased rates of acceptance of the conclusion. In other words, these results mirrored the clear parallel between probabilistic and counterexample models and illustrate the difficulty in dissociating the two. The critical comparison involved the high-probability problem sets. For these, judgments of validity decreased significantly between the updated and initial inferences, but probability estimates did not decrease (and showed a tendency to increase). In other words, updated estimates of probability were consistent with the presented statistical information and with a process of Bayesian updating. By contrast, deductive updating was inconsistent with the statistical information but instead reflected the presence of potential counterexamples. These results show that the presence of potential counterexamples has a specific effect that is not explicable by parallel changes in probabilistic evaluations of conclusions. In addition, results of Study 1, for which participants gave confidence ratings, showed a similar dissociation. On the lowprobability problem, participants were more confident when rejecting the deductive conclusion and when giving the lower likelihood ratings of the same conclusion. Once again, this pattern differed for the high-probability problem set. In this case, increased confidence ratings corresponded to higher evaluations of conclusion likelihood but corresponded to increased rejection of the deductive conclusion. Thus, both studies show a clear dissociation between statistical updating and deductive updating in the specific situation in which updated information indicates a high probability of a conclusion being true, but it also indicates the presence of at least some potential counterexamples. These results provide strong evidence that a single underlying form of probabilistic evaluation is not sufficient to explain judgments of deductive validity. Of course, there remains the possibility that some adaptation of probabilistic models might be able to account for these results. For example, a signal detection model has been used to analyze the effects of belief on reasoning with familiar content (Heit & Rotello, 2014; although, see also Trippas, Handley, & Verde, 2013). Although not directly applicable to the present forms of reasoning, this analysis suggests that people have nonlinear receiver operating characteristic (ROC) functions that define their tendency to accept or reject a conclusion with the same form of reasoning process. This argument has been used to reinterpret conclusions based on

955

comparisons between different forms of reasoning because changes in logical form or conclusion believability could modify ROC functions and thus generate the appearance of differences in reasoning that might in fact be only differences in response tendencies. However, in the present studies, we have specifically examined only one single form of reasoning in each study, and unless it is claimed that processing statistical data somehow changes response functions, when all else is the same, it is difficult to see how this kind of model could account for the observed results. Likewise, a threshold model that translates probabilities into deductive judgments (Oberauer, 2006) could potentially account for these results. However, for such a model to do so, it would be necessary to suppose that whatever threshold is used changes between initial and updated inferences, with a much higher threshold on the latter. Therefore, similar to a signal detection model, explaining the pattern of results found on the high-probability problem sets by a probabilistic model would require being able to justify changing basic model parameters for what are identical forms of inference. Thus, the results of this study provide clear evidence that deductive updating does not use statistical information in the same way as that involved in straightforward Bayesian updating. They are certainly consistent with the idea that reasoning based on counterexamples is a distinct process from that suggested by probabilistic reasoning models. The fact that counterexamples have been found to underlie even very young children’s reasoning reinforces the critical role that counterexamples have in deductive reasoning (Markovits et al., 1996; Markovits & Thompson, 2008). In addition, these results show that metacognitive evaluations of probabilistic inferences differ from that of counterexample-based inferences in ways that are similar to the updating results. Thus, these results add more credence to the idea that understanding inferential reasoning requires a dual strategy approach (Markovits, Brunet, Thompson, & Brisson, 2013; Markovits, Forgues, & Brunet, 2012; Verschueren, Schaeken, & d=Ydewalle, 2005), in which both statistical and counterexample-based reasoning are considered as different modalities of inferential reasoning. However, it should be noted that a single counterexample-based model could explain performance on the deductive problems in these studies, although any extension to probabilistic inferences would face the same inconsistencies.

References Byrne, R. M. (1989). Suppressing valid inferences with conditionals. Cognition, 31, 61– 83. http://dx.doi.org/10.1016/0010-0277(89)90018-8 Cummins, D. D., Lubart, T., Alksnis, O., & Rist, R. (1991). Conditional reasoning and causation. Memory & Cognition, 19, 274 –282. http://dx .doi.org/10.3758/BF03211151 Evans, J. S. B., Over, D. E., & Handley, S. J. (2005). Suppositions, extensionality, and conditionals: A critique of the mental model theory of Johnson-Laird and Byrne (2002). Psychological Review, 112, 1040 – 1052. http://dx.doi.org/10.1037/0033-295X.112.4.1040 Geiger, S. M. O., & Oberauer, K. (2007). Reasoning with conditionals: Does every counterexample count? It’s frequency that counts. Memory & Cognition, 35, 2060 –2074. http://dx.doi.org/10.3758/BF03192938 Heit, E., & Rotello, C. M. (2010). Relations between inductive reasoning and deductive reasoning. Journal of Experimental Psychology: Learn-

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

956

MARKOVITS, BRISSON, AND DE CHANTAL

ing, Memory, and Cognition, 36, 805– 812. http://dx.doi.org/10.1037/ a0018784 Heit, E., & Rotello, C. M. (2014). Traditional difference-score analyses of reasoning are flawed. Cognition, 131, 75–91. http://dx.doi.org/10.1016/ j.cognition.2013.12.003 Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to adolescence. New York, NY: Basic Books. http://dx.doi .org/10.1037/10034-000 Johnson-Laird, P. N. (2001). Mental models and deduction. Trends in Cognitive Sciences, 5, 434 – 442. http://dx.doi.org/10.1016/S13646613(00)01751-4 Johnson-Laird, P. N., & Byrne, R. M. J. (2002). Conditionals: A theory of meaning, pragmatics, and inference. Psychological Review, 109, 646 – 678. http://dx.doi.org/10.1037/0033-295X.109.4.646 Markovits, H. (1984). Awareness of the “possible” as a mediator of formal thinking in conditional reasoning problems. British Journal of Psychology, 75, 367–376. http://dx.doi.org/10.1111/j.2044-8295.1984 .tb01907.x Markovits, H., & Barrouillet, P. (2002). The development of conditional reasoning: A mental model account. Developmental Review, 22, 5–36. http://dx.doi.org/10.1006/drev.2000.0533 Markovits, H., Brunet, M.-L., Thompson, V., & Brisson, J. (2013). Direct evidence for a dual process model of deductive inference. Journal of Experimental Psychology: LMC, 39, 1213–1222. http://dx.doi.org/ 10.1037/a0030906 Markovits, H., Forgues, H. L., & Brunet, M.-L. (2010). Conditional reasoning, frequency of counterexamples, and the effect of response modality. Memory & Cognition, 38, 485– 492. http://dx.doi.org/10.3758/ MC.38.4.485 Markovits, H., Forgues, H. L., & Brunet, M.-L. (2012). More evidence for a dual-process model of conditional reasoning. Memory & Cognition, 40, 736 –747. http://dx.doi.org/10.3758/s13421-012-0186-4 Markovits, H., & Thompson, V. (2008). Different developmental patterns of simple deductive and probabilistic inferential reasoning. Memory & Cognition, 36, 1066 –1078. http://dx.doi.org/10.3758/MC.36.6.1066

Markovits, H., Thompson, V. A., & Brisson, J. (2014). Metacognition and abstract inferential reasoning. Memory and Cognition. Advance online publication. http://dx.doi.org/10.3758/s13421-014-0488-9 Markovits, H., & Vachon, R. (1990). Conditional reasoning, representation, and level of abstraction. Developmental Psychology, 26, 942–951. http://dx.doi.org/10.1037/0012-1649.26.6.942 Markovits, H., Venet, M., Janveau-Brennan, G., Malfait, N., Pion, N., & Vadeboncoeur, I. (1996). Reasoning in young children: Fantasy and information retrieval. Child Development, 67, 2857–2872. http://dx.doi .org/10.2307/1131756 Oaksford, M., & Chater, N. (2007). Baysian rationality. Oxford, United Kingdom: Oxford University Press. http://dx.doi.org/10.1093/acprof: oso/9780198524496.001.0001 Oberauer, K. (2006). Reasoning with conditionals: A test of formal models of four theories. Cognitive Psychology, 53, 238 –283. http://dx.doi.org/ 10.1016/j.cogpsych.2006.04.001 Rips, L. J. (2001). Two kinds of reasoning. Psychological Science, 12, 129 –134. http://dx.doi.org/10.1111/1467-9280.00322 Thompson, V. A. (1994). Interpretational factors in conditional reasoning. Memory & Cognition, 22, 742–758. http://dx.doi.org/10.3758/ BF03209259 Thompson, V. A., Prowse Turner, J. A., & Pennycook, G. (2011). Intuition, reason, and metacognition. Cognitive Psychology, 63, 107–140. http:// dx.doi.org/10.1016/j.cogpsych.2011.06.001 Trippas, D., Handley, S. J., & Verde, M. F. (2013). The SDT model of belief bias: Complexity, time, and cognitive ability mediate the effects of believability. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 1393–1402. http://dx.doi.org/10.1037/a0032398 Verschueren, N., Schaeken, W., & d’Ydewalle, G. (2005). Everyday conditional reasoning: A working memory-dependent tradeoff between counterexample and likelihood use. Memory & Cognition, 33, 107–119. http://dx.doi.org/10.3758/BF03195301

Received April 2, 2014 Revision received October 17, 2014 Accepted October 20, 2014 䡲

E-Mail Notification of Your Latest Issue Online! Would you like to know when the next issue of your favorite APA journal will be available online? This service is now available to you. Sign up at http://notify.apa.org/ and you will be notified by e-mail when issues of interest to you become available!

Deductive updating is not Bayesian.

One of the major debates concerning the nature of inferential reasoning is between counterexample-based theories such as mental model theory and proba...
93KB Sizes 2 Downloads 7 Views