Clin. exp. Immunol. (1976) 24, 227-229.
The incorrect use of Chi-square analysis for paired data J. I. E. HOFFMAN Cardiovascular Research Institute and Department of Pediatrics, University of California, San Francisco, California, U.S.A.
(Received 11 September 1975)
When results are classified in categories (for example, improved and unimproved, reactive and non-reactive) and the same patients are examined by two different treatments or two different tests, then the correct statistical analysis is by McNemar's test. More often a Chi-square analysis is used; this is not only incorrect but also leads to erroneous conclusions.
INTRODUCTION A recent publication examining the correlation between two immunological tests in a series of people (Catalona et al., 1975) indicates the need for re-examination of the statistical tests used. The authors chose to use a Chi-square analysis, but because this was the wrong statistical test they reached erroneous conclusions. In part of their study, Catalona et al. (1975) examined twenty-eight patients with genitourinary malignancies for their reactivity to DNCB and PHA. The results, given here in Table 1, were analysed as a 2 x 2 contingency or Chi-square table to give x2 6X39 with P< 0-02. The authors therefore concluded that there was a significant difference in reactivity to these two assays. There are two problems with a 2 x 2 Chi-square analysis of this data. The lesser objection is that when the total number of observations is under thirty or when one or more expected cell frequencies are under five for total numbers under fifty then Fisher's exact test should be used in place of the Chi-square test (Langley, 1970; Siegel, 1956; Zar, 1974). Failure to do so may lead to an erroneous probability of rejecting the null hypothesis. The more serious objection is that this data should not be analysed by the Chi-square test at all. The Chi-square tests, including Fisher's exact test, envision complete independence between rows and columns, and this may be expressed more specifically as follows. If members of group A can have one of two characteristics (+ or -), and members of group B can also be classified as + or -, then the Chi-square test examines the null hypothesis that group A and B could have been drawn randomly from the same population. If the experiment had been to divide the patients into two groups, give one group DNCB and expose lymphocytes from the other group to PHA, and then compare the proportions of reactive and non-reactive patients, then a 2 x 2 Chi-square analysis would have been appropriate. Once the patients are paired in that each has two tests done then what is wanted is McNemar's test (Siegel, 1956; Tate & Clelland, 1959; Zar, 1974). In this test the 2 x 2 table is set up, but patients with reactivity in each group (+ +) or no reactivity in either (- -) are excluded from analysis. Therefore in Table 1 the comparison is between four who responded to DNCB but not PHA (+ -) and three who responded to Correspondence: Dr J. I. E. Hoffman, Department of Pediatrics and Cardiovascular Research Institute, University of California, San Francisco, California 94143, U.S.A.
J. I. E. Hoffman TABLE 1. Reactivity to DNCB and PHA
PHA but not DNCB (- +). Clearly there was no significant difference from the null hypothesis that there would be equal numbers with + and +. In practice, if the total in these two groups is ten or more, then they are tested for goodness of fit to a 1: 1 ratio. If the total is under ten, a simple binomial test is done (Tate & Clelland, 1959; Zar, 1974). To bring out the difference between the Chi-square and McNemar's test, consider a trial of two skin lotions in patients with a chronic rash. Each patient has lotion A on one arm and lotion B on the other, and after a suitable time they are assessed as being improved (+) or unaltered (-). The hypothetical observed findings are set out in Table 2. If this were analysed as a Chi-square the differences would be highly significant; numerically x2 would be 26-27. However, if this data has any meaning, it is that among those who improve with lotion B a high proportion also respond to lotion A and similarly most who do not respond to lotion B do not respond to lotion A; in other words, common sense and McNemar's test tell us that there is no difference in effectiveness of the two lotions, yet the Chi-square results imply a significant difference of some kind. The error lies in using a Chi-square analysis for paired values. Some people become concerned that there is something illegal, or at least undesirable, about ignoring the + + or observations. Consider the hypothetical observations in Table 3. By McNemar's test, we compare 10+ and 40- + with the null hypothesis that each should be 25. The probability is