Dichotomous versus polytomous response options in psychopathology assessment: method or meaningful variance?

© 2014 American Psychological Association 1040-3590/15/$ 12.00 http://dx.doi.org/10.1037/pas0000044

Psychological Assessment 2015, Vol. 27, No. 1, 184-193

Dichotomous Versus Polytomous Response Options in Psychopathology Assessment: Method or Meaningful Variance? Jacob A. Finn and Yossef S. Ben-Porath

Auke Tellegen

Kent State University

University of Minnesota

In previous studies, researchers have examined the optimal number of response options for psychological questionnaires. Several have reported increased scale score reliabilities, but few have documented improved external validities. In the present investigation, we followed-up on Cox (2011) and Cox et al.’s (2012) extensive analyses of a clinical assessment instrument, the Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF). We compared the dichotomous (true/false) response format of this inventory with a 4-choice format. Our sample consisted of 406 undergraduate students from a large Midwestern university who were largely female (64.3%), predominantly Caucasian (76.4%), and had a mean age of 19.24 years. Internal-structural analyses confirmed that more response options increase reliabilities, but the effects were small. The differences between correlations with external criteria were very rarely statistically significant, and the few that were did not consistently favor either format. We recommend that in future response-format investigations the internal-structural analyses continue to be combined with evaluations of relevant external correlations. Keywords: response options, questionnaires, Minnesota Multiphasic Personality Inventory-2-RF, psychopathology, personality assessment

increase. However, other studies have found negligible changes in internal consistency and other psychometric properties with in creased response options (e.g., Aiken, 1983; Komorita & Graham, 1965). When comparing different response format options, reliability and validity are the metric of interest. Loevinger (1957) empha sized three domains for establishing construct validity for an objective psychological test: a substantive content component, an internal structural component, and an external prediction compo nent. Item pool content typically addresses the first component, and in studies manipulating response format, items are not altered. Internal consistency analyses and confirmatory factor analyses are related to the second component, demonstrating the degree of covariance between item responses and confirming fidelity to the specified scoring method. The third component requires the test to be examined in relation to external criteria; however, this is typi cally neglected in response option studies. Thus, studies focusing solely on reliability and factorial validity provide incomplete as sessments of the impact of alternative response options by neglect ing to explore whether alternative response options have differen tial validity impact. Several studies have examined the impact of response format on the Minnesota Multiphasic Personality Inventory (MMPI) family of instruments, the most widely used measure of psychopathology (Camara, Nathan, & Puente, 2000). Using an undergraduate stu dent sample, Cox et al. (2012) examined the effects of altering the response options from the traditional true/false format to a 4-point (very true, mainly true, slightly true, and false) format for a subset of 225 items from the second edition of the MMPI, the MMPI-2. The authors administered the MMPI-2 items twice using both response formats, in the same session, along with two scales from the Multidimensional Personality Questionnaire (MPQ; Tellegen,

Personality and psychopathology assessment instruments in clude various response options, ranging from unstructured, openended performance-based measures to more restrictive, forcedchoice options in self-report inventories. The latter differ in the number of response options available to test-takers. The number of response options ranges from two in the traditional true/false response format to (typically) four or five response options on instruments that utilize a rating scale format. Studies of the relative utility of varying numbers of response options have commonly used two methods: collecting responses to the same items using varying numbers of options (e.g., Aiken, 1983; Chang, 1994; Komorita & Graham, 1965) or simulation studies in which, along with the number of response options, scale length, interitem cor relation, and other properties, can be manipulated (e.g., Bandalos & Enders, 1996; Lozano, Garcia-Cueto, & Muniz, 2008). Review ing this literature, Lozano et al. (2008) noted that many studies have demonstrated improved internal consistency, often using Cronbach’s alpha, and factorial validity, typically using confirma tory factor analysis, when the number of response categories

This article was published Online First November 3, 2014. Jacob A. Finn and Yossef S. Ben-Porath, Department of Psychological Sciences, Kent State University; Auke Tellegen, Department of Psychol ogy, University of Minnesota. Yossef S. Ben-Porath and Auke Tellegen are paid consultants to the MMPI publisher; the University of Minnesota Press; and distributor, Pear son Assessments. They receive royalties on sales of MMPI-2-RF materials and research grants from the MMPI-2-RF publisher. Correspondence concerning this article should be addressed to Jacob A. Finn, Department of Psychological Sciences, Kent State University, P.O. Box 5190, Kent, OH 44242. E-mail: [email protected] 184

DICHOTOMOUS AND POLYTOMOUS RESPONSE OPTIONS

1982). Cox et al. (2012) found that raw scores on the same MMPI-2 Restructured Clinical (RC) Scale, derived from the alter native response formats, had an average correlation of .83. The authors reported an average internal consistency increase of .051 and an average interitem correlation increase of .041 for the extended response version. Using the information provided, we tested the statistical significance of these differences and calcu lated effect sizes. The RC Scale alpha increases were statistically significant, except for Antisocial Behaviors (RC4), Ideas of Per secution (RC6), and Aberrant Experiences (RC8). The six signif icant alpha increases were small effects. Regarding validity, Cox et al. (2012) found an increased convergent con-elation between Low Positive Emotions (RC2) and MPQ Well-being (i.e., —.66 to —.69), but a decreased convergent correlation between Ideas of Persecution (RC6) and MPQ Alienation (i.e., .59 to .55). These correlation differences were not statistically significant. Thus, al though Cox et al. (2012) found significant increases in internal consistency, they were not paired with improvements in validity coefficients, raising a question as to whether meaningful, construct-valid variance was added by using the four-choice re sponse format. Cox (2011) followed up on Cox et al. (2012)1 by adminis tering the entire MMPI-2-RF in traditional and the same 4-point response format in a single session 1 week after administering seven MPQ scales. Regarding administration and participant perception, Cox reported that participants spent significantly less time completing the traditional form compared with the extended response form (completion time: traditional M = 30.4 min, SD = 7.5 min; extended response M = 33.4 min, SD = 7.2 min; Cohen’s d = 0.41), and that they perceived the traditional MMPI-2-RF as significantly easier to complete than the ex tended one (Cohen’s d = 0.78). However, participants believed they were better able to describe themselves with the extended response form compared with the traditional one (Cohen’s d = 1.08). The same scales calculated from the different response formats correlated, on average, .80 to .85 with their counterpart. For the extended response version, the substantive scale scores had an alpha coefficient, on average, .050 to .075 higher than the one found for the traditional format. The average interitem correlation for the extended response version substantive scale sets were, on average, .04 to .07 higher than for the traditional MMPI-2-RF. On the basis of the reported results, we tested for statistical significance of these differences and calculated effect sizes. Seven scales did not produce statistically significant alpha differences: Malaise (MLS), Neurological Complaints (NUC), Stress/Worry (STW), Anxiety (AXY), Multiple Spe cific Fears (MSF), Disaffiliativeness (DSF), and MechanicalPhysical Interests (MEC). Of the 35 significant differences in Cronbach’s alpha, only six had small effect sizes, and the remaining differences were less than small (8 < .20). Using the MPQ scales as criterion measures, convergent validity correla tions were slightly stronger for the traditional true/false re sponse format than for the extended response format, with an average absolute correlation difference of .019. None of the correlation differences reported by Cox (2011) were statisti cally significant. As with Cox et al. (2012), the small increase in internal consistency did not lead to increases in validity coefficients.

185

The findings of Cox (2011) and Cox et al. (2012) of modestly higher internal consistencies with no concomitant improvement in validity may appear counterintuitive; however, it can be reconciled and understood in the context of Streiner’s (2003) discussion of the roles of random and systematic error in scale score reliability and validity. Streiner (2003) noted that in classical test theory, ob served test score variance is composed of true score variance, random error variance (i.e., the unreliable component of the ob served score), and systematic error variance (i.e., nonrandom scale score variance that is construct-irrelevant). Improved reliability without concomitant improvement in validity can occur if the variance added by the expanded response options contributes systematic error variance to the resulting scale scores. For exam ple, if some respondents tend to systematically provide extreme item responses (response options 1 or 4) or mainly midlevel responses (options 2 or 3), regardless of their standing on the targeted construct, this will contribute systematic (reliable) vari ance to the scale score without improving its validity. Cox (2011) and Cox et al. (2012) provided important and clinically meaningful contributions to determining the relative merits of true/false versus extended, rating scale response for mats. Nonetheless, further clarification of the response format issue is needed. For example, the impact of including possibly invalid test protocols can be addressed. Because of item pool restrictions, Cox et al. (2012) were unable to exclude cases based on inconsistent responding, which is common in college samples (Berry et al., 1992). Cox (2011) used the sample’s means to calculate validity cutoffs for the extended response version. Without a normative reference, it is unclear how ef fective the validity scale cutoffs they used were in eliminating invalid test protocols. In addition, fatigue and repetition may have influenced the previous results because the traditional MMPI and the extended response version were administered in the same session. Individuals may have attempted to appear consistent across the single session (e.g., Knowles, 1988) and more readily recalled their responses to prior items of similar (or identical) content. Further expansion of the response format investigation could include a range of external criterion mea sures for validity analyses. For example, similar to the MMPI, the MPQ uses a dichotomous response format. Use of criteria that rely on either dichotomous or polytomous response formats would allow for a more detailed examination of the validity of MMPI scale scores derived from administration of polytomous responses. The purpose of the current study was to extend the response format literature by administering the MMPI-2-RF using the tra ditional and an extended response format in two sessions across a 1-week period, along with several criterion measures that use either dichotomous or polytomous response formats. Consistent with previous studies, we expected to find increased internal consistency coefficients when using an extended response format with the MMPI-2-RF. Our primary focus was on examining whether with a broader set of criteria we too would find no validity ' Cox (2011) is a follow-up to the Cox et al. (2012) study. Because of publication schedules, the 2012 study was published after the 2011 disser tation defense.

FINN, BEN-PORATH, AND TELLEGEN

186

differences between MMPI-2-RF scores derived from a dichoto mous versus polytomous response format.

Method Participants Four hundred and six undergraduate students from a large Midwestern university were recruited from the psychology depart ment’s subject pool for the current study. The sample was largely female (64.3%), predominantly Caucasian (76.4%), and had a mean age of 19.24 years. To be included in the subsequent anal yses, participants were required to have completed a traditional (true/false) MMPI-2-RF and an alternative (four-choice) version of the inventory (described next). From the total sample, 336 indi viduals (82.8%) completed both versions of the instrument. Participants also were excluded from analyses if they produced an invalid protocol on the basis of validity scale scores for either the traditional, true/false MMPI-2-RF or an extended, 4-point version of the inventory. For the purpose of scoring the MMPI2-RF Validity Scales of the extended response version, the indi vidual item responses were dichotomized. Any response of “def initely true” or “mostly true” was treated like a “true” response; conversely, any response of “definitely false” or “mostly false” was considered ‘Talse.” The MMPI-2-RF Validity Scales were calculated in their traditional manner with the dichotomized re sponses. Protocols with a Cannot Say (CNS) score >18, a Variable Response Inconsistency (VRIN-r) or True Response Inconsistency (TRIN-r) score £80, an Infrequent Responses (F-r) score = 120, or an Infrequent Psychopathology Responses (Fp-r) score £ 100 were considered invalid. This resulted in the exclusion of 101 participants (30.1% of participants who attended both sessions). The exclusion procedures resulted in the removal of 42.1 % of the 406 participants; an uncommonly large proportion that requires explanation and consideration. Our procedure required participants to attend two research sessions and to produce valid MMPI-2-RF protocols both times. Of those who attended both sessions, 17.0% produced an invalid MMPI-2-RF protocol in their first session, which is consistent with the single-session invalidity rate reported by Sellbom and Ben-Porath (2005). Twenty-five percent of par ticipants produced invalid protocols when tested a second time. Our subjects may have been less motivated in the later session, resulting in higher invalidity rates. To examine whether our final sample was meaningfully different from other undergraduate stu dent samples used for MMPI research, we calculated uniform T scores for the RC Scales for both response format administrations in our final sample of participants. To calculate RC Scale uniform T scores for the extended response administration, we used the same dichotomization process described earlier for the Validity Scales. The resulting RC Scale scores were similar to ones re ported by Osberg, Haseley, and Kamas (2008), indicating that the substantial attrition in our study does not appear to have had a systematic impact on the representativeness of the data from the resulting sample. In addition, it should be noted that although we used dichotomization of the four-choice response format to iden tify invalid protocols and to compare our final sample to other research samples, the subsequent analyses do not dichotomize any validity or substantive scales.

There were no age differences based on protocol validity, but there were gender differences (males more likely invalid than females; 4> = —.22, p < .001) and racial differences (nonCaucasians more likely invalid than Caucasians; = —.13, p = .016) associated with protocol validity. Although statistically sig nificant, both of these demographic differences represent relatively small effect sizes.

Measures MMPI-2-RF. The MMPI-2-RF consists of 338 items to which a test-taker responds in a true/false format. The test comprises 51 scales, including 9 Validity Scales, 3 Higher-Order Scales, 9 RC Scales, 23 Specific Problems (SP) Scales, 2 Interest Scales, and 5 Personality Psychopathology-Five (PSY-5) Scales. Information about the reliability and validity of MMPI-2-RF scores is provided in the technical manual for the instrument (Tellegen & Ben-Porath, 2008/2011). MMPI-2-RF, Extended Response Format. The current study’s response options were intended to be distinctive to opti mize any advantage that a rating scale may offer. Using the traditional MMPI-2-RF booklet, the extended response format altered participants’ response options to a 4-point format: Defi nitely True, Mostly True, Mostly False, and Definitely False. “Definitely” and “mostly” were used to reduce potential ambigu ity. For instance, instruments that include both “somewhat true” and “somewhat false” as response options (e.g., Personality Inven tory for DSM-5 (PID-5); Krueger, Derringer, Markon, Watson, & Skodol, 2012) require the test-takers to select among nonmutually exclusive alternatives, which may result in greater complexity and poorer discrimination. In addition, more distinctive middle options may deter responders from engaging in the error of central ten dency (i.e., endorsing central ratings to avoid extreme judgments; Guilford, 1954). Along the same lines, the decision to use balanced responses (i.e., two true options and two false options) was made to avoid potential bias introduced by unbalanced response options (e.g., three true options and one false option), such as the one used by Cox (2011) and with the Personality Assessment Inventory (PAI; Morey, 2007), particularly when planning multiple admin istrations. Indeed, Cox et al. (2012) hypothesized that some of their unanticipated results might have been due to administration order effects and unbalanced response options, with participants taking the four-choice (three true and one false option) administration first possibly being more likely to respond true during the latter dichotomous version administration. In addition, it is unclear how the use of unbalanced response options may differentially influ ence true-keyed and false-keyed items, offering three gradations for the former but only one option for the latter. To score the extended response format, true-keyed items were coded by assigning 0 points for “definitely false,” 1 point for “mostly false,” 2 points for “mostly true,” and 3 points for “def initely true.” A reverse coding was used for false-keyed items. Scales were scored by summing the assigned points. Big Five Inventory. The Big Five Inventory (BFI; John, Donahue, & Kentle, 1991) is a 44-item questionnaire that assesses the domains of the Five Factor model, specifically Neuroticism, Extraversion, Openness to Experience, Conscientiousness, and Agreeableness. The BFI uses a 5-point response format, ranging from “disagree strongly” to “agree strongly” with a neutral option.

DICHOTOMOUS AND POLYTOMOUS RESPONSE OPTIONS Internal consistency coefficients ranged from .78 (Agreeableness) to .84 (Extraversion) in the current sample. Emotionality, Activity, Sociability, and Impulsivity Scale. The Emotionality, Activity, Sociability, and Impulsivity (EASI) scale (Buss & Plomin, 1984) is a 25-item questionnaire that assesses broad personality characteristics, including Activity, So ciability, Impulsivity, Fear, Distress, and Anger scales. An Emo tionality scale is also calculable, which combines fear, distress, and anger. The EASI uses a 5-point response format, ranging from “not typical” to “typical.” No anchor is given for the middle value. Internal consistency coefficients ranged from .46 (Impulsivity) to .78 (Distress) in the current sample. Structured Clinical Interview for DSM-IV Axis II (SCIDII)-Personality Questionnaire. The SCID-II-Personality Ques tionnaire (SCID-II-PQ; First, Gibbon, Spitzer, Williams, & Ben jamin, 1997) is a 119-item questionnaire that assesses symptoms of personality psychopathology, including the 10 personality disor ders of the DSM-TV and DSM-5. The SCID-II-PQ uses a dichot omous response option (yes/no). Internal consistency coefficients for the various symptom counts ranged from .40 (Schizoid) to .79 (Borderline). Sensation Seeking Scale. The Sensation Seeking Scale (SSS-V; Zuckerman, Eysenck, & Eysenck, 1978) is a 40-item questionnaire that assesses various types of impulsive and risk taking behaviors, including Thrill and Adventure Seeking, Expe rience Seeking, Disinhibition, and Boredom Susceptibility. The SSS-V uses a dichotomous, forced-choice response format, offer ing respondents two situations and asking them to indicate which most closely matches their feelings and preferences. Internal con sistency coefficients ranged from .53 (Experience Seeking) to .72 (Thrill and Adventure Seeking) in the current sample. Machiavellianism scale. The Machiavellianism scale (MACH-IV) Christie & Geis, 1970) is a 20-item questionnaire that assesses manipulative personality traits that are associated with narcissistic and psychopathic personality (Paulhus & Williams, 2002). The MACH-IV uses a 6-point response format, ranging from “disagree strongly” to “agree strongly,” with no neutral option. In the current sample, the internal consistency coefficient for the total score was .76.

Procedure Participants attended two research sessions exactly 1 week apart. In the first session, they were asked to respond to one version of the MMPI-2-RF (either traditional or extended response format) and additional criterion measures. In the second session, the participants were administered the other version of the MMPI2-RF and other criterion measures. To reduce the influence of order effects, the MMPI-2-RF and collateral questionnaire admin istrations were counterbalanced across the two sessions. Approx imately half (51.23%) of the sample was administered the tradi tional MMPI-2-RF (i.e., true/false response format) during their first session. Similar percentages were noted for each of the collateral measures, suggesting a relatively even distribution of the various questionnaires. The correlation between the order of MMPI-2-RF administration and the order of administration for a collateral measure ranged from - .0 0 (BFI) to .10 (EASI), indicat ing relative independence in administration. Within the sessions, the MMPI-2-RF’s order was counterbalanced to be either before or

187

after all criterion measures. The order of the criterion measures were randomized within the packet. All questionnaires were ad ministered in paper-pencil format.

Data Analysis Reliability, estimated with Cronbach’s alpha, was compared for the two response forms first. To prevent range restriction, the Validity Scales’ internal consistency coefficients were calculated using participants who were administered both versions of the MMPI-2-RF, regardless of the validity of their protocols (n = 336); however, substantive scale comparisons were only con ducted with valid protocols. We used Feldt, Woodruff, and Salih’s (1987) procedure for testing the statistical significance of the difference between the internal consistency coefficients of the different response formats. Effect sizes for the alpha differences were calculated using Liu and Weng’s (2009) delta (A), which is interpreted similarly to Cohen’s d, with small effects between 0.20 and 0.49, medium effects between 0.50 and 0.79, and large effects greater than or equal to 0.80. On the basis of previous findings of increased internal consis tency absent concomitant improvement in construct validity (i.e., Cox, 2011; Cox et al., 2012), we hypothesized that enhanced reliability estimates may reflect the addition of systematic (i.e., reliable, but not valid) error variance. Validity and substantive scales score variances for the two response options were compared to test this hypothesis. To examine whether reliability estimates are associated with increased scale score variance, delta coefficients, reflecting the degree to which polytomous response options re sulted in increased reliability estimates, were correlated with the percentage of increased variance associated with the enhanced response option. To compare validity across response formats, MMPI-2-RF sub stantive scales were divided into the major domains assessed by the instrument (internalizing, externalizing, thought dysfunction, and interpersonal functioning) identified in the MMPI-2-RF man ual (Ben-Porath & Tellegen, 2008/2011). To examine validity of internalizing scale scores, BFI Extraversion and Neuroticism; EASI Fear, Distress, Anger, and Emotionality; and SCID-II-PQ Avoidant, Dependent, Depressive, and Borderline Personality Dis order symptom counts were used as criteria. For the thought dysfunction scales, BFI Openness and SCID-II-PQ Paranoid, Schizotypal, and Schizoid Personality Disorder symptoms counts were used. For the externalizing scales, BFI Agreeableness and Conscientiousness; EASI Activity and Impulsivity; SCID-II-PQ Narcissistic and Antisocial Personality Disorder symptom counts; the SSS-V scales; and the MACH-IV total score were used as validity criteria. For the interpersonal scales, BFI Extraversion; EASI Sociability; and SCID-II-PQ Avoidant, Dependent, Schiz oid, and Histrionic Personality Disorder symptom counts served as validity criteria. Zero-order correlations were calculated using raw scale scores from both versions of the MMPI-2-RF and the exter nal criterion measures. Because we did not have validity criteria for the MMPI-2-RF Somatic/Cognitive and Interest scales, these measures were included in the reliability analyses, but not the validity ones. We used Meng, Rosenthal, and Rubin’s (1992) method to test the statistical significance of the difference between the validity coefficients (i.e., differences in correlations between MMPI-2-RF scale scores and relevant criteria as a function of

188

FINN, BEN-PORATH, AND TELLEGEN

response format). Cohen’s (1988) q statistic was used as an esti mate of effect size for the statistically significantly different cor relations. It is interpreted similarly to r as an effect size, with small effects between .10 and .29, medium effects between .30 and .49, and large effects greater than or equal to .50. Because of the influence of family-wise error on the interpre tation of results, we focus on statistical significance and effect size in our discussion. Although we conducted many statistical tests, increasing the likelihood of Type I error, we did not correct for family-wise error because setting a higher standard for statistical significance would make it easier to support our hypothesis that no validity differences would be found across response formats.

Results Reliabilities Table 1 presents the internal consistency results for the Validity Scales and the substantive scales obtained from the traditional MMPI-2-RF and the extended response version. VRIN-r and TRIN-r were not included because of the complexities of working with item pairs rather than individual items. For the remaining Validity Scales, all alpha coefficients were higher in the extended response version of the MMPI-2-RF. The differences in internal consistency ranged from .02 (Adjustment Validity [K-r]) to .21 (Uncommon Virtues [L-r]). As predicted, most of the alpha dif ferences reached statistical significance; however, the difference for K-r did not. Five of the six significant differences had small effect sizes, but Fp-r did not reach a small effect size. As for the substantive scales, the alpha coefficients of 40 of the 42 scales were higher with the extended response format than with the traditional format of the MMPI-2-RF. One scale had a higher traditional administration alpha coefficient (Anger Proneness [ANP]), although this difference was small and not statistically significant; and one scale (Self-Doubt [SFD]) had the same alpha coefficient for both versions. When examining the scales with higher extended response format internal consistencies, the alpha differences ranged from .02 (Emotional/Intemalizing Dysfunction [EID] and Shyness [SHY]) to .19 (Helplessness/Hopelessness [HLP]) with a median difference of .06. Inefficacy (NFC), STW, Juvenile Conduct Problems (JCP), Activation (ACT), SHY, and Negative Emotionality (NEGE-r) did not show statistically signif icant alpha improvements when using the four-choice response format. For those 34 scales with statistically significant differ ences, 17 reached the level of small effect size, with deltas ranging from 0.20 (Behavioral/Extemalizing Dysfunction [BXD], NUC, and Family Problems [FML]) to 0.48 (Psychoticism-Revised [PSYC-r]). The effect sizes of the other 17 scales, although sta tistically significant, were less than the .20 threshold for a small effect. Examining relative increments in internal consistency by domain, scales related to negative emotionality (e.g., NEGE-r, STW, SFD, and SHY) had more comparable internal consistencies across response options whereas scales related to thought dysfunc tion (e.g., Thought Dysfunction [THD], PSYC-r, and RC6) had more discrepant internal consistencies, with the extended response format yielding higher alpha values. Table 1 also presents variances for the MMPI-2-RF scales from both response format administrations. As expected, the addition of response options increased scale variances, ranging from 354.14%

variance increase (NFC) to 1,040.74% variance increase (Suicidal/ Death Ideation [SUI]). The variance increase for the 49 MMPI2-RF scales was significantly positively correlated, r = .79, p < .001, with the internal consistency increase demonstrated by the delta coefficient, indicating that the additional scale variance gained from polytomous response options is largely reliable vari ance, reflecting either construct-relevant or systematic error vari ance.

Validities Tables 2-7 present the correlations between conceptually rele vant collateral measures and the MMPI-2-RF internalizing, thought dysfunction, externalizing, and interpersonal scales. For the internalizing scales (see Table 2-3), the median absolute cor relation difference was .03, ranging from no difference to a .16 difference, with approximately 80% of the correlations having a difference of .05 or less. Fourteen correlation differences were statistically significant: eight favored the true/false response for mat and six favored the extended response format. Correlations favoring dichotomous responses were observed for SCID-II-PQ Avoidant, Dependent, and Depressive symptom counts, and they produced smaller effect sizes, with Cohen’s qs ranging from .083 to .142. The correlations favoring polytomous response also had small effect sizes, Cohen’s qs ranging from .127 to .183. Most of these correlations were between SUI and an internalizing criterion measure, although none of the criteria were directly related to suicidality. For the thought dysfunction scales (see Table 4), the median absolute correlation difference was .05, ranging from .01 to .13. Four correlation differences were statistically significant: all fa vored the traditional true/false response format. These significant differences were found for correlations with SCID-II-PQ Paranoid and Schizotypal symptom counts, and they represented small ef fect sizes, with Cohen’s qs ranging from .125 to .168. For the externalizing scales (see Table 5-6), the median absolute correlation difference was .03, ranging from no difference to a .15 difference. Again, approximately 80% of the correlations had a difference of .05 or less. Eight correlation differences were statis tically significant: one favored the traditional response format and seven favored the extended response format. The correlation be tween the SCID-II-PQ Narcissistic symptom count and Aggres sion (AGG) favored the dichotomous response, with a small effect size (q = .113). The four-choice favoring correlations were smaller effect sizes, with Cohen’s qs ranging from .092 to .155. Most of these correlations were between Hypomanic Activation (RC9) or its associated SP scales (ACT and AGG) and a criterion measure. For the interpersonal scales (see Table 7), the median absolute correlation difference was .04, ranging from no difference to a .16 difference. Approximately 72% of the correlations had a difference of .05 or less. Four correlation differences were statistically sig nificant: one favored the traditional MMPI-2-RF and three favored the four-choice version. The correlation between Social Avoidance (SAV) and SCID-II-PQ Schizoid symptom count favored the traditional MMPI-2-RF, but it did not reach a small effect size (q = .083). The polytomous-favoring correlations represented small effect sizes, with Cohen’s qs ranging from .112 to .204. Two

DICHOTOMOUS AND POLYTOMOUS RESPONSE OPTIONS

189

T able 1

Internal Consistencies and the Statistical Significance and Effect Size o f Their Difference and Variance and Change in Variance fo r Participants With a Valid MMPI-2-RF and a Valid Extended Response Form (n = 235) Scale Validity Scales* F-r Fp-r Fs FBS-r RBS L-r K-r H -0 scales EID THD BXD RC Scales RCd RC1 RC2 RC3 RC4 RC6 RC7 RC8 RC9 Somatic/cognitive scales MLS GIC HPC NUC COG Internalizing scales SUI HLP SFD NFC STW AXY ANP BRF MSF Externalizing scales JCP SUB AGG ACT Interpersonal scales FML IPP SAV SHY DSF Interest scales AES MEC PSY-5 scales AGGR-r PSYC-r DISC-r NEGE-r INTR-r

Traditional MMPI-2-RF a

Extended Response a

Significance (p)

.79 .57 .60 .69 .56 .43 .72

.88 .67 .76 .77 .73 .64 .74

Psychometric properties for the Balanced Inventory of Desirable Responding: dichotomous versus polytomous conventional and IRT scoring.

Assessment of evidence versus consensus or prejudice.

Anti-angiogenic agent ramucirumab: meaningful or marginal?

[Funktionsoberarzt. Pseudo-title or meaningful position?].

Comparison of meaningful learning characteristics in simulated nursing practice after traditional versus computer-based simulation method: a qualitative videography study.

Prediction of selection response for threshold dichotomous traits.

Local response dispersion method in periodic systems: Implementation and assessment.

Modified generalized method of moments for a robust estimation of polytomous logistic model.

Response assessment in metronomic chemotherapy: RECIST or PERCIST?

Variance in niacin response in individuals with elevated lipoprotein(a).

Comparison of alternate personality models in psychopathology assessment.

A derivation of the Polytomous Rasch model based on the most probable distribution method.

Susceptibility or vulnerability? The role of Basal cortisol in psychopathology.

Using Cook's distance in polytomous logistic regression.

A fast minimum variance beamforming method using principal component analysis.

Method and effect of adjustment for heterogeneous variance.

Response to selection while maximizing genetic variance in small populations.

2 model.

Vaccines versus immunotherapy: overview of approaches in deciding between options.

Assessment of the genetic variance of late-onset Alzheimer's disease.

A note on weighted likelihood and Jeffreys modal estimation of proficiency levels in polytomous item response models.

Assessment of shame and guilt and their relationships to psychopathology.

"Meaningful use" provides a meaningful opportunity.

An Exact Method for Partitioning Dichotomous Items Within the Framework of the Monotone Homogeneity Model.