Development of screening inventories for bipolar disorder at workplace: a diagnostic accuracy study.

Journal of Affective Disorders 178 (2015) 32–38

Contents lists available at ScienceDirect

Journal of Affective Disorders journal homepage: www.elsevier.com/locate/jad

Research report

Development of screening inventories for bipolar disorder at workplace: A diagnostic accuracy study Kotaro Imamura a,n, Norito Kawakami a, Yoichi Naganuma b, Yoshio Igarashi c a

Department of Mental Health, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan Department of Social Work, School of Health Sciences, Tokai University, Kanagawa, Japan c Medical Care Toranomon, Tokyo, Japan b

art ic l e i nf o

a b s t r a c t

Article history: Received 24 February 2015 Accepted 26 February 2015 Available online 9 March 2015

Background: This study aimed to develop a new instrument for bipolar disorder screening, the Workplace Bipolar Inventory (WBI), and examine its efficiency as compared with Mood Disorder Questionnaire (MDQ) and Bipolar Spectrum Diagnostic Scale (BSDS) among workers on leave of the absence due to their mental health problems. Methods: Participants were recruited at a psychiatric outpatient clinic for return-to-work in Tokyo, Japan, during September to November 2009. 81 outpatients were recruited, 55 of whom (68%) agreed to participate in this study. Participants answered questionnaires including WBI, MDQ, BSDS, and demographic factors. Their diagnostic information according to the international statistical classification of diseases and related health problems 10th revision (ICD-10) was obtained from their attending psychiatrists. The WBI is a new self-rating 39-item questionnaire which developed with input from occupational mental health specialists and an analysis of WHO Composite International Diagnostic Interview (CIDI) items. The WBI contains 3 subtype scales: WBI-A (5 items), WBI-AB4 (9 items), and WBI-AB (39 items). Results: Reliability of these scales was moderate. In the AUC of these scales, BSDS was the best of them (0.83). In the optimal cut-off point of these scales, WBI-AB4 showed good efficiency of screening (sensitivity¼0.78, specificity¼0.75). Both MDQ and BSDS had high specificity, while low in sensitivity. Limitations: The well validated diagnostic method (i.e., the structured clinical interview for DSM-IV [SCID] or CIDI) was not applied in this study. Conclusions: The WBI, especially WBI-AB4 would be a useful workplace screening tool for workers with bipolar disorder. & 2015 Elsevier B.V. All rights reserved.

Keywords: Bipolar disorder Screening Workplace

1. Introduction Bipolar disorder is a serious, commonly disabling, psychiatric condition (Miller et al., 2014). Although bipolar disorder exacts a high personal and societal toll, with high rates of suicide, interpersonal problems, and a substantial economic burden (Dunner, 2003; Glick, 2004), it is frequently misdiagnosed (Dunner, 2003; Ghaemi et al., 2001; Glick, 2004; Mantere et al., 2004). Previous studies reported that inappropriate treatments caused by misdiagnosis or delayed diagnosis may lead to poor prognosis and increased social problems (Berk and Dodd, 2005; Fagiolini et al., 2013; Skeppar and Adolfsson, 2006). For these reasons, early correct diagnosis of bipolar disorder is an important theme (Das et al., 2005; Dunner, 2003). At the workplace also, bipolar disorder requires attention (Stang et al., 2007). A systematic review of bipolar disorder in the n Correspondence to: 7-3-1, Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan. Tel.: þ 81 3 5841 3522; fax: þ 81 3 5841 3392. E-mail address: [email protected] (K. Imamura).

http://dx.doi.org/10.1016/j.jad.2015.02.034 0165-0327/& 2015 Elsevier B.V. All rights reserved.

workplace revealed that bipolar disorder imposes a significant financial burden due to lost productivity on employers, costing more than twice as much as depression (Laxman et al., 2008). In addition, it was reported that employees with bipolar disorder annually cost $6836 more than employees without bipolar disorder in terms of health care insurance, prescription drugs, and sick leave, among others (Gardner et al., 2006). A qualitative study using interviews with people with bipolar disorder reported that the impact of bipolar disorder upon work functioning emerged as follows: lack of continuity in work history, loss, illness management strategies in the workplace, stigma and disclosure in the workplace, and interpersonal problems at work (Michalak et al., 2007). The presence of stigma in the workplace would lead to delays in accurate diagnosis and effective management of bipolar disorder (Laxman et al., 2008). A previous study suggested that more attention should be paid to the screening and the treatment of bipolar disorder at the workplace (Kessler et al., 2006). It is important to include a screen for bipolar disorder in workplace depression screening programs.

K. Imamura et al. / Journal of Affective Disorders 178 (2015) 32–38

Some self-report questionnaires have been developed to screen for bipolar spectrum disorders, but they mainly are intended for use in clinical settings. The Mood Disorder Questionnaire (MDQ) was developed to screen for a lifetime history of a manic or hypomanic syndrome according to the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR) criteria and clinical experience (Hirschfeld et al., 2000) and has been widely investigated (Carvalho et al., 2014). Mood Disorder Questionnaire showed good sensitivity (0.73) and very good specificity (0.90) in clinical settings (Hirschfeld et al., 2000) and higher sensitivity with bipolar I disorder (Gervasoni et al., 2009; Miller et al., 2004; Twiss et al., 2008). The Bipolar Spectrum Diagnostic Scale (BSDS) was developed to target bipolar II and NOS conditions (Ghaemi et al., 2005). Bipolar Spectrum Diagnostic Scale showed good sensitivity, at 0.76, approximately equal in bipolar I and II/NOS subjects (0.75 and 0.79, respectively), with an optimal cut-off point to detect bipolar disorder of 12/13 (sensitivity0.75 and specificity¼0.93; (Ghaemi et al., 2005). The Hypomania Check List (HCL-32) self-report questionnaire is a tool designed to screen for hypomanic components in patients with major depressive disorder (MDD) (Angst et al., 2005). The HCL-32 distinguished between BP and MDD with a sensitivity of 0.80 and a specificity of 0.51, but it does not distinguish between BP-I and BP-II disorders (Angst et al., 2005). A comprehensive meta-analysis of accuracy studies among these screening instruments reported that the summary sensitivities were 0.81, 0.66 and 0.69, while summary specificities were 0.67, 0.79 and 0.86 for the HCL-32, MDQ, and BSDS, respectively, in psychiatric services, in reference to the recommended cut-off points (Carvalho et al., 2014). However, most studies were performed in mental health care settings and no study has been conducted targeting the workplace. The purpose of this study was to develop new instruments for bipolar disorder screening in the workplace, and to examine the reliability and efficiency (i.e., sensitivity, specificity, AUC, and the stratum-specific likelihood ratios) to be compared with Mood Disorder Questionnaire (MDQ) and Bipolar Spectrum Diagnostic Scale (BSDS) in workers who have a mood or anxiety disorder.

2. Methods 2.1. Participants' recruitment The present survey was conducted in one psychiatric outpatient clinic for supporting return-to-work of employees with mental health problems in Japan. The inclusion criteria were 1) taking a leave of the absence due to their mental health problems, 2) participated in the daily return-to-work program, and 3) permitted to participate in this study by their attending psychiatrists. 2.2. Procedure The procedures of the present study were as follows: 1) participants were asked to complete the self-reported questionnaire including the Workplace Bipolar Inventory (WBI), Mood Disorder Questionnaire (MDQ) and Bipolar Spectrum Diagnostic Scale (BSDS), 2) after the questionnaire survey, the diagnostic information of each participant according to the ICD-10 was collected from their attending psychiatrist, and 3) the researcher combined this information to create the dataset for statistical analyses. The attending psychiatrists were blinded to the results of the questionnaire of their patients. 2.3. Screening instruments 2.3.1. Workplace Bipolar Inventory (WBI) Workplace Bipolar Inventory is a newly developed self-rating 39-item questionnaire for use at the workplace. It was developed

33

with input from two sources, a panel of specialists and items from an already-established inventory. Eight practitioners of occupational mental health were asked to report the observed specific symptoms or behaviors of cases with bipolar disorder type I and II in workplace settings. From their reports, 128 items of specific symptoms or behaviors were collected as an item pool and four large categories (including 17 small categories) were created by two occupational mental health specialists and five graduate school students majoring in occupational mental health using the KJ Method (Scupin, 1997). One to three items were selected among each category for WBI, and provisionally 35 items were included. The other source was an analysis of World Health Organization Composite International Diagnostic Interview version 3.0 (WHO-CIDI 3.0) items using the data of the world mental health survey Japan (WMH-J) (Kawakami et al., 2008). 69 items met the criteria for a screening question (elevated, expansive or irritable mood) of bipolar disorder, two met the criteria for bipolar disorder type I (manic episode), five met the criteria for bipolar disorder type II, and 14 met the criteria for hypomanic episode according to Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR) (American Psychiatric Association., 2000). The odds ratios (OR) of predicting bipolar disorder type I, II, mania and hypomania were calculated using each of the 15 questions about manic and hypomanic episodes according to WHO-CIDI 3.0. The respondents diagnosed with bipolar disorder were compared with those who were diagnosed with major depressive disorder or anxiety disorders. The ORs of each question ranged from 0.75 to 4.80. Four questions which had significant ORs were chosen for the items of WBI. These questions were related to the signs and symptoms of “psychomotor agitation” (OR¼ 4.05 [95% CI: 1.13–14.57]), “increase in goal-directed activity” (OR¼ 3.70 [95% CI: 1.08–12.66]), “excessive involvement in pleasurable activities that have a high potential for painful consequences” (OR¼ 4.38 [95% CI: 1.21–15.78]), and “more talkative than usual or pressure to keep talking” (OR¼ 4.80 [95% CI: 1.38–16.65]). From the two approaches described above, a provisional set of 39 items for the WBI was developed. To screen for bipolar disorder using fewer items, WBI was developed as a two-step inventory according to the diagnostic criteria of manic and hypomanic episodes in DSMIV. The first step (question A) consisted of five items from the KJ method according to criterion A of DSM-IV (i.e., abnormally and persistently elevated, expansive or irritable mood) and one item asking about the duration of symptoms. The second step (question B) consisted of 34 items including 30 items from the KJ method and four items from WHO-CIDI 3.0. In addition, the severity of impairment of daily living due to symptoms was asked in one item (question C). Questions A and B ask the respondents to answer with “yes” or “no” except the question about the duration of symptoms (3point scale as follows; less than 3 days, 3–6 days, or more than 7 days). Question C evaluates the level of impairment resulting from the symptoms on a 4-point scale (no, minor, moderate, or serious problems). The respondents who answered “yes” to any item of question A were required to answer question B. In the present study, the respondents who answered “no” to all items of the question A were treated as having answered “no” to all items of question B. The WBI contains three subtype scales; WBI-A (5 items), WBIAB (39 items), and WBI-AB4 (9 items). The present study tested the screening performances of each of the three subtype scales. WBI-A included only question A. The scoring algorithm calculated the number of symptom items scored “yes” (ranging from 0 to 5). The WBI-AB scale included all the questions. The scoring algorithm calculated the number of symptom items scored “yes” (ranging from 0 to 39). The WBI-AB4 scale included question A and four items quoted from WHO-CIDI 3.0. The scoring algorithm calculated the number of symptom items scored “yes” (ranging from 0 to 9). The diagnostic criteria for bipolar type I or II according to the ICD-10 do not include the duration of symptoms and severity

34


of impairment. For the above reason, the present study did not use the items of the duration of symptoms (the last item in the question A) and severity of impairment (question C). 2.3.2. Mood Disorder Questionnaire (MDQ) The MDQ is a brief, self-report, and easy-to-use inventory, which is used to screen for bipolar spectrum disorders according to the DSM-IV criteria of a lifetime history of a manic and hypomanic episode (Hirschfeld et al., 2000). The questionnaire consists of three sections. The first part consists of 13 brief “yes” or “no” statements related to manic and hypomanic symptoms. The second part has one “yes” or “no” question, which asks whether those manic symptoms occurred simultaneously. The third part has the subject evaluate the problems caused by those manic behaviors along a 4-point scale, ranging from “no problem” to “serious problem.” A scoring algorithm is used to calculate the number of symptom items scored “yes” (ranged from 0 to 13). In order to screen positive for bipolar spectrum disorder, in addition to a threshold number of symptom items (7 or more), the respondent has to check “yes” for the item asking if the symptoms clustered in the same time period and has to indicate that the symptoms caused either “moderate” or “serious” problems. The original study reported that the sensitivity and the specificity of the MDQ were 0.65 and 0.94, respectively (Hirschfeld et al., 2000). The English version of the MDQ used in the present study was translated into Japanese by Dr. Naganuma and Dr. Utsumi (private communication); however, the reliability and validity of the Japanese translation have not been investigated. 2.3.3. Bipolar Spectrum Diagnostic Scale (BSDS) The BSDS is a self-report narrative-based scale which aims to detect the milder portions of the bipolar spectrum in outpatients (Ghaemi et al., 2005). The BSDS is composed of two parts. The first part is a paragraph containing 19 positively valenced sentences describing many of the symptoms of bipolar disorder. The score can range from 0 to 19. In the second part, there are four possible answers from which to choose: “This story fits me very well or almost perfectly” (worth 6 points), “This story fits me fairly well” (4 points), “This story fits me to some degree” (2 points), and “This story does not really describe me at all” (0 points). The total score on the BSDS can range from 0 to 25. The original study reported that the overall sensitivity and specificity for bipolar types I, II, and NOS of the BSDS using the cut-off point of 13 or more were 0.75 and 0.93, respectively (Ghaemi et al., 2005). The screening performances for sensitivity and specificity of the Japanese version of BSDS, using the original cut-off point (13 or more), were reported as 0.53 and 0.59, respectively, and using the optimal cut-off point (11 or more) were 0.63 and 0.73, respectively (Tanaka and Koyama, 2011).

WBI-AB), MDQ and BSDS were analyzed using receiver operating characteristic (ROC) curves. Areas under ROC curves (AUC) and their 95% confidence intervals were calculated by the non-parametric method and compared among the screening scales. The optimal cutoff points for each scale were calculated according to the Youden index (Fluss et al., 2005). The Youden index is calculated as the Youden index¼ maximum (sensitivityþspecificity 1). The stratum-specific likelihood ratio (SSLR) is also calculated by describing the proportion of true positives with screening scale scores in a given range divided by the proportion of true negatives with scores in the same range (Peirce and Cornell, 1993). This approach has been proposed as more informative than the single-threshold approach regardless of prevalence of target disorder and spectrum bias (Ransohoff and Feinstein, 1978). In general, an SSLR over 10 makes the target disorder highly probable, whereas one smaller than 0.1 usually rules it out. Those between 5 and 10 or between 0.1 and 0.2 are often very informative. Those between 0.5 and 2 would be of little assistance in the diagnosis (Furukawa et al., 2001; Jaeschke et al., 1994). All analyses were performed using the SPSS 17.0 for windows (IBM Corp., USA) and MedCalc for Windows 14 (MedCalc Software, Belgium). Methods and results were reported according to the STARD guideline checklist. 2.6. Ethics The Research Ethics Review Board of the University of Tokyo, Graduate School of Medicine (no. 2652) approved the study procedures. Participants were fully explained about the aims and procedures of the study, including information that non-participants would not be disadvantaged, and participants voluntarily gave written informed consent.

3. Results 3.1. Participants' characteristics The present study was conducted during the period from September to November 2009. 81 patients were given an explanation about this study and 55 agreed to participate (response rate¼67.9%). Table 1 shows the participants' characteristics, and the STARD flow chart is provided in Fig. 1. Of the 55 respondents, 27 have been diagnosed with bipolar disorder (F31) and 28 have been diagnosed with other common mental disorders. In the non-bipolar group, 10 Table 1 Demographic characteristics in bipolar disorder group and non-bipolar group. Bipolar disorder group (N ¼27)a

2.3.4. Demographic characteristics Demographic data, such as age, gender, education, and occupation were also collected. 2.4. Reference standard Participants were diagnosed according to the international statistical classification of diseases and related health problems 10th revision (ICD-10) criteria by their attending psychiatrist as the reference standard (World Health Organization., 2004). 2.5. Statistical analyses Sensitivity and specificity for each possible WBI (WBI-A, WBI-AB4, WBI-AB), MDQ and BSDS score were plotted by using results from the diagnosis by psychiatrist according ICD-10 as the standard. Sensitivity and specificity for various symptom threshold cutoff scores were calculated in order to determine the optimal screen threshold. The performance of the three subtype scales of WBA (WBI-A, WBI-AB4,

n (%) Age (years) Sex Male Female Education (years) Occupational status Manager Profession Clerk Service Blue-collar Others

Non-bipolar group (N ¼28)a

Average (SD) n (%)

Average (SD)

39.5 (6.3)

39.9 (7.8)

0.82 1.00

16.3 (1.3)

0.33 0.52

24 (88.9) 3 (11.1)

25 (89.3) 3 (10.7) 16.0 (1.3)

4 (14.8) 12 (44.4) 7 (25.9) 0 (0) 2 (7.4) 2 (7.4)

pnn

3 (10.7) 11 (39.3) 11 (39.3) 0 (0) 0 (0) 3 (10.7)

a Diagnoses were made according to ICD-10. The non-bipolar group included 10 cases of depressive disorder (F32), 6 cases of recurrent depressive disorder (F33), 7 cases of phobic anxiety disorders (F40), 1 case of other anxiety disorders (F41), 1 case of obsessive–compulsive disorder (F42), and 3 cases of reaction to severe stress, and adjustment disorders (F43). nn t-test or chi-square test.


35

Fig. 1. STARD flow diagram; positive/negative results show the number using the optimal cutoff-point of this study.

have been diagnosed with depressive disorder (F32), six with recurrent depressive disorder (F33), seven with phobic anxiety disorders (F40), one with other anxiety disorders (F41), one with obsessive– compulsive disorder (F42), and three have been diagnosed with reaction to severe stress, and adjustment disorders (F43). There were no significant differences in demographic characteristics between the bipolar and non-bipolar groups. In both groups, most participants were males, professionals or clerks, and university graduates. There was no missing data in this survey. Reliability of each scale and the comparisons of the average scores of the scales between the two groups. Cronbach's alpha coefficients of the initial consistency reliability of each scale were 0.65 for WBI-A, 0.94 for WBI-AB, 0.78 for WBIAB4, 0.84 for MDQ, and 0.88 for BSDS. Table 2 shows the average scores of scales in two groups. Respondents with bipolar disorder had significantly greater average scores on all scales compared to the non-bipolar respondents (Po0.01). 3.2. Comparison of the screening performances of each scale Fig. 2 shows the ROC curves and the AUCs of each scale. All scales showed moderate accuracy on AUC. Bipolar Spectrum Diagnostic Scale showed the greatest AUC for screening bipolar disorder, followed by MDQ, WBI-AB4 and WBI-AB. The AUC for WBI-A was the lowest. The pairwise comparisons for the AUCs among scales were not significant. Table 3 shows the sensitivity, specificity, optimal cut-off point, and Youden index scores for each scale. According to the Youden index, WBI-AB4 was the highest (0.53) and the optimal cut-off point was 3/4 with a sensitivity of 0.78 and a specificity of 0.75. The sensitivity for WBI-A and WBI-AB4 were slightly greater than the other scales. The specificity for MDQ using the original cut-off point was the highest (1.0). Workplace Bipolar

Table 2 Comparison of average scores of each scales. Scale

Bipolar disorder group (N ¼7) Non-bipolar group (N ¼8) pn Average (SD) Average (SD)

WBI-A 2.6 (1.6) WBI-AB 15.7 (10.7) WBI-AB4 4.2 (2.7) MDQ 6.0 (3.6) BSDS 10.9 (5.6) n

1.3 6.7 1.9 2.7 4.7

(1.3) (6.4) (1.8) (2.4) (4.5)

o 0.01 o 0.01 o 0.01 o 0.001 o 0.001

t-test.

Inventory-AB and BSDS using the original cut-off point were also high (0.96 and 0.89, respectively). Using the original cut-off point, the MDQ and the BSDS showed lower sensitivity than that derived by using the optimal cut-off point in the present study. Table 4 presents SSLRs of each scale for bipolar disorder. WBI-A, WBI-AB4, MDQ and BSDS were categorized into three strata. WBI-AB was categorized into two strata. At the highest strata, WBI-AB, WBI-AB4, MDQ and BSDS showed the SSLR scores 5.0 or more. WBI-AB showed the highest SSLR score (SSLR¼ 15.56). At the lowest strata, BSDS showed the lowest SSLR score (SSLR¼0.07).

4. Discussion The present study examined the performance of a newly developed self-reported screening instrument for workers on sick leave due to their mental health problems. All the screening instruments showed the significantly higher scores in the bipolar disorder group than that of the non-bipolar group. On the AUC scores, BSDS showed

36


AUCs (95%CIs) WBI_A:

0.73 (0.59 to 0.87)

WBI_AB:

0.74 (0.61 to 0.88)

WBI_AB4:

0.76 (0.62 to 0.89)

MDQ:

0.76 (0.63 to 0.89)

BSDS:

0.83 (0.72 to 0.94)

Fig. 2. The ROC curves for each scale differentiating bipolar respondents (N ¼ 27) or non-bipolar respondents (N ¼28).

Table 3 Cut-off scores of scales. Scale

WBI-A WBI-AB WBI-AB4 MDQa BSDS

Table 4 Stratum-specific likelihood ratios (SSLRs) of the scales, comparing bipolar disorder and non-bipolar disorder groups.

Optimal cutoff-point

Sensitivity (SE)

Specificity (SE)

The Youden index (Se þSp 1)

2þ 17 þ 3þ 5þ 7þb 8þ 13þ b

0.78 0.56 0.78 0.67 0.30 0.67 0.41

0.68 0.96 0.75 0.75 1.00 0.86 0.89

0.46 0.52 0.53 0.42 0.30 0.52 0.30

(0.08) (0.10) (0.08) (0.09) (0.09) (0.09) (0.09)

(0.09) (0.04) (0.08) (0.08) (0.00) (0.07) (0.06)

a A positive screen is defined as endorsement of any 7 of the 13 questions excluding the questions about simultaneity of symptoms and impairment (the second and the third parts). b Standard cut-off score for these scales; otherwise, the best cut-off scores in the present sample.

the highest score (0.83), while the AUCs among scales were not significant. In comparison of the Youden index, WBI-AB4 showed the highest score (0.53) among other instruments. In the highest strata, WBI-AB, WBI-AB4, MDQ, and BSDS showed a relatively higher positive SSLRs (more than 5). In the lowest strata, BSDS showed the most informative negative SSLR (0.07). In the WBI subscales, WBI-AB4 (9 items) showed the most informative screening performance. WBI-AB4 showed the highest sensitivity (0.78) and relatively higher specificity (0.75) among other instruments in this study. According to the optimal cut-off point (3 or more), WBI-AB4 would be useful for occupational mental health staffs to screen out bipolarity among workers who have depressive symptoms at the workplace; however, more information of the suspected subjects about the manic/hypomanic episode from their supervisor, colleagues, or family was needed. The SSLR of WBI-AB4 was also acceptable (9.33). The previous study reported that 53.2% of psychiatric outpatients who have not been diagnosed with bipolar disorder but have recurrent and current depressive episodes were classified as having bipolar disorder according to the SCID (Kim et al.,

Scale

Score

Bipolar/non-bipolar

SSLR

95% CI

WBI-A

0–1 2 3–5 0–16 17–39 0–2 3–5 6–9 0–2 3–7 8–13 0–4 5–14 15–25

6/19 5/4 16/5 12/27 15/1 6/21 12/6 9/1 6/17 10/10 11/1 1/15 18/12 8/1

0.33 1.30 3.32 0.46 15.56 0.30 2.07 9.33 0.37 1.04 11.41 0.07 1.56 8.30

0.16 0.39 1.41 0.30 2.20 0.14 0.91 1.27 0.17 0.52 1.58 0.01 0.94 1.11

WBI-AB WBI-AB4

MDQ

BSDS

– – – – – – – – – – – – – –

0.69 4.32 7.79 0.71 109.78 0.62 4.73 68.77 0.79 2.08 82.43 0.49 2.58 61.93

2008). If the WBI-AB4 is applied to current depressed workers who have also recurrent depressive episodes, the prior probability of detecting bipolar disorder will be estimated to about 50%, and the positive predictive value will be calculated to 82% according to the cut-off point at 6 or more. It would be useful not to misdiagnose workers with bipolar disorder. Workplace Bipolar Inventory-AB showed the highest score on the SSLR, with a low sensitivity (0.56) at the optimal cut-off point. Most of the items of the WBI were collected from the report by the occupational mental health practitioners about the observed specific symptoms or behaviors of cases with bipolar disorder type I and II at the workplace. These items may be too specific and this may lead to low sensitivity (0.56) and high specificity (0.96). In further investigations, more accurate items could be selected for the WBI based on the results of this study. The sensitivity and specificity for MDQ and BSDS using the optimal cut-off point in the present study were similar to the previous


meta-analysis using the developer recommended cut off point (Carvalho et al., 2014). In this study, the sensitivities for MDQ and BSDS according to the recommended cut-off point were decreased and the specificities were increased. Using the MDQ and the BSDS among a Japanese working population, it would be better to apply the lower cut-off point than of the one the developer recommended. The BSDS showed the highest AUC point (0.83) and better specificity (0.86), although with low sensitivity (0.67). Both the high and the low stratum SSLR were informative. The BSDS may be useful for screening in or screening out workers who have bipolar disorder at the workplace. The MDQ showed the lowest Youden index score (0.42) in the present study. Many participants of this study may have features of bipolar disorder type II (i.e., hypomanic symptoms). Whereby, MDQ showed the lowest screening performance in this study.

37

YN has received royalties from Chuo-Hoki-Shuppan, and Minervashobo. YI has received honoraria for speaking at meetings sponsored by GlaxoSmithKline, Eizai, Daiichi-Sankyo Pharmaceutical Company, Meiji-seika Pharama. Company, Ohtsuka Pharmaceutical Company.

Acknowledgment We appreciate the help of the following persons in completing this project: Dr. Tanaka and Dr. Utsumi.

Appendix A. Supporting information Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.jad.2015.02.034. References

5. Limitations The present study has some limitations. The major limitation of this study was that the well validated diagnostic method, that is, the structured clinical interview for DSM-IV (SCID) or CIDI, was not applied in this study. In addition, participants were not diagnosed by an independent research assistant, but by their attending psychiatrists, while the results of the screening instruments were blinded. The reliability and validity of the diagnoses may be inadequate, even though the diagnoses were assessed by the well trained psychiatrists. The second limitation was that the present study did not assess bipolar disorder type I or type II because these cannot be assessed according to the diagnostic criteria of ICD-10. The previous studies reported that the MDQ showed the higher sensitivity with bipolar I disorder (Gervasoni et al., 2009; Miller et al., 2004; Twiss et al., 2008). WBI also may show the different screening performances for bipolar disorder type I or type II. Third, the participants of this study were recruited from one psychiatric outpatient clinic for return-to-work in Japan. Most participants were males, university graduates, and all of them participated in the return-to-work program. In this program, they were lectured about their mental illness and how to manage their symptoms. The previous study reported that the patients' insight into their mental illness affect the screening performance (Miller et al., 2004). The participants in this study had greater insight into their mental illness than the general working population. Generalization of the present findings may be limited. In addition, the previous study reported that the screening instruments can rule out bipolarity when patients have insight into their symptoms, but do not effectively rule it in at lower prevalence or prior probabilities as in the community or primary care setting (Phelps and Ghaemi, 2006). This also may occur in the workplace setting. A further diagnostic accuracy study should be conducted to examine whether the WBI is effective for screening for bipolar disorder in the workplace. Role of funding source The present study was supported by the Health and Labour Sciences Research Grants, Research on Psychiatric and Neurological Diseases and Mental Health, from the Japan Ministry of Health, Labour and Welfare. The sponsor of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The authors had access to the data in the study and the final responsibility to submit the paper.

Conflict of interest We have read the journal's policy and have the following conflicts: KI is employed part-time by Chugai Pharmaceutical Company and Medical Care Toranomon as a clinical psychologist. NK has received honoraria for speaking at CME meetings sponsored by GlaxoSmithKline, Eizai and Pfizer. He is on the advisory board for Sekisui Chemicals and Junpukai Health Care Center. He has received royalties from Chuo-HokiShuppan, Igaku-Shoin, Kyobun-do, Life Science, Maruzen, Nanko-do, Nanzan-do, and Fujitsu Software Technologies, Ltd., and research grants from Fujitsu Software Technologies, Ltd., Softbank, Co., Ltd., and Japan Management Association.

American Psychiatric Association, 2000. Diagnostic and statistical manual of mental disorders: DSM-IV-TRs. American Psychiatric Pub, Washington, DC. Angst, J., Adolfsson, R., Benazzi, F., Gamma, A., Hantouche, E., Meyer, T.D., Skeppar, P., Vieta, E., Scott, J., 2005. The HCL-32: towards a self-assessment tool for hypomanic symptoms in outpatients. J. Affect. Disord. 88, 217–233. Berk, M., Dodd, S., 2005. Bipolar II disorder: a review. Bipolar Disord. 7, 11–21. Carvalho, A.F., Takwoingi, Y., Sales, P.M., Soczynska, J.K., Kohler, C.A., Freitas, T.H., Quevedo, J., Hyphantis, T.N., McIntyre, R.S., Vieta, E., 2014. Screening for bipolar spectrum disorders: a comprehensive meta-analysis of accuracy studies. J. Affect. Disord. 172C, 337–346. Das, A.K., Olfson, M., Gameroff, M.J., Pilowsky, D.J., Blanco, C., Feder, A., Gross, R., Neria, Y., Lantigua, R., Shea, S., Weissman, M.M., 2005. Screening for bipolar disorder in a primary care practice. J. Am. Med. Assoc. 293, 956–963. Dunner, D.L., 2003. Clinical consequences of under-recognized bipolar spectrum disorder. Bipolar Disord. 5, 456–463. Fagiolini, A., Forgione, R., Maccari, M., Cuomo, A., Morana, B., Dell'Osso, M.C., Pellegrini, F., Rossi, A., 2013. Prevalence, chronicity, burden and borders of bipolar disorder. J. Affect. Disord. 148, 161–169. Fluss, R., Faraggi, D., Reiser, B., 2005. Estimation of the Youden Index and its associated cutoff point. Biom. J. 47, 458–472. Furukawa, T.A., Goldberg, D.P., Rabe-Hesketh, S., Ustun, T.B., 2001. Stratum-specific likelihood ratios of two versions of the general health questionnaire. Psychol. Med. 31, 519–529. Gardner, H.H., Kleinman, N.L., Brook, R.A., Rajagopalan, K., Brizee, T.J., Smeeding, J.E., 2006. The economic impact of bipolar disorder in an employed population from an employer perspective. J. Clin. Psychiatry 67, 1209–1218. Gervasoni, N., Weber Rouget, B., Miguez, M., Dubuis, V., Bizzini, V., Gex-Fabry, M., Bondolfi, G., Aubry, J.M., 2009. Performance of the Mood Disorder Questionnaire (MDQ) according to bipolar subtype and symptom severity. Eur. Psychiatry 24, 341–344. Ghaemi, S.N., Ko, J.Y., Goodwin, F.K., 2001. The bipolar spectrum and the antidepressant view of the world. J. Psychiatr. Pract. 7, 287–297. Ghaemi, S.N., Miller, C.J., Berv, D.A., Klugman, J., Rosenquist, K.J., Pies, R.W., 2005. Sensitivity and specificity of a new bipolar spectrum diagnostic scale. J. Affect. Disord. 84, 273–277. Glick, I.D., 2004. Undiagnosed bipolar disorder: new syndromes and new treatments. Prim. Care Companion J. Clin. Psychiatry 6, 27–33. Hirschfeld, R.M., Williams, J.B., Spitzer, R.L., Calabrese, J.R., Flynn, L., Keck Jr., P.E., Lewis, L., McElroy, S.L., Post, R.M., Rapport, D.J., Russell, J.M., Sachs, G.S., Zajecka, J., 2000. Development and validation of a screening instrument for bipolar spectrum disorder: the Mood Disorder Questionnaire. Am. J. Psychiatry, 157; pp. 1873–1875. Jaeschke, R., Guyatt, G.H., Sackett, D.L., 1994. Users' guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The Evidence-Based Medicine Working Group. JAMA 271, 703–707. Kawakami, N., Takeshima, T., Ono, Y., Uda, H., Nakane, Y., Nakamura, Y., Tachimori, H., Iwata, N., Nakane, H., Watanabe, M., 2008. Twelve-month prevalence, severity, and treatment of common mental disorders in communities in Japan: the World Mental Health Japan 2002–2004 survey. The WHO world mental health surveys: global perspectives on the epidemiology of mental disorders, 474–485. Kessler, R.C., Akiskal, H.S., Ames, M., Birnbaum, H., Greenberg, P., Hirschfeld, R.M., Jin, R., Merikangas, K.R., Simon, G.E., Wang, P.S., 2006. Prevalence and effects of mood disorders on work performance in a nationally representative sample of U.S. workers. Am. J. Psychiatry 163, 1561–1568. Kim, B., Wang, H.R., Son, J.I., Kim, C.Y., Joo, Y.H., 2008. Bipolarity in depressive patients without histories of diagnosis of bipolar disorder and the use of the Mood Disorder Questionnaire for detecting bipolarity. Compr. Psychiatry 49, 469–475. Laxman, K.E., Lovibond, K.S., Hassan, M.K., 2008. Impact of bipolar disorder in employed populations. Am. J. Manag. Care 14, 757–764. Mantere, O., Suominen, K., Leppamaki, S., Valtonen, H., Arvilommi, P., Isometsa, E., 2004. The clinical characteristics of DSM-IV bipolar I and II disorders: baseline findings from the Jorvi Bipolar Study (JoBS). Bipolar Disord. 6, 395–405.

38


Michalak, E.E., Yatham, L.N., Maxwell, V., Hale, S., Lam, R.W., 2007. The impact of bipolar disorder upon work functioning: a qualitative analysis. Bipolar Disord. 9, 126–143. Miller, C.J., Klugman, J., Berv, D.A., Rosenquist, K.J., Ghaemi, S.N., 2004. Sensitivity and specificity of the Mood Disorder Questionnaire for detecting bipolar disorder. J. Affect. Disord. 81, 167–171. Miller, S., Dell'Osso, B., Ketter, T.A., 2014. The prevalence and burden of bipolar depression. J. Affect. Disord. 169S1, S3–S11. Peirce, J.C., Cornell, R.G., 1993. Integrating stratum-specific likelihood ratios with the analysis of ROC curves. Med. Decis. Mak. 13, 141–151. Phelps, J.R., Ghaemi, S.N., 2006. Improving the diagnosis of bipolar disorder: predictive value of screening tests. J. Affect. Disord. 92, 141–148. Ransohoff, D.F., Feinstein, A.R., 1978. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N. Engl. J. Med. 299, 926–930.

Scupin, R., 1997. The KJ method: a technique for analyzing data derived from Japanese ethnology. Hum. Organ. 56, 233–237. Skeppar, P., Adolfsson, R., 2006. Bipolar II and the bipolar spectrum. Nord. J. Psychiatry 60, 7–26. Stang, P., Frank, C., Ulcickas Yood, M., Wells, K., Burch, S., 2007. Impact of bipolar disorder: results from a screening study. Prim. Care Companion J. Clin. Psychiatry 9, 42–47. Tanaka, T., Koyama, T., 2011. Rating scales for bipolar disorder: in view of the debate over underdiagnosis and overdiagnosis of bipolar disorder (in Japanese). Rinsho-Seisinigaku 40, 251–259. Twiss, J., Jones, S., Anderson, I., 2008. Validation of the Mood Disorder Questionnaire for screening for bipolar disorder in a UK sample. J. Affect. Disord. 110, 180–184. World Health Organization, 2004. International Statistical Classification of Diseases and Related Health Problems. World Health Organization, Geneva.

Mobile teledermatology for skin cancer screening: A diagnostic accuracy study.

Diagnostic Precursors to Bipolar Disorder in Offspring of Parents With Bipolar Disorder: A Longitudinal Study.

Screening for bipolar disorder during pregnancy.

A diagnosis of bipolar spectrum disorder predicts diagnostic conversion from unipolar depression to bipolar disorder: a 5-year retrospective study.

Screening for bipolar disorder: confusion between case-finding and screening.

Screening for bipolar disorder among migraineurs: the impact of migraine-bipolar disorder comorbidity on disease characteristics.

Development and validation of a screening instrument for bipolar spectrum disorder: The Mood Disorder Questionnaire Thai version.

Historical Underpinnings of Bipolar Disorder Diagnostic Criteria.

Workplace-Based Assessment of Internal Medicine Resident Diagnostic Accuracy.

Clinical correlates of age at onset distribution in bipolar disorder: a comparison between diagnostic subgroups.

THE VALIDITY OF THE MOOD DISORDER QUESTIONNAIRE FOR SCREENING BIPOLAR DISORDER: A META-ANALYSIS.

Accuracy study of the main screening tools for temporomandibular disorder in children and adolescents.

Screening Characteristics of Bedside Ultrasonography in Confirming Endotracheal Tube Placement; a Diagnostic Accuracy Study.

Screening Characteristics of TIMI Score in Predicting Acute Coronary Syndrome Outcome; a Diagnostic Accuracy Study.

Development of consensus statements for pregnancy screening in diagnostic nuclear medicine: a Delphi study.

Diagnostic and clinical implications of functional neuroimaging in bipolar disorder.

Potential Child Abuse Screening in Emergency Department; a Diagnostic Accuracy Study.

Asenapine for bipolar disorder.

The diagnostic accuracy of urine lipoarabinomannan test for tuberculosis screening in a South African correctional facility.

Diagnostic accuracy of screening tests for COPD: a systematic review and meta-analysis.

Development and validation of a new multidimensional measure of inspiration: associations with risk for bipolar disorder.

Population screening for variant Creutzfeldt-Jakob disease using a novel blood test: diagnostic accuracy and feasibility study.

Duration of untreated bipolar disorder: a multicenter study.

Early intervention for adolescents at high risk for the development of bipolar disorder: pilot study of Interpersonal and Social Rhythm Therapy (IPSRT).