Journal of Psychosomatic Research 76 (2014) 374–379

Contents lists available at ScienceDirect

Journal of Psychosomatic Research

Validity and reliability of the Brief Insomnia Questionnaire in the general population in Hong Kong Ka-Fai Chung a,⁎, Wing-Fai Yeung a, Fiona Yan-Yee Ho a, Lai-Ming Ho b, Kam-Ping Yung a, Yee-Man Yu a, Chi-Wa Kwok a a b

Department of Psychiatry, The University of Hong Kong, Hong Kong, China School of Public Health, The University of Hong Kong, Hong Kong, China

a r t i c l e

i n f o

Article history: Received 18 October 2013 Received in revised form 12 March 2014 Accepted 13 March 2014 Keywords: Diagnosis DSM-IV-TR DSM-5 ICD-10 ICSD-2 Insomnia

a b s t r a c t Objectives: The Brief Insomnia Questionnaire (BIQ) was first validated in the U.S. for insomnia disorders according to the Diagnostic and Statistical Manual, Fourth Edition, Text Revision (DSM-IV-TR), International Classification of Diseases, Tenth Edition (ICD-10) and research diagnostic criteria/International Classification of Sleep Disorders, Second Edition (RDC/ICSD-2). We aimed to determine the validity and reliability of a Hong Kong Chinese version of the BIQ to derive the DSM-5 in addition to other insomnia diagnoses in a general population sample. Methods: Probability subsamples of population-based epidemiological survey respondents (n = 2011) completed test–retest (n = 120) and clinical reappraisal (n = 176) interviews. Results: Short-term test–retest reliability was moderate for most BIQ items (Pearson r N 0.40), except for the number of nights with problems staying asleep, amount of time awake, duration of sleep problems and sleep onset latency. The areas under the receiver operating characteristic curve for the DSM-IV-TR, DSM-5, ICD-10 and RDC/ICSD-2 insomnia disorder ranged from 0.76 to 0.86, indicating high individual-level concordance between BIQ and clinical-interview diagnoses. The use of super-normal control and BIQ symptom-level data further improves the diagnostic concordance. Prevalence estimates based on the BIQ dichotomous classification were comparable with estimates based on clinical interviews for the DSM-5, RDC/ICSD-2 and any of the DSM-IV-TR, ICD-10 and RDC/ICSD-2 insomnia disorders. Conclusion: The Hong Kong Chinese version of the BIQ generates accurate prevalence estimates for insomnia disorders in the general population. Modification of the BIQ scoring algorithms and use of trained interviewers may further improve its diagnostic performance. © 2014 Elsevier Inc. All rights reserved.

Introduction Insomnia is a distressing and disabling condition that has significant public health implications. Epidemiological studies have reported the prevalence of insomnia ranging from 6% to 48% [1]. In Hong Kong, the point prevalence of insomnia in the general population was estimated to be 11.9% [2] and 39.4% in another study [3]. Estimates of insomnia prevalence vary widely due to 2 major sources of variability [4]. The first is criterion variance, which occurs when different sets of rules are used for diagnosis. There have been different definitions of insomnia symptoms, different frequency and duration criteria and differences in the definition of significant distress and daytime impairment. The criterion variability can be solved by using standardized diagnostic criteria. The best known diagnostic criteria for insomnia are the Diagnostic and Statistical Manual (DSM) of the American Psychiatric Association, the ⁎ Corresponding author at: Department of Psychiatry, The University of Hong Kong, Pokfulam Road, Hong Kong, China. Tel.: +852 22554487; fax: +852 28551345. E-mail address: [email protected] (K.-F. Chung).

http://dx.doi.org/10.1016/j.jpsychores.2014.03.002 0022-3999/© 2014 Elsevier Inc. All rights reserved.

International Classification of Diseases (ICD) of the World Health Organization and the International Classification of Sleep Disorders (ICSD) endorsed by various national professional sleep societies, including the American Academy of Sleep Medicine. The other source of variability is information variance, which occurs when different levels and types of data are collected about an individual by different interviewers. The use of standardized diagnostic instruments can reduce information variance; however, these are quite lacking for the assessment of insomnia. The only diagnostic instrument that has been used is the SleepEVAL interview developed by Ohayon [5]. Although the Sleep-EVAL is fully structured and able to derive DSM and ICSD diagnoses, the system is generally not available to the research community. Recently, a standardized questionnaire, the Brief Insomnia Questionnaire (BIQ), was developed for use in the America Insomnia Survey, an epidemiological survey of over 10,000 managed health care plan subscribers [6]. The BIQ is able to detect insomnia disorder according to the DSM, Fourth Edition, Text Revision (DSM-IV-TR) [7], ICD, Tenth Edition (ICD-10) [8] and research diagnostic criteria/ICSD, Second Edition (RDC/ICSD-2) [9,10], and it has an advantage over instruments

K.-F. Chung et al. / Journal of Psychosomatic Research 76 (2014) 374–379

designed for a single system, such as the Athens Insomnia Scale [11]; hence insomnia prevalence according to various standardized diagnostic criteria can be obtained by a single, easy-to-use, lay-administered questionnaire. With 2 additional questions and an extra item under the impairment section, the BIQ should be able to generate insomnia diagnosis according to the recent DSM-5 criteria [12]. A brief outline of the DSM-5 criteria is presented in Appendix A. The purpose of this study is to examine the validity and reliability of a Hong Kong Chinese version of the BIQ in a general population sample.

Method Sample The study population consisted of Hong Kong residents older than 18 years and able to communicate in Cantonese or Mandarin Chinese languages. The randomization process was divided into 2 parts: randomization of telephone numbers and selection of respondents in households. Telephone numbers in Hong Kong are listed in telephone directories automatically unless the customers request their numbers be withheld. As of September 2012, the fixed telephone line density in Hong Kong was 102 lines per 100 households, which was among the highest in the world [13]. We selected telephone numbers randomly from computerized residential telephone directories with no stratification applied and generated some unlisted numbers by adding and subtracting 1 and 2 from the selected numbers [14]. Duplicate numbers were screened out. Within each household, respondents were randomly selected by asking to speak to the person who was going to celebrate his/her next birthday. This technique is commonly used to overcome respondent selection bias associated with administering the survey to the household member most likely to answer the phone. A recent review detected no significant differences in demographic distribution between “next birthday” and true probability samples [15]. Verbal consent was obtained from all participants and all procedures used in this study were reviewed and approved by the local institutional review board.

375

Procedure A fully-structured lay-administered telephone interview was conducted by the Public Opinion Programme, The University of Hong Kong. We successfully interviewed 2011 respondents from July 24 to December 6, 2012. The overall response rate was 64.3%. There were 1019 refusals at household or respondent-levels and 97 partial responses. The first section included an introduction and verbal consent, followed by the Hong Kong Chinese version of the BIQ, and then sociodemographics, including age, gender, occupation and level of education. The last section consisted of verbal consent to another telephone interview on their sleep problem. In most cases, the telephone interview could be completed within 15 min. We initially planned to generate 2 subsamples of 200 subjects each for the purpose of clinical reappraisal and test–retest. In line with the original BIQ validation study [6], we planned to oversample BIQ positives with 80 cases, 65 subthreshold cases and 55 non-cases in each subsample. Initially, we randomly allocated the respondents for clinical reappraisal and test– retest in a ratio of 1:1; in the mid-stage of the survey, we noticed that the number of cases and subthreshold cases might not be sufficient for re-interview and therefore allocated the remaining cases and subthreshold cases for clinical reappraisal only. The final clinical reappraisal subsample consisted of 73 cases, 51 subthreshold cases and 52 noncases, which allowed a Cohen's kappa (κ) of 0.7 with 2-sided 95% confidence interval (CI) of 0.1 [16]; for the test–retest subsample, it consisted of 34 cases, 23 subthreshold cases and 63 non-cases, which could yield a power of 0.9 to detect a Pearson correlation of 0.3 [17]. Table 1 presents the sociodemographic characteristics of the total sample and subsamples compared to the population census data. There were higher proportions of females in the total sample and subsamples, but the clinical reappraisal subsample was very similar to the census population in educational level despite a slightly lower mean age, while the test– retest subsample was similar to the census population in mean age, but had slightly more participants with secondary school education or higher. The telephone-based clinical reappraisal interviews were conducted in a blinded manner 2–14 days after the first interview by 2 senior authors

Table 1 Socio-demographic characteristics of the total sample and subsamples compared to censes population data. Variables

Hong Kong general population aged ≥ 18 yr (N = 5,999,455)a

Total sample (N = 2011)

Test–retest subsample (N = 120)

Clinical reappraisal subsample (N = 176)

Age in yr, mean (SD) Sex, male/female Education, N (%) Primary Secondary Tertiary Marital status, N (%) Never married Married Divorced Cohabited, separated or widow Occupation, N (%)b Professional and associate professional Skilled and semi-skilled worker Unskilled worker Retired Students Homemakers/others Unemployed Income, N (%)b No income b$10,000 $10,000–19,999 $20,000–29,999 N$30,000

46.51 (17.2) 1/1.18

52.20 (17.9) 686/1325 (1/1.93)

48.69 (16.7) 45/75 (1/1.67)

42.29 (17.5) 57/119 (1/2.09)

a b

23.7% 48.1% 28.3%

520 (26.0) 993 (49.7) 484 (24.2)

19 (15.8) 62 (51.7) 39 (32.5)

41 (23.3) 84 (47.7) 51 (29.0)

28.8% 60.1% 4.1% 7.0%

415 (20.9) 1432 (72.1) 60 (3.0) 80 (4.0)

32 (26.7) 77 (64.2) 8 (6.7) 3 (2.5)

39 (22.2) 127 (72.2) 6 (3.4) 4 (2.3)

22.0% 26.1% 11.8% 18.0% 2.5% 17.5% 2.1%

313 (15.8) 406 (20.4) 80 (4.0) 530 (26.7) 123 (6.2) 475 (23.9) 60 (3.0)

29 (24.4) 21 (17.6) 4 (3.4) 28 (23.5) 11 (9.2) 23 (19.3) 3 (2.5)

35 (19.9) 34 (19.3) 3 (1.7) 39 (22.2) 17 (9.7) 44 (25.0) 4 (2.3)

40.3% 23.2% 20.0% 7.3% 9.2%

960 (51.1) 368 (19.6) 288 (15.3) 141 (7.5) 122 (6.5)

50 (45.0) 18 (16.2) 28 (25.2) 7 (6.3) 8 (7.2)

85 (49.4) 35 (20.3) 25 (14.5) 11 (6.4) 16 (9.3)

Population census 2011; occupation and income data based on population aged ≥20 yr. Difference from total N reflects omissions on reporting forms; income in HK$.

376

K.-F. Chung et al. / Journal of Psychosomatic Research 76 (2014) 374–379

(KC and WY) and the test–retest interviews were conducted within the same time period by 4 research team members. The re-interviews were timed to minimize memory effects and true changes in insomnia status and to provide some flexibility in contacting the respondents and performing the test–retest assessment. No respondents were paid for participation. Measures The translation of the BIQ into Chinese was conducted according to the World Health Organization guidelines [18], with steps including forward translation, expert review, back-translation, expert review, pretesting, and final version. The expert panel consists of experienced clinicians and researchers in sleep disorder. The first version of the Hong Kong Chinese BIQ was tested in a pilot sample of 25 friends and relatives of the research team. We edited the language structure and grammar according to feedback. In addition, the respondents commented that questions SS2, SS3 and SS (4–6) were somewhat difficult to answer (question numbers according to the original paper). These questions asked how much time respondents typically spend in bed either watching TV, reading or talking to their partner, how much time they spend lying in bed trying to get to sleep but not sleeping, and how long they actually sleep on a week-night and a weekend night. After reviewing the diagnostic algorithm, we considered that separate questions for weekdays and weekends were not absolutely required. In the revised version, we asked the subjects the time spent in bed engaging in the 3 kinds of activities during week-nights only. We added 2 questions to meet the DSM-5 diagnostic criteria of insomnia disorder [12]. One is on satisfaction with sleep quality and quantity (SL21a), with possible responses being very satisfied, rather satisfied, half–half, rather dissatisfied, and very dissatisfied. The other additional question is on how often participants are able to return to sleep when waking up too early (SL10a), with possible responses: most of the time, sometimes, rarely, and never. Impulsivity and aggression that are associated with sleep complaints were included as an item (SL32a) under the section on impairments in daytime functioning. The second version of the Hong Kong Chinese version of the BIQ was piloted on 15 subjects and then re-edited and approved by the research team for use in this study. The scoring algorithm of DSM-5 insomnia disorder is listed in Appendix A. The clinical reappraisal was conducted using a standardized semistructured questionnaire, developed specifically for BIQ validation [6]. It includes DSM-5, DSM-IV-TR, ICD-10 and RDC/ICSD-2 symptom checklists, with rating categories of definite, probable, possible and no for each symptom and classification as case or non-case for each diagnostic system. The first 32 clinical reappraisal interviews were conducted by the first author (KC), an experienced clinician and researcher, and were audiotaped for training purposes. The other interviewer (WY), after adequate training, conducted the remaining 144 clinical reappraisal assessment. Interviewer training included discussion of the diagnostic criteria, interviewing skill training, review of audiotaped interviews and practice in administration and scoring using the structured checklist. The first 56 interviews by WY were audiotaped for review. Anything found unclear during audiotape reviews was clarified by contacting and re-checking with respondents. The kappa coefficients between KC and WY for all diagnostic categories were 1.0 based on the last 20 audiotaped interviews. The test–retest interview began with an explanation that its purpose was to test the instrument and that participants should not try to remember their responses in the initial interview but, rather, to respond in the way they currently thought and provide the most accurate answer to the questions. Data analysis All statistical analysis was done by STATA 10.0. The analysis was similar to the original validation study so that comparison was feasible [6].

The subsamples were weighted to adjust for the oversampling of BIQ positives. Validity of the BIQ was assessed by the concordance between diagnoses based on the BIQ and diagnoses based on clinical reappraisal. Test–retest reliability was evaluated by the stability of responses over time. The validity and reliability at both aggregate and individual levels were tested. At the aggregate level, we compared prevalence estimates based on the BIQ and the clinical reappraisal by the McNemar χ2 tests. Individual-level diagnostic concordance was evaluated using 2 descriptive measures, the area under the receiver operating characteristic curve (AUC) [19] and κ. We also reported sensitivity (SN), specificity (SP), positive predictive value (PPV) and negative predictive value (NPV). The odd ratio, which is equal to [SN × SP] / [(1 − SN) × (1 − SP)] was also used to assess concordance between diagnoses based on the BIQ and the clinical reappraisal. To determine whether the BIQ symptomlevel data improved the prediction of clinical diagnoses, a series of stepwise logistic-regression equations, in which clinical diagnoses were treated as dichotomous outcomes and BIQ symptom variables were included along with BIQ diagnoses as predictors. We compared the AUC based on the dichotomous BIQ classification with that of the continuous predicted probability based on the BIQ symptom-level data. In line with the original BIQ validation study, we excluded the borderline cases to create a sample with super-normal controls and re-analyzed the psychometric performance of the BIQ. For test–retest analysis at the aggregate level, we compared test and retest interview means. At the individual level, we calculated Pearson correlations between reports made in the 2 interviews. Statistical significance was evaluated in all of the above analyses using 0.05 level 2-sided tests. The Taylor series linearization method was used to adjust standard errors of Pearson's correlation and estimates in the regression models.

Results Test–retest reliability Moderate short-term test–retest reliability was found for most of the core BIQ items (Table 2). Pearson correlations for reports about the number of nights in a typical week respondents have problems falling asleep, staying asleep, waking up too early, and waking up feeling unrested over a period of 2 days to 2 weeks are 0.60, 0.38, 0.41 and 0.71, respectively. For report about the number of nights with any of the sleep difficulties, test–retest reliability is moderate (0.46). Correlation is high for reports about the amount of time waking up earlier than wanted (0.72), but moderate for the number of awakenings per night (0.63), severity of nonrestorative sleep (0.53), satisfaction with sleep (0.47) and amount of time awake at night (0.37). However, correlation is weak for reports of how long it takes to fall asleep (−0.06). Correlations for daytime impairment or distress caused by sleep problems are high, ranging from 0.76 to 0.80, but for the duration of sleep problems, test–retest correlation is unsatisfactory (0.22). Explanations were sought on the BIQ items with Pearson correlations lower than 0.70. The severity of nonrestorative sleep showed a better test–retest correlation, from 0.53 to 0.74, after dichotomizing the answers into mild/moderate and severe/very severe. We explored whether the weak correlations for sleep onset latency (−0.06), the duration of sleep problems (0.22), the amount of time awake at night (0.37) and the number of awakenings per night (0.63) were partly due to wide response ranges and could be corrected by ranking the responses. Sleep onset latency and the amount of time awake at night were rank-ordered into 0 to b30 min, 30 min to b60 min, 60 min to b120 min and 120 min or longer. The duration of sleep problems was ranked as 0 to b3 months, 3 months to b1 year, 1 year to b5 years and 5 years or longer. The number of awakenings was ranked as 0 to 2, 3 to 4, 5 to 7 and 8 or more. The categories were chosen based on their clinical relevance. After rank-ordering the responses, test–retest correlations became higher for the duration of sleep problems (0.57) and the amount of time awake at night (0.59), but remained low for sleep onset latency (0.11) and the number of awakenings per night (0.58). We also tested whether the weak correlation for the number of nights with problems falling asleep (0.60), problems staying asleep (0.38), waking too early (0.41), and any sleep difficulties (0.46) were influenced by the unstable reports in good sleepers and subthreshold cases. It was found that test–retest correlation in the number of night with problems staying asleep was higher among BIQ cases (0.49) and much lower among non-cases (0.26), while the weak correlations in the number of nights with problems falling asleep, waking too early and any sleep difficulties were influenced by the unstable reports in subthreshold cases, which had a test–retest correlation as low as 0.07, 0.07 and 0.19, respectively. Mean values of the BIQ items were generally similar for the test and retest interviews, except for the number of nights with any sleep difficulties and the Sheehan Disability Scale score, which had significant changes with time (Table 2). The number of nights with any

K.-F. Chung et al. / Journal of Psychosomatic Research 76 (2014) 374–379 Table 2 Two-day to two-week test–retest reliability of the Brief Insomnia Questionnaire items in a weighted sample of 120 Chinese adults. Time 1

Time 2

Mean (SD)

Mean (SD)

0.9 (1.1) 1.7 (2.2) 2.2 (3.3) 2.0 (2.2) 2.3 (2.0)

0.9 (1.1) 1.9 (2.2) 2.5 (2.2) 1.8 (2.2) 3.4 (3.0)

120 120 120 120 98

0.60*** 0.38*** 0.41*** 0.71*** 0.46***

II. Severity on nights of occurrence Sleep latency, min Amount of time awake at night, min Number of awakenings per night Earliness of waking in morning, min Severity of nonrestorative sleep Satisfaction with sleep

93.7 (53.7) 51.2 (55.5) 2.1 (1.5) 34.0 (26.2) 1.7 (0.8) 2.9 (0.9)

92.8 (84.9) 69.9 (75.7) 2.1 (1.5) 32.7 (23.3) 1.6 (0.8) 2.8 (0.9)

39 52 57 53 72 77

−0.06 0.37** 0.63*** 0.72*** 0.53*** 0.47***

III. Daytime impairment/distress Difficulties caused by sleep problemc Sheehan Disability Scaleb,d Concern/worry/distress about sleep

1.9 (0.9) 2.5 (1.6) 2.0 (0.9)

1.8 (0.9) 2.0 (1.6) 2.0 (0.9)

77 67 80

0.80*** 0.79*** 0.76***

40.0 (60.6)

48.4 (98.8)

69

0.22

I. Number of nights per week Problems falling asleep Waking in the night Waking too early in the morning Nonrestorative sleep Any sleep difficultiesb

IV. Duration of sleep problems Number of months

N

Pearson ra

Abbreviations: min, minutes; SD, standard deviation. **P b 0.01, ***P b 0.001 (2-sided). a Data were weighted to adjust for the oversampling of respondents classified as cases and subthreshold cases by the Brief Insomnia Questionnaire. b Significant within-individual difference in test and retest scores based on a 0.05-level 2-sided test. c Mean responses to 8 questions about reduced motivation, performance at work, school, or social activities, making errors or having accidents, irritability, nerves or mood disturbance, daytime attention, concentration or memory problems, daytime fatigue, daytime sleepiness, and tension headache or digestive problems because of sleep problems over the past 30 days, with options 1–4 indicating none to severe impairment. d Mean responses to 4 questions about interference with home management, ability to work, social life, and close personal relationships because of sleep problems over the past 4 weeks, with options 0–10 indicating none to very severe interference.

sleep difficulties was significantly higher in the retest, but the reverse was observed for the Sheehan Disability Scale score. Concordance of diagnoses based on the BIQ and clinical interviews Prevalence estimates based on the BIQ classification were compared with those based on clinical interviews for each of the diagnostic systems. McNemar tests show that prevalence estimates differ significantly for the DSM-IV-TR and ICD-10 but there is

377

no significant difference for the DSM-5, RDC/ICSD-2 and any of the DSM-IV-TR, ICD-10 and RDC/ICSD-2 diagnoses (Table 3). Judging from the values of SN and PPV, prevalence estimates based on the BIQ are lower than those based on clinical interviews in all diagnostic systems except for the ICD-10. Individual-level concordance based on the BIQ and the clinical interviews is the highest for the DSM-IV-TR and any diagnosis (κ = 0.72 in both cases; AUC = 0.83 and 0.85, respectively); concordance is moderate for the DSM-5 and RDC/ICSD-2 (κ = 0.58 and 0.60, respectively; AUC = 0.76 and 0.78, respectively) and fair for the ICD-10 (κ = 0.48, AUC = 0.86). The discrepancy between the values of κ and AUC in the ICD-10 is probably due to its low prevalence. The high ORs with all diagnostic systems indicate that diagnoses based on the BIQ are highly associated with those derived from clinical interviews. Results show that the BIQ has better sensitivity in detecting the DSM-IV-TR and ICD10 cases (SN = 68.4% and 77.8%, respectively), but is less sensitive with the DSM-5 and RDC/ICSD-2 (SN = 55.2% and 59.5%, respectively). Most of the DSM-5, DSM-IV-TR, and RDC/ICSD-2 cases are confirmed by clinical interviews (PPV = 76.2%–89.7%), but not for the ICD-10 cases (PPV = 38.9%). The SP and NPV are high for all diagnostic systems (SP = 93.5%–97.8%; NPV = 89.8%–98.7%), indicating that the vast majority of non-cases are classified accurately by the BIQ. The use of super-normal controls We re-analyzed the diagnostic concordance after excluding the respondents who reported some sleep problems (2 or more days a week for 1 month or longer) but failed to meet full criteria for insomnia disorder according to the BIQ scoring algorithms. Results clearly show that concordance estimates are substantially inflated in this way for all diagnostic criteria, except for the ICD-10 (Table 3). With super-normal controls, the κ for the DSM-5, DSM-IV-TR and RDC/ICSD-2 range from 0.79 to 0.88, AUC 0.92–0.97, SN 87.4%– 100%, SP 94.6%–97.0%, PPV 77.4%–89.1% and NPV 97.8%–100%. For the ICD-10, the κ and PPV remain unsatisfactory, at 0.52 and 38.3%, respectively, suggesting that there are some fundamental differences between the BIQ and clinical interviews in making the ICD-10 diagnosis. Continuous classifications using BIQ symptom data The same as the original validation study, we performed stepwise logistic-regression analysis to select BIQ items that could significantly predict clinical diagnoses after controlling for the dichotomous BIQ diagnoses. Each respondent in the clinical reappraisal sample was then assigned a predicted probability based on the resulting logistic-regression equations (Table 4). Results show that the AUC after including BIQ item scores are substantially higher than the AUC using the dichotomous classification. Improvement is the most substantial for the DSM-5 and RDC/ICSD-2 (AUC from 0.76 to 0.93 and from 0.78 to 0.96, respectively).

Discussion To our knowledge, this is the first study on the validation of the BIQ in a general population sample and the application of the BIQ to generate insomnia diagnosis according to the DSM-5. Our results show that the BIQ is a valid and reliable instrument for detecting

Table 3 Consistency of diagnoses based on the Brief Insomnia Questionnaire with diagnoses based on clinical interviews in weighted normal control and super-normal control samples.a Criteria

McNemar χ2 test (p-value) OR (95% CI) Cohen's κ (95% CI) AUC (95% CI) Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI) NC (cases/control) SNC (cases/control)

NC NC NC SNC NC SNC NC SNC NC SNC NC SNC NC SNC

DSM-5

DSM-IV-TR

ICD-10

RDC/ICSD-2

Anyb

3.6 (0.0963) 34.7 (9.8, 135.8) 0.58 (0.41, 0.75) 0.79 (0.74, 0.83) 0.76 (0.67, 0.85) 0.92 (0.89, 0.94) 55.2 (35.7, 73.6) 87.4 (82.3, 91.6) 96.6 (92.2, 98.9) 95.6 (94.3, 96.7) 76.2 (52.8, 91.8) 77.4 (71.6, 82.5) 91.6 (86.0, 95.4) 97.8 (96.8, 98.5) 176 (51/125) 96 (51/45)

5.4 (0.04) 96.8 (23.4, 538.6) 0.72 (0.59, 0.85) 0.88 (0.85, 0.91) 0.83 (0.76, 0.91) 0.94 (0.93, 0.96) 68.4 (51.3, 82.5) 91.8 (88.3, 94.5) 97.8 (93.7, 99.5) 97.0 (95.9, 97.9) 89.7 (72.6, 97.8) 89.1 (85.3, 92.2) 91.8 (86.1, 95.7) 97.8 (96.8, 98.5) 176 (73/103) 118 (73/45)

6.2 (0.0225) 50.0 (7.8, 516.1) 0.48 (0.25, 0.72) 0.52 (0.44, 0.59) 0.86 (0.71, 1.00) 0.95 (0.95, 0.96) 77.8 (40.0, 97.2) 100.0 (95.2, 100.0) 93.5 (88.6, 96.7) 91.0 (89.3, 92.4) 38.9 (17.3, 64.3) 38.3 (31.4, 45.5) 98.7 (95.5, 99.8) 100.0 (99.7, 100.0) 176 (42/134) 87 (42/45)

3.9 (0.09) 32.3 (10.3, 109.4) 0.60 (0.45, 0.76) 0.85 (0.82, 0.89) 0.78 (0.69, 0.86) 0.97 (0.97, 0.98) 59.5 (42.1, 75.2) 100.0 (98.6, 100.0) 95.7 (90.8, 98.4) 94.6 (93.2, 95.7) 78.6 (59.0, 91.7) 78.5 (73.6, 82.8) 89.8 (83.7, 94.2) 100.0 (99.7, 100.0) 176 (68/108) 113 (68/45)

0.5 (0.63) 55.7 (17.8, 183.1) 0.72 (0.59, 0.84) 0.83 (0.80, 0.86) 0.85 (0.78, 0.92) 0.93 (0.92, 0.95) 75.0 (58.8, 87.3) 92.7 (89.6, 95.1) 94.9 (89.8, 97.9) 94.1 (92.6, 95.3) 81.1 (64.8, 92.0) 82.1 (78.0, 85.6) 92.9 (87.3, 96.5) 97.8 (96.8, 98.5) 176 (88/88) 133 (88/45)

Abbreviations: AUC, area under the receiver operating characteristic curve; CI, confidence interval; DSM-5, Diagnostic and Statistical Manual, Fifth Edition; DSM-IV-TR, Diagnostic and Statistical Manual, Fourth Edition, Text Revision; ICD-10, International Classification of Diseases-10; NC, normal control sample; NPV, negative predictive value; PPV, positive predictive value; RDC/ICSD-2, Research Diagnostic Criteria and International Classification of Sleep Disorders-2; SNC, super-normal control sample. a Data were weighted to adjust for the oversampling of respondents classified as cases and subthreshold cases by the Brief Insomnia Questionnaire. b Any of DSM-IV-TR, ICD-10 and RDC/ICSD-2 diagnoses.

378

K.-F. Chung et al. / Journal of Psychosomatic Research 76 (2014) 374–379

Table 4 Comparisons of AUC in predicting clinical diagnoses based on the dichotomous Brief Insomnia Questionnaire classification and the continuous predicted probabilities based on item-level data in a weighted sample of 176 adults. AUC Criteria

Dichotomous

Continuous

DSM-5 DSM-IV-TR ICD-10 RDC/ICSD-2 Anya

0.76 (0.67, 0.85) 0.83 (0.76, 0.91) 0.86 (0.71, 1.00) 0.78 (0.69, 0.86) 0.85 (0.78, 0.92)

0.93 (0.88, 0.98) 0.89 (0.83, 0.94) 0.90 (0.84, 0.97) 0.96 (0.93, 0.99) 0.89 (0.83, 0.94)

Abbreviations: AUC, area under the receiver operating characteristic curve; DSM-5, Diagnostic and Statistical Manual, Fifth Edition; DSM-IV-TR, Diagnostic and Statistical Manual, Fourth Edition, Text Revision; ICD-10, International Classification of Diseases-10; RDC/ ICSD-2, Research Diagnostic Criteria and International Classification of Sleep Disorders-2. a Any of DSM-IV-TR, ICD-10 and RDC/ICSD-2 diagnoses.

insomnia disorder in the general population. Moderate short-term test– retest reliability was found for most of the core BIQ items. High individual-level concordance was noticed between BIQ diagnoses and diagnoses based on clinical interviews, with AUC ranging from 0.76 to 0.86. Diagnostic concordance could be further improved with the use of super-normal controls and BIQ symptom-level data. Prevalence estimates based on the BIQ were comparable with estimates based on clinical interviews for the DSM-5, RDC/ICSD-2 and any of the DSM-IV-TR, ICD-10 and RDC/ICSD-2 insomnia diagnoses, but for the DSM-IV-TR, BIQ estimate was significantly lower than clinical-interview estimates and for the ICD-10, it was significantly higher. A few methodological differences between this study and the original BIQ validation study should be highlighted. The Hong Kong sample was selected from the general population by means of territory-wide telephone directories while the U.S. sample was restricted to fully insured members of a large national commercial health plan. The U.S. sample also received an advance letter; hence, the level of cooperation may be higher. There is no monetary incentive in this study, but in the U.S. study, a $20 incentive was offered for participation and a further $20 for retest and clinical reappraisal interviews. With monetary incentive, participants in the U.S. study may have put more effort in the interviews and the quality of data is better [20,21]. The test–retest interviews in the Hong Kong study were conducted 2 days to 2 weeks after the first interview, but the retests in the U.S. study were carried out 2 days after; hence, there was a higher chance that participants in the U.S. study could remember their responses while participants in this study experienced true changes in insomnia condition. In this study, retest interviews were conducted by our research members who have basic knowledge in sleep medicine while in the U.S. study the same computer-assisted telephone interview was conducted; hence, there was a higher risk that information gathered was different between the first interview and the test–retest interview in the Hong Kong study. In addition, socio-cultural factors may have caused the more unstable insomnia condition in the Hong Kong sample. A possible explanation is that sleeping in Hong Kong is more affected by the environment than in the U.S. where there are less people sharing the same flat, more distance from neighbors, and less road traffic. Possibly due to the methodological and socio-cultural differences between our study and the U.S. study, the test–retest Pearson's correlations in this study are generally lower. Although the correlation is statistically significant for most of the items in our version, it is below 0.40 in 4 items, including the number of nights with problems staying asleep, amount of time awake at night, duration of sleep problems and sleep onset latency. The test–retest correlations in 3 of the 4 items are improved after data transformation, while the correlation for sleep onset latency remains low. It is well recognized that night-to-night sleep variability is high in people with chronic insomnia [22,23]. The low test–retest correlation may be due to the inter-night variability in sleep onset latency among the participants with problems falling asleep.

Psychosocial factors in people with insomnia may have also affected the self-report of sleep-wake parameters, resulting in the low test–retest correlation [24,25]. At an aggregate level, paired t-tests show that participants in this study report significantly greater number of nights with any sleep difficulties but significantly lower Sheehan Disability Scale scores during test–retest interviews. The lower Sheehan Disability Scale scores during retest, which was also found in the original validation study, may be due to a downward trend on repeated testing. It is unclear the reasons for the significantly greater number of nights with any sleep difficulties reported by the participants during retests. The Hong Kong Chinese version of the BIQ possesses satisfactory power in discriminating insomnia cases and non-cases and its performance is comparable to that of the English version [6]. The AUC for the different diagnostic systems ranges from 0.76 to 0.86, suggesting that the BIQ can be used to detect insomnia disorder in a non-selected general population sample. When super-normal controls are used, the AUC increases to 0.92–0.97. The higher AUC upon using super-normal control than using a mixed sample of borderline and non-cases has been shown in previous studies [26,27]. We found similar improvement in the psychometric performance of the BIQ when symptom-level data was used, suggesting that some forms of modification of the dichotomous BIQ classification may enhance its diagnostic accuracy. Further studies to explore modification of the BIQ scoring algorithms may be needed. As the BIQ was developed to assist in epidemiological studies, it is important for the BIQ to generate accurate prevalence estimates. Taking this perspective, the Hong Kong Chinese version of the BIQ is accurate in estimating the prevalence of DSM-5, RDC/ICSD-2 and any of the DSM-IV-TR, ICD-10 and RDC/ICSD-2 insomnia diagnoses, as there is no significant difference between the prevalence estimates by the BIQ and clinical interviews. The DSM-IV-TR, ICD-10 and RDC/ICSD-2 diagnostic criteria all have a 1-month duration criterion; with the BIQ, it is now possible to detect and estimate the prevalence of any of these insomnia diagnoses as to allow early recognition, treatment and service planning. The Hong Kong Chinese version of the BIQ is found to under-estimate the prevalence of DSM-IV-TR insomnia diagnosis but over-estimate the ICD-10 diagnosis. In the U.S. study, the BIQ was shown to under-estimate the prevalence of DSM-IV-TR and RDC// ICSD-2 insomnia diagnoses, but there was no significant difference in the prevalence estimates of ICD-10. The over-estimation of the ICD-10 insomnia diagnosis in this study is probably due to our stringent interpretation of criterion C (preoccupation with the sleeplessness and excessive concern over the consequences at night and during the day), resulting in a low prevalence of ICD-10 diagnosis based on clinical interviews. Our results show that the ICD-10 is the only diagnostic system where both κ and PPV remain unsatisfactory despite using supernormal control. Our study has a number of strengths as well as several methodological limitations. Our sample was selected from the general population; although there were differences in sociodemographic characteristics between the total sample and subsamples with the census population, age and educational level of the subsample population, the most important factors that could affect understanding of the BIQ, were not markedly different from those of the census population; hence, we believe the psychometric properties found with the subsamples are quite likely generalizable in the general population. Our response rate was only 64.3% although it is similar to the original validation study. Similar to the U.S. study, the BIQ was administered by lay interviewers in this study. It is unclear whether the performance of the BIQ can be improved by using trained interviewers. Systematic bias in the rating of symptoms and hence classification of case/non-case may be introduced. It was minimized by using a standardized semi-structured clinical reappraisal questionnaire. The accuracy of clinical diagnoses could have been improved by in-person assessments and the use of sleep diaries and other additional information. Lastly, the test–retest assessments were done between 2 days and 2 weeks after the initial interview, resulting

K.-F. Chung et al. / Journal of Psychosomatic Research 76 (2014) 374–379

in some difficulties of interpretation due to the large variability of the timing. In conclusion, the Hong Kong Chinese version of the BIQ is a valid and reliable instrument for detecting insomnia disorder in the general population. When super-normal control is used, the SN and SP of the BIQ for the DSM-IV-TR, DSM-5, ICD-10 and RDC/ICSD-2 insomnia diagnoses range from 87.4% to 100%, providing evidence for its validity. In addition, the Hong Kong Chinese version of the BIQ can accurately generate insomnia prevalence estimates, including the most recent DSM-5 diagnosis. Further studies are needed to evaluate the BIQ in other settings and to test whether the use of trained interviewers and slight modification of the BIQ scoring algorithms can help to improve its psychometric performance. Conflict of interest The authors have no competing interests to report. Acknowledgments This project was supported by the Small Project Funding of the University of Hong Kong (Grant No. 104002576). The authors would like to thank Prof. Ronald C. Kessler for his valuable suggestions on study design and statistical analysis. Appendix A. Supplementary data Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.jpsychores.2014.03.002. References [1] Ohayon MM. Epidemiology of insomnia: what we know and what we still need to learn. Sleep Med Rev 2002;6:97–111. [2] Li RH, Wing YK, Ho SC, Fong SY. Gender differences in insomnia — a study in the Hong Kong Chinese population. J Psychosom Res 2002;53:601–9. [3] Wong WS, Fielding R. Prevalence of insomnia among Chinese adults in Hong Kong: a population‐based study. J Sleep Res 2011;20:117–26. [4] Endicott J, Spitzer RL. A diagnostic interview: the schedule for affective disorders and schizophrenia. Arch Gen Psychiatry 1978;35:837–44. [5] Ohayon MM, Guilleminault C, Zulley J, Palombini L, Raab H. Validation of the SleepEVAL system against clinical assessments of sleep disorders and polysomnographic data. Sleep 1999;22:925–30. [6] Kessler RC, Coulouvrat C, Hajak G, Lakoma MD, Roth T, Sampson N, et al. Reliability and validity of the brief insomnia questionnaire in the America Insomnia Survey. Sleep 2010;33:1539–49.

379

[7] American Psychiatric Association. Diagnostic and statistical manual of mental disorders (DSM-IV-TR). 4th ed. text revision ed. Washington (DC): American Psychiatric Association; 2000. [8] World Health Organization. International classification of diseases (ICD-10). Geneva, Switzerland: World Health Organization; 1991. [9] Edinger JD, Bonnet MH, Bootzin RR, Doghramji K, Dorsey CM, Espie CA, et al. Derivation of research diagnostic criteria for insomnia: report of an American Academy of Sleep Medicine Work Group. Sleep 2004;27:1567–96. [10] American Academy of Sleep Medicine. The International Classification of Sleep Disorders, second edition (ICSD-2): diagnostic and coding manual. 2nd ed. Westchester, (IL): American Sleep Disorders Association; 2005. [11] Soldatos CR, Dikeos DG, Paparriqopoulos TJ. The diagnostic validity of the Athens Insomnia Scale. J Psychosom Res 2003;55:263–7. [12] American Psychiatric Association. Diagnostic and statistical manual of mental disorders (DSM-5). 5th ed. Arlington (VA): American Psychiatric Publishing; 2013. [13] Office of the Communications Authority, the Government of the Hong Kong Special Administrative Region. Telecommunications. Available at: http://www.gov.hk/en/ about/abouthk/factsheets/docs/telecommunications.pdf . [accessed Aug 1, 2013]. [14] Lavrakas PJ. Generating telephone survey sampling pools. Telephone survey methods: sampling, selection, and supervision. 2nd ed. California: Sage Publications Inc.; 1993. p. 27–59. [15] Gaziano C. Comparative analysis of within-household respondent selection techniques. Public Opin Q 2005;69:124–57. [16] Rotondi MA, Donner A. A confidence interval approach to sample size estimation for interobserver agreement studies with multiple raters and outcomes. J Clin Epidemiol 2012;65:778–84. [17] Munro BH. Statistical methods for health care research. Philadelphia: Lippincott Williams & Wilkins; 2005. [18] Harkness J, Pennell BE, Villar A, Gebler N, Aguilar-Gaxiola S, Bilgen I. Translation procedures and translation assessment in the World Mental Health Survey initiative. In: Kessler RC, Üstün TB, editors. The WHO World Mental Health Survey: global perspectives on the epidemiology of mental disorders. Cambridge: Cambridge University Press; 2008. p. 91–113. [19] Hanely JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29–36. [20] Edwards PJ, Roberts IG, Clarke MJ, DiGuiseppi C, Wentz R, Kwan I, et al. Methods to increase response rates to postal questionnaires. Cochrane Database Syst Rev 2007: MR000008. [21] Olsen F, Abelsen B, Olsen JA. Improving response rate and quality of survey data with a scratch lottery ticket incentive. BMC Med Res Methodol 2012;12:52. [22] Sanchez-Ortuno MM, Carney CE, Edinger JD, Wyatt JK, Harris A. Moving beyond average values: assessing the night-to-night instability of sleep and arousal in DSM-IVTR insomnia subtypes. Sleep 2011;34:531–9. [23] Sanchez-Ortuno MM, Edinger JD. Internight sleep variability: its clinical significance and responsiveness to treatment in primary and comorbid insomnia. J Sleep Res 2012;21:527–34. [24] Jackowska M, Dockray S, Hendrickx H, Steptoe A. Psychosocial factors and sleep efficiency: discrepancies between subjective and objective evaluations of sleep. Psychosom Med 2011;73:810–6. [25] Maglione JE, Ancoli-Israel S, Peters KW, Paudel ML, Yaffe K, Ensrud KE, et al. Depressive symptoms and subjective and objective sleep in community-dwelling older women. J Am Geriatr Soc 2012;60:635–42. [26] Chung KF, Kan KK, Yeung WF. Assessing insomnia in adolescents: comparison of Insomnia Severity Index, Athens Insomnia Scale and Sleep Quality Index. Sleep Med 2011;12:463–70. [27] Smith S, Trinder J. Detecting insomnia: comparison of four self‐report measures of sleep in a young adult population. J Sleep Res 2001;10:229–35.

Validity and reliability of the Brief Insomnia Questionnaire in the general population in Hong Kong.

The Brief Insomnia Questionnaire (BIQ) was first validated in the U.S. for insomnia disorders according to the Diagnostic and Statistical Manual, Four...
244KB Sizes 0 Downloads 3 Views