Getting the story straight: evaluating the test-retest reliability of a university health history questionnaire.

This article was downloaded by: [Southern Illinois University] On: 26 December 2014, At: 22:09 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of American College Health Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/vach20

Getting the Story Straight: Evaluating the Test-Retest Reliability of a University Health History Questionnaire a

b

Charles R. Gilkison RN, MSN , Mary V. Fenton RN, DrPH & Jerry W. Lester PhD a

b

Department of Internal Medicine , University of Texas Medical Branch , Galveston, USA

b

University of Texas School of Nursing , Galveston, USA Published online: 09 Jul 2010.

To cite this article: Charles R. Gilkison RN, MSN , Mary V. Fenton RN, DrPH & Jerry W. Lester PhD (1992) Getting the Story Straight: Evaluating the Test-Retest Reliability of a University Health History Questionnaire, Journal of American College Health, 40:6, 247-252, DOI: 10.1080/07448481.1992.9936289 To link to this article: http://dx.doi.org/10.1080/07448481.1992.9936289

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

Getting the Story Straight: Evaluating the Test=RetestReliability of a University Health History Questionnaire

Downloaded by [Southern Illinois University] at 22:09 26 December 2014

Charles R. Gilkison, RN, MSN; Mary V. Fenton, RN, DrPH; and Jerry W. Lester, PhD

Abstract. This study was designed to establish the reliability of a health history questionnaire used as a screening tool for

incoming university students. The authors used a test-retest design, with a test interval of 6 months, on a sample of medical and nursing students. The analysis focused on overall reliability of the questionnaire and reproducibility of specific items, based on question format. Questionnaire items of specific interest were those with dichotomous yes/no response options versus open-ended format questions, those using the words frequently or recently, or those that asked multiple questions. Demographic characteristics of the subjects were considered in the evaluation of reliability. Overall reliability of the questionnaire (93.6%) was above the anticipated level of 90%, and subject sex or program of study did not show any significant differences in reproducibility of responses. Although wording of questions did not affect item reliability, dichotomous format questions demonstrated a higher degree of reliability (%.4%) than the overall reliability of the questionnaire.Recommendations for enhancing the reliability of the questionnaire are based on item analysis and information gathered from interviews with subjects. Key Words. health, questionnaires, reliability

I

n a 1980 report on reliability of the Health Hazard Appraisal (HHA),' a relatively well-known instrument used to measure health risk based on lifestyle and personal health history, investigators reported striking contradictions in the retest analysis. This raised questions about the usefulness of the tool as a reliable clinical measure of health status or health-risk profile. Such health history questionnaires as the HHA are

Chadks R. GiIkivon is a famdy nurse practitioner in the Department of Internal Medicine at the University of TexasMedical Branch in Galveston; Mmy V. Fmton is dean and a prof m o r , and Jerry w.L.ester is an associate prafmor; both are with the University of Texas School of Nursing, Galveston.

VOL 40, MAY 1992

widely accepted and used in many healthcare settings today,' yet few of these questionnaires have been formally tested for reliability.' Although some healthcare professionals contend that problems with the reliability of questionnaires come from patients who are poor personal historians, others have shown that problems arise from the instruments themselves. Collen et al' noted, on analyzing the reliability of a 2Witem questionnaire, that subjects tended to change their answers more often on retest on questions that contained the words and/or and often or that included unfamiliar words, such as medical terms. These authors concluded that thoughtful construction of questions could reduce conflicting responses on retest. Mellne9 conducted a similar study, using a questionnaire of comparable length and a test-retest design, and obtained similar results. Questionnaire items containing words that rendered the question vague to the subject (eg, and/or) tended to be less reliable on retest than questions that were more simply worded. Mellner concluded, as did Collen, that some of the problems of unreliability could be attributed to the design of the questionnaire itself. In both of these studies, the test-retest reliability of the questionnaires was about 90%. The purpose of our present study was to determine the test-retest reliability of a university health history questionnaire used to screen incoming students. We anticipated that reliability of responses would be at least 90Q0,the level generally found in previous studies. The focus of our study was on the reliability characteristics of the questionnaire, and we anticipated that demographic characteristics of the subjects would not be a significant determinant. We assumed that questions with dichotomous yesho response options would, in general, have more consistent responses when compared with open-ended questions because of the limited options that these types of questions offered. 247

COLLEGE HEALTH

To gain better insight into problems with reliability in this particular questionnaire, we planned interviews with any subject who changed responses on more than 10% of the items on the questionnaire. The interviews were designed to identify common problems, either with the questionnaire itself or with environmental factors that might have influenced responses. We hoped that such data would help guide efforts to improve the reliability of the questionnaire.


Research Question The general question to be answered was, What is the test-retest reliability of the responses to individual items and, overall, on this health history questionnaire? The following hypotheses were formulated from this research question: 1. Test-retest reliability will be influenced by the format and/or content of each question. (a) Questions with forced-choice yes/no answers will have a lower frequen-

cy of discordant answers and therefore a higher testretest reliability than those questions without forced choices. (b) Questions using the word or, or that ask multiple questions at once, will have a higher frequency of discordant answers and therefore a lower test-retest reliability than those questions that do not. (c) Questions containing the words frequently or recently will have a higher frequency of discordant responses than those that do not. 2. The overall test-retest reliability of the questionnaire will be equal to or greater than .90. 3. No significant differences among the reliabilities of the questionnaire items will be based on demographic characteristics of the subjects. METHOD The study was approved by the Institutional Review Board of the university. The Personal Health History Questionnaire (PHHQ), the instrument we analyzed in this study, was developed as a screening instrument to identify incoming students who may require a physical examination before beginning their academic programs. This study used a test-retest design to determine the reproducibility of answers on the PHHQ by a volunteer sample of nursing and medical students. The interval between the first and second administrations of the questionnaire was 6 months. Discordant responses were those responses on the PHHQ that clearly disagreed with the response given to the same question during the first administration of the questionnaire. Reproducible responses were those responses that matched or were equivalent on both administrations of the questionnaire. The analyses focused on particular types or styles of questions, assuming that healthcare students would reduce the contribution of unreliable historian effects to a minimum. 248

Subjects The study sample included medical and nursing undergraduate and graduate students at a southwestern university medical center. Eligible students included those currently enrolled who had entered the university in the previous semester and had completed the PHHQ at that time. We collected data from a convenience sample of 70 eligible student volunteers who were willing to read and sign an informed consent form, complete the questionnaire, and return it to the investigator by the deadline date. In addition, volunteers consented to be interviewed if we requested an interview.

Instrument The PHHQ contains five discrete sections. The first section, family history, addresses the subject’s immediate family members’ sex, age, and state of health. Also included are questions about the incidence of certain diseases among blood relatives. The amount of information provided is largely affected by the subject’s family size. The second section, personal habits, contains 14 questions regarding smoking, alcohol consumption, and exercise behavior. The third section, medications and immunizations, contains 16 questions that ask about prescription medicines and the dates of immunizations for various diseases. The fourth section, medical history, asks open-ended questions about previous illnesses, hospitalizations, and operations; it also contains a series of questions specific to women’s health, such as menstrual and gestational history. The fifth section, signs and symptoms, asks 64 yes/no questions about specific health complaints.

Procedure Volunteer nursing students and medical students were recruited during their classes. Subjects who agreed to participate were given a packet of materials that included the PHHQ and a consent form, both of which were encoded with the subject’s study identification number, and a self-addressed envelope for return of the questionnaire. The consent form gave explicit permission to retrieve and photocopy the original PHHQ that the subjects had completed when they entered school the previous semester. When we received the questionnaires in the mail, we retrieved subjects’ original questionnaires, photocopied them, and encoded them with the subject’s individual identification number. To ensure confidentiality, we blocked out all identifying information during reproduction. We compared PHHQ forms question by question for consistent responses between the two tests. Questions that did not elicit a response on both occasions were categorized as missing data. These missing responses were treated separately from discordant responses, and we excluded missing data from the reliability analysis. Patterns of missing data were analyzed separately. Separate JACH


HEALTH HISTORY QUESTIONNAIRES

analyses were performed on (1) the overall reliability of the questionnaire, (2) the reliability of individual questions with forced-choice yesho response options, (3) the reliability of individual yesho response questions containing the word or or that asked multiple questions simultaneously, (4) the reliability of individual yeslno response questions containing the words recently or frequently, and (5) the presence of significant differences in the reliability of the questionnaire related to the demographic characteristics of the subjects sampled. After we completed our data analysis, we selected a subject subgroup consisting of 11 persons who gave discordant answers on more than 10% of the questions to which they responded. We interviewed this subgroup to identify reasons for discordant answers and missing data. RESULTS

Of 260 students eligible, 70 volunteered, yielding an effective response rate of 27%. The sample included 46 medical students (65.7%) and 24 nursing students (34.3%). Thirty-six of the subjects (51.4%) were women and 35 subjects (48.6%) were men. The age of the subjects ranged from 20 to 44 years, with a mean age of 26.6 years. Relative participation of students from classes invited to volunteer for the study was similar26.7% were from nursing courses, 27% from the medical school. The range of the reproducibility or reliability of the PHHQ across subjects was from 81.2% to 100%. Across all subjects, 9,080 responses were identified for test-retest reliability analysis. From this total, 944 responses were eliminated because of missing data, leaving 8,136 responses eligible for reliability analysis. Of the 8,136 responses that could be compared, 516 had discordant responses, and 7,620 had reproducible answers across both testings. Reproducible, or reliable, responses represented 93.6% of the total eligible questions analyzed. In most cases, the second questionnaire was less complete in total responses than the first questionnaire. The majority of the missing data were in the medications/immunizations section (72070), followed by family history (17%).

I

Questions that required dichotomous yesho responses were primarily those in the final section of the questionnaire, which asked about signs and symptoms of various diseases. Of the 4,400 responses that could be analyzed, 158 were categorized as discordant, yielding a reliability of %.4% for the yes/no questions, the highest of any section of the questionnaire. This finding supported the speculation that questions with forced-choice yesho responses would have a lower percentage of discordant answers than those questions without forced choices. For comparison, reproducibility of answers of all health information categories is shown in Table 1; responses to the forced-choice yes/no format questions (%.4%) compared favorably when juxtaposed to the other questionnaire categories. We designed our analysis of items within the section of dichotomous yes/no choice questions to identify variations in reliability on the basis of the style of the question. We speculated that questions using nonspecific terminology, such as the words frequently or recently, or items that asked multiple questions simultaneously might be interpreted by the subject in a more ambiguous manner and would be more prone to nonreproducible responses over time. This proved not to be the case. The mean reliability for questions using terms such as frequently was 97.1%, for terms such as recently, 93.8070, and for multiple questions, 95.9%. One interesting finding related to items in this section was the identification of the questions that had the lowest percentage of changed responses on retest. Of the 64 questions in this section, 3 questions demonstrated a markedly higher rate of changed responses compared with the other items. The questions were: (1) Have you been overly tired? (80010), (2) Have you had anxiety/depression/emotional difficulty? (81.2070), and (3) Trouble sleeping? (82.6%). In the case of these questions, most responses were changed from negative in the first testing to positive in the retest 6 months later. Demographic variables showed the sample to be relatively homogeneous with respect to age and race. Groups that were evenly distributed so that meaningful comparisonscould be made were those based on sex and

TABLE 1 Distribution of Discordant Responses by Health Information Category

Family history Personal habits Medications/immunizations Medical history Signs/symptoms Total

Total responses

Discordant responses

Percentage discordant

Category reliability

1,342 %7 442 987 4,400 8,138

69 84 124 81 158 516

5.1 8.7 28.0 8.2 3.6 6.3

94.9 91.3 72.0 91.8 %.4 93.7

I

VOL 40, MAY 1992

249

COLLEGE HEALTH


program of study because these groups were more similar in numbers. Age groups were less balanced, but they contained enough variation for comparison. Comparisons of reliability on the basis of sex revealed little variation between the groups of men and of women. All groups performed similarly in terms of reliable responses and maintained a level of consistency above 90%. Comparisons of reproducible responses between the different subgroups of the sample (see Table 2) suggest that demographic characteristics did not influence performance on the reliability of the questionnaire.

Follow-up Interviews After the data had been analyzed, we conducted interviews with 9 of the 11 subjects who had less than 90% reliability on their test-retest analyses. Subjects in the interview group had reproducible responses on the questionnaire ranging from 81.2% to 89.7%. The average reliability of the interview group was 86.2%. The interviews sought to reconcile differences between discordant responses and to determine why answers changed over time. Subjects were also asked to describe the circumstances under which the questionnaires were initially completed and were invited to offer any other reactions to the questionnaire. Among the 9 subjects interviewed, we reconciled 162 discordant questionnaire items. The most common reason for differing answers between the questionnaires (43.3%) was the occurrence of some event that changed the response on the retest between the times of administration of the two questionnaires. Most often, this was because of an illness that had occurred during school. In other instances, it was the result of subjects’ having learned new information about family health history, recent updating of immunizations, or a change in personal health habits.

TABLE 2

Percentage Reproducible Responses by Sex, Program, and Age

n

Reproducible responses VO)

36 34

92.9 94.5

24 46

92.6 92.3

41 13 12 4

93.9 94.7 92.3 91.5

Sex

Female Male Program Nursing Medical Age (Years)

< 25

25-30 31-40

>40 250

The second most common reason for changing responses on the retest (20.4%) was uncertainty about specific dates of immunization. The immunization section of the questionnaire contained, as a group, the highest proportion of discordant responses and missing data. Other reasons for having changed answers included interpreting the question differently on retest (13010), “I forgot about that” (11.7070), “I don’t know why I answered the question that way” (8.6Vo), and oversight of the question while completing the form (2.4%). One subject indicated that certain information regarding personal habits was withheld on the initial questionnaire because he didn’t want the school to know about some of his health behaviors. The majority of the subjects interviewed had completed the first questionnaire at home, before coming to the university. Most students had their medical records, including immunization records and other information, at home when they completed the questionnaire the first time. It can also be assumed that many were able to ask their parents about questions of which they were unsure. Most of the participants said that, when they filled out the questionnaire initially, they were aware that it was a screening tool and that they might be called in for a physical examination if their responses suggested a health problem. Many subjects stated that a physical examination was viewed as an undesirable outcome of filling out the questionnaire. Virtually all subjects revealed that knowing of the possibility of receiving a physical examination may have influenced their responses on the sections related to health complaints in the fist questionnaire. DISCUSSION

Our findings in this study support the conclusion that the PHHQ is a reliable instrument, as generally defined for such questionnaires. Although this alone supports the questionnaire as an appropriate tool for collecting health data, our study provides information that may be helpful in improving its performance even further, keeping in mind that generalizability may be limited owing to the modest response rate of the students. The recommendations that stem from this investigation address two general areas of the data collection process that can introduce a systematic bias that may have an impact on the reliability-administration methods and format of the questions. Interviews suggested that data collection is more complete if the questionnaires are mailed to the students’ homes before orientation. This appears to be especially true in the case of questions that require reference to medical and immunization records. This type of information is not easily recalled by the students, and they are prone to either leave the questions blank or guess inconsistently. Other problems may be avoided by home administration. Students indicated that during orientation they JACH


HEALTH HISTORY QUESTIONNAIRES

were distracted by their new environment and by being required to fill out many forms during the first days of class. Also, a time of change and adjustment may not be the best time to gather baseline health data. The stress of relocation and adjustment may influence responses. Factors related to the first days of enrollment in school, such as administration methods, subject fatigue, environmental conditions, situational contaminants, personal response biases, and transitory personal factors, may also affect reliability. If some of these factors can be minimized by collecting this data earlier, then the systematic bias of the questionnaire can be reduced. The area of question format raises issues regarding the best way to fashion questionnaire items. Although rigidly structured questions are more reliable, they can be less revealing than open-ended questions when the information sought may be highly individualized. It may be possible to modify certain questions dealing with areas such as family history or personal habits that are directed to specific risks. This may improve reliability of the questionnaire without sacrificing the information desired. The issue of missing data touches on several points previously mentioned. The large amounts of missing data that can be attributed to the absence of medical records supports the recommendation for having the students complete the questionnaires at home before going to school. The second highest percentage of missing data was in the family history section. For many subjects, these missing data were due to misreading the instructions and not providing requested data. Another source of missing data was the inability of the subject to recall a relative’s health problem, which was reported in the first questionnaire. These two areas relate to the issues of administration process and clarity of the instructions. Again, access to parents may be a primary factor. Health is often described as a dynamic quality that is in a state of flux. With this in mind, the concept of reliability that implies stability over time may be undesirable in an instrument of general health status if it means that the measure does not reflect true changes. To be useful, an instrument must be sensitive to, and reflect, true changes. The characteristic of sensitivity would require that, over a &month period, some changes should be detected if the instrument is to be clinically relevant to the population being tested. Although the test-retest design is the definitive design for measuring reliability as to stability over time, its use must be qualified when one is measuring an unstable characteristic, such as health status. Some of the answers treated as discordant were truthful and correct in their respective time frames. Treating those responses as discordant worked to reduce the estimate of reliability reported for the PHHQ from 96.4% to 93.7%. The largest group of discordant answers (43%) was attributed to events that transpired between the testing VOL 40, MAY 1992

periods. This finding supports the questionnaire’s ability to be sensitive to changes in the health status of the subjects. It also suggests that the intervening period of the test design may have been longer than is desirable for this type of study. The next most frequent cause of discordtint responses was attributed to the absence of medical records, which required the subject to guess on the immunization dates, but other reasons cited for discordant responses were linked to changes in the perceptions of questions, memory lapses, and nonspecific causes. Again, the presence of medical records and the clarification of questions and instructions may have reduced these numbers significantly. These findings suggest that such factors as administrative methods may be a significant influence on reliability in other settings, including outpatient clinics. The waiting room may not be the most appropriate setting for administering questionnaires for history taking. In analyzing the data of the signs and symptoms, we noted that the responses that changed most were those that dealt with complaints most likely to be related to stress (eg, recent excessive fatigue, emotional difficulties, anxiety, and difficulty in sleeping). Most of the changes in responses over time were from negative to positive, a pattern that pointed to a growing list of complaints not present at the beginning of the school term. Interviews verified the actual changes as being true events that had occurred during the period between the first and second test of the questionnaire. Although these findings were not within the scope of the primary questions addressed by our research, they reflect a health-related phenomenon that may be of interest to university-based healthcare professionals and may merit further investigation. In conclusion, the reliability of a health screening questionnaire represents a vital, intrinsic quality of this form of data collection instrument. In settings where these tools are used, an assessment of reliability may be of great value to clinicians who rely on questionnaires in making clinical decisions. In doing so, clinicians can identify the strengths and weaknesses of the instrument, thus justifying its continued use and guiding attempts to improve performance. ACKNOWLEDGMENTS

We would like to express our appreciation to Christine Boodley, RN, PhD, and Linda Rounds, PhD, for their support and suggestions in the development of this project and to Janie Regnier for her expert editorial assistance in the preparation of this manuscript. REFERENCES 1. Sacks J, Krushat M,Newman J. Reliability of the health hazard appraisal. A m J Public Health. 1980;70(7):73&732. 2. Kern D, Barker LR. Preventive medicine in ambulatory practice. In: Barker LR, Burton JR, Zieve PD, eds. Principles of Ambulatory Medicine, 2nd ed. Baltimore: Williams & Wilkins; 1986:16-29. 251

COLLEGE HEALTH 3. Pecoraro R, Inui T, Chen M, Plorde D, Heller J. Validity and reliability of a self-administered questionnaire. Public Health Rep. 1!?779;94(3):231-238. 4. Collen M, Cutler J, Siegelaub A, Cella R. Reliability of a self-administered medical questionnaire. Arch Intern Med.

Advantages of the rrenhy Cave-Rim Cervical Cap

1%9; 123:664-681.

5 . Mellner C. The self-administered medical questionnaire. Acta Chir Scand (suppl). 1!?70;406:63-72.

I

I Downloaded by [Southern Illinois University] at 22:09 26 December 2014

A print ready health service newsletter, written especially for the campus community; mailed four times during the school year, for reprint and distribution. Healthy Hints contains current information on health issues and promotes healthy living. For reprints, information, or brochure, call or write: Gretta Buller 410 13th Avenue SE Rochester, MN 55904 507-289-6859

IMMUNIZATION RECORDS... ...EASY ...FAST ...A CCURATE A software package for IBM and compatible computers, manages immunization records, rosters and correspondence.

COMPUTER EXPERIENCE NOT REQUIRED!

Can be left in place for up to 48 hours, allowing spontaneous protected coitus. Use of celvical caps may assist in avoiding urinary tract infections associated with diaphragm use. Smaller than a diaphragm and less noticeable to either partner. Requiresonly one small application of spermicide inside the cap at timiof insertion.

“Simply a Great Alternative”

NEW Training Information

For additional information please contact:

Cervical Cap Ltd. 430 Monterey Avenue Suite 1B Los Gatos, CA 95030 (408) 395-2100

‘Fitting the Prentif Cap’ training is available nationwide, throughout the year to accom-

modate busy schedules. It is advised that clinicians be trained to fit Cervical Caps before dispensing. Didactic presentations and supplementary video education are now availablc.

LET’S TALK CHOICE.

..

A wide choice among many insurance companies assures your school the best company and best price.

LET’S TALK SERVICE.

..

Fast, personal claims service and regular claims reports.

LET’S TALK PLANS. . . Our years of experience enable us to tailor plans that fit your school’s requirements.

BERKSHIRE WORKS P.O. Box 912 NORTH ADAMS, M A 01247

(413) 663-3992

252

LET’S TALK.

..

For further information, contact

ALUMNI INSURANCE AGENCY & ADMINISTRATORS Seymour Canter, CLU

P.O. Box 3008, Chatsworth, CA 91313-3008 (818) 709-7333 or (800) 726-AIAA JACH

Getting the facts straight.

Safe Motherhood Initiative: getting our priorities straight.

Sexual health screening in people living with HIV--are we getting the whole story?

Validity and Reliability of the Questionnaire for Assessing Women's Reproductive History in Azar Cohort Study.

Letter: Reliability of the history.

Measuring the reliability of picture story exercises like the TAT.

Scales tell a story on the stress history of fish.

A core curriculum for international health: evaluating ten years' experience at the University of Arizona.

Evaluating the Interpersonal Needs Questionnaire: Comparison of the Reliability, Factor Structure, and Predictive Validity across Five Versions.

The story of the Howard University Transplant Center: (a project of the people).

Reliability of a Novel Social Activity Questionnaire.

Reliability, Validity and Factor Structure of the 12-Item General Health Questionnaire among General Population.

Evaluating the headache patient: history and workup.

The Clarke SHQ: a clinical sex history questionnaire for males.

The factor structure of the General Health Questionnaire in a Japanese high school and university student sample.

The Story of Korean Health Insurance System.

Teaching psychiatry through literature : the short story as case history.

The history of neurosurgery at the University of Rochester.

Cerebrospinal fluid and lumbar puncture: the story of a necessary procedure in the history of medicine.

Internal consistency of the University of Michigan RBD Questionnaire.

Reliability and validity of the Symptoms of Depression Questionnaire (SDQ).

The Validity and Reliability of the Sinhala Translation of the Patient Health Questionnaire (PHQ-9) and PHQ-2 Screener.

An arabic translation, reliability, and validation of Patient Health Questionnaire in a Saudi sample.

Preliminary reliability of the five item physical activity questionnaire.