Ann. Occup. Hyg., 2014, Vol. 58, No. 5, 612–624 doi:10.1093/annhyg/meu012 Advance Access publication 3 March 2014

Melissa C. Friesen1*, Sarah J. Locke1, Carina Tornow2, Yu-Cheng Chen1, Dong-Hee Koh1, Patricia A. Stewart1,3, Mark Purdue1 and Joanne S. Colt1 1.Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 6120 Executive Boulevard, Room 8106, MSC 7240, Bethesda, MD 20892-7240, USA 2.Westat, 1600 Research Boulevard, Rockville, MD 20850, USA 3.Stewart Exposure Assessments, LLC, 6045 N 27th Street, Arlington, VA 22207, USA *Author to whom correspondence should be addressed. Tel: +1-301-594-7485; fax: +1-301-402-1819; e-mail: [email protected] Submitted 23 August 2013; revised 13 January 2014; revised version accepted 27 January 2014.

A b st r a ct Objectives: Lifetime occupational history (OH) questionnaires often use open-ended questions to capture detailed information about study participants’ jobs. Exposure assessors use this information, along with responses to job- and industry-specific questionnaires, to assign exposure estimates on a job-by-job basis. An alternative approach is to use information from the OH responses and the job- and industry-specific questionnaires to develop programmable decision rules for assigning exposures. As a first step in this process, we developed a systematic approach to extract the free-text OH responses and convert them into standardized variables that represented exposure scenarios. Methods: Our study population comprised 2408 subjects, reporting 11 991 jobs, from a case–control study of renal cell carcinoma. Each subject completed a lifetime OH questionnaire that included verbatim responses, for each job, to open-ended questions including job title, main tasks and activities (task), tools and equipment used (tools), and chemicals and materials handled (chemicals). Based on a review of the literature, we identified exposure scenarios (occupations, industries, tasks/tools/ chemicals) expected to involve possible exposure to chlorinated solvents, trichloroethylene (TCE) in particular, lead, and cadmium. We then used a SAS macro to review the information reported by study participants to identify jobs associated with each exposure scenario; this was done using previously coded standardized occupation and industry classification codes, and a priori lists of associated key words and phrases related to possibly exposed tasks, tools, and chemicals. Exposure variables representing the occupation, industry, and task/tool/chemicals exposure scenarios were added to the work history records of the study respondents. Our identification of possibly TCE-exposed scenarios in the OH responses was compared to an expert’s independently assigned probability ratings to evaluate whether we missed identifying possibly exposed jobs. Results: Our process added exposure variables for 52 occupation groups, 43 industry groups, and 46 task/tool/chemical scenarios to the data set of OH responses. Across all four agents, we identified

Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2014.

612  •

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

Systematically Extracting Metal- and Solvent-Related Occupational Information from Free-Text Responses to Lifetime Occupational History Questionnaires

Extracting free-text occupational information  •  613 possibly exposed task/tool/chemical exposure scenarios in 44–51% of the jobs in possibly exposed occupations. Possibly exposed task/tool/chemical exposure scenarios were found in a nontrivial 9–14% of the jobs not in possibly exposed occupations, suggesting that our process identified important information that would not be captured using occupation alone. Our extraction process was sensitive: for jobs where our extraction of OH responses identified no exposure scenarios and for which the sole source of information was the OH responses, only 0.1% were assessed as possibly exposed to TCE by the expert. Conclusions: Our systematic extraction of OH information found useful information in the task/ chemicals/tools responses that was relatively easy to extract and that was not available from the occupational or industry information. The extracted variables can be used as inputs in the development of decision rules, especially for jobs where no additional information, such as job- and industry-specific questionnaires, is available.

I n t ro d u ct i o n Most population-based studies use lifetime occupational history (OH) questionnaires as the starting point for evaluating occupational risk factors. These questionnaires systematically collect information from study subjects for each job held using a series of open-ended questions, including job title, company name, product made or service provided by the company, and the job’s start and stop years. Variants of these questionnaires also ask questions on work frequency patterns, the main tasks and activities performed in the job, the tools and equipment used, and the chemicals and materials handled. The OH responses serve a dual purpose. First, they may be used to code jobs into standardized occupation and industry classification codes (e.g. SOC and SIC, respectively). These codes can be directly used in epidemiologic analyses of occupation and/or industry or they can be used to link the reported jobs to exposure metrics in population-based job-exposure matrices ( JEMs) that can be used in epidemiologic analyses for the relevant agents. However, neither type of analysis captures important differences in tasks and exposure between people with the same job title (Bouyer and Hemon, 1993; Stewart and Stewart, 1994; Kromhout and Vermeulen, 2001; Teschke et al., 2002). Second, OH responses may be used during a job-by-job review by exposure assessors. This expert review can capture more of the within-job variability in exposure than analyses by occupation, industry, or JEMs; however, it is time consuming and the rationale for the exposure decision rules is rarely published. The utility of the OH responses in the exposure assessment process was recently reported in a study

that developed programmable decision rules to assign estimates of occupational diesel exhaust exposure in a population-based case–control study of bladder cancer (Pronk et  al., 2012). The resulting estimates of probability, intensity, and frequency developed for diesel exhaust exposure using only the OH responses had moderate to moderately high agreement with estimates based on more detailed occupation- and industry-specific questionnaires that contained diesel exhaust–related questions (2749 jobs, proportion of agreement: 71–84%; weighted kappa: 0.50–0.74), providing evidence for the utility of OH responses in the exposure assessment process. To use the OH responses as inputs in the programmable rules, however, the authors first had to convert the free-text information in the OH responses into standardized variables identifying exposure scenarios (e.g. exposed occupations and industries; exposure sources) where occupational diesel exhaust exposure was likely to have occurred. They derived these variables manually using text-based filters in Microsoft Excel, a process that was neither transparent nor transferrable to other studies. In this study, our aim was to develop a more systematic and transferable approach for extracting free-text OH responses and converting them into standardized exposure variables representing exposure scenarios. This is the first step in the process of developing decision rules that incorporate information from both the occupational histories and the job- and industry-specific modules. The development and application of decision rules for assessing exposures for jobs reported by our study participants will be described separately.

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

K e y w or d s :   cadmium; chlorinated solvents; exposure assessment methodology; lead

614  •  Extracting free-text occupational information

M et h o d s

Deriving exposure variables representing exposure scenarios from the OH responses We used a multistep process (A–H) shown in Fig. 1 to derive exposure variables identifying exposure scenarios from the OH responses. Each step is described in detail below. A. Exposure scenarios in the literature: We reviewed the published literature to identify exposure scenarios where exposure to each of four agents—chlorinated solvents as a group, trichloroethylene (TCE) in particular, lead, and cadmium—could be expected to have occurred. The agents were chosen based on ongoing exposure assessment needs in this study. For each exposure

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

Study population and data collection The study population comprised 2408 subjects, reporting 11 991 jobs, who were enrolled in a population-based case–control study of renal cell carcinoma that was conducted in Chicago and Detroit (Colt et al., 2011; Purdue et al., 2011). We describe only the study components related to occupation here. Trained interviewers administered to all subjects a structured interview that included a lifetime OH where subjects reported all jobs held for a minimum of 12  months from the age of 16  years. The OH comprised openended questions that asked, for each job reported, the job title, employer name, job start and stop year, employer’s activities (product made or service provided), work frequency (i.e. hours per day, days per week, months per year), main activities or duties (hereafter, task question), the tools and equipment used (hereafter, tool question), and the chemicals or materials used (hereafter, chemical question). Each job was assigned a four-digit SOC (US Department of Commerce, 1980) and a four-digit SIC (Office of Management and Budget, 1987). Responses to the OH questions triggered through key words additional questions (modules) if the given job was linked to 1 of 23 occupations (e.g. welders, painters) or 13 industries of interest (e.g. chemical industry, textile industry) and if the job was held for a minimum of 3500 h. All jobs were asked OH questions; 9309 jobs (78%) had additional information from completed modules.

scenario, we identified related occupations, industries, tasks, tools, and chemicals. Examples of exposure scenarios related to chlorinated solvents included the occupations of mechanics, janitors, dry cleaners, and painters; the automobile manufacturing and the textile industries; tasks and tools associated with degreasing, gluing, and printing (e.g. printing press); and chemicals that indicated solvents, chlorinated solvents (e.g. TCE), or solvent-containing products (e.g. paint remover). Some exposure scenarios overlapped but were kept separate because the definitions varied slightly by agent. For example, chemical paint removal was identified with chlorinated solvent exposure, whereas chemical and mechanical paint removal was identified with lead exposure. B and C. Occupation- and industry-related exposure scenarios: Our literature review identified 51 occupation groups and 43 industry groups associated with exposure to one or more of the four agents of interest. We added an additional occupation group to identify administrative jobs because these occupations usually indicate the absence of exposure and thus this designation is expected to be a useful input in future exposure decision rules. We assigned one or more relevant four-digit SOC codes to each occupation group, and one or more relevant four-digit SIC codes to each industry group. We created two spreadsheets, one for occupation and one for industry, with three columns each: the first column listed a unique record identifier, the second column contained a four-digit SOC or SIC code, and the third column listed the occupation or industry exposure scenario group associated with that SOC or SIC code. Only SOCs and SICs associated with administrative occupations or with occupation groups related to exposure scenarios to at least one of the four agents were listed in the spreadsheet; all other SOCs and SICs were excluded. D. Task-, tool-, and chemical-related exposure scenarios: For exposure scenarios related

Extracting free-text occupational information  •  615

to task, tools, or chemicals, a team of industrial hygienists used their professional judgment to develop a list of character strings that represented words and phrases (hereafter, key words) that may occur in the subjects’ responses. For example, the painting scenario was linked to specific responses, such as spray paint (from the task question), airless sprayer (tool question), and paint remover (chemical question). The team identified additional key words by reviewing lists of commaparsed free-text OH responses from the

occupational histories. Key words included truncated words, complete words, and phrases (e.g. ‘icide’ identified ‘insecticide’, ‘pesticide’, ‘herbicide’). Where truncated words or key words identified activities related to one or more exposure scenarios and/or an unexposed situation (e.g. ‘gas’ identified ‘pumping gas’, ‘read gas meters’, ‘inspect gaskets’), we used phrases to make the link to the appropriate scenario (e.g. ‘pump gas’ and ‘gas fill up’). Misspellings (e.g. aluminum was misspelled as ‘aluminim’, ‘alumiun’, and ‘alumunum’)

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

1  Process for extracting free-text OH responses into exposure scenario–related variables linked to the study subjects’ jobs.

616  •  Extracting free-text occupational information

records. For example, a subject reporting his or her tasks for a given job as ‘cleaning parts and gluing them together’ was assigned a value of ‘1’ for both the degreasing and gluing task variables; no value was added to the other task variables for that job. To capture the absence of information in the subject’s responses, we created a variable that indicated whether there was a ‘none’ response to the tool question, another for a ‘none’ response to the chemical question, and a third that identified when both the tool and chemical questions had a response of ‘none’. F. Exposure variable review: After applying the macro to add exposure scenario–related variables to the data set containing the OH responses, an industrial hygienist reviewed the free-text responses with a ‘1’ in at least one exposure variable, excluding the ‘none’ scenarios, to verify that the reported information was appropriately linked to the assigned exposure scenario. The key word lists were revised based on this review (for example to be more specific to avoid false matches) and the program rerun. After the final application of the macro, an industrial hygienist conducted a second review of the free-text responses linked to each exposure variable. In this review, false positive identifications were revised directly in the data set of OH responses using Stata/SE v.11.2 for Windows (StataCorp LP, College Station, TX, USA). For example, a job record with the job title ‘IV (intravenous) technician’ that had a response of ‘starting IVs, changing fluids’ to the task question had been assigned by the macro a value of ‘1’ for the scenario ‘vehicle repair’ based on the key word ‘changing fluids’; the positive flag for vehicle repair was removed for this record. The second review identified 117 false positive identifications for chlorinated solvent–related variables and 268 false positive identifications for lead- and cadmiumrelated variables. G. Merging task, tool, and chemical exposure variables: We reviewed the extracted

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

identified in the lists of comma-parsed freetext OH responses and other permutations (e.g. ‘pumping gas’ and ‘pumped gas’) were included as separate key words. Three separate spreadsheets, one each for the task, tool, and chemical questions, were developed by the industrial hygienists to record the key words and their applicable exposure scenarios. Table 1 provides an excerpt for the ‘chemicals’ question. Complete spreadsheets for all variables can be obtained from the corresponding author. Each spreadsheet had multiple columns: the first column listed the unique record identifier, the second column listed the key word, and the subsequent columns listed all possible exposure scenarios relevant to that OH question (e.g. for the ‘chemicals’ spreadsheet shown in Table 1, the chemicalrelated exposure scenarios were TCE, chlorinated solvents, solvents, chemical degreasers, paint, glues, inks, lubricants, and pesticides). When a key word was associated with an exposure scenario, a ‘1’ was recorded in the corresponding cell; otherwise, the cell remained empty. Key words could be associated with multiple exposure scenarios within each spreadsheet. E. Linking exposure scenarios to study subjects to derive exposure variables: A customized SAS macro program using SAS 9.1.2 for Windows (SAS Institute Inc., Cary, NC, USA) was developed to read the search terms (i.e. SOC, SIC, key words) in the spreadsheets. The SAS macro converted each line from the spreadsheets into a conditional statement (if…, then…) to search the data set containing the OH responses and assigned the subject–job record a value of ‘1’ for that exposure scenario if the key word was found. The search was not case sensitive; in addition, the program removed spaces between words when searching for a match (e.g. ‘pump gas’ became ‘pumpgas’). For each exposure scenario listed in the spreadsheets, the macro added corresponding exposure variables (with responses ‘1’ or missing) to the subjects’ work history

Extracting free-text occupational information  •  617

Table 1. Excerpt of key words, including observed misspellings, and the structure of the spreadsheet developed to search free-text responses to the OH ‘chemicals used’ question for linkage to chemicalrelated exposure scenarios TCE Chlorinated Solvents Chemical Paint Glues Inks Lubricants Pesticides solvents degreasers

1

TAP-MAJIC

1

1

1

2

TRI CHLORO ETHYLENE 1

1

1

3

TRICHLORETHYLENE

1

1

4

CHLOROFORM

1

1

5

DRY CLEANING

1

1

6

FREON

1

1

7

INK CLEANER

1

8

TOLUENE

1

9

DECREASER

1

10 METAL WASHING SOLVENTS

1

11 PARTS CLEANER

1

1

1

12 ENAMELS

1

13 PAINT

1

14 PRIMERS

1

15 GLUE

1

16 BLUEPRINT

1

17 INK FROM NEWSPAPER

1

18 PLOTTER INK

1

19 AIR TOOL OIL

1

20 HYDRALIC

1

21 245T

1

22 agent orange

1

23 INSECTICIDE

1

The full list can be obtained from the corresponding author. a Misspellings are listed intentionally, to reflect the need to include misspellings in the linkage process.

variables to combine, into a single variable, those task, tool, and chemical exposure variables that were related to the same exposure scenario. For example, to create a single variable for the exposure scenario ‘painting’, we merged all relevant information

from variables representing painting from the task question (e.g. painting), the tool question (e.g. paint equipment), and the chemical question (e.g. paints) to form a single task/tool/chemical variable. There were two occasions where we did not merge

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

ID Key worda

618  •  Extracting free-text occupational information

Evaluation of the extracted occupational information We conducted descriptive and evaluative analyses of the data set using Stata/SE v.11.2 for Windows (StataCorp LP, College Station, TX, USA). Hereafter, we use the term ‘possibly exposed’ to indicate that a job was positively linked to an agent based on an occupation, industry, and/or task/tool/chemical exposure variable. The term does not indicate that an exposure decision was made. Future work, beyond the scope of this paper, will involve developing decision rules to estimate exposure that will consider both the OH responses and the module responses, focusing on the jobs with positive linkages in the current effort. We began by identifying the occupational variables, industry variables, and task/tool/chemical exposure variables that were associated with the highest number of possibly exposed jobs in our study population. Our next step was to examine each agent separately;

we calculated the number of jobs that were linked to each agent by possibly exposed exposure variables based only on occupation, only on industry, only on tasks/tools/chemicals, and finally on all three types of variables combined. To illustrate the variability in the identification of exposure variables within occupations, we then selected 10 possibly exposed occupational groups and examined the proportion of jobs in each group that were identified as possibly exposed to each agent based only on industry variables and only on the task/tool/chemical variables. We then performed a limited comparison of this approach for TCE exposure, for which an industrial hygienist (who was not involved in the work described here) had previously developed estimates of the probability of exposure for each job. Using the OH and, when available, the job- and industry-specific modules, the industrial hygienist had assessed probability using the following categories. We calculated the number and proportion of jobs assessed by the expert as having a probability of TCE exposure ≥1% for each combination of exposure variable types. R e s u lts The most frequently identified occupation, industry, and task/tool/chemical exposure variables appearing in the OH responses of this study population are listed in Table 2, along with the agents of interest for each variable. The most common occupational group was administrative/clerk (1206 jobs) a variable that identified jobs that were unlikely to be exposed to any of the agents. Similarly, the most frequent task/ tool/chemical variables were a ‘none’ response to the tool question (6600 jobs) and to the chemical question (6903 jobs). Among the occupations with possible exposure to one or more of the agents, the most frequent was assembly workers (321 jobs). The most common industry was automotive manufacturing (1277 jobs), which is highly prevalent in Detroit, one of the study sites. The most common task/tool/ chemical variables were ‘driving gas-powered vehicles’ (753 jobs), ‘fabricating or machining tasks that may use solvents’ (504 jobs), and ‘fabricating metal parts’ (408 jobs). The most commonly identified chemicals were gasoline (139 jobs), lubricants (126 jobs), and chlorinated solvent–containing chemicals (109 jobs). Table  3 shows the distribution of identified scenarios by type of exposure variable (job, industry, or

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

variables. First, we kept separate any variable considered too generic to be combined with more specific information from other OH questions. For example, the tool-related variables for construction were frequently identified within nonconstruction-related jobs (e.g. a hammer reported by a jewelry maker; a saw reported by an auto technician); as a result, the final construction task/ tool/chemical variable was based solely on the task question. Second, we did not combine a chemical variable with its associated task and tool variables when the mentioned chemical could be used in multiple exposure scenarios. For example, the chemical scenario ‘chlorinated solvents’ was associated with multiple tasks and tools (i.e. painting, gluing, and degreasing) and thus was kept separate. H. Final data set: At the end of this process, we had a data set linking each subject–job to 52 occupation, 43 industry, and 46 task/ tool/chemical exposure variables related to exposure scenarios associated with the four agents. Descriptions for each variable are provided in Supplementary Table S1 (available at Annals of Occupational Hygiene online).

Extracting free-text occupational information  •  619

Table 2. Most prevalent occupation-, industry-, and task/tool/chemical-related exposure variables in the study population, and agent of interest for each variable Exposure variable

N jobs identified

Agent of interesta Chlorinated solvents

TCE

Lead

Cadmium

Most common occupations  Administrative/clerk

1206 321

X

X

X

X

  Engineers and engineering technologists

281

X

X

X

X

  Machine operator, not metals or plastic

263

X

X

X

X

  Machine operator, NEC, solvent exposure

260

X

X

X

X

1277

X

X

X

X

  Military, national security

649

X

X

X

  Surgery or hospital care

536

X

X

  On-road vehicle transport

342

  Metal fabrication

340

Most common industries   Automobile manufacturing

X X

X

X

X

Most common task/tool/chemicals   Tool question: ‘none’

6600

  Chemical question: ‘none’

6903

Variables derived from only the task question, or combination of task, tool, and/or chemical questions   Drive gas-powered vehicles

753

  Fabricate or machine, solvent-related processes

504

X

X

X

X

  Fabricate metal parts

408

X

X

X

X

  Use, mix, test, sell paint

339

X

X

X

X

 Painting

338

X

X

X

X

  Solvent-containing chemicals

739

X

X

 Gasoline

139

 Lubricants

126

X

X

  Chlorinated solvent–containing chemicals

109

X

X

Variables derived from the chemical questions

  Lead-containing chemicals

69

X X X

NEC, not elsewhere classified. The full list of exposure scenario–related variables and their definitions are provided in Supplementary Table S1 (available at Annals of Occupational Hygiene online).

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

 Assembler

620  •  Extracting free-text occupational information

Table 3. Number of jobs linked to each agent by occupation, industry, and task/tool/chemical exposure variables Type of question identifying possibly exposed variables

Scenarios identified with possible exposure to agent %

  Exposed occupation

3609

30.1

  Exposed industry

5279

44.0

  Exposed task/tool/chemical

2225

18.5

  Any of the above

6385

53.2

  Exposed occupation

3608

30.0

  Exposed industry

5249

43.8

  Exposed task/tool/chemical

2084

17.4

  Any of the above

6388

53.1

  Exposed occupation

3530

29.5

  Exposed industry

4915

41.0

  Exposed task/tool/chemical

2469

20.5

  Any of the above

6242

52.1

  Exposed occupation

2650

22.1

  Exposed industry

3949

32.9

  Exposed task/tool/chemical

1634

13.6

  Any of the above

4799

40.0

Chlorinated solvent

TCE

Lead

Cadmium

N, number; %, proportion of jobs.

task/tool/chemical) for each of the four agents. For example, 30.1% of the 11 991 jobs reported by study participants were linked to occupation variables with potential exposure to chlorinated solvents, while the corresponding percentages for industry variables and task/tool/chemical variables were 44.0 and 18.5%, respectively. For all four agents, industry variables were consistently linked to more possibly exposed jobs than were the occupation and task/tool/chemical-based

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

N

exposure variables. Overall, no exposure scenarios related to chlorinated solvents, TCE, lead, or cadmium were identified in 46.8, 46.9, 47.9, and 60.0% of the jobs, respectively. The prevalence of identified exposure scenarios, and the type (industry, or task/tool/chemical) of information identified, varied by occupation group and demonstrates the heterogeneity in the information from the OH responses (Table  4). Most precision metal workers and painters were identified as also being in possibly exposed industries (all agents: 98–99% for precision metal workers; 89–91% for painters) and associated with possibly exposed task/tool/chemicals (all agents: 81–83% for precision metal workers; 91–96% for painters). The majority of construction laborers were also in possibly exposed industries (all agents, 82%), but only 36–45% of the jobs were identified with possibly exposed tasks/tools/chemicals related to the four agents. Many administrative jobs were in possibly exposed industries (28–39%, varying by agent); however, only 1–5% of the administrative jobs had identifiable possibly exposed task/tool/ chemicals. We report the proportion of jobs identified for each occupation, industry, and task/tool/ chemical variable for each 3-digit SOC in the study in Supplementary Table S2 (available at Annals of Occupational Hygiene online). The number of possibly exposed jobs identified by each unique combination of variable types for TCE is shown in the first two columns of Table  5 (other agents not shown). Varying only slightly by agent, only 3–4% of the jobs were identified as possibly exposed based solely on occupation variables (group #5), 14–18% were identified solely by industry variables (group #7), and 2–3% were identified solely by task/tool/chemical variables (group #8). All three variable types combined (occupation, industry, and task/tool/chemical; group #2) identified possibly exposed jobs for 8–12% of the jobs. For jobs identified as possibly exposed based on occupation variables (groups #2, 3, 4, 5), 44–51% of the jobs also had possibly exposed task/tool/chemical variables identified [(#2+#4)/(#2+#3+#4+#5)]. In contrast, for jobs not identified as being in a possibly exposed occupation, a nontrivial 9–14% of the jobs had possibly exposed task/tool/chemical variables identified [(#6+#8)/ (#1+#6+#7+#8)].

Extracting free-text occupational information  •  621

Table 4. For selected possibly exposed occupation groups, proportion of jobs with possibly exposed exposure variables based on industry or task/tool/chemicals Occupation group

N jobs Proportion of jobs in possibly exposed industry

Proportion of jobs with possibly exposed task/tool/chemical

Chlorinated Lead (%) Cadmium Chlorinated Lead (%) Cadmium solvent/TCE (%) solvent/TCE (%) (%) (%) 51

90

90

89

73

69

69

Construction laborer

56

82

82

82

36

45

39

Dry cleaner

31

52

3

3

16

3

0

Farmer, agricultural worker

63

86

86

86

21

40

21

Mechanic

150

65

70

42

82

79

69

Nurse

140

77

4

1

4

0

0

Painter

90

90

91

89

96

92

91

Physician, surgical assistant

36

78

6

3

14

3

3

Plumber

42

76

76

76

38

69

31

145

99

99

98

83

83

81

1206

39

33

28

2

5

1

Precision metal workers Administration N, number.

Table 5.Comparison of possibly exposed TCE exposure variables to an expert’s assignment of TCE exposure from a one-by-one job review, by type of information (occupation, industry, and/or task/ tool/chemical variable) identifying possible exposure Type of possibly exposed exposure variable identified in the OH

N jobs (% of all jobs)

1. No variables identified

5606 (46.7)

75 (1.3)

2. Occupation, industry, and task/tool/chemical

1326 (11.1)

671 (50.6)

3. Occupation and industry

1448 (12.1)

435 (30.0)

4. Occupation and task/tool/chemical

313 (2.6)

175 (55.9)

5. Occupation only

522 (4.4)

109 (20.9)

6. Industry and task/tool/chemical

315 (2.6)

115 (36.5)

2190 (18.3)

184 (8.4)

7. Industry only 8. Task/tool/chemical only At least one variable identified (sum of #2–#8) N, number of jobs.

271 (2.3) 6385 (53.2)

N jobs identified as TCE exposed by an expert (% of jobs in stratum)

52 (19.2) 1741 (27.3)

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

Carpenter

622  •  Extracting free-text occupational information

Di s c u s s i o n The derived exposure variables from the SOC and SIC codes and the free-text responses to the task, tool, and chemical OH questions can serve several purposes in retrospective exposure assessment efforts. We developed these variables as a first step in deriving decision rules that incorporate both the OH information and job- and industry-specific module information in future exposure assessment efforts. In contrast, with the exception of Pronk et  al. (2012), recently approaches to develop decision rules have focused solely on using the module information (Fritschi et al., 2009; Behrens et  al., 2012; MacFarlane et  al., 2012; Carey et al., 2014). The derived OH variables can also be used to extract decision rules using statistical learning models, as in Wheeler et al. (2013), to improve the transparency of the exposure decision process. The prevalence of exposure scenarios varied by type of question. Possibly exposed jobs were most frequently identified from occupation and industry variables, reflecting in part the ease of deriving variables from the previously assigned SOC and SIC codes. Although less prevalent than the occupation and industry variables, the tasks/tools/chemical variables also helped identify possibly exposed jobs. Although the prevalence of possibly exposed tasks/tools/chemical variables was three to five times higher in jobs

identified in possibly exposed occupations than in jobs in nonidentified occupations, a nontrivial 9–14% of the jobs in nonexposed occupations had at least one identified task/tool/chemical variable. This finding showed that relying only on occupation, such as is commonly done when JEMs are used, without considering reported tasks, tools, and chemicals, could result in exposure misclassification. The occupation, industry, and task/tool/chemical variables derived from the OH responses can be used together in future exposure assessments. Support for deriving exposure variables from the tasks/tools/chemicals free-text responses was also found in our comparisons to the expert-assigned TCE estimates. For instance, for jobs without modules, combining the information from the occupation, industry, and task/tool/chemical variables missed capturing only two expert-assigned TCE-exposed jobs. Both of these jobs were barbers, who were assigned a very low probability of exposure to TCE by the expert. Even when module information was available to the expert, only 1% were missed by our process. Similarly, the prevalence of expert-assigned TCE-exposed jobs was generally higher when there was supportive information from the task, tool, and chemical questions compared to when only SOC or SIC was available. Our findings also show that jobs in possibly exposed industries have a much lower prevalence of exposure when there is no supporting information from the occupation or task/tool/chemical question. For instance, the prevalence of TCE-exposed jobs identified by the expert was only 8.4% when only a possibly exposed industry scenario was identified. This is not surprising since many jobs within an industry, such as administrative jobs, are not exposed. This process has several limitations. First, we can reasonably anticipate, based on subject burden and recall (Teschke et al., 2002), that extracting information from the OH free-text responses captured, at best, only the most common and frequently occurring tasks, tools, and chemicals for a job. Rarely reported scenarios may have been missed and would have required reviewing each job/OH response one-by-one to ensure complete capture, but as suggested here few jobs may be missed. Second, although within-job differences in the identification of task/tool/chemical variables reflect the natural within-job variability in activities, they may also reflect potential subject-specific differences in the

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

We compared the prevalence of possibly exposed jobs to the prevalence of jobs assessed as TCE exposed by an expert, shown in the last column of Table  5. Overall, expert-assigned TCE ratings (≥1% probability) were rare for jobs for which we found no possibly TCE-exposed variables (group #1, 1.3%) and were more prevalent for jobs with at least one possibly TCE-exposed variable (last row, 27.3%). The highest proportion of nonzero expert-based probability estimates was observed for jobs identified as being possibly exposed from occupation and task/tools/chemical variables (group #4, 55.9%). The lowest proportion of nonzero expert-based probability estimates was observed for possibly exposed jobs identified solely by industry variables (group #7, 8.4%). When restricted to jobs without modules (N  =  2682), for which the expert used only the OH responses to assign exposure, only two jobs (both barbers) for which there were no extracted exposure scenarios had been assigned as possibly TCE exposed by the expert (not shown).

Extracting free-text occupational information  •  623

by the study subjects (e.g. diesel exhaust, metals, and solvents). In contrast, our approach will be of limited use to extract information for rarer exposures or exposures from infrequent tasks whose exposure scenarios are less likely to be mentioned in the free-text information (e.g. polychlorinated biphenyls). In summary, in this study, we used SIC and SOC codes and extracted free-text responses to the task/ tool/chemical questions in the OH questionnaires to derive exposure variables that can be used in future exposure assessment efforts. This approach serves as a first step in our ability to include the OH responses when developing transparent, programmable decision rules to estimate occupational exposure. In addition, the process will help identify, for jobs with only OH responses, important differences in exposure between people with the same job title that would not be captured by a job exposure matrix. The key word list may also assist investigators of other studies in using the OH responses in their exposure assessment efforts. S u p p l e m e n ta ry   Data Supplementary data can be found at http://annhyg. oxfordjournals.org/. Funding Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health (Z01 CP1012219; Z01 CP010136-19). A c k n o w l e d g e m e n ts We thank industrial hygienists Elizabeth Boyle, Pabitra Josse, and Susan Viet of Westat for developing the lists of key words, phrases, and strings used to extract occupational information from the free-text responses. References Behrens T, Mester B, Fritschi L. (2012) Sharing the knowledge gained from occupational cohort studies: a call for action. Occup Environ Med; 69: 444–8. Bouyer J, Hemon D. (1993) Retrospective evaluation of occupational exposures in population-based case-control studies: general overview with special attention to job exposure matrices. Int J Epidemiol; 22 (Suppl. 2): S57–64. Carey RN, Driscoll TR, Peters S et al. (2014) Estimated prevalence of exposure to occupational carcinogens in Australia (2011-2012). Occup Environ Med; 71: 55–62.

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

completeness and specificity of the OH responses. For example, the ‘solvent-containing chemical’ variable, which included paint and paint strippers, was identified in only 48% of the painter jobs (not shown). This likely underestimated reporting by painters may have occurred because the respondent felt the response was obvious (e.g. a painter uses paints), repeated information provided in previous questions (e.g. a painter mentioning painting for the task question, paint tools for the tool question, or paint for the chemical question), or because the respondent did not consider ‘paint’ a chemical (perhaps because of its commonness). As a result, one must not interpret a lack of an identified task, tool, or chemical as the absence of exposure for a job without considering underreporting by subjects, potential for lower frequency events that may not have been mentioned, lack of the participants’ understanding of what was wanted, and recall bias, which must be considered in all exposure assessment approaches for population-based studies. Lastly, using the four-digit SOC and SIC codes to identify possibly exposed occupations may result in some misclassification over using the actual job title because of the often heterogeneous nature, with regard to exposure, of the SOCs and SICs. Our approach serves as a methodological framework when valuable occupational information is present in the form of free-text responses to open-ended questions but does not cover all exposure agents, possibly exposed scenarios, or potential key words. Developing the key word lists to extract the free-text responses was a moderately time-consuming task. Improved efficiency comes from the ability to apply these key word lists to other studies to evaluate the same agent. However, future work is needed to evaluate the resources that would be needed to apply these key word lists to other studies. We expect that future users of these key word lists may need to add other key words and variables that are relevant to their study population. For example, because this study included two urban centers, including Detroit, the study population overrepresented US automobile industry workers and underrepresented rural or other geographic-specific industries. Similarly, future work is needed to evaluate the usefulness of this approach for other exposures. We expect that it will be most useful to extract exposure scenarios from the task/tool/chemical variables when the exposure is reasonably prevalent and for which related activities will be more likely reported

624  •  Extracting free-text occupational information expert review of individual jobs. Occup Environ Med; 69: 752–8. Purdue MP, Colt JS, Graubard B et al. (2011) A case-control study of reproductive factors and renal cell carcinoma among black and white women in the United States. Cancer Causes Control; 22: 1537–44. Stewart WF, Stewart PA. (1994) Occupational case-control studies: I.  Collecting information on work histories and work-related exposures. Am J Ind Med; 26: 297–312. Teschke K, Olshan AF, Daniels JL et al. (2002) Occupational exposure assessment in case-control studies: opportunities for improvement. Occup Environ Med; 59: 575–93; discussion 594. US Department of Commerce. (1980) Standard occupational classification manual. Washington, DC: Office of Federal Statistical Policy and Standards. Wheeler DC, Burstyn I, Vermeulen R et  al. (2013) Inside the black box: starting to uncover the underlying decision rules used in a one-by-one expert assessment of occupational exposure in case-control studies. Occup Environ Med; 70: 203–10.

Downloaded from http://annhyg.oxfordjournals.org/ at University of Georgia on June 15, 2015

Colt JS, Schwartz K, Graubard BI et al. (2011) Hypertension and risk of renal cell carcinoma among white and black Americans. Epidemiology; 22: 797–804. Fritschi L, Friesen MC, Glass D et al. (2009) OccIDEAS: retrospective occupational exposure assessment in communitybased studies made easier. J Environ Public Health; 2009: 957023. Kromhout H, Vermeulen R. (2001) Application of job-exposure matrices in studies of the general population: some clues to their performance. Eur Respir Rev; 11: 80. Macfarlane E, Benke G, Sim MR et  al. (2012) OccIDEAS: An innovative tool to assess past asbestos exposure in the Australian Mesothelioma Registry. Saf Health Work; 3: 71–6. Office of Management and Budget. (1987) Standard industrial classification manual. Washington, DC: Executive Office of the President. Pronk A, Stewart PA, Coble JB et  al. (2012) Comparison of two expert-based assessments of diesel exhaust exposure in a case-control study: programmable decision rules versus

Systematically extracting metal- and solvent-related occupational information from free-text responses to lifetime occupational history questionnaires.

Lifetime occupational history (OH) questionnaires often use open-ended questions to capture detailed information about study participants' jobs. Expos...
1013KB Sizes 0 Downloads 3 Views