Accident Analysis and Prevention 63 (2014) 138–145

Contents lists available at ScienceDirect

Accident Analysis and Prevention journal homepage: www.elsevier.com/locate/aap

External validity of a generic safety climate scale for lone workers across different industries and companies Jin Lee a,b , Yueng-hsiang Huang a,∗ , Michelle M. Robertson a , Lauren A. Murphy a,c , Angela Garabet a , Wen-Ruey Chang a a

Liberty Mutual Research Institute for Safety, Hopkinton, MA, USA University of Connecticut, Storrs, CT, USA c Harvard School of Public Health, Boston, MA, USA b

a r t i c l e

i n f o

Article history: Received 18 February 2013 Received in revised form 10 October 2013 Accepted 10 October 2013 Keywords: Generic safety climate Lone workers External validity Measurement equivalence Confirmatory factor analysis Item response theory

a b s t r a c t Purpose: The goal of this study was to examine the external validity of a 12-item generic safety climate scale for lone workers in order to evaluate the appropriateness of generalized use of the scale in the measurement of safety climate across various lone work settings. External validity evidence was established by investigating the measurement equivalence (ME) across different industries and companies. Method: Confirmatory factor analysis (CFA)-based and item response theory (IRT)-based perspectives were adopted to examine the ME of the generic safety climate scale for lone workers across 11 companies from the trucking, electrical utility, and cable television industries. Results: Fairly strong evidence of ME was observed for both organization- and group-level generic safety climate sub-scales. Although significant invariance was observed in the item intercepts across the different lone work settings, absolute model fit indices remained satisfactory in the most robust step of CFA-based ME testing. IRT-based ME testing identified only one differentially functioning item from the organization-level generic safety climate sub-scale, but its impact was minimal and strong ME was supported. Implications: The generic safety climate scale for lone workers reported good external validity and supported the presence of a common feature of safety climate among lone workers. The scale can be used as an effective safety evaluation tool in various lone work situations. © 2013 Elsevier Ltd. All rights reserved.

1. Introduction 1.1. The need for studying safety climate for lone workers Safety climate refers to employees’ shared perception in regard to their organizations’ or supervisors’ true priority of safety over other competing organizational demands, such as cost reduction and productivity (Zohar, 1980). Safety climate can be seen as an organization’s temporal “state of safety” at a discrete point in time (e.g., Cheyne et al., 1998). Recent meta-analytic studies have consistently shown that safety climate is a strong leading indicator of behavioral safety and occupational injuries (Beus et al., 2010; Christian et al., 2009; Nahrgang et al., 2011). While the number of studies on safety climate has increased dramatically in recent years (Huang et al., 2010), most have focused on traditional work

∗ Corresponding author at: Center for Behavioral Science, Liberty Mutual Research Institute for Safety, 71 Frankland Road, Hopkinton, MA 01748, USA. Tel.: +1 508 497 0208; fax: +1 508 435 0482. E-mail address: [email protected] (Y.-h. Huang). 0001-4575/$ – see front matter © 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.aap.2013.10.013

environments in which supervisors and workers interact in the same location throughout the day. Little research has been done to examine how a company’s safety climate influences lone workers. Although there is no universal definition of lone workers, they work in isolation from supervisors and colleagues due to the nature of their jobs (e.g., truck drivers, repair technicians, utility workers, teleworkers) and are usually not supervised closely (Ferret and Hughes, 2009). For some lone workers, safety largely depends on their own safety decision-making. For instance, speed limit compliance by truck drivers and proper ladder use by line workers cannot be mandated by remote supervision, but instead depend on the specific safety intentions and actions of the truck drivers or line workers themselves. Moreover, these mobile lone workers can experience greater safety risks under hazardous situations such as vehicle failure, inclement weather conditions, and violence (Health and Safety Executive, 2009). They have to deal with these issues independently with only limited access to timely assistance from co-workers or supervisors, which is more easily obtainable in nonlone working situations. Lone workers may be better prepared for the potential risks they face and to be compliant with organizational safety guidelines in the face of risks if they perceive safety is

J. Lee et al. / Accident Analysis and Prevention 63 (2014) 138–145

given a high priority by the company, even though they are away from direct supervision. Safety climate can potentially supplement the weaker impact of remote safety supervision for lone workers. 1.2. Lone workers’ individual/psychological level of safety climate perceptions Since it has been shown that safety climate is a strong predictor of behavioral safety and occupational injuries, the question arises as to whether safety climate can be found among lone workers, given their sometimes hazardous, but solitary, working conditions. According to Christian et al. (2009), safety climate perceptions can be categorized into two levels of analysis, namely shared upper-level (e.g., unit, organization) safety climate and individual/psychological safety climate. Both levels of safety climate have been shown to be significant predictors of safety outcomes. Climate perceptions refer to the meaning employees attach to management policies, procedures and practices as well as to what is given higher priority in terms of the kinds of behavior likely to be rewarded or supported. Even though lone workers oftentimes work outside direct supervision, they can form perceptions of organizational safety values and attitudes through company directives and verbal interactions with their supervisors. Specifically, if supervisors devalue an organization’s new policies about inclement weather and work schedules for safety and, subsequently, if they are not supportive of their workers’ compliance to the policies, the organization’s safety efforts can be affected. Recently, trucking industry-specific (Huang et al., 2013a) and utility industry-specific (Huang et al., 2013b) safety climate scales for mobile lone workers were developed and validated. Although, in Huang et al. (2013a), it was found that truck drivers’ safety climate perceptions lacked enough between-group variance to aggregate individual safety climate perceptions to create upperlevel safety climate variables for multi-level analysis, employees’ individual/psychological safety climate perceptions significantly predicted safe behaviors and injury outcomes for truck drivers. Moreover, it is common to utilize individual/psychological level responses to examine the psychometric properties of a safety climate measure such as construct validity or measurement equivalency (e.g., Cigularov et al., 2013). Thus, the present study utilizes lone workers’ individual/psychological level of safety climate perceptions to investigate the external validity of a generic safety climate scale for lone workers by examining its measurement equivalence. 1.3. Generic safety climate scale for lone workers The generic safety climate scale, developed by Zohar and Luria (2005), consists of 16 items each for its organization- and grouplevel sub-scales (32 items total) and was grounded on a single managerial commitment dimension, which is similar to Griffin and Neal’s (2000) global safety climate factor. Organization- and group-level in this context indicate the different referent objects of employees’ safety climate perceptions, rather than levels of analysis. Zohar (2008, 2010) emphasized the need for discrimination between the workers’ perception of safety priorities of top management (i.e., organization-level) and those of work group or team supervisors (i.e., group-level) in understanding safety climate and proposed these two distinct entities of safety climate. The organization-level safety climate sub-scale is intended to measure workers’ perceptions of the instituted company procedures and top-management actions for promotion of safety, and the grouplevel safety climate sub-scale is intended to measure workers’ perceptions of the direct supervisory and workgroup practices for safety.

139

A shorter version of the generic safety climate scale, which was incorporated in the above mentioned trucking (Huang et al., 2013a) and utility (Huang et al., 2013b) industry-specific safety climate scales, also displayed satisfactory predictive validity. This scale consists of 12 items (six items each for organization- and grouplevel safety climate sub-scales) from Zohar and Luria’s (2005) full 32-item generic safety climate scale. Items were chosen by the research team because they are especially relevant to key safety issues of moblie lone workers, such as truckers and utility workers. Among the trucking industry sample (n = 7466), organization- (six items) and group-level (six items) generic safety climate sub-scales explained a substantial amount of drivers’ safety behavior variance (i.e., 10–14%) and could predict lost days due to injuries significantly (Huang et al., 2013a). Similarly, in Huang et al.’s (2013b) study, the same organization- and group-level generic safety climate sub-scales could explain a significant amount of variance (i.e., 6–12%) in the safety behavior among utility workers (n = 2451). From these findings, it can be inferred that the 12-item generic safety climate scale for lone workers can be used to predict future safety outcomes in certain industries characterized by lone working employees. 1.4. Testing external validity of the generic safety climate scale for lone workers Although the predictive validity of the 12-item generic safety climate scale for lone workers in the trucking and utility industries is an encouraging finding in support of the scale’s criterion-related validity, external validity of this generic portion of safety climate scale is yet to be examined. External validity in this context refers to the scale’s generalizability across different industries or companies. In other words, it is the extent to which the scale has equal measurement implications across different lone work settings. If the safety climate scale’s external validity is poor, then the scale cannot be “generic” because what the scale actually measures might be heterogeneous and what the item scores of the scale imply would be considerably different across multiple industries and companies. For example, when the validity is poor, two industries (or companies) with the same scores in the items of the generic safety climate scale may need different interpretation schemes for the scores because the same items represent distinct notions regarding safety climate across the two industries (or companies). Conversely, if the scale’s external validity is demonstrated, the items would have consistent and interchangeable meanings and same or similar approaches are likely to be applicable to both industries (companies) for safety promotion, which is more efficient. If common and persistent features of safety climate across multiple lone working situations can be identified, they can serve as a starting point for organizational safety policy development and practice. Moreover, they could also be used as an efficient prospective safety evaluation tool because they are generalizable and more concise than industry-specific safety climate measures. Furthermore, testing the presence or absence of common (generic) aspects of safety climate across different industries and companies is theoretically important. Strong evidence for the presence of global features of safety climate among lone workers would provide a good starting point for the investigation of safety climate’s emergence and development among lone workers. 1.5. Using measurement equivalence analysis to examine the external validity of survey scale Measurement equivalence (a.k.a., measurement invariance) analysis can be used to examine the external validity of the generic safety climate scale for lone workers. Measurement

140

J. Lee et al. / Accident Analysis and Prevention 63 (2014) 138–145

equivalence (ME) means “equality of item level and sub-scale level true scores for persons with identical latent scores, irrespective of group membership” (Raju et al., 2002, p. 527) and can be tested by confirmatory factor analysis (CFA) in combination with item response theory (IRT) (Raju et al., 2002; Reis et al., 1993). CFA, a linear modeling method based on classical test theory, tests a hypothesized factor structure of observed variables (i.e., item scores). It also assesses magnitudes of relationships between the observed variables and their underlying latent factors, error variances and intercepts of the observations, and means, variances, and inter-relationships among the latent factors. Goodness of fit can be determined by resulting model fit indexes such as chisquare (2 ), comparative fit index (CFI), and root mean square error of approximation (RMSEA). CFA-based ME testing can be performed by comparing a series of models with different numbers of equal-parameter constraints. A scale with poor ME within a CFA paradigm is likely to be understood or interpreted distinctively by different groups or measurement situations (Cigularov et al., 2013). On the other hand, IRT is a non-linear modeling technique based on probability theory and investigates scale items’ functioning in the measurement of a purported ability or trait. One can assume that items of a scale are designed to assess a particular psychological attribute (e.g., mental ability or safety perception) such that higher ratings on the items indicate a stronger psychological attribute (e.g., higher intelligence or stronger safety perception). If so, then it can be concluded that the items are not functioning properly if respondents who have wide variation in terms of the target psychological attribute give similar and undiscriminating endorsements to the scale’s items. In the IRT framework, item functioning is determined by different item parameters (Hambleton et al., 1991). Specifically, item guessing (for binary response items only), difficulty, and discrimination parameters can be derived based on the observed probabilities that a scale respondent would give a certain response to a particular item. When a scale is administered across multiple groups or situations, items with differential functioning in terms of the item parameters can be examined. This is known as differential item functioning (DIF). Unlike the CFA-based approach, which is based on classical test theory, IRT parameter estimation and DIF detection are not affected by across-group differences such as ability, gender, age, and culture. If a scale has DIF items that influence the overall scale’s functioning across different groups, weakness or absence of ME is implied. This means that the scale would not be a good criterion for the target trait discrimination in certain groups or measurement settings. More details of the two different approaches for testing ME will be presented in Section 2. In sum, the CFA-based ME approach can provide information regarding the consistency in measurement constructs in general and unique response patterns of the constructs in particular measurement contexts. On the other hand, the IRT-based ME approach can present information about the heterogeneity in individual items’ functioning across different measurement contexts independent of the characteristics of the sample.

Table 1 Generic safety climate scale items for lone workers derived from Zohar and Luria (2005) and published in Huang et al. (2013a,b). Organization-level safety climate Top management at this company – 1. Reacts quickly to solve the problem when told about safety concerns Is strict about working safely when delivery falls behind schedule 2. Uses any available information to improve existing safety rules 3. Invests a lot in safety training for workers 4. Listens carefully to our ideas about improving safety 5. 6. Tries to continually improve safety levels in each department Group-level safety climate My direct supervisor – Discusses with us how to improve safety 1. Compliments employees who pay special attention to safety 2. Is strict about working safely even when we are tired or stressed 3. Frequently talks about safety issues throughout the work week 4. 5. Refuses to ignore safety rules when work falls behind schedule 6. Uses explanations (not just compliance) to get us to act safely

2. Method 2.1. Data collection and participants Data were collected via paper-and-pencil or online-based questionnaires across three different industries in the United States. Trucking and electrical utility company data were collected for industry-specific safety climate scale development purposes (Huang et al., 2013a,b). From the eight trucking companies, a total of 7474 observations were made (mean n = 934.3, ranged from 234 to 3578). Responses of 2421 electrical workers from two electric power distribution and maintenance companies (mean n = 1210.5, ranged from 861 to 1560) were obtained as well. Additionally, data from the cable television industry were collected. From a large-size cable television company in the U.S., 540 responses were obtained. In total, 10,435 lone workers from the 11 different companies across the three industries were used in the final sample. In order to protect the confidentiality of the participants, any personally identifying information was removed from the dataset. Response rates ranged from 34% to 74% (mean = 49%) and the mean age of the participants was 42.6 years (standard deviation = 15.1). Thirty-five percent (35%) of the sample reported that they worked for their current company for more than one but less than five years. Twenty-six percent (26%) reported that they had worked for the company for more than 16 years. Fifteen percent (15%) worked less than one year, while another 15% worked for the company between six and ten years. Nine percent (9%) of the employees’ company tenure was between 11 and 15 years. To handle missing responses (3.1%), fullinformation maximum likelihood method (FIML; Schafer, 1997) and marginal maximum likelihood (MML; Johnson, 2007) algorithms were utilized, respectively, for the CFA- and the IRT-based ME testing.

2.2. Safety climate measures 1.6. Study purpose The goal of this study was to examine the external validity of the 12-item generic safety climate scale for lone workers in order to evaluate the appropriateness of the generalized use of the scale to detect safety climate across numerous lone work settings. External validity of the generic safety climate scale for lone workers was tested based on ME across different industries and companies, utilizing two different psychometric perspectives, CFA and IRT.

Among the original 32 items of the organization- (16 items) and group-level (16 items) generic safety climate sub-scales (Zohar and Luria’s, 2005), six items that best describe lone working situations were chosen for the organization-level sub-scale, and another six for the group-level sub-scale. A 5-point Likert scale was used (1 = strongly disagree to 5 = strongly agree). The items are presented in Table 1. Internal consistency statistics for the organization- and group-level generic safety climate sub-scales for lone workers were ˛ = .90 and .82, respectively.

J. Lee et al. / Accident Analysis and Prevention 63 (2014) 138–145

2.3. Data analysis 2.3.1. Linear method (CFA)-based ME testing A major goal of CFA-based ME testing is to investigate whether the implications of the factors, relationships between the factors and items, and scaling of the factor items and factors are consistent across different measurement settings. Vandenberg and Lance (2000) proposed a series of steps to test ME based on a multi-group CFA approach. In the current study, 11 different companies across three industries were regarded as different measurement groups.

1. Configural Invariance. The first step was to test structural invariance of the measurement model (e.g., number of scale items and factors). Specification of the factor structure with given items was hypothesized to be invariant across different industries and companies. 2. Metric Invariance. In the second step, factor loadings were constrained to be equal across the multiple groups. The underlying assumption of this metric invariance model is that every item of the scale equally contributes to its factor(s) across the 11 companies. 3. Residual Invariance. Residuals (errors) of the item ratings were constrained to be identical across the 11 companies in this step. Reliability of the items in terms of measurement error was investigated in this residual invariance model. To state it differently, whether the safety climate factor can explain similar levels of variance in each item’s response can be examined. 4. Factor Variance Invariance. Variance of the latent factor was constrained to be equal across the 11 lone working companies with an assumption that calibration of the latent factor was not different across the 11 companies. If factor variance invariance is not supported, this indicates that a range of response options used by the respondents are not identical and comparison of the factor scores across the companies might not be valid. 5. Scalar Invariance. In the last step, intercepts of the item scores were assumed to be equal across the multiple lone work settings in addition to the previous steps’ equal factor structure and factor loading constraints. This step evaluates a systematic response bias or response threshold difference in certain measurement conditions. If intercepts in the ratings from a particular company are higher (or lower) than those from other companies, this indicates lone workers in the particular company show lenient (or rigorous) rating tendency. Or, lone workers from that particular company may have a different threshold for safety climate scale items such that similar safety practices or policies are viewed as less or more safe across different companies because of the heterogeneity in a companies’ general level of hazard and characteristics of work systems. Residuals of items and variance of a measurement model’s latent factors are independent from the item intercepts (Lee, 2013). Therefore, the scalar invariance testing can be performed before or after the previous two steps (residual and factor variance equivalence) of ME testing. Also, the meaning of intercept invariance in the current study is not as critical as in other studies where the primary goal is to examine actual differences across groups of people (e.g., Wicherts and Dolan, 2010). Thus, the scalar invariance was tested in the end of the ME analysis. 6. Generally, two additional steps for testing, factor covariance invariance and factor mean invariance, follow the factor variance invariance model. However, our generic safety climate subscales for lone workers have only one latent factor, respectively, for the organization- and group-level sub-scales, and there is no factor-covariance to be constrained. Also, the full information maximum likelihood (FIML) method was used in order to manage missing values. Subsequently, the factor mean was fixed to

141

zero from the beginning of the CFA-based ME testing and factor mean invariance could not be examined in this particular context. In sum, ME was tested based on CFA across the five steps with different numbers of equal parameter constraints. If the current step of ME testing reports acceptable model fit indices (e.g., CFI over .95 and RMSEA not greater than .08), the next step of ME testing can be conducted. Also, the acceptable model fit indexes should remain without significant worsening compared to those of the previous step of ME testing, while the degrees of freedom increase as additional equal-parameter constraints are imposed across the five steps. Our criteria for significant model fit deterioration were CFI decreasing greater than .02 (Cheung and Rensvold, 2002) and non-overlap of RMSEA 90% confidence interval (Wang and Russell, 2005). If the two criteria are met, it would be concluded that there was significant deterioration in ME. A 2 change was not used for model fit comparison due to its sensitivity to large sample sizes. 2.3.2. Non-linear method (IRT) based ME testing Graded response modeling (GRM) for polytonomous response data (Samejima, 1969) was implemented because the safety climate sub-scales were calibrated on a 5-point Likert scale. Differential item functioning (Choi et al., 2011) was examined by utilizing the R open source package lordif. It identifies DIF items by comparing the three logistic regression models as shown below. logit P(ui ≥k) = ˛k + ˇ1 × trait

(Model 1)

logit P(ui ≥k) = ˛k + ˇ1 × trait + ˇ2 × group

(Model 2)

logit P(ui ≥k) = ˛k + ˇ1 × trait + ˇ2 × group + ˇ3 × trait × group

(Model 3)

“Trait” refers to safety climate while “group” refers to 11 different companies from the three different industries in our research context. P(ui ≥ k) denotes the cumulative probabilities that an item’s particular response ui falls into the item response category k or higher; ˛ indicates an intercept of a regression equation. As our safety climate sub-scales have five response categories, i and k vary from one to five. Uniform DIF, indicating whether there is a significant group effect in the prediction of a particular response, can be assessed by analyzing log likelihood difference values of Models 1 and 2. If Model 2 shows better fit than Model 1, this means that respondents’ industry or company membership can significantly explain respondents’ item ratings. Non-uniform DIF can be investigated by comparing Models 2 and 3. It examines an interaction effect of the trait and group. If there is a significant interaction between the trait and group, this indicates that the extent to which the target trait can explain respondents’ endorsement in an item varies systematically by the respondents’ membership to particular industries or companies. Finally, an overall DIF effect can be evaluated by comparing Models 1 and 3. McFadden’s pseudo R2 values were used for model comparison and DIF item identification (Jodoin and Gierl, 2001; Kim et al., 2007) for two reasons. First, widely used model comparison criterion 2 (Swaminathan and Rogers, 1990) is largely dependent on sample size and increases the chance of Type II error (Cohen, 1988). Second, alternative pseudo R2 statistics, such as Cox and Snell’s or Nagelkerke’s, do not provide meaningful interpretation (Choi et al., 2011; Mittlböck and Schemper, 1999). Threshold ˛ was 1% (Crane et al., 2007) and significant R2 difference criterion was greater than .02, indicating a small but non-negligible effect size

142

J. Lee et al. / Accident Analysis and Prevention 63 (2014) 138–145

Table 2 CFA-based ME testing. 2 (df)

CFI

RMSEA (90% C.I.)

A. Organization-level safety climate sub-scale Step 1: configural invariance model Step 2: metric invariance model Step 3: residual invariance model Step 4: factor variance invariance model Step 5: scalar invariance model

909.35 (99) 1159.38 (149) 1790.55 (209) 1918.61 (219) 9481.50 (279)

.995 .994 .991 .990 .948

.028 (.026, .030) .026 (.024, .027) .027 (.026, .028) .027 (.026, .028) .056 (.055,.057)

B. Group-level safety climate sub-scale Step 1: configural invariance model Step 2: metric invariance model Step 3: residual invariance model Step 4: factor variance invariance model Step 5: scalar invariance model

803.88 (99) 1027.31 (149) 1685.10 (209) 1872.06 (219) 6178.92 (279)

.996 .995 .991 .990 .964

.026 (.024, .028) .024 (.022, .025) .026 (.025, .027) .027 (.026, .028) .045 (.044,.046)

Notes: df: degree of freedom; CFI: comparative fit index; RMSEA: root mean square error of approximation; C.I.: confidence interval. Boldface: MI criterion exceeded. Step 1: configural invariance model (equal factor structure). Step 2: metric invariance model (equal factor structure + equal factor loadings). Step 3: invariant residual model (equal factor structure + equal factor loadings + equal residuals). Step 4: invariant factor variance model (equal factor structure + equal factor loadings + equal residuals + equal factor variance). Step 5: scalar invariance model (equal factor structure + equal factor loadings + equal intercepts + equal residuals + equal factor variance).

change (Cohen, 1988).1 Additionally, a visual analysis of the test characteristic curve (TCC) can supplement the decision regarding DIF item detection. If the identified DIF items show relatively less variation in TCC across the multiple groups and, if the overall TCC is not largely influenced by the DIF items, the items may be used in different situations, but with caution. 3. Results 3.1. CFA-based ME testing CFA results for the organization- and group-level safety climate sub-scales for lone workers are presented in Table 2. For both subscales, ME was supported to the factor variance invariance (step 4) level. Thus, the measurement structure of the generic safety climate scale was shown to be stable and its items consistently exhibited the underlying dimension (i.e., safety climate) to an equal extent across the 11 companies from the three different lone working industries. Inclusion of item residual invariance constraints (step 3) to the metric invariance model (step 2) and additional constraint on factor variance (step 4) to the residual invariance model (step 3) did not yield significant model fit deterioration in terms of CFI and RMSEA for either the organization- or group-level generic safety climate sub-scales. Also, the model fit indexes, such as CFIs greater than .95 and RMSEAs smaller than .05, supported goodness of fit of the residual and factor variance invariance models. However, once equal-parameter constraints were imposed on intercepts of the items (step 5), CFI decreases were greater than .02 and overlaps of RMSEA confidence intervals disappeared. These findings suggest that lone workers from certain companies or industries may have lenient or rigorous response patterns. Also, different levels of response thresholds across the companies might have played a role. The mean of intercepts across the six items of the organization-level safety climate sub-scale was 3.43, ranging from 1.79 (company 11; a company from the cable television industry) to 4.11 (company 6; a company from the trucking industry). Similarly, the mean of the intercepts of the group-level safety climate sub-scale’s six items was highest in company 6 with a mean = 3.82 and lowest in company 11 with a mean = 2.08. In order to see whether the higher or lower intercepts actually resulted in

1 These are a few examples of widely accepted DIF item detection criteria. One may choose to adjust the criteria for study purposes (Choi et al., 2011). Thus, a decision about whether to include or exclude the identified DIF items from a scale needs to be based on various theoretical or methodological considerations. The same is true for CFA-based ME testing.

higher or lower sub-scale scores across different industries, the mean scores of the sub-scales were examined (Table 3). Across the trucking, electrical utility, and cable television industries, mean scores of the organization-level safety climate sub-scale were 3.97 (SD = .83), 3.12 (SD = .77), and 1.79 (SD = .71), respectively. Counterparts for the group-level safety climate sub-scale were 3.69 (SD = .89), 3.47 (SD = .75), and 2.08 (SD = .75), respectively. Leniency, or a lower threshold in safety climate rating, was implied in the trucking industry, while rigor or a higher threshold was suggested in the cable television industry. The results may also be the by-product of workers’ general satisfaction with their organizations or supervisors (i.e., halo effect). Also, presence or absence of recent accident events and organizational safety efforts might have influenced the workers’ safety alertness and this could, subsequently, adjust the workers’ psychological anchors for safety climate perception. Future studies are required to investigate the root causes of the difference in safety climate across the different companies. While a statistically significant breakdown of ME was observed in the scalar invariance testing models for organization- and group-level sub-scales, scalar invariance testing models for these sub-scales reported CFIs close to or greater than .95 and RMSEAs smaller than .08. This means that the inclusion of equal item intercept constraints caused significant model fit worsening, but the effect was not so great as to drive the scalar invariance models to be poor in absolute terms. The commensurability of the generic safety climate dimension across the numerous lone worker settings was supported, given that the measurement equivalence of factor variance was established (Vandenberg and Lance, 2000). To summarize the findings from the CFA-based ME testing, the generic safety climate scale for lone workers showed fairly strong ME across the 11 companies from three different industries, even though noticeable across-company variations in intercepts of item ratings were detected. 3.2. IRT-based ME testing Results from the IRT-based ME testing are presented in Tables 4A and 4B. Only one item (item #6: “Tries to continually improve safety levels in each department”) of the organization-level generic safety climate sub-scale showed differential functioning across the 11 different lone worker companies. For this item, McFadden R2 differences were greater than .02 for 2 the uniform (McFadden R1−2 = .035) and total DIF testing mod2 els (McFadden R1−3 = .038). The McFadden R2 difference was not greater than .02 for the non-uniform DIF testing model (McFadden 2 = .003), indicating a statistically non-significant interaction R2−3

J. Lee et al. / Accident Analysis and Prevention 63 (2014) 138–145

143

Table 3 Means and standard deviations of the organization- and group-level safety climate sub-scales across the 11 companies. Industry

Companies

Organization-level safety climate

Group-level safety climate

Mean

S.D.

Mean

S.D.

1 2 3 4 5 6 7

4.05 3.68 2.82 3.72 3.81 4.11 3.90

.75 .83 .96 .94 .89 .74 .86

3.78 3.41 2.74 3.32 3.19 3.81 3.85

.86 .92 .85 1.00 .93 .81 .88

Electrical utility industry

8 9 10

3.64 3.16 3.03

.97 .77 .77

3.20 3.51 3.41

1.00 .71 .81

Cable TV industry

11

1.79

.71

2.08

.75

Total

3.65

.99

3.55

Trucking industry

Table 4A IRT-based ME testing of the organization-level generic safety climate sub-scale: DIF by McFadden R2 statistics. Item

n of categories

Uniform DIF (McFadden 2 ) R1−2

Total DIF (McFadden 2 ) R1−3

Non-uniform DIF (McFadden 2 ) R2−3

1 2 3 4 5 6

5 5 4 4 5 4

.008 .002 .003 .017 .013 .035

.008 .003 .004 .017 .014 .038

.001 .002 .001 .000 .001 .003

Note: DIF: differential item functioning; Significant R2 difference ≥ .02 (Cohen, 1988).

between safety climate and company membership. Fig. 1 is a graphical illustration of the impact of the DIF item on the overall organization-level generic safety climate sub-scale. Even though the identified DIF item showed noticeably different functioning

.93

Table 4B IRT-based ME testing of the group-level generic safety climate sub-scale: DIF by McFadden R2 statistics. Item

n of categories

Uniform DIF (McFadden 2 ) R1−2

Total DIF (McFadden 2 ) R1−3

Non-uniform DIF (McFadden 2 ) R2−3

1 2 3 4 5 6

5 5 5 5 5 5

.007 .012 .008 .018 .003 .012

.010 .012 .010 .019 .004 .014

.004 .001 .001 .001 .001 .001

Note: DIF: differential item functioning; Significant R2 difference ≥ .02 (Cohen, 1988).

across the 11 different lone working companies (Fig. 1 left), its impact on the entire six-item organization-level generic safety climate sub-scale was minimal. Functioning of the sub-scale showed almost perfect overlap across the 11 companies (Fig. 1 right). Strong

Fig. 1. IRT-based ME testing of the organization-level generic safety climate sub-scale: Test characteristic curves (TCC) with all items and only with the item with significant differential item functioning across the 11 companies. Note: theta = safety climate. 1–11 in the figure legend indicate 11 different companies.

144

J. Lee et al. / Accident Analysis and Prevention 63 (2014) 138–145

ME was supported for the organization-level generic safety climate sub-scale for lone workers. None of the six items in the group-level generic safety climate sub-scale were identified as having DIF. This means that all of the McFadden R2 differences were below .02 across the uniform, total, and non-uniform DIF testing models and the sub-scale’s functioning in measurement of the group or supervisor referred generic safety climate across the 11 companies was equal. Thus, the findings support ME in the IRT framework. 4. Discussion The present study aimed to evaluate the external validity of the generic safety climate scale for lone workers across multiple lone work settings. Two different methodological frameworks, CFA and IRT, were adopted to assess the scale’s ME, an indicator of external validity. The CFA results showed that the ME of the generic safety climate scale for lone workers is compelling. Although intercepts of the scale items varied significantly across the 11 different companies from the trucking, electrical utility, and cable television industries, the measurement model’s fit indexes such as CFI and RMSEA remained good even after including equal factor loadings, item residuals, factor variance, and item intercepts constraints. This means that the equal parameter measurement model of the generic safety climate scale for lone workers provided a good representation of the data; accordingly, the imposed equal parameters across the multiple lone working companies are plausible. Lone workers’ safety climate perception measured by the generic safety climate scale is congruent across different lone worker settings and the scale is commensurable. Additionally, such results support the construct validity of the generic safety climate scale for lone workers. The IRT analyses also showed strong support for ME of the generic safety climate scale for lone workers. Only one of the organizationlevel items was identified as a differentially functioning item (i.e., DIF); however, its impact on the entire six-item sub-scale was minimal (Fig. 1). Further, none of the six items of the group-level generic safety climate sub-scale reported differential functioning across the 11 companies. To summarize the results from the CFA and IRT analyses, the generic safety climate scale was shown to have robust external validity in terms of measurement construct and item functioning in the measurement of the scale’s target trait, generic safety climate for lone workers. Based on the generic safety climate scale, a time-efficient evaluation of potential safety risks can be made and the results from numerous lone working environments can be compared in a valid way using a generic safety climate scale. Unlike other safety diagnostic criteria such as the safety diagnostic questionnaire (SDQ; Hoyos and Ruppert, 1995) and safety diagnostic criteria (SDC; Tinmannsvik and Hovden, 2003), which have more than 100 items, the generic safety climate scale is very brief (12 items). Also, even though the generic safety climate scale may not provide as much comprehensive and specific information as industry-specific safety climate scales, its criterion-related validity has been wellsupported (Huang et al., 2013a,b). Considered jointly, the generic safety climate scale has clear advantages over industry-specific safety climate scales in terms of conciseness and generalizability when time and logistics are of concern. The scale, which is brief and psychometrically sound, may be administered to lone workers before they are dispatched to their work sites where direct safety supervision from management is likely to fade. The importance of risk prevention, such as double-checking all safety needs and prioritization of safety over other competing organizational demands can be communicated to those who have scored these as low company priorities. The findings of the present study also advocate the presence of a global safety climate dimension among the lone working

population. Lone workers from different industry sectors and companies share common aspects of safety climate that are primarily regarding managers’ and frontline supervisors’ attitudes and actions for the safety of their employees. This is congruent with the finding of Flin et al. (2000) about a common safety climate feature. Based on 17 published studies (total sample size over 20,000) that attempted to measure safety climate in industrial settings (such as nuclear power plants, gas company depots, construction sites, and oil companies), it was shown that 72% of the safety climate themes were about a “management” factor, thereby making it the most typical safety climate dimension. The importance of managerial roles in influencing safety climate (Simard and Marchand, 1994, 1995) was confirmed once again by the findings of the current study. Also, a developmental trajectory of the generic safety climate and a more sophisticated industry-specific safety climate can now be examined. General safety-friendly management practices can be specialized to a particular work setting of lone workers. Conversely, the generalized safety management factors can be achieved from the industry- or company-specific safety concerns of the management staff. Important implications about causes or catalysts for the emergence of safety climate among lone workers can be obtained from attempts to answer these research questions. Another contribution of the current study is its call for the examination of ME and introduction of relevant research methods. Establishment of ME in terms of both measurement structure and item functioning is critical for in-depth examination of the causes of workers’ low or high safety climate perceptions across multiple organizations because lack of ME can mask (e.g., lenient rating tendency) or exaggerate (e.g., harsh rating tendency) the true safety climate of organizations. Also, the legitimacy of safety climate score comparison across numerous organizations for safety inspection or audit can be compromised if ME of the safety climate measurement is weak because the low ME safety climate measurement may reflect something other than safety climate (e.g., safety non-related worker-leader relationship). The CFA and IRT frameworks can be applied to other studies aimed at testing the external validity of many different safety criteria. Results from the two perspectives are complementary and critical for validation of a psychological measure. The CFA approach evaluates the invariance of the conceptual structure of the measurement, while the IRT approach tests the validity of individual items in target trait measurement. Since the combined use of the two psychometric perspectives was introduced to the organizational research domain (Raju et al., 2002; Reis et al., 1993), only a few safety studies have adopted the methods. This study has some limitations. Although a large sample size was utilized (n = 10,435), the types of industries and number of companies were limited. Our data were collected not by study design but by availability of 11 different companies from three different industries (i.e., eight trucking companies, two electrical utility companies, and one cable television company). In order to ensure the most robust external validity of the generic safety climate scale, additional data from different types of lone working industries and companies need to be utilized. Also, the results of the CFA-based ME testing showed that intercepts of item ratings substantially vary across the numerous lone working environments. A simple comparison of raw scores of the generic safety climate scale across different lone work settings may not be accurate, as possibilities of significant response bias or discrepancies in response threshold were suggested. In addition, IRT-based ME testing identified one DIF item (item #6: “Tries to continually improve safety levels in each department”) from the organization-level generic safety climate sub-scale. Whether this item is truly functioning differently across different lone working environments needs to be examined after the inclusion of additional data from different industries and companies. Finally, what needs to be noted is that

J. Lee et al. / Accident Analysis and Prevention 63 (2014) 138–145

the existence of the global safety climate dimension does not rule out the need for company- or industry-specific safety climate scales for lone workers. Every industry and company has its own unique safety climate features (Zohar, 2010) and they need to be understood thoroughly for an individual organization’s comprehensive safety innovations (e.g., system error identification, system design, development, and implementation of safety training; employee participation encouragement; renewal of safety management policies). The generic safety climate scale is most appropriate for quick safety inspections. Previous studies have shown that industryspecific safety climate, which includes 12 items of the generic safety climate scale, could explain more variance in safety performance and outcome variables than the 12-item generic safety climate scale alone (Huang et al., 2013a,b). It is expected that the generic safety climate scale for lone workers can be used across a wide variety of lone work settings and contribute to lone workers’ safety. Acknowledgements The authors wish to thank the following team members for their invaluable assistance: Dov Zohar (Technion-Israel Institute of Technology); Mo Wang (University of Florida); Garry Gray (HSPH); Marvin Dainoff, Susan Jeffries, Peg Rothwell, Jacob Banks, Niall O’Brien, and Ryan Powell (LMRIS); Jennifer Rineer (Portland State University) for data collection, analysis and general assistance; and Dave Melton, Dave Money, Jim Houlihan (Liberty Mutual), and Keith Herzig (Herzig Hauling) for technical consulting. References Beus, J.M., Payne, S.C., Bergman, M.E., Arthur, W., 2010. Safety climate and injuries: an examination of theoretical and empirical relationships. Journal of Applied Psychology 95, 713–727. Christian, M.S., Bradley, J.C., Wallace, J.C., Burke, M.J., 2009. Workplace safety: a meta-analysis of the roles of person and situation factors. Journal of Applied Psychology 94, 1103–1127. Cheung, G.W., Rensvold, R.B., 2002. Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling 9, 233–255. Cheyne, A., Cox, S., Oliver, A., Tomás, J.M., 1998. Modeling safety climate in the prediction of levels of safety activity. Work & Stress 12, 255–271. Cigularov, K., Adams, S., Gittleman, J., Haile, E., Chen, P.Y., 2013. Measurement equivalence and mean comparisons of a safety climate measure across construction trades. Accident Analysis and Prevention 51, 68–77. Choi, S.W., Gibbons, L.E., Crane, P.K., 2011. lordif: an R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/items response theory and Monte Carlo simulations. Journal of Statistical Software 39, 1–30. Cohen, J., 1988. Statistical Power Analysis for the Behavioral Sciences, 2nd ed. Lawrence Erlbaum, Hillsdale, NJ. Crane, P.K., Gibbons, L.E., Ocepek-Welikson, K., Cook, K., Cella, D., 2007. A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Quality of Life Research 16, 69–84. Ferret, E., Hughes, P., 2009. Introduction to Health and Safety at Work, 4th ed. Butterworth-Heinemann, Oxford, UK. Flin, R., Mearns, K., O’Connor, P., Bryden, R., 2000. Measuring safety climate: identifying the common features. Safety Science 34, 17–192. Griffin, M.A., Neal, A., 2000. Perceptions of safety at work: a framework for linking safety climate to safety performance, knowledge, and motivation. Journal of Occupational Health Psychology 5, 347–358. Hambleton, R.K., Swaminathan, H., Rogers, H.J., 1991. Fundamentals of Item Response Theory. Sage Press, Newbury Park, CA. Health and Safety Executive, 2009. Working Alone in Safety: Health and Safety Guidance on the Risks of Lone Working. http://www.hse.gov.uk/publish/indg73.pdf (retrieved 02.18.13).

145

Hoyos, C.G., Ruppert, F., 1995. Safety diagnosis in industrial work settings: the safety diagnosis questionnaire. Journal of Safety Research 26, 107–117. Huang, Y.H., Chen, P.Y., Grosch, J.W., 2010. Safety climate: new developments in conceptualization, theory, and research. Accident Analysis and Prevention 42, 1421–1422. Huang, Y.H., Zohar, D., Robertson, M.M., Garabet, A., Lee, J., Murphy, L.A., 2013a. Development and validation of safety climate scales for lone workers using truck drivers as exemplar. Transportation Research Part F: Traffic Psychology and Behavior 17, 5–19. Huang, Y.H., Zohar, D., Robertson, M.M., Garabet, A., Murphy, L.A., Lee, J., 2013b. Development and validation of safety climate scales for remote workers using utility/electric workers as exemplar. Accident Analysis and Prevention 59, 76–86. Jodoin, M.G., Gierl, M.J., 2001. Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education 14, 329–349. Johnson, M.S., 2007. Marginal maximum likelihood estimation of item response models in R. Journal of Statistical Software 20, 1–24. Kim, S.H., Cohen, A.S., Alagoz, C., Kim, S., 2007. DIF detection effect size measures for polytomously scored items. Journal of Educational Measurement 44, 93–116. Lee, J., 2013. Measurement invariance of assessment center ratings: consistency of dimensional constructs across exercises. Unpublished master’s thesis. University of Connecticut, Storrs, Connecticut, USA. Mittlböck, M., Schemper, M., 1999. Computing measures of explained variation for logistic regression models. Computer Methods and Programs in Biomedicine 58, 17–24. Nahrgang, J.D., Morgeson, F.P., Hofmann, D.A., 2011. Safety at work: a metaanalytic investigation of the link between job demands, job resources, burnout, engagement, and safety outcomes. Journal of Applied Psychology 96, 71–94. Raju, N.S., Laffitte, L.J., Byrne, M.B., 2002. Measurement equivalence: a comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology 87, 517–529. Reis, S.P., Widaman, K.F., Pugh, R.H., 1993. Confirmatory factor analysis and item response theory: two approaches for exploring measurement invariance. Psychological Bulletin 114, 552–566. Samejima, F., 1969. Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement 34, 100–114. Schafer, J.L., 1997. Analysis of incomplete multivariate data. Monographs on Statistics and Applied Probability 72, 401–414. Simard, M., Marchand, A., 1994. The behavior of first-line supervisors in accident prevention and effectiveness in occupational safety. Safety Science 17, 169–185. Simard, M., Marchand, A., 1995. A multilevel analysis of organisational factors related to the taking of safety initiatives by workgroups. Safety Science 21, 113–129. Swaminathan, H., Rogers, H.J., 1990. Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement 27, 361–370. Tinmannsvik, R.K., Hovden, J., 2003. Safety diagnosis criteria – development and testing. Safety Science 41, 575–590. Vandenberg, R.J., Lance, C.E., 2000. A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. Organizational Research Methods 3, 4–70. Wang, M., Russell, S.S., 2005. Measurement equivalence of the job descriptive index across Chinese and American workers: results from confirmatory factor analysis and item response theory. Educational and Psychological Measurement 65, 709–732. Wicherts, J.M., Dolan, C.V., 2010. Measurement invariance in confirmatory factor analysis; an illustration using IQ test performance of minorities. Educational Measurement: Issues & Practice 29, 39–47. Zohar, D., 1980. Safety climate in industrial organizations: theoretical and applied implications. Journal of Applied Psychology 65, 96–102. Zohar, D., 2008. Safety climate and beyond: a multi-level multi-climate framework. Safety Science 46, 376–387. Zohar, D., 2010. Thirty years of safety climate research: reflections and future directions. Accident Analysis and Prevention 42, 1517–1522. Zohar, D., Luria, G., 2005. A multilevel model of safety climate: cross-level relationships between organization and group-level climates. Journal of Applied Psychology 90, 616–628.

External validity of a generic safety climate scale for lone workers across different industries and companies.

The goal of this study was to examine the external validity of a 12-item generic safety climate scale for lone workers in order to evaluate the approp...
622KB Sizes 0 Downloads 0 Views