Accident Analysis and Prevention 89 (2016) 62–73

Contents lists available at ScienceDirect

Accident Analysis and Prevention journal homepage: www.elsevier.com/locate/aap

A hybrid finite mixture model for exploring heterogeneous ordering patterns of driver injury severity Lu Ma a , Guan Wang a , Xuedong Yan a,∗ , Jinxian Weng b a MOE Key Laboratory for Urban Transportation Complex Systems Theory and Technology, School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, PR China b College of Transport and Communications, Shanghai Maritime University, Shanghai 201306, PR China

a r t i c l e

i n f o

Article history: Received 20 June 2015 Received in revised form 13 December 2015 Accepted 10 January 2016 Keywords: Highway safety Mixture model EM algorithm Injury severity Heterogeneous ordering pattern

a b s t r a c t Debates on the ordering patterns of crash injury severity are ongoing in the literature. Models without proper econometrical structures for accommodating the complex ordering patterns of injury severity could result in biased estimations and misinterpretations of factors. This study proposes a hybrid finite mixture (HFM) model aiming to capture heterogeneous ordering patterns of driver injury severity while enhancing modeling flexibility. It attempts to probabilistically partition samples into two groups in which one group represents an unordered/nominal data-generating process while the other represents an ordered data-generating process. Conceptually, the newly developed model offers flexible coefficient settings for mining additional information from crash data, and more importantly it allows the coexistence of multiple ordering patterns for the dependent variable. A thorough modeling performance comparison is conducted between the HFM model, and the multinomial logit (MNL), ordered logit (OL), finite mixture multinomial logit (FMMNL) and finite mixture ordered logit (FMOL) models. According to the empirical results, the HFM model presents a strong ability to extract information from the data, and more importantly to uncover heterogeneous ordering relationships between factors and driver injury severity. In addition, the estimated weight parameter associated with the MNL component in the HFM model is greater than the one associated with the OL component, which indicates a larger likelihood of the unordered pattern than the ordered pattern for driver injury severity. © 2016 Elsevier Ltd. All rights reserved.

1. Introduction Traffic injury severity is an important safety concern of the transportation system. Although many countries have reported a decrease in road traffic injuries and deaths, the numbers remain at an unacceptable level (WHO, 2015). It clearly underscores the requirement for understanding the influence of various factors on the severity of injury sustained by crash-involved motor vehicle occupants, and for providing insights and suggestions on improving the safety of the transportation system and preventing future accidents. Past injury-severity studies have merits in terms of the development of econometrical approaches for properly modeling injury severity as well as understanding the patterns of influence of factors. Specifically, injury-severity models have been enhanced

∗ Corresponding author. E-mail addresses: [email protected] (L. Ma), [email protected] (G. Wang), [email protected] (X. Yan), [email protected] (J. Weng). http://dx.doi.org/10.1016/j.aap.2016.01.004 0001-4575/© 2016 Elsevier Ltd. All rights reserved.

with features of Bayesian inferences (Huang et al., 2008), random effects (Aziz et al., 2013), multivariate statistics (Abay et al., 2013), etc. Empirical analyses have found that injury severity is affected by accident characteristics (Zhu and Srinivasan, 2011), individual characteristics (Castro et al., 2013), roadway characteristics (Mergia et al., 2013), atmospheric conditions (Yasmin and Eluru, 2013) etc. However, the continuing debates (Eluru, 2013; Sasidharana and Menéndez, 2014) on the ordering patterns of injury severity indicate the limited understanding of its distributional characteristics and inadequate flexibility in terms of capturing its complex ordering patterns. Generally, the crash injury severity for a particular person can be categorized as one of the following: no injury (O), possible injury (C), non-incapacitating injury (B), incapacitating injury (A) and fatal injury (K). It is undeniable that crash data are inherently of an ordinal nature, with severity increasing consistently from level O to K. However, in fact the boundaries between adjacent categories of injury severity are ambiguous and could lead to the misreporting of injury severity (Rosman, 2001; Tsui et al., 2009). For example, a reported possible injury could be a non-incapacitating

L. Ma et al. / Accident Analysis and Prevention 89 (2016) 62–73

injury or not an injury at all under certain circumstances. The misreports can occur due to the incorrect judgments of police officers as well as inadequate considerations of post-crash factors (Bigdeli et al., 2010), including emergency rescue action, prehospital and hospital medical care, personal physical condition etc. These factors are difficult to take into consideration at the time of being recorded by the police. In addition, some factors such as seat belt usage and airbag deployment may not unambiguously increase or decrease injury severity levels (Patil et al., 2012). Therefore, the reported injury-severity level could misinterpret the actual harm suffered by a person in an accident, and hence invalidate the ordinal sequence of the above-defined severities. In order to examine the complexity of the ordering patterns as well as enhance the flexibility of the econometrical structures for injury-severity modeling, this study develops a finite mixture model using component distributions from a hybrid of more than one family. This is referred to as the hybrid finite mixture (HFM) model in this paper. In this approach, a multinomial logit (MNL) process and an ordered logit (OL) process are combined to fit the data simultaneously. The main objective is to capture heterogeneous ordering patterns and impacts of variables by assuming that samples are probabilistically generated from a nominal data process or an ordinal data process.

2. Literature review Statistical analyses of crash injury severities have received tremendous attention for the last two decades, as researchers have sought to find appropriate econometrical structures as well as to reveal the patterns of influence of factors. A prevalent concern is the ordering patterns of crash injury severities, which are crucial for developing models in the form of regressions. The MNL model (e.g. Shankar and Mannering, 1996; C¸elika and Oktay, 2014; Zhao and Khattak, 2015) and the ordered response model (e.g. Klop and Khattak, 1999; Zhu and Srinivasan, 2011; Zhao and Khattak, 2015) are recognized as the most basic econometrical structures, treating crash injury severity as unordered/nominal variables and ordinal variables respectively. Based on ordered or unordered models, specific data features have been taken into account, including individual unobserved heterogeneity (Malyshkina and Mannering, 2009; Xie et al., 2012; Xiong and Mannering, 2013; Xiong et al., 2014; Yasmin et al., 2014; Behnood et al., 2014; Cerwick et al., 2014; Shaheed and Gkritza, 2014), and effects of outcome-based samples (Hauer and Hakkert, 1989; Elvik and Myssen, 1999; Yamamoto et al., 2008; Ye and Lord, 2011; Patil et al., 2012). On the other hand, advanced analytical approaches were also developed, including random parameters (e.g. Yasmin and Eluru, 2013; Ye and Lord, 2014; Islama et al., 2014; Roque et al., 2015; Weiss et al., 2014; Zhao and Khattak, 2015), and multivariate distributions (Lee and Abdel-Aty, 2008; Eluru et al., 2010; Abay et al., 2013; Russo et al., 2014; Chiou et al., 2014). The readers could see Savolainen et al. (2011) and Mannering and Bhat (2014) for a more extensive and comprehensive review of the evolution of injury-severity models. In the sequence of no injury, possible injury, non-incapacitating injury, incapacitating injury and fatal injury, the harmfulness of an accident seemingly increases. Therefore, studies may treat crash injury severity as an ordinal variable. However, the traditional ordered response model is problematic due to the strong assumption of constant coefficients across different categories of injury severity (Eluru et al., 2008) and the constraint on the shifts in thresholds to move in the same direction (Savolainen et al., 2011). For example, the deployment of airbag could decrease the probability of both the fatal and no-injury severities and increase the

63

probability of other severities (Patil et al., 2012). Yet, such a behavior of data will be concealed under traditional ordered response models. Generalized ordered outcome models (e.g. Quddus et al., 2010) and partial proportional odds models (Sasidharana and Menéndez, 2014) improve the traditional ordered response model by allowing the coefficients to vary across different severity categories. Past studies also recognize injury severity as unordered/nominal data. Although the MNL model can largely enhance the flexibility of capturing complicated influential patterns between factors and injury severity, it violates the ordinal nature of the dependent variable for certain cases (Abdel-Aty, 2003) and is susceptible to correlation of unobserved effects among different injury-severity levels (Savolainen et al., 2011). Nested logit models (e.g. Abdel-Aty and Abdelwahab, 2004; Patil et al., 2012) and ordered generalized extreme value models (e.g. Yasmin and Eluru, 2013) are useful for overcoming that issue. Although both the unordered and ordered approaches have been enhanced with various statistical considerations, the understanding of the ordering patterns of crash injury severity is still limited. As mentioned above, the ordering patterns of the crash injury severity could be very complicated. Hence, it is interesting to treat the crash injury severity using features from both ordinal and nominal data generation processes. This study aims to account for the heterogeneous data generation process by developing an HFM model that contains both an unordered/nominal component and an ordered component. It should be noted that the zero-inflated ordered probit model (Jiang et al., 2013) can be considered a special case of the proposed model, treating it as the mixture of an ordered probit distribution and a degenerate distribution. As has been emphasized (Malyshkina and Mannering, 2009; Xiong and Mannering, 2013; Zou et al., 2014), the mixture modeling approach can capture unobservable individual heterogeneity. Malyshkina and Mannering (2009) introduced a Markov switching MNL model, in which the severity level of each crash could be regressed simultaneously by two MNL models. Several other studies (Xie et al., 2012; Yasmin et al., 2014; Xiong and Mannering, 2013; Behnood et al., 2014; Shaheed and Gkritza, 2014) have also used mixture multinomial models for analyzing crash injury severities. On the other hand, injury severity has also been treated using twocomponent ordered probit/logit data-generating processes (Eluru et al., 2012; Xiong et al., 2014; Yasmin et al., 2014). Although these mixture models allow observations to come from multiple datagenerating processes, the components of the model are restricted to a single family. This limitation could result in biased estimations and misinterpretations of factors. Overall, the past literature has contributed to injury-severity analysis by accounting for various econometrical features of data. However, there are still debates on the ordering patterns of crash injury severities. This study seeks to propose an alternative model capturing unobserved heterogeneous relationships between driver injury severity and explanatory variables. The new approach allows the coexistence of both ordinal and nominal econometrical considerations for injury severity data, while enhancing the flexibility of the data-fitting structures.

3. Method 3.1. The hybrid finite mixture (HFM) model A general framework of mixture models with K components is given by Eq. (1). The probability function f(·) is the weighted sum of K different density/mass functions fk (·) using weight parameters

64

L. Ma et al. / Accident Analysis and Prevention 89 (2016) 62–73

k . Under such a specification, each observation belongs to one of the K data-generating processes k with probability k . f (y; x, ) =

K 

(t)

Q (| ) = EZ|Y,(t) (LL(; y, z, x)) = WZ|Y ,(t)

k fk (y; x, )

k=1 K

k ≥ 0,



log-likelihood function under the conditional distribution of the latent variable with current estimations is illustrated in Eq. (7).

(1) =

k = 1

k=1

In fact, the component distributions do not necessarily belong to a single family and it is possible to capture additional data heterogeneity if these component distributions are allowed to come from different parametric families. This study proposes an HFM model integrating an MNL model and an OL model as the two components, which is expected to capture both unordered and ordered characteristics of crash injury severity. Eq. (2) provides the probability of an observation being injury level s. Z is the missing dummy variable indicating whether the sample follows the MNL model (Z = 1) or the OL model (Z = 2), and (1 , 2 ) are the corresponding weight parameters. fY (s) = fZ (1)fY |Z=1 (s) + fZ (2)fY |Z=2 (s) = 1 fY |Z=1 (s) + 2 fY |Z=2 (s) 1 + 2 = 1, 1 ≥ 0, 2 ≥ 0

(2)

The conditional probability mass function for the MNL component of the mixture model is given in Eq. (3). fY |Z=1 (s) =

exp(ˇST x) J exp(ˇjT x) j=1



T

T

fY |Z=2 (s) = (˛s − ˇ0 x) − (˛s−1 − ˇ0 x)

(4)

˛0 = −∞, ˛J = ∞

Under the regression form, the marginal log-likelihood function is given in Eq. (5), where n is the sample size and i indexes the observations. Since the derivatives of such a function are complicated and may be intractable, one effective method for dealing with it is the expectation-maximization (EM) algorithm, which will be adopted for parameter estimation in this study. LL(; y, x) =

=



n 

log i=1

log





T

1 exp(ˇS xi ) T exp(ˇS xi ) j=1



T

T

(5)



log fY |Z=zi (yi )fZ (zi )



i

n 

(7)

i

i=1

=



log fY |Z=zi (yi )fZ (zi )

i=1





log (1 fY |Z=1 (yi )) fZ|Y ,(t) (1) + log (2 fY |Z=2 (yi )) fZ|Y ,(t) (2)

log (fY |Z=1 (yi )) fZ|Y ,(t) (1) +

n 

i

i=1

n 

+ log(1 )

log (fY |Z=2 (yi )) fZ|Y ,(t) (2) i

i=1

n 

fZ|Y ,(t) (1) + log(2 ) i

i=1

fZ|Y ,(t) (2) i

i=1

Eventually, the expectation function can be split into three parts that have disjointed parameters. The first two parts are the weighted log-likelihood function for the MNL and OL models respectively. The weights are the posterior probabilities of the latent variable given Y using current estimations (Eq. (8)). The third part is a linear function of the parameters (1 , 2 ). (t)

(t)

f

Z|Yi ,

(t)

(1) =

1 fY |Z=1 (yi ;  ) (t)

(t)

(t)

(t)

1 fY |Z=1 (yi ;  ) + 2 fY |Z=2 (yi ;  ) f (t) (2) = 1 − f (t) (1) Z|Yi ,



(8)

Z|Yi ,

In the “M” step, the function Q(·) is optimized to get new estimations of the parameters for the next iteration. 

(t+1)

(t)

= arg maxQ (| )

(9)



As mentioned, Q(·) is formed from three parts with disjointed parameters. Therefore, it can be optimized by individually solving the three optimization problems. The first two problems can readily be solved using traditional gradient search methods on the MNL and OL models using predefined weights associated with samples. For the third part, the optimized weight parameters of the mixture model are given in Eq. (10).



(t+1) (t+1) 1 , 2



= arg max

n 

log(1 )

(1 ,2 )

1 fY |Z=1 (yi ) + 2 fY |Z=2 (yi )





i

 n

i=1

J

EZ|Y ,(t)



i=1

(3)

The conditional probability mass function for the OL component of the mixture model is given in Eq. (4), where (·) is the cumulative distribution function of the standard logistic distribution.

n 

=

n  

n 

 n 

=

f

n

fZ|Y ,(t) (1) + log(2 ) i

i=1

(t) i=1 Z|Yi ,

n 

(1) ,

n

f

(t) i=1 Z|Yi ,

(2)



fZ|Y ,(t) (2) i

i=1

n

(10)

+ 2 × (˛s − ˇ0 xi ) − (˛s−1 − ˇ0 xi )

3.2. Parameter estimation algorithm

3.3. Data description

The EM algorithm (Dempster et al., 1977; Bhat, 1997; Sobhani et al., 2013) is an iterative method specifically designed for finding the maximum-likelihood estimations for statistical models with latent variables. In this study, Z is the corresponding unobservable latent variable. In order to proceed with the EM algorithm, it is necessary to define the complete log-likelihood function including the latent variable, as in Eq. (6).

The General Estimates System (GES) samples from policereported motor vehicle crashes in 2012, maintained by the National Highway Traffic Safety Administration (NHTSA) of the United States, were adopted in this study. The research scope is designated to cover only those accidents that occurred on interstate highways. The variables examined cover a wide range, including demographic driver characteristics, vehicle attributes, crash attributes, environment attributes etc. After removing samples with missing variables, 4189 observations were extracted. Because there were few fatal accidents in the data, the crash injury severity was reorganized into four categories, O, C, B, and A/K. The most severe injury suffered by any of the drivers in an accident was used as the response variable. Table 1 presents the distributions of driver injury severity within

LL(; y, z, x) =

n  i=1

log(fYZ (yi , zi )) =

n 

log(fY |Z=zi (yi )fz (zi ))

(6)

i=1

The EM iteration alternates between performing an expectation (E) step and a maximization (M) step until a certain stopping criterion is achieved. In the “E” step, the expectation of the complete

L. Ma et al. / Accident Analysis and Prevention 89 (2016) 62–73

65

Table 1 Sample counts for categorical variables. Variables

Value label

O

C

B

A/K

Manner of collision

Not collision with motor vehicle in transporta Front-to-rear Angle Sideswipe Not within interchange areaa Within interchange area On roadwaya Not on roadway Daylighta No on daylight Cleara Rain or snow Cloudy Driver onlya With passengers Automobilesa Utility vehicles Trucks No trailing unitsa With trailing units No rollovera With rollover One lanea Two lanes More than two lanes Drya Wet =31 & =46 & =61 Malea Female Useda Not used Not deployeda Deployed Not ejecteda Ejected Normala Impaired

688 828 110 441 1619 448 1525 542 1342 725 1326 413 328 1544 523 1170 263 634 1780 287 1997 70 132 573 1362 1560 507 919 537 446 165 1398 669 2055 12 1717 350 2067 0 1943 124

184 376 44 189 634 159 619 174 566 227 516 140 137 590 203 528 127 138 768 25 722 71 58 176 559 617 176 312 249 158 74 383 410 779 14 567 226 793 0 741 52

278 324 41 149 623 169 538 254 512 280 505 128 159 580 212 506 149 137 766 26 667 125 56 161 575 623 169 337 250 146 59 384 408 753 39 462 330 788 4 696 96

278 163 29 67 420 117 273 264 317 220 355 76 106 408 129 327 111 99 533 4 386 151 28 102 407 437 100 212 164 114 47 287 250 443 94 252 285 498 39 409 128

Relation to Junction Relation to traffic way Light condition Atmospheric conditions

Presence of passengers Body type

Vehicle trailing Rollover Total lanes in roadway

Roadway surface condition Age

Sex Restraint system use Air bag deployed Ejection Condition at time of crash a

This category was used as the “reference” in following models.

each categorical explanatory variable. Further, Fig. 1 illustrates two spineplots between driver injury severity and some explanatory variables. To illustrate, the manner of collision and whether an air bag was deployed are taken as examples. The width of each column is proportional to the number of observations in each of the corresponding categories. Fig. 1 enables a visual inspection of the concentrations of different injury levels within each category for manner of collision and air bag deployment. Collisions that were not with a motor vehicle in transport exhibit a higher proportion of incapacitating/fatal injuries than other manners of collision. By examining the distribution of driver injury severity against air bag deployment, it is obvious that drivers of cars in which airbags were deployed experienced more severe injuries. Such trends can be used to reflect the marginal associations between driver injury severity and explanatory variables without controlling for other variables. 4. Empirical results and analyses The observed data were fitted with the proposed HFM model and four other models, of which the OL and MNL models are the restricted versions of the HFM model, and the finite mixture multinomial logit (FMMNL) and finite mixture ordered logit (FMOL) models can be considered non-hybrid finite mixture models. The estimation results, modeling performance and an elasticity analysis for these models are presented below.

4.1. The MNL and OL models Before examination of the HFM model, it is useful to illustrate the estimation results of the MNL and OL models. Table 2 presents the estimated coefficients. To be noted, the “no injury” (O) category has been left out as the reference case, which will also be applied for the HFM, FMMNL models. Since the MNL model has more parameters than the OL model, it may provide a more flexible econometrical structure for data fitting. All inferences are based on the magnitude of coefficients associated with each explanatory variable. In the OL model, positive coefficient indicates the increases of propensity of the latent injury severity, and hence the increases of the probability of the last category (incapacitating/fatal) injury as well as the decrease of the probability of the first category (no injury). However, for interior categories of injury severities, factors’ impacts on their probabilities will also depend on the status of other factors. In the MNL model, the exponential function of a particular coefficient of a categorical variable is just the odds ratio of the particular level of severity relative to the reference level of severity between the particular status and the reference status of this variable (Morgan and Teachman, 1988; Shankar and Mannering, 1996; Agresti, 2002; Kim et al., 2007; Eluru, 2013). For example, the relative probability of possible injury rather than no injury is exp(0.732) = 2.079 times greater for females than for males with all other variables fixed, according to the results of the MNL model. However, the actual probability of

66

Angle

Front-to-rear

1.0 0.8 0.2 0.0

Not collision with motor vehicle in transport Sideswipe

0.0

A/K

A/K

0.2

B

0.4

0.6

C

0.6 0.4

C B

Driver Injury Severity

O

0.8

O

1.0

L. Ma et al. / Accident Analysis and Prevention 89 (2016) 62–73

Deployed

Manner of Collision

NotDeployed

Air Bag Deployed

Fig. 1. Spineplots between driver injury severity and some explanatory variables.

having a particular level of severity will depend on the statuses of other variables. It should be noted that it is guaranteed that the probability of the severity category with the largest coefficient will increase and the probability of the category with the smallest coefficient will decrease, under the MNL model specification. In terms of the behavioral interpretation of the explanatory variables’ impacts on driver injury severities, the two models exhibit similar conclusions for most variables, but some variables exhibit different patterns of influence. This could be an indication of the existence of a heterogeneous ordering pattern of driver injury severities.

The manner of collision was found to be an important factor affecting the injury severities of drivers. The MNL model indicates that the odds of suffering an injury are higher for front-rear collisions than for collisions that are not with other motor vehicles in transport. According to the OL model, front-rear, angle and sideswipe collisions tend to show an increased latent injury propensity in comparison to the reference category. Among all types of collision, angle collisions have the largest latent injury propensity. Roughly speaking, both the MNL and the OL models suggest that collisions with motor vehicles in transport have the potential to cause more severe injuries to drivers. Some studies

Table 2 Estimated coefficients for the MNL and OL models. Explanatory variables

Threshold 1 Threshold 2 Threshold 3 Intercept Manner of collision Front-to-rear Angle Sideswipe Relation to traffic way: not on roadway Light condition: not on daylight Body type Utility vehicles Trucks Vehicle trailing: with trailing units Rollover: with rollover Total lanes in roadway: Two lanes More than two lanes Roadway surface condition: wet Age >=31 & =46 & =61 Sex: female Restraint system use: not used Air bag deployed: deployed Ejection: ejected Condition at time of crash: impaired

Multinomial logit model

Ordered logit model

Possible injury (C)

Non-incapacitating injury (B)

Incapacitating/fatal injury (A/K)

NA NA NA −2.797 (−9.819)

NA NA NA −1.953 (−18.864)

NA NA NA −4.076 (−19.259)

1.637 (9.426) 2.619 (14.827) 4.033 (21.958) NA

1.458 (5.275) 1.287 (4.039) 1.407 (5.042) 0.883 (3.182) −0.186 (−2.069)

0.298 (3.169) – – – –

0.596 (3.706) 0.753 (2.982) – 0.731 (4.624) –

1.004 (5.982) 1.041 (5.038) 0.752 (4.313) 0.916 (5.371) –

– – −1.318 (−5.916) 1.640 (8.530)

– – −1.150 (−5.080) 2.053 (11.772)

– – −2.554 (−4.850) 2.445 (12.882)

– −0.194 (−2.229) −1.242 (−7.319) 1.580 (14.432)

−0.291 (−2.819) – –

−0.536 (−4.981) – –

– 0.764 (5.992) −0.325 (−2.449)

−0.500 (−6.722) – −0.191 (−2.481)

0.466 (4.404) 0.377 (3.120) 0.428 (2.784) 0.732 (8.154) 1.138 (2.813) 0.675 (6.531) – –

0.466 (4.492) 0.323 (2.662) – 0.804 (8.756) 1.970 (5.694) 1.196 (12.127) – –

0.725 (5.338) 0.943 (6.242) 0.821 (4.132) 0.825 (7.189) 2.635 (7.642) 1.479 (12.502) 2.339 (4.128) 0.640 (4.411)

0.517 (6.952) 0.532 (6.269) 0.561 (4.794) 0.611 (9.458) 1.669 (9.004) 0.992 (14.492) 2.112 (3.688) 0.567 (5.275)

Values in parentheses are t-statistics associated with estimated coefficients. – indicates that the coefficient is statistically insignificant.

L. Ma et al. / Accident Analysis and Prevention 89 (2016) 62–73

(e.g. Yasmin and Eluru, 2013) have claimed that striking a stationary object would lead to serious injuries. However, the reference category (not collision with motor vehicle in transport) might comprise collisions with motor vehicles in parking lots or collisions with pedestrians or bicycles, which are situations likely to lead to lower driver injury severities. The MNL model indicates that traffic accidents that occur on roadways tend to have higher odds of possible injury compared with collisions that occur in other locations such as on the shoulder, in the parking lane, median etc. In the OL model, the latent injury propensity is higher for accidents that occur on roadways. One of the possible reasons for this is that the speed of a vehicle before a collision on a roadway will be higher than that in other locations, hence promoting the chance of a more severe crash. The coefficients associated with light conditions are only revealed by the MNL model. It is found that accidents that occur in daylight are less likely to lead to possible injury than nighttime accidents, which indicates that the odds of no injury, non-incapacitating injury or incapacitating/fatal injury are higher for daylight accidents. This finding is partly in line with past studies (e.g. Krull et al., 2000) that suggested that good light conditions could increase vehicle speed and consequently lead to serious injuries. However, some other studies (e.g. Rifaat et al., 2011; Klop and Khattak, 1999), claim that daylight is negatively correlated with more severe injuries due to better visibility for drivers and there being more time for them to take evasive action. The body type of the vehicle has an impact on driver injury severity only through the OL model. It is found that trucks have a lower latent injury propensity than automobiles and utility vehicles. This is in line with previous findings (e.g. Yasmin and Eluru, 2013), and is as expected since the structures of light vehicles are more vulnerable than those of heavy vehicles, which could result in more harmful impacts on drivers during crashes. The MNL model indicates that vehicles with trailing units have lower odds of incapacitating/fatal injuries and similar findings are also observed in the OL model. Drivers might pay more care and attention when the vehicle is attached to a trailing unit and there is also lower potential for speeding in such cases. Both the MNL model and the OL model suggest that the occurrence of rollover could greatly increase drivers’ injury severity, which is consistent with past studies (e.g. Krull et al., 2000). Rollover could have an impact on the driver and also create difficulties for the rescuers. In the MNL model, roadways with two or more lanes are associated with a higher likelihood of incapacitating/fatal injuries than single-lane roadways. When roadways have more lanes, they might have a higher operating speed, which is an adverse factor when it comes to injury severity. However, in the OL model, roadways with two lanes tend to have lower latent injury propensities. Compared with dry road surfaces, the MNL model indicates lower odds of incapacitating/fatal injuries while the OL model indicates a lower latent injury propensity in the case of wet road surfaces. They could be attributed to careful driving and lower speeds on wet roadways. From both the MNL and OL models, compared with drivers aged 30 or younger, older drivers are more likely to experience incapacitating/fatal injuries. This result is generally consistent with previous findings (e.g. Carson and Mannering, 2001). From the MNL model, female drivers are found to be associated with a higher risk of serious injuries. Similar results are found in the OL model, in that female drivers have higher latent injury propensities than male drivers. With respect to the use of a restraint system, drivers that did not use restraints were more likely to suffer serious injuries, especially incapacitating/fatal injuries, according to both the MNL and

67

OL models. The deployment of airbags is found to be an indication of more serious injuries. Based on the MNL model results, there are significantly higher odds of the driver sustaining incapacitating/fatal injuries when they are ejected from the vehicle. The OL model also exhibits higher injury severity propensities for ejected drivers. The driver’s physical condition has a significant influence on injury severity. The MNL model indicates that drivers in an impaired condition, including under the influence of alcohol, drugs, fatigue and others, are more likely to have incapacitating/fatal injuries than other types of injuries. The OL model also finds that drivers with impairments tend to have a higher propensity to experience a serious injury. 4.2. The HFM model Table 3 presents the estimation results for the parameters of the HFM model. It reveals the influence of several explanatory variables, including the variable named “relation to junction”, and those concerning atmospheric conditions, the presence of passengers and the speed limit, that are not significant in either the MNL or the OL model. The results indicate that the weight parameters associated with the two component models are statistically significant with the OL process accounting for 19.4% (0.194 in Table 3, likelihood ratio (LR) test statistic 303.700 with degrees of freedom (DF) 54, p-value < 0.001) of the data while the unordered MNL process accounts for the remaining 80.6% (0.806 in Table 3, LR = 106.895, DF = 24, p-value < 0.001) of the data. Thus, the nominal/unordered features of the crash injury data might dominate the ordinal features of the data. More importantly, the HFM model enables considerations of heterogeneous ordering patterns of driver injury severity with respect to the explanatory variables. Many variables present “weakly” heterogeneous ordering patterns such that the patterns of influence of a specific variable are partly conflicting in the MNL and OL components, whereas there are several variables that exhibit “strongly” heterogeneous ordering patterns such that the directions of these variables’ impacts are opposing in the MNL and OL components. The coefficients of the variable “relation to junction” in the MNL component indicate that no injury and incapacitating/fatal injury have larger odds than possible injury and non-incapacitating injury when an accident is located within an interchange area. Vehicles within interchange areas are usually running at relatively lower speeds, which could lead to a higher chance of no injury, whereas the relative speeds of vehicles involved in the same accident could differ by more under certain circumstances (for example, a vehicle that has just come off a ramp entering a freeway versus a vehicle already traveling on the freeway), which could lead to a higher chance of incapacitating/fatal injuries. Under the HFM model, the results from the MNL component represent some of the behaviors of driver injury severity. The remaining behaviors are represented by the OL component, in which driving within interchange areas has a higher propensity for injury severity. This could be due to greater differences in speeds between vehicles. In fact, many accidents within interchange areas involve vehicles falling from overpasses, which could lead to very serious injuries. For accidents that do not occur on roadways, the MNL component illustrates a higher chance of incapacitating/fatal injuries and a lower chance of no injuries, but the OL component does not recognize the impact of this variable. For accidents occurring at nighttime, the pattern of influence in the MNL component is similar to that in the MNL model. In the OL component, nighttime is a beneficial factor for reducing drivers’ injury severities. It is possible that daytime conditions

68

L. Ma et al. / Accident Analysis and Prevention 89 (2016) 62–73

Table 3 Estimated coefficients for the HFM model. Explanatory variables

Threshold 1 Threshold 2 Threshold 3 Intercept Manner of collision Front-to-rear Angle Sideswipe Relation to Junction: within interchange area Relation to traffic way: not on roadway Light condition: not on daylight Atmospheric conditions Rain or snow Cloudy Presence of passengers: with passengers Body type Utility vehicles Trucks Vehicle trailing: with trailing units Rollover: with rollover Total lanes in roadway Two lanes More than two lanes Speed limit Roadway surface condition: wet Age >=31 & =46 & =61 Sex: female Restraint system use: not used Air bag deployed: deployed Ejection: ejected Condition at time of crash: impaired Mixture weight parameters

Component1-MNL

Component2-OL

Possible injury (C)

Non-incapacitating injury (B)

Incapacitating/fatal injury (A/K)

NA NA NA −2.615 (−6.328)

NA NA NA −2.785 (−5.392)

NA NA NA −3.517 (−9.938)

9.005 (9.272) 10.943 (10.865) 16.383 (13.745) NA

2.007 (5.724) 2.404 (6.049) 2.159 (6.065) −0.307 (−2.310) 1.420 (4.024) −0.214 (−2.142)

0.865 (3.190) 1.630 (4.865) 0.948 (3.372) −0.355 (−2.403) 0.870 (3.183) –

1.026 (2.983) 1.867 (4.560) 0.733 (2.025) – 1.454 (4.242) –

2.410 (10.289) −6.076 (−7.241) – 3.145 (10.709) – −0.887 (−4.219)

– – –

– – 0.274 (2.607)

– – –

−1.556 (−3.062) 1.644 (6.289) −1.037 (−4.557)

– −0.311 (−2.334) −0.970 (−3.884) 2.316 (9.439)

– – −0.924 (−3.845) 2.629 (11.375)

– – −2.653 (−4.977) 3.001 (12.561)

−0.798 (−3.028) – −6.339 (−5.141) 1.667 (4.744)

−0.871 (−3.911) −0.644 (−3.030) – –

−1.310 (−5.441) −0.800 (−3.574) 0.016 (2.413) −0.574 (−4.446)

−0.767 (−5.090) – – −0.900 (−5.762)

4.878 (8.660) 6.360 (10.842) −0.036 (−3.037) 4.891 (9.587)

0.717 (4.959) 0.733 (4.736) 0.771 (3.611) 0.708 (5.735) 1.840 (5.120) 1.187 (9.346) 3.425 (3.785) 0.924 (5.362)

1.140 (4.799) 3.207 (10.821) 2.942 (7.849) 2.423 (11.061) 9.484 (12.964) 4.557 (15.711) – – 0.194 (LR = 106.895, DF = 24, p-value < 0.001)

0.505 (4.317) 0.557 (4.961) 0.309 (2.409) – 0.501 (2.941) – 0.697 (6.891) 0.714 (6.672) 0.834 (1.990) 1.513 (4.172) 0.479 (4.180) 0.894 (7.724) – – – 0.492 (2.853) 0.806 (LR = 303.700, DF = 54, p-value < 0.001)

Values in parentheses are t-statistics associated with estimated coefficients. – indicates that the coefficient is statistically insignificant.

could increase vehicles’ speeds and consequently lead to more serious injuries. Through the MNL component, a non-incapacitating injury has higher odds when there are passengers in the vehicle. The presence of passengers could interrupt the driver’s field of vision or distract drivers due to conversation, and hence lead to a certain level of injury. Through the OL component, the latent injury propensity is lower if there are passengers. In fact, in certain scenarios, passengers – especially those who are experienced drivers – might inform drivers about potential collisions and ways to avoid serious injuries. The MNL component indicates that the odds of non-incapacitating injuries are higher when the speed limit is higher and the OL component indicates that the propensity of injury severity is reversely associated with a higher speed limit. The same variable can also have opposing impacts on driver injury severity with respect to certain unobserved scenarios related to the collision. Taking the variable “wet road surface” as an example, the first scenario may be that the vehicle loses control due to the slippery surface, leading to a huge physical impact on the vehicle and the driver. Such a situation could result in raised odds of incapacitating/fatal collisions and reduced odds of no injury collisions. In the second scenario, drivers may tend to lower their speed and pay more attention when driving on wet roads, which could lead to reduced odds of incapacitating/fatal collisions and raised odds of no injury collisions. However, the MNL and OL models only reveal the effect of a wet road surface according to the second scenario.

The empirical results from the HFM model are able to illustrate such a pattern of influence. The MNL component indicates that driving on a wet road surface will lower the odds of sustaining an incapacitating/fatal injury, whereas the OL component indicates that driving on a wet road surface will increase the latent propensity for an injury. Therefore, a single factor is able to affect injuryseverity levels in opposing directions. The different coefficients could be a reflection of such behavioral characteristics in the data, which would be closer to reality. 4.3. The FMMNL and FMOL models The conventional manner of organizing mixture models for injury severity overlooked the possibility of their component models coming from different econometrical structures. For example, the FMMNL and FMOL models adopted two MNL components and two OL components respectively. This section presents the results from these models for discussion purposes. Table 4 presents the estimation results for the FMMNL model using the observed data. The observations are assumed to come from two multinomial processes. The interpretation for each variable depends on the two MNL components. For some variables, it is difficult to judge the influential directionality on injury severity as there are abundant sets of coefficients. Many factors, including vehicle trailing, age, sex, use of restraint system etc., show similar patterns of influence to the results of the HFM model. However, a certain amount of discrepancy also exists between the FMMNL

L. Ma et al. / Accident Analysis and Prevention 89 (2016) 62–73

69

Table 4 Estimated coefficients for the FMMNL model. Explanatory variables

Intercept Manner of collision Front-to-rear Angle Sideswipe Relation to Junction: within interchange area Relation to traffic way: not on roadway Light condition: not on daylight Atmospheric conditions Rain or snow Cloudy Presence of passengers: with passengers Body type Utility vehicles Trucks Vehicle trailing: with trailing units Rollover: with rollover Total lanes in roadway Two lanes More than two lanes Speed limit Roadway surface condition: wet Age >=31 & =46 & =61 Sex: female Restraint system use: not used Air bag deployed: deployed Condition at time of crash: impaired Mixture weight parameters

Component1

Component2

Possible injury (C)

Non-incapacitating injury (B)

−0.747 (−1.411)

−4.898 (−6.500)

1.495 (8.494) – 2.895 (13.954) – – −0.537 (−3.962)

Possible injury (C)

Non-incapacitating injury (B)

Incapacitating/fatal injury (A/K)

−3.098 (−11.040)

−7.502 (−8.517)

−4.962 (−9.050)

−6.595 (−9.898)

4.669 (11.580) – 4.870 (11.379) 0.640 (3.534)

– 1.334 (3.543) 4.870 (11.379) –

4.577 (6.693) 5.498 (7.504) – –

– 2.172 (7.834) – –

1.174 (5.866) 1.442 (3.511) – 0.469 (2.725)



1.266 (4.679)

4.505 (6.511)

1.529 (9.930)

1.285 (5.689)



−0.904 (−3.928)





0.365 (2.585)

1.363 (3.461) 0.762 (4.086) −0.635 (−4.128)

1.861 (3.695) 1.241 (5.860) −0.423 (−2.287)

– −0.599 (−3.109) −2.284 (−5.066)

– −0.778 (−3.377) −1.479 (−3.106)

1.468 (6.156)

Incapacitating/fatal injury (A/K)



−1.418 (−3.609) −0.688 (−3.016) 0.906 (5.961)

−1.285 (−3.612) −0.454 (−2.589) 0.438 (3.216)

−0.751 (−1.973) – –

1.790 (7.084) 0.807 (2.970) −2.727 (−3.354)

– – −0.865 (−2.876)

– – −1.302 (−3.762)

−0.737 (−3.708) −0.428 (−2.112) −3.108 (−4.075)

0.579 (2.060)

4.565 (6.078)

5.679 (7.974)

6.548 (8.975)

−0.730 (−4.580) – 0.041 (4.722) 1.029 (3.212)

– 0.794 (4.576) 0.026 (2.704) 0.899 (2.566)

0.677 (4.699) – – 0.647 (5.032) 1.699 (3.842) 1.300 (9.100) –

1.190 (6.503) 1.109 (5.665) 1.835 (7.279) 1.171 (7.699) 2.668 (5.907) 1.492 (9.338) –

– – –

– – −0.029 (−3.474) −1.468 (−4.106)

−0.543 (−2.905) – −0.025 (−2.480) −1.560 (−3.448)

−1.295 (−4.854) – −2.361 (−7.024)

−1.254 (−4.626) −0.619 (−2.537) 0.027 (3.017) 1.658 (4.599)

0.487 (3.454) – – 1.485 (10.607) – 1.037 (6.467) –

0.685 (3.550) 0.903 (4.652) 1.089 (4.331) 1.512 (9.053) 1.529 (2.224) 1.652 (8.655) 1.288 (4.017)

– 0.813 (3.393) −1.051 (−2.029) – 4.663 (9.034) 1.788 (8.134) 1.649 (6.021)

0.779 (4.240) 0.984 (5.370) 1.634 (6.648) – 1.459 (2.747) 0.457 (2.621) 0.464 (2.010)

0.473 (LR = 350.008, DF = 52, p-value < 0.001)

0.527 (LR = 458.768, DF = 54, p-value < 0.001)

Values in parentheses are t-statistics associated with estimated coefficients. – indicates that the coefficient is statistically insignificant.

and HFM models. For example, the FMMNL model suggests that sideswipe collisions are associated with a higher probability of incapacitating/fatal injury, whereas in the HFM model they are linked to a higher probability of possible injury. Table 5 presents the estimation results for the FMOL model. The observations are assumed to come from two OL processes. This provides a relatively simple structure than the HFM and FMMNL models as there are only two sets of coefficients for each variable. If the signs of the two coefficients are the same, the influential directionality is assured, whereas if the signs are opposing, the influential directionality will depend on the status of other variables. Most of the variables’ impacts are consistent between the two OL components. For example, the positive coefficients indicate that front-to-rear collisions have a propensity for higher injury severity.

test, Akaike Information Criterion (AIC), Akaike Information Criterion corrected (AICc) and Bayesian Information Criterion (BIC) values. Eq. (11) presents the formulas for calculating the AIC, AICc and BIC values, where p is the number of parameters of the model and n is the sample size. These information criterion values can be treated as penalized log-likelihoods. Therefore, the models with the smallest information criterion values are preferable. Table 6 presents these information criterion values for the five models under full model specifications that contain all variables, and model specifications that only contain significant variables.

4.4. Modeling comparisons

Under the full models, the MNL model is nested within the HFM (LR = 157.106, DF = 30, p-value < 0.001) and FMMNL (LR = 187.730, DF = 83, p-value < 0.001) models, while the OL model is nested within the HFM (LR = 338.898, DF = 82, p-value < 0.001) and FMOL (LR = 116.420, DF = 30, p-value < 0.001) models. These likelihood ratio test results indicate that the finite mixture models (HFM, FMMNL and FMOL) are preferable to the simpler models (MNL and OL), which is also consistent with the AIC and AICc values. The HFM model exhibits the best ability to capture the information from the data according to the AIC and AICc values. Under the model containing only the significant parameters, the proposed HFM model

It is interesting to compare the proposed HFM model with the other four models, namely the MNL, OL, FMMNL and FMOL models. In fact, they all have advantages in certain aspects. Simple models can provide clearer relationships between factors and responses, but they are not able to reveal some of the more complicated behavioral differences among cohorts. The complex models are specially designed to reflect some specific data-generating characteristics accompanied by complicated interpretations of factors. The following analysis compares these models through log-likelihood ratio

AIC = −2LL + 2p AICc = AIC + 2p(p + 1)/(n − k − 1)

(11)

BIC = −2LL + log(n)p

70

L. Ma et al. / Accident Analysis and Prevention 89 (2016) 62–73

Table 5 Estimated coefficients for the FMOL model. Explanatory variables Threshold 1 Threshold 2 Threshold 3 Manner of collision Front-to-rear Angle Sideswipe Relation to Junction: within interchange area Relation to traffic way: not on roadway Light condition: not on daylight Atmospheric conditions Rain or snow Cloudy Body type Utility vehicles Trucks Vehicle trailing: with trailing units Rollover: with rollover Total lanes in roadway Two lanes More than two lanes Roadway surface condition: wet Age >=31 & =46 & =61 Sex: female Restraint system use: not used Air bag deployed: deployed Condition at time of crash: impaired Mixture weight parameters

Component1 4.546 (9.315) 6.713 (13.215) 9.378 (16.674)

Component2 1.659 (12.470) 2.417 (17.616) 3.870 (25.795)

3.322 (7.157) 1.484 (2.778) 3.365 (7.175) −0.448 (−2.990)

0.434 (4.358) 0.980 (5.547) – 0.430 (4.531)

0.993 (2.175)

0.904 (8.165)

−0.367 (−2.742)



It should be noted that the purpose of this paper is not to decide which model is the best since these performance measurements could vary according to the dataset. The advantages of behavioral modeling structures are more important. As was just mentioned, the proposed model allows more than one ordering pattern for the contributing factors. For example, a front-to-rear collision could affect the crash severity in two major ways. If the physical strength of impact between two vehicles is greater than a certain amount (because of a large size difference or a large difference in speeds between the two vehicles), there could be a higher chance of an incapacitating/fatal injury. On the other hand, if the physical strength of impact between the two vehicles is small, there may be more chance of less severe injuries. Such a situation is consistent with the results provide by the proposed HFM model in which the OL component has a positive coefficient (2.410) indicating a higher chance of an incapacitating/fatal injury, while the MNL component has the highest coefficient (2.007) for the utility function of possible injury. However, such a behavioral characteristic is not captured by the OL or MNL model, or by the FMMNL or FMOL model.

1.699 (4.724) 0.717 (4.360)

−0.460 (−4.474) –

– −0.845 (−4.651) −1.533 (−3.673)

– – −1.368 (−7.467)

2.678 (11.966)

1.657 (12.301)

4.5. Elasticity analyses

– – −0.875 (−2.695)

– 0.715 (8.175) –

1.109 (7.599) 0.967 (5.724) 0.881 (3.910) 1.335 (10.423) 6.736 (16.058)

0.405 (4.460) 0.498 (4.835) 0.585 (4.070) 0.488 (6.303) 0.955 (4.621)

1.419 (10.332) –

1.012 (12.049) 0.772 (5.865)

The coefficients presented in Table 3 contain two profiles, respectively belonging to the MNL and OL components. These coefficients cannot directly reflect the overall impacts of a particular variable on injury severity, which will also depend on the magnitudes of all other exogenous variables. The influence of a factor might not be monotonic and sometimes the influential directionalities of the two component models can oppose one another. The elasticity analysis can to some extent reflect the overall impacts of variables by summarizing their effects on the entire dataset. For each explanatory variable, a new set of observations was created, by increasing this variable by one in each original observation. Then, the percentage changes in the predicted probabilities for the severity categories were measured between the new set of observations and the original set (see Eluru et al., 2012 for more detailed information on how to calculate the elasticities). Table 7 presents the elasticities for the proposed HFM model, and for the OL, MNL, FMMNL and FMOL models. For example, the HFM model indicates that the probability of having a no injury crash would reduce by 49.6% for front-to-rear collisions in comparison to other manners of collision with all other variables remaining the same. On the whole, the elasticity values indicate that there are minor discrepancies among the models for many explanatory variables, whereas substantial differences of elasticities are exist for some variables across these five models. These differences could be attributed to two major sources. First, the different econometric structures are expected to cause certain discrepancies of elasticities, because of the different functional forms connecting factors to the distribution of injury severity. Second, the error terms which capture the influences from unobserved factors are differently specified in different models. The variables (e.g. age, sex and use of restraint system) that have more consistent elasticities across different models, could affect injury severity with less intervention of unobserved factors. For other variables (e.g. manner of

0.347 (LR = 126.910, DF = 22, p-value < 0.001)

0.653 (LR = 164.858, DF = 19, p-value < 0.001)

Values in parentheses are t-statistics associated with estimated coefficients. – indicates that the coefficient is statistically insignificant.

would be favored over the MNL, OL and FMOL models, while the FMMNL model presents slightly better performance. However, under both the full models and the models that only contain significant parameters, the BIC values indicate that the OL model should be preferred to the other four models. This conflicts with previous findings from the likelihood ratio test, and the AIC and AICc values. In fact for a fairly large sample size, BIC exerts very heavy penalties that lead to the selection favorably on simple models (Zou et al., 2013; Hastie et al., 2009). Among the three mixture models, the FMMNL model provides a very flexible modeling structure but the interpretation of the factors is difficult, whereas the FMOL model is simpler to interpret but less capable of mining complicated patterns of influence of factors. The proposed HFM model exhibits an appropriate balance between modeling flexibility and interpretation difficulty. Table 6 Summary of the modeling performance of the five models. Summary statistics

Number of parameters Log-likelihood AIC AICc BIC

Full models

Models with only significant parameters

MNL

OL

HFM

FMMNL

FMOL

MNL

OL

HFM

FMMNL

FMOL

81 −4523.5 9209.1 9209.5 9722.6

29 −4599.1 9256.2 9256.6 9440.1

111 −4445.0 9111.9 9118.0 9815.7

164 −4429.7 9185.3 9198.6 10218.8

59 −4540.9 9199.8 9201.5 9573.9

41 −4553.4 9188.9 9189.7 9448.8

20 −4604.6 9249.2 9249.4 9376.0

77 −4490.4 9134.8 9137.7 9623.0

105 −4457.8 9125.6 9131.1 9791.3

40 −4563.3 9206.5 9207.3 9460.1

Table 7 Elasticity effects of variables. Variables

Atmospheric conditions Rain or snow Cloudy Presence of passengers: with passengers Body type Utility vehicles Trucks Vehicle trailing: with trailing units Rollover: with rollover Total lanes in roadway Two lanes More than two lanes Speed limit Roadway surface condition: wet Age >=31 & =46 & =61 Sex: female Restraint system use: not used Air bag deployed: deployed Ejection: ejected Condition at time of crash: impaired

Possible injury (C)

MNL

OL

HFM

FMMNL

FMOL

MNL

−34.7 −33.1 −29.5 0

−39.5 −40.9 −29.8 0

−49.6 −41.2 −45.1 −1.9

−53.4 −52.2 −39.0 −5.0

−41.9 −39.5 −30.3 −6.4

103.9 112.1 132.0 0

−22.5

−37.2

−38.3

−45.0

−33.5

64.6 −14.2

OL

Non-incapacitating injury (B)

HFM

FMMNL

FMOL

MNL

OL

HFM

FMMNL

13.0 1.5 5.7 0

104.7 107.7 133.6 −14.0

103.1 152.4 26.9 −10.7

29.5 7.0 9.5 −5.1

−17.0 −40.2 −34.0 0

39.7 41.0 30.9 0

13.7 46.9 −12.7 −15.6 −2.5 83.7 1.8 11.1

7.5

62.2

101.3

8.2

−29.9

40.2

0

−14.3

−18.3

−4.8

3.7

0 0 0

0 0 0

−1.8 0.8 −5.6

3.3 −3.4 7.0

9.4 7.9 0

0 0 0

Incapacitating/fatal injury (A/K) FMOL

MNL

46.3 42.6 43.9 5.6

5.0 21.3 −31.0 0

OL

HFM

FMMNL

FMOL

75.3 96.0 61.7 0

16.3 18.6 −20.2 25.5

−16.2 −1.1 −13.1 18.2

52.0 81.6 39.9 24.7

73.9

50.2

18.0

65.9

3.7

3.6

36.2

35.2

0

−1.5

4.1

−4.3

3.2

0

0.7

1.3

−1.8

0 0 0

−7.6 8.9 10.7

2.1 11.4 6.5

12.3 9.7 0

0 0 0

0 0 0

−4.7 6.3 −8.5

−11.9 6.1 −10.3

−12.1 4.1 0

3.2

0

5.9

5.1

3.9

0 0 0

0 0 0

4.8 −5.4 0.3

1.1 −4.7 −2.5

−5.1 −7.7 0

0 0 54.6 −71.8

0 8.0 49.0 −60.6

2.6 4.3 52.9 −73.4

−4.0 6.6 52.5 −73.6

0 9.0 50.0 −65.2

0 0 −53.1 31.1

0 −3.4 −34.5 −9.2

−0.9 −18.9 −42.4 43.3

−4.9 −12.2 −52.3 53.7

0 −12.1 −37.8 −4.3

0 0 0 −8.8 −35.2 −52.4 80.0 57.7

−4.3 3.7 4.5 −9.6 −39.1 −28.4 72.2 39.4

0 −9.0 −51.4 65.8

0 0 −79.9 112.4

0 −13.2 −61.8 163.8

−2.3 4.7 −83.9 112.8

17.0 6.8 −81.3 143.3

0 −3.6 −63.6 165.1

13.0 −5.5 0 2.4

20.6 0 0 7.9

18.4 1.0 −0.1 −3.2

17.3 0.4 −0.3 3.8

0 −18.8 0 9.0

−12.4 −8.2 0 3.7

−9.6 0 0 −3.2

−29.8 −19.8 −0.3 16.4

−15.4 −22.9 −0.4 8.7

0 4.0 0 −11.6

−33.0 −22.5 −11.1 0 0 0 4.9 −8.6

−29.4 −30.6 −12.3 −3.7 0.7 0.8 2.0 −1.5

0 19.4 0 −9.4

17.1 49.5 0 −21.9

−32.5 0 0 −13.0

16.7 43.5 −0.4 −15.0

1.3 37.1 0.5 −24.7

0 39.1 0 −4.0

−21.1 −19.3 −15.3 −32.6 −66.1 −44.1 −35.7 −6.0

−21.3 −21.7 −22.9 −25.7 −61.5 −42.2 −71.5 −23.4

−22.2 −20.3 −22.8 −31.6 −60.3 −42.0 −49.0 −45.4

−21.8 −18.8 −24.7 −29.3 −64.8 −42.2 0 −14.1

−22.5 −23.1 −24.3 −27.9 −64.4 −42.5 0 −19.8

17.7 9.7 22.4 31.8 −15.4 5.8 −45.9 −8.9

6.5 5.7 4.9 9.7 −14.7 9.6 −28.8 4.9

16.5 11.5 23.2 29.2 −16.4 2.8 −60.5 −56.2

19.9 8.0 27.3 26.7 −40.5 2.4 0 −10.5

13.3 10.6 8.7 20.1 −43.5 14.4 0 −1.4

14.3 0.1 −22.4 34.7 75.6 59.7 −52.3 −11.4

20.5 1.4 −3.7 34.8 53.2 57.4 −57.2 −47.5

25.1 25.3 26.1 29.7 42.3 47.4 0 17.7

34.1 59.9 58.8 27.3 165.5 73.0 282.3 53.1

39.0 41.4 45.5 43.7 183.4 79.0 251.9 46.0

31.0 59.6 59.1 27.5 178.8 73.3 364.2 329.4

29.6 56.3 52.8 25.9 278.9 69.9 0 30.9

31.2 37.3 44.0 35.4 255.4 75.0 0 53.4

23.1 23.2 24.4 28.1 52.4 47.9 46.3 25.4

16.6 2.2 0.7 31.7 17.5 59.3 0 26.0

L. Ma et al. / Accident Analysis and Prevention 89 (2016) 62–73

Manner of collision Front-to-rear Angle Sideswipe Relation to Junction: within interchange area Relation to traffic way: not on roadway Light condition: not on daylight

No injury (O)

Zero elasticity means that the variable is insignificant.

71

L. Ma et al. / Accident Analysis and Prevention 89 (2016) 62–73

Component1-MNL

With passengers

1.0 O

0.4

C

0.2

B A/K

0.4 0.2 0.0

Driver only

0.6

0.6

O C B A/K

0.2 0.0

A/K

Driver only

Presence of Passengers

0.8

0.8

1.0 0.8 0.4

0.6

O C B

Driver Injury Severity

Component2-OL 1.0

Overall

With passengers

Presence of passengers

0.0

72

Driver only

With passengers

Presence of passengers

Fig. 2. Relationships between presence of passengers and driver injury severities.

collision, light conditions, roadway surface conditions), their impacts on injury severity could depend greatly on the unobserved factors, as their elasticities are less consistent across different model structures.

5. Discussion and conclusions The overall objective of this study is to develop a finite mixture model in which an unordered/nominal data-generating process was fused with an ordered data-generating process. Such a specification was intended to capture the heterogeneous ordering patterns of driver injury severity. In the proposed model, an MNL component and an OL component contribute to the likelihood function using fixed weight parameters, and the EM algorithm was adopted to determine the maximum likelihood estimators of the HFM model. Conceptually, the proposed model allows additional unobserved heterogeneity from the data, because not only can it account for heterogeneous coefficients but also heterogeneous econometrical structures. The empirical results indicate that the HFM model presents a strong ability to extract information from the data, and more importantly to uncover heterogeneous ordering relationships between factors and driver injury severity. Nevertheless, the HFM model could perform slightly worse in terms of the BIC value, which might limit its applications in the situations of having a large sample size or a large number of explanatory variables. In regression forms, the coefficients reflect average relationships between a particular explanatory variable and the response variable, with other variables being controlled. The impact of some variables could be concealed if there are opposing correlative relationships between the explanatory variable and the response variable in different data generation processes. Therefore, the hybrid mixture model could reveal the influences of some variables that are shown to be statistically insignificant in the MNL or OL model. The overall correlation between a certain variable and driver injury severity might provide insight into its pattern of influence, especially when the variable is not strongly correlated with other explanatory variables. Taking the variable “presence of passengers” as an example, Fig. 2 indicates that there is no evident correlation between the presence of passengers and driver injury severity. It is consistent with the results from both the MNL model and the OL model in which the impact of the presence of passengers is not

significant. However, this variable has a significant impact in the HFM model through both the MNL component and the OL component. The HFM model actually separates observations into two groups according to the influential behaviors of each sample. In Fig. 2, it is clear that the correlations between the presence of passengers and driver injury severity within the different groups are different. Specifically, the MNL component suggests that driving with passengers will cause a higher chance of sustaining a nonincapacitating injury, while the OL component suggests that driving with passengers will lead to a lower chance of sustaining a nonincapacitating or incapacitating/fatal injury. This study could be extended by linking the weight parameter to the explanatory variables using appropriate functions. This would better capture the heterogeneous data-generating process as well as benefit the modeling performance. For example, the logit form of the weight parameter has been adopted in modeling crash severity with a latent class OL model (Eluru et al., 2012) and a latent class MNL model (Xie et al., 2012), and in modeling crash count with a finite mixture negative binomial model (Zou et al., 2013). Further Zou et al. (2014) extend this feature by investigating the effects of eleven different functional forms of the weight parameter. Next, it would be interesting to consider a nonparametric or random parametric specification for the weight parameters. Acknowledgment This research was supported by National Natural Science Foundation of China (No. 51208032 and No. 71210001). References Abdel-Aty, M., 2003. Analysis of driver injury severity levels at multiple locations using ordered probit models. J. Saf. Res. 34 (5), 597–603. Abdel-Aty, M., Abdelwahab, H., 2004. Modeling rear-end collisions including the role of driver’s visibility and light truck vehicles using a nested logit structure. Accid. Anal. Prev. 36 (3), 447–456. Abay, K., Paleti, R., Bhat, C., 2013. The joint analysis of injury severity of drivers in two-vehicle crashes accommodating seat belt use endogeneity. Transp. Res. B 50, 74–89. Agresti, A., 2002. Categorical Data Analysis, 2nd ed. Wiley, New York. Aziz, A., Ukkusuri, S., Hasan, S., 2013. Exploring the determinants of pedestrian–vehicle crash severity in New York City. Accid. Anal. Prev. 50, 1298–1309. Behnood, A., Roshandeh, A.M., Mannering, F.L., 2014. Latent class analysis of the effects of age, gender, and alcohol consumption on driver-injury severities. Anal. Method Accid. Res. 3–4, 56–91.

L. Ma et al. / Accident Analysis and Prevention 89 (2016) 62–73 Bhat, C., 1997. An endogenous segmentation mode choice model with an application to intercity travel. Transp. Sci. 31 (1), 34–48. Bigdeli, M., Khorasani-Zavareh, D., Mohammadi, R., 2010. Pre-hospital care time intervals among victims of road traffic injuries in Iran. A cross-sectional study. BMC Public Health 10, 406. Carson, J., Mannering, F., 2001. The effect of ice warning signs on ice-accident frequencies and severities. Accid. Anal. Prev. 33 (1), 99–109. Castro, M., Paleti, R., Bhat, C., 2013. A spatial generalized ordered response model to examine highway crash injury severity. Accid. Anal. Prev. 52, 188–203. C¸elika, A.K., Oktay, E., 2014. A multinomial logit analysis of risk factors influencing road traffic injury severities in the Erzurum and Kars Provinces of Turkey. Accid. Anal. Prev. 72, 66–67. Cerwick, D.M., Gkritza, K., Shaheed, M.S., Hans, Z., 2014. A comparison of the mixed logit and latent class methods for crash severity analysis. Anal. Methods Accid. Res. 3–4, 11–27. Chiou, Y., Fu, C., Chih-Wei, H., 2014. Incorporating spatial dependence in simultaneously modeling crash frequency and severity. Anal. Method Accid. Res. 2, 1–11. Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39 (1), 1–38. Eluru, N., 2013. Evaluating alternate discrete choice frameworks for modeling ordinal discrete variables. Accid. Anal. Prev. 55, 1–11. Eluru, N., Bhat, C., Hensher, D., 2008. A mixed generalized ordered response model for examining pedestrian and bicyclist injury severity level in traffic crashes. Accid. Anal. Prev. 40 (3), 1033–1054. Eluru, N., Paleti, R., Pendyala, R., Bhat, C., 2010. Modeling multiple vehicle occupant injury severity: a copula-based multivariate approach. Transp. Res. Rec. 2165, 1–11. Eluru, N., Bagheri, M., Miranda-Moreno, Fu, L., 2012. A latent class modeling approach for identifying vehicle driver injury severity factors at highway-railway crossings. Accid. Anal. Prev. 47 (1), 119–127. Elvik, R., Myssen, A., 1999. Incomplete accident reporting: meta-analysis of studies made in 13 countries. Transp. Res. Rec. 1665, 133–140. Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning DataMining, Inference, and Prediction, 2nd ed. Springer, New York, NY, USA. Hauer, E., Hakkert, S., 1989. The extent and some implications of incomplete crash reporting. Transp. Res. Rec. 1185, 1–10. Huang, H., Chor, C.H., Haque, M.M., 2008. Severity of driver injury and vehicle damage in traffic crashes at intersections: a Bayesian hierarchical analysis. Accid. Anal. Prev. 40, 45–54. Islama, S., Jones, S.L., Dye, D., 2014. Comprehensive analysis of single- and multi-vehicle large truckat-fault crashes on rural and urban roadways in Alabama. Accid. Anal. Prev. 67, 148–158. Jiang, X., Huang, B., Zaretzki, R., Richards, S., Yan, X., Zhang, H., 2013. Investigating the influence of curbs on single-vehicle crash injury severity utilizing zero-inflated ordered probit models. Accid. Anal. Prev. 57, 55–56. Kim, J.-K., Kim, S., Ulfarsson, G., Porrello, L., 2007. Bicyclist injury severities in bicycle–motor vehicle accidents. Accid. Anal. Prev. 39 (2), 238–251. Klop, J., Khattak, A., 1999. Factors influencing bicycle crash severity on two-lane, undivided roadways in North Carolina. Transp. Res. Rec. 1674, 78–85. Krull, K., Khattak, A., Council, F., 2000. Injury effects of rollovers and events sequence in single-vehicle crashes. Transp. Res. Rec. 1717, 46–54. Lee, C., Abdel-Aty, M., 2008. Presence of passengers: does it increase or reduce driver’s crash potential? Accid. Anal. Prev. 40 (5), 1703–1712. Malyshkina, N., Mannering, F., 2009. Markov switching multinomial logit model: an application to accident-injury severities. Accid. Anal. Prev. 41, 829–838. Mannering, F., Bhat, C.R., 2014. Analytic methods in accident research: methodological frontier and future directions. Anal. Method Accid. Res. 1, 1–22. Mergia, W., Eustace, D., Chimba, D., Qumsiyeh, M., 2013. Exploring factors contributing to injury severity at freeway merging and diverging locations in Ohio. Accid. Anal. Prev. 55, 202–210. Morgan, S., Teachman, J., 1988. Logistic regression: description, examples, and comparisons. J. Marriage Fam. 50 (4), 929–936. Patil, S., Geedipally, S., Lord, D., 2012. Analysis of crash severities using nested logit model—accounting for the underreporting of crashes. Accid. Anal. Prev. 45 (1), 646–653.

73

Quddus, M., Wang, C., Ison, S., 2010. Road traffic congestion and crash severity: an econometric analysis using ordered response models. J. Transp. Eng. 136 (5), 424–435. Rifaat, S., Tay, R., deBarros, A., 2011. Effect of street pattern on the severity of crashes involving vulnerable road users. Accid. Anal. Prev. 43 (1), 276–283. Roque, C., Moura, F., Cardoso, J.L., 2015. Detecting unforgiving roadside contributors through the severity analysis of ran-off-road crashes. Accid. Anal. Prev. 80, 262–273. Rosman, D.L., 2001. The Western Australian Road Injury Database (1987–1996): ten years of linked police, hospital and death records of road crashes and injuries. Accid. Anal. Prev. 33, 81–88. Russo, B., Savolainen, P., Schneider, W., 2014. Comparison of factors affecting injury severity in angle collisions by fault status using a bivariate ordered probit model. Anal. Method Accid. Res. 2, 21–29. Sasidharana, L., Menéndez, M., 2014. Partial proportional odds model: an alternate choice for analyzing pedestrian crash injury severities. Accid. Anal. Prev. 72, 330–340. Savolainen, P., Mannering, F., Lord, D., Quddus, M., 2011. The statistical analysis of crash-injury severities: a review and assessment of methodological alternatives. Accid. Anal. Prev. 43 (5), 1666–1676. Shaheed, S.M., Gkritza, K., 2014. A latent class analysis of single-vehicle motorcycle crash severity outcomes. Anal. Method Accid. Res. 2, 30–38. Shankar, V., Mannering, F., 1996. An exploratory multinomial logit analysis of single-vehicle motorcycle accident severity. J. Saf. Res. 27 (3), 183–194. Sobhani, A., Eluru, N., Faghih-Imani, A., 2013. A latent segmentation based multiple discrete continuous extreme value model. Transp. Res. B 58, 154–169. Tsui, K.L., So, F.L., Sze, N.N., Wong, S.C., Leung, T.F., 2009. Misclassification of injury severity among road casualties in police reports. Accid. Anal. Prev. 41, 84–89. Weiss, H., Kaplan, S., Prato, C., 2014. Analysis of factors associated with injury severity in crashes involving young New Zealand drivers. Accid. Anal. Prev. 65, 142–155. World health organization (WHO), 2015. Global status report on road safety 2013, Available at: http://www.who.int/iris/bitstream/10665/78256/1/ 9789241564564 eng.pdf?ua=1 (accessed 01.06.15). Xie, Y., Zhao, K., Huynh, N., 2012. Analysis of driver injury severity in rural single-vehicle crashes. Accid. Anal. Prev. 47, 36–44. Xiong, Y., Mannering, F., 2013. The heterogeneous effects of guardian supervision on adolescent driver-injury severities: a finite-mixture random-parameters approach. Transp. Res. B 49, 39–54. Xiong, Y., Tobias, J., Mannering, F., 2014. The analysis of vehicle crash injury-severity data: a Markov switching approach with road-segment heterogeneity. Transp. Res. B 67, 109–128. Yamamoto, T., Hashiji, J., Shankar, V., 2008. Underreporting in traffic accident data, bias in parameters and the structure of injury severity models. Accid. Anal. Prev. 40 (4), 1320–1329. Yasmin, S., Eluru, N., Bhat, C.R., Tay, R., 2014. A latent segmentation based generalized ordered logit model to examine factors influencing driver injury severity. Anal. Method Accid. Res. 1, 23–28. Yasmin, S., Eluru, N., 2013. Evaluating alternate discrete outcome frameworks for modeling crash injury severity. Accid. Anal. Prev. 59, 506–521. Ye, F., Lord, D., 2011. Investigation of effects of underreporting crash data on three commonly used traffic crash severity models: multinomial logit, ordered probit, and mixed logit. Transp. Res. Rec. 2241, 51–58. Ye, F., Lord, D., 2014. Comparing three commonly used crash severity models on sample size requirements: multinomial logit, ordered probit, and mixed logit. Anal. Method Accid. Res. 1, 72–85. Zhao, S., Khattak, A., 2015. Motor vehicle drivers’ injuries in train–motor vehicle crashes. Accid. Anal. Prev. 74, 162–168. Zhu, X., Srinivasan, S., 2011. A comprehensive analysis of factors influencing the injury-severity of large-truck crashes. Accid. Anal. Prev. 43 (1), 49–57. Zou, Y., Zhang, Y., Lord, D., 2013. Application of finite mixture of negative binomial regression models with varying weight parameters for vehicle crash data analysis. Accid. Anal. Prev. 50, 1042–1051. Zou, Y., Zhang, Y., Lord, D., 2014. Analyzing different functional forms of the varying weight parameter for finite mixture of negative binomial regression models. Anal. Method Accid. Res. 1, 39–52.

A hybrid finite mixture model for exploring heterogeneous ordering patterns of driver injury severity.

Debates on the ordering patterns of crash injury severity are ongoing in the literature. Models without proper econometrical structures for accommodat...
566B Sizes 0 Downloads 12 Views