A Bayesian Vector Multidimensional Scaling Procedure Incorporating Dimension Reparameterization with Variable Selection.

psychometrika doi: 10.1007/s11336-015-9449-x

A BAYESIAN VECTOR MULTIDIMENSIONAL SCALING PROCEDURE INCORPORATING DIMENSION REPARAMETERIZATION WITH VARIABLE SELECTION

Duncan K. H. Fong, Wayne S. DeSarbo, Zhe Chen, and Zhuying Xu PENNSYLVANIA STATE UNIVERSITY We propose a two-way Bayesian vector spatial procedure incorporating dimension reparameterization with a variable selection option to determine the dimensionality and simultaneously identify the significant covariates that help interpret the derived dimensions in the joint space map. We discuss how we solve identifiability problems in a Bayesian context that are associated with the two-way vector spatial model, and demonstrate through a simulation study how our proposed model outperforms a popular benchmark model. In addition, an empirical application dealing with consumers’ ratings of large sport utility vehicles is presented to illustrate the proposed methodology. We are able to obtain interpretable and managerially insightful results from our proposed model with variable selection in comparison with the benchmark model. Key words: bayesian analysis, multidimensional scaling, vector model, variable selection, QR decomposition, reparameterization, consumer psychology.

1. Introduction Originally, the term “multidimensional scaling” (MDS) had been utilized to refer to spatial techniques that explicitly fit distances to input dissimilarity data (see Kruskal, 1964; Shepard, 1962). More recently, Carroll and Arabie (1980) extended the definition of multidimensional scaling to encompass a much wider family of geometric models (both spatial and non-spatial) for the multidimensional representation of data and a corresponding set of methods for fitting such models to actual data. These authors provide a very broad taxonomy of a wide variety of spatial and non-spatial models under the MDS rubric that accommodate a variety of different forms of data (not solely proximity data). While MDS had its roots in the mathematical psychology literature (cf. Shepard, 1980), the techniques are now widely used in the social and behavioral sciences (see DeSarbo & Kim 2013; Fong, 2010 for recent reviews of MDS applications in marketing, psychology, political science, etc.). In consumer surveys dealing with the evaluation of multiple brands (as in our consumer psychology context), various questions of managerial interest may be investigated via a MDS analysis of collected two-way dominance data, which produces a joint space map of consumers and brands in a reduced dimensionality. For example, such MDS solutions are frequently sought to answer the following questions: what are the competing brands and who are the potential customers for any specific brand? How many underlying dimensions will influence consumers when they are forming their purchasing decisions, and what exactly are those dimensions? Also, for purposes of product positioning and making strategic decisions, one would like to know what product attributes and features most contribute to a product’s position on the map. To answer this important question, it would be desirable to have an appropriate MDS procedure that uniquely relates the derived latent dimensions to known product attributes and Zhe Chen is currently working at Google Inc. Correspondence should be made to Duncan K. H. Fong, Department of Marketing, Pennsylvania State University, University Park, PA 16802, USA. Email: [email protected]

© 2015 The Psychometric Society

PSYCHOMETRIKA

features, as well as performing variable selection to select the most significant product attributes and features. There are many traditional MDS procedures for analyzing two-way dominance data and providing a joint space representation of the row (e.g., consumer) and column (e.g., brand) elements of the input data matrix in a reduced dimensionality. For such data, there are multidimensional unfolding, correspondence analyses (and related optimal scaling methods), and vector MDS models (see, for example, Borg & Groenen, 2005; Cox & Cox, 2001; DeSarbo & Rao, 1986; Gifi, 1990; Takane, Young, & Leeuw, 1977). In two-way multidimensional unfolding (e.g., Schonemann, 1970), both row and column entities are represented by points in a T-dimensional space, and the Euclidean distance between these row and column points is indicative of the dominance relations shared between them in the data. In a vector MDS procedure (e.g., MDPREF; Carroll, 1980), one set of entities are represented by vectors emanating from the origin of the derived joint space, while the alternative set of entities are represented by points. Here, the orthogonal projection of the points onto these vectors in a T-dimensional space renders information on the dominance relationships contained in the data. Correspondence analysis (e.g., Benzecri, 1992; Shin, Fong, & Kim, 1998) is an exploratory technique typically used to analyze contingency tables reflecting relationships between the rows and columns. However, unlike the unfolding or vector model, associations between row and column entities are often more difficult to assess directly. In this paper, we devise a Bayesian approach and develop a new two-way vector MDS procedure incorporating dimension reparameterization with a variable selection option to select the most significant product attributes and features. As is well known, traditional two-way vector MDS models, such as MDPREF and PREFMAP (Model 1), are not identified as the resultant solutions are subject to both scale and orthogonal rotational indeterminacies. As such, the interpretations of the derived dimensions change depending upon which rotation one chooses to utilize (akin to factor analysis). In selected applications where low dimensional solutions are obtained, freedom of rotation allows the user to select his/her own directions that are most interpretable. As the dimensionality increases, this benefit subsides. While reparameterized versions of such vector MDS models have been devised which enforces linear restrictions of either the coordinate points and/or vectors to be specified functions of designated background variables (e.g., product attributes and features, etc.), such approaches (see CANDELINC by Carroll, Pruzanksy, & Kruskal, 1980; Scott & DeSarbo, 2011; Takane, 2013, Ter Braak, 1986) still suffer from rotational indeterminacies and questions concerning model selection (i.e., the number of dimensions; see also the permutation tests suggested by Buja & Eyuboglu, 1992). In addition, it is often impossible to a priori specify the complete set of attributes and features in a reparameterization given the need for the number of stimuli to exceed the number of attributes and features for identification. We address all these issues in our proposed Bayesian procedure, which allows the number of attributes and features to exceed the number of stimuli (quite common in situations where there is no theory to indicate a priori which attributes and features are significant) and uniquely relates the derived latent dimensions to known product attributes and features (see also Jolliffe, Trendafilov, & Uddin, 2003). As with traditional vector MDS models, a challenge for our proposed Bayesian vector MDS (BVMDS) modeling is to deal with parameter indeterminacies in a Bayesian formulation. In the Bayesian literature, various approaches have been proposed to address the (model) identifiability issue. By imposing identification constraints as part of the prior specification, Park, DeSarbo, and Liechty (2008) provide a Bayesian MDS method that combines both the vector model and the ideal point model in a generalized framework for modeling metric dominance data. Also, DeSarbo, Park, and Rao (2011) use a stochastic ordered preference BVMDS model to analyze ordered successive category measurements. When informative prior information is available, Gustafson (2005) has advocated the use of such derived priors to resolve identification problems. Employing informative prior distributions, DeSarbo, Kim, and Fong (1999) present a BVMDS procedure for the spatial

DUNCAN K. H. FONG ET AL.

analysis of binary choice data, and Fong, DeSarbo, Park, and Scott (2010) offer a BVMDS model for the analysis of ordered successive categories preference data. When informative priors are not available, the post-processing approach may be employed by imposing identification constraints on the MCMC sample generated from the unconstrained posterior distribution to obtain estimates of identified parameters. Convergence of the Markov chain in such case is checked through inspection of the derived sample of identified parameters. By means of post-processing the unconstrained MCMC outputs, Oh and Raftery (2001) devise a Bayesian metric MDS method to analyze (dis)similarity measures between pairs of objects. Gormley and Murphy (2006) use a latent space ideal point model to analyze Irish election data and employ a Procrustean method to solve the identification problem. Lee (2008) presents Bayesian analyses of three influential psychological models and performs post-processing to accommodate translation, reflection, and axis permutation invariances inherent in the MDS models. Here, the post-processing approach is used in our BVMDS procedure. We first develop a Gibbs sampling algorithm to generate random deviates from the unconstrained posterior distribution and then apply the QR decomposition method (cf., Golub & Van Loan, 1996) to the generated MCMC sample to obtain estimates of identified parameters. In addition to the intrinsic identifiability issue associated with the vector model, when one relates product positions with product attributes and features in a regression setting, an identification problem in the estimation of regression coefficients will arise if there are more product attributes and features than the number of available brands. Since the Stochastic Search Variable Selection approach in Brown, Vannucci, and Fearn (1998) is known to handle variable selection for such cases, we adopt that approach in our procedure for the variable selection option. Note that the procedure in Brown et al. (1998) is a multivariate regression extension of the work in George and McCulloch (1993, 1997). See O’Hara and Sillanpaa (2009) for a recent review of Bayesian variable selection methods. Furthermore, we show that variable selection from our BVMDS procedure is invariant under the proposed post-processing approach in the paper. Also, we use well-established model comparison criteria like the Bayes factor (e.g., Bolton, Fong, & Mosquin, 2003; Kass & Raftery, 1995; Raftery, Newton, Satagopan, & Krivitsky, 2007) and DIC (e.g., Spiegelhalter, Best, Carlin, & van der Linde, 2002) to determine the number of dimensions. This paper is organized as follows. In Section 2, we describe the proposed Bayesian vector MDS model and the corresponding estimation procedure. In Section 3, we present the results of a simulation study to examine the performance of several competing models with known data structures and parameters. Section 4 gives an empirical application analyzing consumer ratings of large sport utility vehicles. Finally, we summarize our contributions and discuss future research directions in Section 5.

2. The Proposed Bayesian Vector Multidimensional Scaling Model with Variable Selection We first describe our Bayesian vector MDS model which directly relates brand positions to brand attributes and features, but with no variable selection option. We also explain how the intrinsic parameter identification problem associated with a vector model is solved here. Then, we include a variable selection option to identify significant attributes and features. 2.1. The Proposed BVMDS Model Employing Covariates Without Variable Selection To produce a T-dimensional joint space map of N consumers and J brands, we use the vector ai (T × 1) to represent consumer i, i = 1, . . . , N , and the vector b j (T × 1) to represent brand

PSYCHOMETRIKA

j, j = 1, . . . , J , in a vector model (see Slater, 1960; Tucker, 1960) for the preference rating response Z i j : Z i j = ai b j + εi j , (1) where the error terms are independent and follow a normal distribution with εi j ∼ N (0, σ 2 ). Note, Z i j is observed, but ai and b j are latent. To relate brand positions with corresponding brand covariates (e.g., attributes and features), we assume a multivariate normal regression model for bj: b j ∼ N (b0 + X j , ), independently, (2) where X j is the vector of K observed covariates for brand j, b0 is the common intercept, is the T × K matrix of regression coefficients, and is the T × T error covariance matrix. For the consumer vectors ai , we assume that they are independent and follow a multivariate normal distribution: ai ∼ N (a0 , cIT ), independently, (3) where a0 is the common mean vector, c is a pre-specified scalar, and IT denotes the T × T identity matrix. The assumption that the covariance matrix of ai is proportional to the identity matrix helps reduce the number of constraints required to obtain identified parameters as explained below. For the remaining parameters, we employ conjugate priors: σ −2 ∼ Ga(m1 , m2 ), a0 ∼ N (0, G a ),

(4) (5)

b0 ∼ N (0, G b ), | ∼ MN(0 , , H),

(6) (7)

−1 ∼ W (ν, V IT ),

(8)

where Ga(m1 , m2 ) represents a Gamma distribution with parameters m1 and m2 , MN(0 , , H) denotes a matrix normal distribution with mean 0 , column covariance matrix , and row covariance matrix H (cf., Dawid, 1981; Gupta & Nagar, 2000), and W (v, V ) represents a Wishart distribution with parameters v and V . The hyper-parameters m1 , m2 , G a , G b , 0 , H, ν, and V are all pre-specified. It is well known that the vector model in (1) is under-identified as (ai M)(M −1 b j ) = ai b j for any T × T non-singular matrix M. With our assumption on the covariance matrix of ai , the model is invariant under the following types of parameter transformation: orthogonal rotation, permutation, and reflection. It can be shown that the parameter indeterminacy arising from these transformations is eliminated when the following constraints on T of the ai vectors (without loss of generality, assuming the first T vectors) are imposed:

aii is positive and ait = 0, t > i, i = 1, . . . , T,

(9)

where ai = (ai1 . . . aiT ) . However, if one includes these identification constraints as part of the prior distribution, it will change the exchangeability assumption of ai and complicate the Bayesian computation (e.g., to generate random deviates from non-standard, instead of standard, probability distributions). To alleviate such concern, we adopt the approach of post-processing the MCMC outputs generated from the unconstrained posterior distribution and provide the details of the procedure in Section 2.3.


2.2. The Proposed BVMDS Model With Variable Selection Here, we include a variable selection option in the proposed BVMDS model given in Section 2.1 and denote this new model as BVMDS-VS. Let γ = (γk ) be a latent indicator vector with length K . For k = 1, . . . , K , if γk = 1, it indicates that the kth covariate contributes significantly to the brand vectors, while if γk = 0, it does not. With the introduction of this selection variable, we assume a new matrix normal prior distribution for which is also conditional on γ : ∼ MN(0 , , H γ ),

(10)

where H γ is a K × K diagonal matrix with its kth diagonal element, h kk , defined as follows: v1k if γk = 1; h kk = (11) v0k if γk = 0, where v1k >> v0k , v1k , v0k , and 0 are pre-specified. In this paper, we set v0k = 0 and 0 equal to the zero matrix. Note that, when γk = 0 and v0k = 0, H γ becomes singular and the k th column of contains zero elements only. To deal with that case, we introduce a working prior for (10) on a submatrix of with the corresponding h kk of H γ all greater than 0: (γ ) ∼ MN(0T×l , , H (γ ) ),

(12)

where 0T×l is a T × l matrix of zeroes, l = k γk , and (γ ) is the submatrix consisting of the columns in that have γk = 1. Next, we assume an exchangeable prior for γk , k = 1, . . . , K : γk |w ∼ Ber(w),

(13)

w ∼ Beta( p, q),

(14)

where Ber(w) represents a Bernoulli distribution with the parameter w. The parameters p and q in the above Beta distribution are pre-specified and they may be set to 1 ( p = q = 1) to represent vague prior information. Thus, our BVMDS model incorporating variable selection (BVMDS-VS) is given by (1), (2), (3), (4), (5), (6), (8), (10) or (12), (13), and (14). 2.3. The Proposed MCMC Estimation Procedure For the post-processing approach, we first generate an approximate sample from the unconstrained posterior distribution by drawing random deviates iteratively and recursively from the following full conditional distributions (see Appendix 1 for their formal derivations): (i) π σ −2 |all others = Ga m∗1 , m∗2 ,

(15)

where m∗1 = N2J + m 1 , m∗2 = m 2 + 21 tr[(Z − A B)(Z − A B) ], Z is the data matrix, A = [a1 . . . a N ], and B = [b1 . . . b J ]. ¯ I N , Al , (ii) π ( A|all others) = MN A,

(16)

¯ = Al (σ −2 BZ + A0 /c). where Al = (σ −2 BB + I T /c)−1 and A (iii) π (a0 |all others) = N ( a¯ , G an ) ,

(17)

PSYCHOMETRIKA

where G an =

N

c IT

+ G a−1

−1

and a¯ = G an

N i=1

ai /c.

¯ I J , Bl , (iv) π (B|all others) = MN B,

(18)

¯ = Bl (σ −2 AZ + −1 (B 0 + X)). where Bl = (σ −2 AA + −1 )−1 and B ¯ G bn , (v) π (b0 |all others) = N b,

(19)

−1 and b¯ = G bn −1 Jj=1 b j − X j . where G bn = J −1 + G −1 b

(vi) π

−1

|all others = W

J+

γk + ν, V n ,

(20)

k

where V n =

B − B0 − (γ ) X (γ )

−1 . B − B0 − (γ ) X (γ ) + V −1 I T + (γ ) H −1 (γ ) (γ )

(vii) π (w|all others) = Beta p +

K

γk , q + K −

k=1

˜ (γ ) , K−1 , , (viii) π (γ ) |all others = MN (γ )

K

γk .

(21)

k=1

(22)

˜ − B0 )X (γ ) K −1 where K (γ ) = X (γ ) X (γ ) + H −1 (γ ) and (γ ) = (B (γ ) . −1 (ix) π γ |all others except and (γ ) ∝

−T w γk (1 − w)1−γk , ( H (γ ) K (γ ) ) 2 |V (γ ) |−(v+J)/2 k

(23)

−1 I + (B − 1 ⊗b )(I − X K −1 X where K (γ ) = X (γ ) X (γ ) + H −1 T 0 J (γ ) (γ ) (γ ) )(B − (γ ) , V(γ ) = V 1 ⊗b0 ) , 1 is a column vector of ones, “⊗” is the Kronecker product, and X (γ ) is a matrix which contains the rows of X = [X 1 . . . X J ] that have γk = 1. Note that (23) is proportional to a weighted product of Bernoulli densities and so each component of γ , conditional on all other components, can be sampled directly from a Bernoulli distribution. Also, it is sufficient to generate the submatrix (γ ) instead of because X = (γ ) X (γ ) and the extra columns in contain zeroes only. For the BVMDS procedure without the variable selection option, we will skip the steps to generate w in (vii) and γ in (ix) and sample in (viii) according to (22) with an obvious modification. To obtain a sample of identified parameters, after each iteration of the MCMC, we apply the QR decomposition method to the generated A to get a unique orthogonal matrix such that A satisfies the identification constraint in (9). Then, we compute the corresponding identified brand vectors given by B (as ( A) B = A B), and similarly other identified parameters. When the variable selection option is used, it is important to check that the selection results are not affected by this post-processing procedure. We establish this result by proving the following theorem:

Theorem 1. The posterior distribution of γ given by (23) is unchanged when the unidentified parameters are replaced by the identified parameters. The proof of this theorem is given in Appendix 2.


Theorem 1 shows that the same set of variables will be selected with or without the postprocessing, which is needed to obtain estimates of identified parameters. At the end of the MCMC procedure, the frequency count of realized γ is used to determine the final model, and we pick the one with the highest frequency which is the highest posterior probability model (HPM). An alternative is to use the median probability model (MPM) advocated in Barbieri and Berger (2004). Below is a summary of the proposed MCMC sampling algorithm: 1. Initialization of σ −2 , A, a0 , B, b0 , −1 , w, (γ ) , γ ; 2. Generate the random deviates iteratively from their full conditional distributions given in (15)–(23) (drop (21) and (23) and make an obvious modification to (22) if the variable selection option is not used); 3. Compute the log likelihood function for the generated set of parameters; 4. Compute the identified parameters: do a QR decomposition on A to obtain a unique orthogonal matrix such that A satisfies the identification constraint in (9); then calculate a0 , B, b0 , (γ ) , and −1 ; 5. Save σ −2 , A, a0 , B, b0 , −1 , w, (γ ) , γ per MCMC iteration; 6. Repeat Steps 2 to 5 for a pre-specified number of times; 7. Subtract the log likelihood evaluated at the mean parameter values from the average log likelihood function and compute the DIC for model selection.

3. A Simulation Study In this section, we examine the performance of the proposed model in a simulation study with known data structures and parameters. For comparison purposes, we formulate a benchmark model starting with the traditional vector model MDPREF for which software is available (no computer software was available for use for Carroll et al., 1980 nor Scott & DeSarbo, 2011; also, both procedures do not provide any form of variable selection). The MDPREF solutions are post-processed in terms of the QR decomposition to obtain estimates of identified parameters, and then, a regression analysis is used to relate derived brand positions with corresponding covariates (including sequential variable selection if needed). Thus, our benchmark model is a two-stage procedure employing two commonly used methods, namely, MDPREF and linear regression. While it can be criticized as being ad hoc, software for the benchmark model is widely available and easy to use. We now describe the experimental design and the factors that are investigated in the study. 3.1. Simulation Study Design We generated data according to the model presented in (1) to (3) in Section 2. We experimentally manipulate the following six factors: F1: F2: F3: F4: F5: F6:

Underlying dimension T, with levels (1) T = 2, (2) T = 3; Sample size, with levels (1) N = 150, (2) N = 300; Number of Brands, with levels (1) J = 14, (2) J = 20; Number of covariates, with levels (1) K = 15, (2) K = 25; Percentage of active covariates, with levels (1) 20%, (2) 40%; Value of σ 2 , with levels (1) 0.1, (2) 0.5.

All these six factors specify the data-generating mechanism, which covers many scenarios including one where there are more covariates than brands. Consistent with past methodological literature published in Psychometrika (e.g., DeSarbo, 1982; DeSarbo & Carroll, 1985; DeSarbo, Fong, Liechty, & Saxton, 2004; Fong, Ebbes, & DeSarbo, 2012; Jedidi & DeSarbo, 1991), we

PSYCHOMETRIKA

employ a fractional factorial design (Addelman, 1961) to compare the performance of BVMDS, BVMDS-VS, and the benchmark model under different conditions. Utilizing an orthogonal design, we generated a total of 16 synthetic datasets, where the true values for the identified model parameters were determined as described below. For each of the 16 datasets, we generate ai from its prior, N (a0 , IT ), where a0 ∼ N (0, 100IT ); (γ ) from its prior in (12) where is set equal to a diagonal matrix with the inverse of each diagonal entry drawn from N (100, 10) and v1k = 10; each element of X j from the standard normal distribution, and b j according to (2) where b0 ∼ N (0, 100IT ). The following hyperparameter values are used for our Bayesian procedure in this simulation study: c = 1 in (3), m1 = 80T, m2 = 6 in (4), G a = 100IT in (5), G b = 100IT in (6), 0 = 0T ×K in (7) and (10), ν = 100, V = 1 in (8), v1k = 10 in (11), p = 1, and q = 1 in (14). After generating the parameters A, B, etc., we apply the QR decomposition method to obtain the identified parameters. In all cases, the posterior estimates from our Bayesian model were computed based on the last 5000 MCMC iterations out of a total 25,000 iterations (i.e., the first 20,000 iterations as burn-in). 3.2. The Simulation Results We compare the performance of BVMDS-VS, BVMDS, and the benchmark model on four aspects: (1) whether the models are able to identify the true dimensionality, (2) how well they estimate the parameters, (3) how well they perform out-of-sample prediction, and (4) whether the models can be used to select the significant covariates accurately. On the first aspect, to determine the dimensionality, we compute the log marginal likelihood for our Bayesian procedure under each dimensionality and the variance-accounted-for (VAF) statistic for MDPREF. Table 1 reports the values of these measures together with the combination of factors for each dataset. From the table, for all 16 simulation datasets, the log marginal likelihood values peak at the true dimension for both BVMDS-VS and BVMDS indicating that they have successfully identified the correct dimensionality. Also, using a scree plot, MDPREF is able to identify the true dimensionality for these datasets. For the second aspect, to compare parameter estimation performance, we compute the root mean square error (RMSE) of the parameter estimates from each model based on the true dimensionality for each simulation dataset. In Table 2, we list the average of RMSE of BVMDS-VS, BVMDS, and the benchmark model for various parameters across the 16 simulation datasets. We also compute the related t-statistic between BVMDS-VS and the benchmark model, the tstatistic between BVMDS and the benchmark model, and the t-statistic between BVMDS-VS and BVMDS. It is clear from the table that both Bayesian models beat the benchmark model in every case. For this particular performance aspect, there does not seem to be a significant difference between the performance of the two proposed Bayesian MDS models here. For the third aspect on out-of-sample prediction, we randomly assign two brands to the validation set for each of the 16 datasets. Table 3 shows the out-of-sample prediction results of the three competing models for each dataset. The RMSEs between the predicted and true ratings of the two validation brands are reported there. In all cases, BVMDS-VS performs the best with the smallest RMSE, followed by BVMDS, and the benchmark model performs the worst. Concerning the fourth aspect to evaluate the variable selection performance, we consider the correct selection of active (i.e., significant) covariates as well as the incorrect selection of inactive covariates. Since BVMDS does not include variable selection in its setup, we propose the following procedure for comparison purposes: a covariate is active for a given dimension if the posterior probability of the corresponding regression coefficient being positive is greater than 0.9 or less than 0.1 (see Rossi, McCulloch, & Allenby, 1996). This is comparable to the p value used for the benchmark model in the regression phase. Table 4 presents the variable selection results for each simulation dataset. We report the percentages of the significant variables and insignificant

2

3

2

3

2

3

2

3

2

2

3

4

5

6

7

8

9

Dimension

1

Dataset

Table 1.

150

300

300

150

150

300

300

150

150

Sample size

14

20

20

20

20

14

14

14

14

Brands

25

15

15

15

15

15

15

15

15

Attributes

20

40

20

20

40

20

40

40

20

Active attributes (%)

0.5

0.1

0.1

0.5

0.5

0.5

0.5

0.1

0.1

Error variance BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark

Model

DIM = 2 −690 −672 0.968 −2083 −2077 0.974 −4803 −4796 0.910 −5039 −5037 0.894 −3493 −3460 0.907 −3924 −3930 0.923 −1710 −1709 0.985 −5241 −5244 0.928 −2574 −2574 0.916

DIM = 1 −2178 −2164 0.856 −3766 −3753 0.859 −6610 −6605 0.757 −6616 −6602 0.743 −5247 −5228 0.666 −6319 −6297 0.613 −8967 −8965 0.826 −6993 −6993 0.858 −4358 −4289 0.510

−696 −679 0.972 −706 −703 0.993 −4949 −4937 0.920 −4955 −4953 0.915 −3608 −3615 0.916 −3791 −3805 0.941 −1732 −1719 0.986 −1827 −1853 0.978 −2737 −2715 0.926

DIM = 3

−716 −701 0.976 −745 −717 0.994 −5104 −5079 0.930 −5128 −5099 0.925 −3787 −3743 0.924 −3967 −3933 0.947 −1744 −1731 0.987 −1877 −1860 0.980 −2903 −2870 0.935

DIM = 4

Simulation analysis—dimension determination heuristics with log marginal likelihood for the Bayesian procedures and VAF for the benchmark model.

−717 −710 0.979 −746 −719 0.995 −5260 −5189 0.939 −5293 −5220 0.935 −3939 −3904 0.931 −4121 −4095 0.953 −1774 −1749 0.988 −1889 −1877 0.982 −3032 −2994 0.944

DIM = 5


2

3

2

3

2

3

11

12

13

14

15

16

300

300

150

150

300

300

150

Sample size

20

20

20

20

14

14

14

Brands

25

25

25

25

25

25

25

Attributes

Bold highlights the optimal dimension under each model.

3

Dimension

10

Dataset

40

20

20

40

20

40

40

Active attributes (%)

0.1

0.1

0.1

0.1

0.1

0.1

0.5

Error variance

continued

Table 1.

BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark BVMDS-VS BVMDS Benchmark

Model

DIM = 2 −3738 −3734 0.844 −1269 −1264 0.986 −4345 −4341 0.950 −850 −847 0.984 −3281 −3241 0.928 −6774 −6768 0.923 −9301 −9296 0.826

DIM = 1 −4225 −4219 0.675 −8582 −8574 0.557 −6509 −6509 0.840 −4170 −4160 0.851 −5608 −5589 0.627 −8930 −8899 0.823 −10,694 −10,685 0.686

−2733 −2729 0.947 −1295 −1282 0.987 −1357 −1352 0.988 −873 −864 0.985 −951 −962 0.984 −6907 −6903 0.930 −7058 −7073 0.925

DIM = 3

−2863 −2845 0.955 −1303 −1292 0.989 −1361 −1358 0.990 −913 −912 0.987 −999 −998 0.986 −7089 −7044 0.935 −7227 −7226 0.931

DIM = 4

−2957 −2940 0.962 −1310 −1292 0.990 −1374 −1361 0.991 −932 −931 0.988 −1030 −1018 0.987 −7273 −7228 0.941 −7422 −7369 0.937

DIM = 5

PSYCHOMETRIKA

0.027 (0.0200) 0.255 (0.129) 0.142 (0.070) 0.157 (0.088) 0.002 (0.001) 0.134 (0.103) 0.117 (0.054)

σ A B a0 b0 0.028 (0.021) 0.242 (0.130) 0.120 (0.081) 0.164 (0.040) 0.001 (0.001) 0.125 (0.105) 0.104 (0.069)

BVMDS 0.042 (0.021) 0.726 (0.381) 0.393 (0.191) 0.318 (0.086) 0.108 (0.172) 0.685 (0.356) 0.376 (0.206)

Benchmark

∗ p value is less than 0.05; ∗∗ p value is less than 0.01.

BVMDS-VS

Parameter

Table 2.

−2.091* −4.687** −4.953** −5.227** −2.476* −5.934** −4.862**

t-statistics between BVMDSVS and Benchmark

−1.858 −4.805** −5.264** −6.472** −2.489* −6.025** −5.006**

t-statistics between BVMDS and Benchmark

Simulation analysis—RMSE means (SD) and t test results.

−0.197 0.267 0.788 −0.291 1.723 0.242 0.597

t-statistics between BVMDSVS and BVMDS


PSYCHOMETRIKA Table 3. Simulation analysis: out-of-sample prediction comparison among the three models.

BVMDS-VS

BVMDS

Benchmark

0.666 1.245 1.023 0.862 0.959 1.134 0.660 0.628 0.753 1.134 0.958 1.005 1.424 1.134 0.805 0.919

0.746 1.308 1.193 0.946 1.126 1.195 0.728 0.709 0.976 1.887 1.200 1.051 1.492 1.370 1.332 1.261

0.835 1.832 1.437 1.476 1.414 1.690 0.852 0.912 1.208 1.998 1.796 1.718 1.526 1.904 1.935 1.994

Group1 Group2 Group3 Group4 Group5 Group6 Group7 Group8 Group9 Group10 Group11 Group12 Group13 Group14 Group15 Group16

Bold highlights the minimum RMSE. Table 4. Simulation analysis—variable selection results.

Group

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Average of 16 groups

Significant variables selected per dimension BVMDS-V BVMDS Benchmark (%) (%) (%) 66 83 83 100 100 100 100 100 100 50 50 100 80 100 100 100 88

83 61 66 78 83 55 83 66 60 43 40 53 65 66 90 46 65

66 44 42 66 83 33 50 5 30 13 35 40 35 66 30 26 42

Insignificant variables selected per dimension BVMDS-VS BVMDS Benchmark (%) (%) (%) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 4 0 0 16 0 4 11 0 0 10 2 20 3 2 0 5

8 4 11 6 0 0 0 0 7 2 16 6 13 6 15 0 6

variables selected per dimension for BVMDS-VS, BVMDS, and MDPREF. The average of each column is shown on the last row. In terms of these averages, BVMDS-VS performs the best, followed by BVMDS, and the benchmark model performs the worst.


For starting values of our Bayesian procedures, we either generate them from the prior distributions or use the MDPREF solutions. The average CPU time per group (all dimensions considered) for BVMDS-VS is 45.1 min with a standard deviation of 17.5 min and that for BVMDS is 1.8 min with a standard deviation of 0.1 min. For comparisons, the average CPU time for the benchmark model is 1.1 min with a standard deviation of 0.9 min. 4. An Empirical Application: Large SUV’s 4.1. The Large SUV Dataset A large US automotive consumer research supplier administered a tracking study back in late 2002 to gauge automotive marketing awareness and shopping behavior among typical consumers: 200 to 300 respondents were asked to provide their image attribute ratings in 16 car segments, and ten light truck and sport utility vehicle (SUV) segments. Typically, a sports utility vehicle or “SUV” refers to a passenger vehicle with off-road and towing capabilities. As the SUV market can be commonly categorized into small SUV’s, large SUV’s, and luxury SUV’s, the large SUV market is the main focus in this application study (see also DeSarbo et al., 2011 for additional details). The data were collected in December 2002 with 279 consumer intenders rating 16 different large SUVs. By “intender,” we mean the consumers who plan to or will purchase a vehicle in the next 6–12 months. We divide the 16 vehicles included in this image attribute rating data into 14 calibration brands and 2 validation brands. These 14 calibration brands account for 97.6 % of 2002 large SUV sales. The resulting 14 brands utilized for the analysis are Chevy Suburban, Chevy Tahoe, Ford Expedition, Ford Excursion, Lincoln Navigator, Toyota Sequoia, Toyota Land Cruiser, GMC Yukon, Escalade ESV, Escalade EXT, Hummer H1, Lexus LX470, and Yukon Denali. The two brands used for prediction/validation are Mercedes G-Class and Hummer H2. Among all these brands, 18.75 % are Ford brands and 68.75 % are US brands (i.e., domestic). Also, following Harshman and Lundy (1984), double centering was applied on the ratings to remove effects due to market share (brands) and consideration set sizes (consumers). Furthermore, we have access to a dataset that includes 21 subjectively rated attributes by these same set of intenders, and that covers measurements regarding the physical, perceptual, and reliability characteristics of the vehicle. The attribute names and their descriptive statistics are shown in Table 5. In addition to these attributes, we also created two dummy variables for “Ford (1) versus not Ford (0)” and “Domestic (1) versus Foreign (0)” to be included in the analysis. In creating the joint space presentation of the consumer and brand locations based on the rating data, we wish to simultaneously relate the resulting latent dimensions to these attributes to examine the resulting managerial implications. 4.2. The Proposed BVMDS-VS Model Solution We assume relatively vague priors with the following hyper-parameter values: c = 0.1 in (3), m1 = m2 = 0.05 in (4), G a = 100IT in (5), G b = 100IT in (6), 0 = 0T ×K in (7) and (10), ν = 1000, V = 1 in (8), v1k = 500 in (11), p = 1, and q = 1 in (14). Table 6 presents both the DIC and log marginal likelihood for the proposed BVMDS-VS method by dimension. As shown, the four-dimensional solution (T = 4) appears to be the most parsimonious. We show the resulting joint space under the four-dimensional solution in Figure 1, where the brand coordinates and the consumer vectors (their termini) are plotted in pair-wise dimension fashion (dimension 1 vs. dimension 2, dimension 3 vs. dimension 4). The consumer vectors (which should emanate from the origin) are represented as black dots, and brand coordinates as red squares with the appropriate brand names next to them. We remark that consumer vectors are normalized to unit length for ease of interpretation as is common for vector MDS solutions (e.g., DeSarbo & Cho, 1989; DeSarbo, Grewal, & Scott, 2008; DeSarbo, Howard, & Jedidi, 1991; DeSarbo &

PSYCHOMETRIKA Table 5. Empirical analysis—descriptive statistics of brand attributes.

Attribute

Mean

Std

Market share Gas mileage Value for money Workmanship Good ride handling Luxurious Safety Rugged Towing capacity Low price Sporty Good looking Good for family usage Fun to drive Easy to enter/exit Dependable Good acceleration Big cargo space Lasts a long time Prestigious Good trade-in value Ford versus not Ford Domestic versus Foreign

0.06 4.60 5.72 6.85 6.82 6.73 7.16 6.75 6.84 5.10 6.16 6.68 6.97 6.88 6.91 7.09 6.53 6.93 6.99 7.00 6.64 0.19 0.69

0.08 0.42 0.50 0.26 0.61 0.94 0.26 0.68 0.70 0.68 0.21 0.62 0.88 0.27 0.47 0.26 0.35 0.69 0.25 0.66 0.33 0.40 0.48

Table 6. Empirical analysis—DIC and log marginal likelihood (LML) of BVMDS-VS under various dimensionality.

Criterion

Dim = 1

Dim = 2

Dim = 3

Dim = 4

Dim = 5

DIC LML

10,322.0 −5832.6

10,316.7 −5239.6

10,246.9 −4432.4

10,163.6 −4264.8

10,203.6 −4936.2

Bold highlights the minimum DIC and/or maximum LML.

Jedidi, 1995; DeSarbo, Oliver, & DeSoete, 1986; Fong et al., 2010), even though one is unable to reproduce the predicted utility scores by consumer according to the proposed vector model when such normalized solutions are presented. Before investigating the interpretation underlying each dimension, we consider an important feature of this model: variable selection. As discussed in Section 2.3, the optimal model is determined based on the posterior distribution of the latent indicator vector γ . Table 7 presents the indicator vectors with the top three posterior probabilities (as the remaining models have negligible probabilities). The top model based on BVMDS-VS indicates that some nine attributes are useful in the reparameterization of brand coordinates, and this model has a 92.4 % chance of selection during the MCMC procedure. Since we now know that the model with 9 attributes is optimal, we use these attributes to ascribe interpretations to our derived four dimensions. Table 8 presents the correlations between the brand coordinates and the attributes involved in the optimal model. As shown, dimension 1


Figure 1. Joint space map of the four-dimensional solution for BVMDS-VS.

Table 7. Empirical analysis—variable selection result of BVMDS-VS.

Variable name Market share Gas mileage Value for money Workmanship Good ride handling Luxurious Safety Rugged Towing capacity Low price Sporty Good looking Good for family usage Fun to drive Easy to enter/exit Dependable Good acceleration Big cargo space Lasts a long time Prestigious Good trade-in value Ford versus not ford Domestic versus foreign Model probability (%)

Model 1

Model 2

Model 3

1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 92.4

1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 1 1 1 0 0 1 0 7.22

1 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1 1 1 0 0 1 0 0.34

PSYCHOMETRIKA Table 8. Empirical analysis—correlations between brand attributes and dimensions.

Attribute

Dimension 1

Dimension 2

Dimension 3

Dimension 4

Market share Value for money Luxurious Rugged Dependable Good acceleration Big cargo space Lasts a long time Ford versus not ford

−0.38 −0.49 0.01 −0.47 −0.64 −0.59 −0.62 −0.68 0.55

0.67 0.68 −0.58 0.64 0.31 0.27 0.61 0.34 −0.19

−0.22 −0.15 −0.56 −0.03 −0.36 −0.70 −0.58 −0.11 −0.16

0.06 −0.31 −0.13 0.23 −0.46 −0.18 0.31 −0.54 0.74

Bold highlights the largest absolute value for each row.

describes a practicality latent construct having high correlation (negative) with “lasts a long time” (−0.68) and “dependable” (−0.64). Dimension 2 appears to be related to a luxury dimension having relatively large negative correlations with “luxurious” (−0.58) and a high positive correlation with “value for money” (0.68). It is also positively correlated with “market share” indicating high popularity. Dimension 3 is definitely reflecting a performance latent structure that is highly related (negatively) to “good acceleration” (−0.70). Finally, dimension 4 is characterized by the manufacturer “Ford” (0.74). Clearly, the joint space maps in Figure 1 provide useful information to management, especially regarding the positioning of different brands from the same manufacturer. Let us first examine the various GM brands. The most obvious phenomenon occurs between the two Cadillac Escalade brands (Escalade ESV and Escalade EXT). Their brand coordinates in all dimensions are quite close to each other, which indicates that they are attracting the same group of consumer intenders so that they are basically actively competing with one another. Since the number of intenders in a specific group remains constant, this simply indicates that these two sister brands are cannibalizing each other’s market share. GM is not doing a good job with respect to positioning its other brands in relation to each other as the GMC Yukon, Yukon Denali, Chevy Suburban, and Chevy Tahoe are located quite close to one another, especially with respect to the first two dimensions. On the positive side, one sees that the two Cadillac brands are not closely located to these other four GM brands. A similar phenomenon can be observed from two Toyota brands, Sequoia and Land Cruiser. The Ford sponsored brands do somewhat better in distinctively positioning themselves differently from each other, especially with respect to the Lincoln Navigator, which avoids severe competition among brands made by this same manufacturer. Ideally, a manufacturer would like to have their multiple brands located far apart and competing with other brands made by other manufacturers. 4.3. Comparison with Other Models In addition to the proposed BVMDS-VS, we also run BVMDS and MDPREF/benchmark model on this same dataset for comparison purposes. Note that, since the number of attributes is greater than the number of available brands, there is an identification problem with BVMDS which may affect its performance. As shown in Table 9, the DIC measurements of BVMDS keep on decreasing, but the log marginal likelihood values do indicate a four-dimensional solution, and so we fix T = 4 for the comparison. For MDPREF, the values of the VAF statistic for dimensions 1–9 are listed in Table 10. In such a deterministic model, it is difficult to declare the optimal dimensionality by use of a scree plot. We make use of the two validation brands to

DUNCAN K. H. FONG ET AL. Table 9. Empirical analysis—DIC and log marginal likelihood (LML) of BVMDS under various dimensionality.

Criterion

Dim = 1

Dim = 2

Dim = 3

Dim = 4

Dim = 5

DIC LML

10,314.7 −5361.5

10,300.8 −4445.2

10,250.6 −4123.5

10,236.8 −3844.5

10,180.4 −4849.1

Bold highlights the minimum DIC and/or maximum LML. Table 10. Empirical analysis—VAF of benchmark model for various dimensions.

Dim

VAF

1 2 3 4 5 6 7 8 9

0.26 0.42 0.55 0.64 0.72 0.78 0.83 0.87 0.91

Table 11. Empirical analysis—out-of-sample prediction comparison among the three models.

Brand Mercedes G-class Hummer H2

BVMDS-VS 4-Dim sol.

BVMDS 4-Dim sol.

4-Dim sol.

Benchmark 5-Dim sol.

6-Dim sol.

0.60 0.72

0.62 0.75

0.73 0.81

0.73 0.81

0.73 0.81

Bold highlights the minimum RMSE.

compare the prediction capability of the three models. Similar to the simulation study, the RMSEs between the predicted ratings and the actual observed data are presented in Table 11. Without any surprise, BVMDS-VS is the best model amongst the three competing options when considering both brands. BVMDS does not perform too badly, and it outperforms the benchmark model in all cases. For the final model (T = 4), we ran the Gibbs sampler for 25,000 iterations of which the last 5000 was used to compute all posterior estimates based on BVMDS and BVMDS-VS, respectively. CPU time is about 7.3 minutes for BVMDS-VS and 1.1 minutes for BVMDS. 5. Conclusion We introduce a new Bayesian vector MDS procedure incorporating dimension reparameterization with variable selection options for the analysis of metric rating data typically collected in marketing positioning studies. The advantages of the proposed Bayesian method are 1. The underlying dimensionality can be determined by well-established Bayesian model comparison criteria; 2. We make use of attribute data to better interpret the brand coordinates. For both the simulation examples and the application study, the proposed method shows better performance over the benchmark model;

PSYCHOMETRIKA

3. A variable selection option is included in our Bayesian framework to identify the optimal model that contains significant brand attributes. The simulation and application results show that the model with variable selection procedure dominates the one without variable selection procedure, and that it can produce more managerially insightful results; 4. We discuss the Bayesian identification issues of the vector MDS model and suggest a post-processing method to solve the indeterminacy problem which does not affect the variable selection results. In addition, an efficient Gibbs sampling algorithm is proposed to obtain a sample from the unconstrained posterior distribution. Concerning future research opportunities, there are a number of potential extensions to our model. For example, one may consider modifying this model to analyze ordered preference or binary choice data. Also, we only considered reparameterizing brands in this study; however, if data are available, one can also involve consumer-based demographics, psychographics, attitudes, etc., information to construct a similar prior on the consumer vectors, and apply the variable selection procedure there as well. Such a model may yield a better positioning strategy and provide insights into the heterogeneity of consumer preference. Here, we assume that all significant attributes contribute in every dimension of our MDS solution. A possible generalization is to allow for different attributes affecting different dimensions (but not necessarily all dimensions). However, this generalization can be a challenge as the number of possible models to compare in the variable selection step increases from 2 K to 2TK , and thus, the computational burden will increase dramatically. Finally, further Monte Carlo simulations are required with comparisons made to other benchmark models whose software are available. Acknowledgments The authors wish to thank the editor, associate editor, and three anonymous referees for their constructive comments. This research was funded in part by the Smeal College of Business. Appendix 1: Full Conditional Distributions (i) Let Z be the data matrix. Since π(σ −2 |all others) σ −2 −2 N J/2 (Z − A B)(Z − A B) (σ −2 )m 1 −1 exp{−m 2 σ −2 } etr − ∝ (σ ) 2 1 −2 N2J +m 1 −1 −2 m 2 + tr Z − A B Z − A B , exp −σ = (σ ) 2 where etr refers function of the trace of (a matrix), the full conditional distribution to an exponential of σ −2 is Ga m∗1 , m ∗2 , where m∗1 = N2J + m 1 and m ∗2 = m 2 + 21 tr[(Z − A B)(Z − A B) ] . (ii) Let A0 = 1 ⊗a0 . Since π( A|all others) 1 ∝ etr − σ −2 Z − A B Z − A B + ( A − A0 ) ( A − A0 )/c 2 1 ∝ etr − A (σ −2 BB + IT /c) A − 2 A (σ −2 BZ + A0 /c) , 2


¯ I N , Al , where Al = (σ −2 BB +I T /c)−1 and A ¯ = the full conditional distribution of A is MN A, −2 Al (σ BZ + A0 /c). (iii) Since π(a0 |all others) N 1 −1 1 ∝ exp − (ai − a0 ) (ai − a0 )/c − a0 G a a0 2 2 i=1 N

1 N −1 a0 − 2a0 IT + G a ai /c , ∝ exp − a 2 0 c i=1

the full conditional distribution of a0 is N ( a¯ , G an ), where G an = N G an i=1 ai /c.

N

c IT

+ G a−1

−1

and a¯ =

(iv) Let B 0 = 1 ⊗b0 . Since π(B|all others) 1 −2 −1 ∝ etr − σ Z − A B Z − A B + (B − B0 − X) (B − B0 − X) 2 1 −2 −1 −2 −1 ∝ etr − B (σ AA + )B − 2B (σ AZ + (B 0 + X)) , 2 ¯ I J , Bl , where Bl = (σ −2 AA + −1 )−1 and B ¯ = the full conditional distribution of B is MN B, −1 −2 Bl (σ AZ + (B 0 + X)). (v) Since π(b0 |all others) ⎧ ⎫ J ⎨ 1 ⎬ 1 (b j − b0 − X j ) −1 (b j − b0 − X j ) − b0 G −1 b ∝ exp − 0 b ⎩ 2 ⎭ 2 j=1 ⎧ ⎤⎫ ⎡ J ⎨ 1 ⎬ −1 ⎦ , b ∝ exp − ⎣ b0 J −1 + G −1 b − 2b − X 0 j j 0 b ⎩ 2 ⎭ j=1

−1 ¯ G bn , where G bn = J −1 + G −1 the full conditional distribution of b0 is N b, and b¯ = b G bn −1 Jj=1 b j − X j . (vi) Since π −1 |all others 1 −1 2J −1 ∝ | | etr − (B − B0 − (γ ) X (γ ) ) (B − B0 − (γ ) X (γ ) ) 2 v−T−1 V −1 −1 1 k γk −1 | −1 | 2 etr − (γ ) H −1 ×| −1 | 2 etr − (γ ) (γ ) 2 2

PSYCHOMETRIKA J + k γk +ν−T−1

2 ∝ | −1 | −1 −1 B − B0 − (γ ) X (γ ) B − B0 − (γ ) X (γ ) +V −1 I T + (γ ) H −1 , × etr (γ ) (γ ) 2

the full conditional distribution of −1 is W (J + k γk +ν, V n ), where V n = B − B0 − (γ ) −1 X (γ ) B − B0 − (γ ) X (γ ) + V −1 I T + (γ ) H −1 . (γ ) (γ ) (vii) Since π(w|all others) ∝ w p−1 (1 − w)q−1

K k=1

∝w

K p+ k=1 γk −1

(w γk (1 − w)1−γk ) K q+K − k=1 γk −1

(1 − w)

the full conditional distribution of w is Beta( p + (viii) Since

K k=1

γk , q + K −

, K k=1

γk ).

π((γ ) |all others) 1 −1 −1 −1 ∝ etr − (B − B0 − (γ ) X (γ ) ) (B − B0 − (γ ) X (γ ) ) + (γ ) H (γ ) (γ ) 2 1 −1 −1 −1 ∝ etr − (γ ) X (γ ) X (γ ) + H (γ ) (γ ) − 2 (B − B0 )X (γ ) (γ ) , 2 ˜ (γ ) , K−1 , ), where K (γ ) = X (γ ) X + H −1 the full conditional distribution of (γ ) is MN( (γ ) (γ ) (γ ) ˜ (γ ) = (B − B0 )X K −1 . and (γ ) (γ ) (ix) Since π((γ ) , −1 , γ |all others) 1 −1 2J −1 ∝ | | etr − (B − B0 − (γ ) X (γ ) ) (B − B0 − (γ ) X (γ ) ) 2 K γ −T 1 −1 −1 k2 k −1 2 |H (γ ) | etr − (γ ) H (γ ) (γ ) [w γk (1 − w)1−γk ] ×| | 2 k=1 −1 v−T−1 V ×| −1 | 2 etr − −1 2 −T 2

−1

J + k γk +ν−T−1 2

∝ |H (γ ) | | | 1 −1 −1 ˜ ˜ ˜ ˜ ×etr − ((γ ) − (γ ) )K (γ ) ((γ ) − (γ ) ) − (γ ) K (γ ) (γ ) 2 K 1 ×etr − (B − B0 ) −1 (B − B0 ) + V −1 −1 [w γk (1 − w)1−γk ], 2 k=1


we first integrate out (γ ) to get the distribution π −1 , γ |all others except (γ ) which is proportional to

−T J +ν−T−1 1 ( H (γ ) K (γ ) ) 2 w γk (1 − w)1−γk | −1 | 2 etr − (B − B0 ) (B − B0 ) k 2 $ ˜ (γ ) −1 . ˜ (γ ) K (γ ) +V −1 IT − Then, the required result in (23) is obtained by integrating out −1 from this last expression.

Appendix 2: Proof of Theorem 1 For any generated A, by applying the QR decomposition method, one can obtain a unique orthogonal matrix such that A satisfies the identification constraint given in (9). Other identified parameters are then obtained by multiplying to those parameters (e.g., B). Now V(γ ) in (23) can be re-written as V(γ ) = V −1 IT + (B − 1 ⊗b0 )(IJ − X (γ ) K −1 (γ ) X (γ ) )(B − 1 ⊗b0 ) = V −1 + ( B − [1 ⊗b0 ])(IJ − X (γ ) K −1 (γ ) X (γ ) )( B − [1 ⊗b0 ]) = V −1 IT + B − 1 ⊗b0 )(IJ − X (γ ) K −1 B − 1 ⊗b0 , (γ ) X (γ )

so |V(γ ) | = | V −1 IT + B − 1 ⊗b0 )(IJ − X (γ ) K −1 B − 1 | X ⊗b 0 (γ ) (γ ) = | ||V −1 IT + B − 1 ⊗b0 )(IJ − X (γ ) K −1 B − 1 ⊗b0 ||| (γ ) X (γ ) B − 1 ⊗b0 |. = |V −1 IT + B − 1 ⊗b0 )(IJ − X (γ ) K −1 X ) (γ (γ ) Since |V(γ ) | as well as the remaining terms in (23) are unchanged when the unidentified parameters (e.g., B) are replaced by the identified parameters (e.g., B), the posterior distribution of γ is unchanged when the substitution is made. Thus, the variable selection results are not affected by the proposed post-processing procedure. References Addelman, S. (1961). Irregular fractions of the 2n factorial experiments. Technometrics, 3, 479–496. Barbieri, M. M., & Berger, J. (2004). Optimal predictive model selection. Annals of Statistics, 32, 870–897. Benzecri, J. P. (1992). Correspondence analysis handbook. New York: Marcel Dekker. Bolton, G. E., Fong, D. K. H., & Mosquin, P. (2003). Bayes factors with an application to experimental economics. Experimental Economics, 6, 311–325. Borg, I., & Groenen, P. J. F. (2005). Modern multidimensional scaling: Theory and applications (2nd ed.). New York: Springer. Brown, P. J., Vannucci, M., & Fearn, T. (1998). Multivariate Bayesian variable selection and prediction. Journal of the Royal Statistical Society Series B, 60, 627–641. Buja, N., & Eyuboglu, N. (1992). Remarks on parallel analysis. Multivariate Behavioral Research, 27(4), 509–540. Carroll, J. D. (1980). Models and methods for multidimensional analysis of preferential choice (or other dominance) data. In E. D. Lantermann & H. Feger (Eds.), Similarity and choice (pp. 234–289). Vienna: Hans Huber Publishers. Carroll, J. D., & Arabie, P. (1980). Multidimensional scaling. Annual Review of Psychology, 31, 607–649. Carroll, J. D., Pruzanksy, S., & Kruskal, J. B. (1980). CANDELINC: A general approach to multidimensional analysis of many-way arrays with linear constraints on parameters. Psychometrika, 45(1), 3–24.

PSYCHOMETRIKA Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling (2nd ed.). Boca Raton, FL: Chapman and Hall/CRC. Dawid, A. (1981). Some matrix-variate distribution theory: Notational considerations and a Bayesian application. Biometrika, 68(1), 265–274. DeSarbo, W. S. (1982). GENNCLUS: New models for general nonhierarchical clustering analysis. Psychometrika, 47(4), 449–475. DeSarbo, W. S., & Carroll, J. D. (1985). Three-way metric unfolding via alternating weighted least squares. Psychometrika, 50(3), 275–300. DeSarbo, W. S., & Cho, J. (1989). A stochastic multidimensional scaling vector threshold model for the spatial representation of pick any/N data. Psychometrika, 54(1), 105–121. DeSarbo, W. S., Fong, D. K. H., Liechty, J., & Saxton, K. (2004). A hierarchical Bayesian procedure for two-mode cluster analysis. Psychometrika, 69, 547–572. DeSarbo, W. S., Grewal, R., & Scott, C. J. (2008). A clusterwise bilinear multidimensional scaling methodology for simultaneous segmentation and positioning analysis. Journal of Marketing Research, 45(2), 280–292. DeSarbo, W. S., Howard, D. J., & Jedidi, K. (1991). MULTICLUS: A new method for simultaneously performing multidimensional scaling and cluster analysis. Psychometrika, 56(1), 121–136. DeSarbo, W. S., & Jedidi, K. (1995). The spatial representation of heterogeneous consideration sets. Marketing Science, 14(3), 326–342. DeSarbo, W. S., & Kim, S. (2013). A review of the major multidimensional scaling models for the analysis of preference/dominance data in marketing. In L. Moutinho & K.-H. Huarng (Eds.), Quantitative Modeling in Marketing and Management (pp. 3–27). London: World Scientific Press. DeSarbo, W. S., Kim, Y., & Fong, D. K. H. (1999). A Bayesian multidimensional scaling procedure for the spatial analysis of revealed choice data. Journal of Econometrics, 89, 79–108. DeSarbo, W. S., Oliver, R. L., & DeSoete, G. (1986). A probabilistic multidimensional scaling vector model. Applied Psychological Measurement, 10(1), 79–98. DeSarbo, W. S., Park, J., & Rao, V. (2011). Deriving joint space positioning maps from consumer preference ratings. Marketing Letters, 22(1), 1–14. DeSarbo, W. S., & Rao, V. R. (1986). A constrained unfolding methodology for product positioning. Marketing Science, 5, 1–19. Fong, D. K. H. (2010). Bayesian multidimensional scaling and its applications in marketing research. In Ming-Hui Chen, Dipak K. Dey, Peter Mueller, Dongchu Sun, & Keying Ye (Eds.), Frontier of Statistical Decision Making and Bayesian Analysis (pp. 410–417). Berlin: Springer. Fong, D. K. H., DeSarbo, W. S., Park, J., & Scott, C. J. (2010). A Bayesian vector multidimensional scaling procedure for the analysis of ordered preference data. Journal of the American Statistical Association, 105(490), 482–492. Fong, D. K. H., Ebbes, P., & DeSarbo, W. S. (2012). A heterogeneous Bayesian regression model for cross sectional data involving a single observation per response unit. Psychometrika, 77(2), 293–314. George, E. I., & McCulloch, R. E. (1993). Variable selection via gibbs sampling. Journal of American Statistical Association, 88, 881–889. George, E. I., & McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statistica Sinica, 7, 339–373. Gifi, A. (1990). Nonlinear multivariate analysis. Chichester, England: Wiley. Golub, G. H., & Van Loan, C. F. (1996). Matrix computations (3rd ed.). Baltimore, MD: Johns Hopkins University Press. Gormley, I. C., & Murphy, T. B. (2006). A latent space model for rank data. In Statistical network analysis: Models, issues and new directions. Lecture notes in computer science. New York: Springer. Available as technical report at http:// www.tcd.ie/Statistics/postgraduate/0602.pdf. Gupta, A. K., & Nagar, D. K. (2000). Matrix variate distributions. Monographs and surveys in pure and applied mathematics (Vol. 104). London: Chapman & Hall/CRC. Gustafson, P. (2005). On model expansion, model contraction, identifiability and prior information: Two illustrative scenarios involving mismeasured variables. Statistical Science, 20, 111–140. Harshman, R. A., & Lundy, M. E. (1984). Data preprocessing and the extended PARAFAC model. In H. G. Law, C. W. Snyder Jr, J. Hattie, & R. P. McDonald (Eds.), Research methods for multimode data analysis (pp. 216–284). New York: Praeger. Jedidi, K., & DeSarbo, W. S. (1991). A stochastic multidimensional scaling procedure for the spatial representation of three-mode, three-way pick any/J data. Psychometrika, 56(3), 471–494. Jolliffe, I. T., Trendafilov, N. T., & Uddin, M. (2003). A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12(3), 531–547. Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795. Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29, 1–27. Lee, M. D. (2008). Three case studies in the Bayesian analysis of cognitive models. Psychonomic Bulletin & Review, 15, 1–15. Oh, M.-S., & Raftery, A. E. (2001). Bayesian multidimensional scaling and choice of dimension. Journal of the American Statistical Association, 96, 1031–1044. O’Hara, R. B. O., & Sillanpaa, M. J. (2009). A review of Bayesian variable selection methods: What, how and which. Bayesian Analysis, 4(1), 85–118. Park, J., DeSarbo, W. S., & Liechty, J. (2008). A hierarchical Bayesian multidimensional scaling methodology for accommodating both structural and preference heterogeneity. Psychometrika, 73(3), 451–472.

DUNCAN K. H. FONG ET AL. Raftery, A. E., Newton, M. A., Satagopan, J. M., & Krivitsky, P. N. (2007). Estimating the integrated likelihood via posterior simulation using the harmonic mean identity. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, & M. West (Eds.), Bayesian statistics 8 (pp. 1–45). Oxford: Oxford University Press. Rossi, P. E., McCulloch, R. E., & Allenby, G. M. (1996). The value of purchase history data in target marketing. Marketing Science, 15(4), 321–340. Schonemann, P. H. (1970). On metric multidimensional unfolding. Psychometrika, 35(3), 349–366. Scott, C. J., & DeSarbo, W. S. (2011). A new constrained stochastic multidimensional scaling vector model: An application to the perceived importance of leadership attributes. Journal of Modeling in Management, 6(1), 7–32. Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with an unknown distance function. Psychometrika, 27(125–140), 219–246. Shepard, R. N. (1980). Multidimensional scaling, tree-fitting, and clustering. Science, 210, 390–398. Shin, J. S., Fong, D. K. H., & Kim, K. J. (1998). Complexity reduction of a house of quality chart using correspondence analysis. Quality Management Journal, 5, 46–58. Slater, P. (1960). The analysis of personal preferences. The British Journal of Statistical Psychology, 13, 119–135. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B, 64, 583–639. Takane, Y. (2013). Constrained principal component analysis. New York, NY: Chapman & Hall Inc. Takane, Y., Young, F., & Leeuw, J. (1977). Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features. Psychometrika, 42(1), 7–67. Ter Braak, C. J. F. (1986). Canonical correspondence analysis: A new eigenvector technique for multivariate direct gradient analysis. Ecology, 67(5), 1167–1179. Tucker, L. R. (1960). Intra-individual and inter-individual multidimensionality. In H. Gullikson & S. Messick (Eds.), Psychological Scaling: Theory and Applications. New York, NY: Holt, Rinehart, & Winston. Manuscript Received: 18 FEB 2013

An efficient sampling algorithm with adaptations for Bayesian variable selection.

Bayesian Variable Selection Methods for Matched Case-Control Studies.

Smooth Scalar-on-Image Regression via Spatial Bayesian Variable Selection.

Bayesian Variable Selection under the Proportional Hazards Mixed-effects Model.

A bayesian integrative model for genetical genomics with spatially informed variable selection.

Multidimensional scaling of pictorial informativeness.

Variable Selection for Support Vector Machines in Moderately High Dimensions.

Bayesian hierarchical structured variable selection methods with application to MIP studies in breast cancer.

Weibull regression with Bayesian variable selection to identify prognostic tumour markers of breast cancer survival.

Joint Bayesian variable and graph selection for regression models with network-structured predictors.

Multidimensional perceptual scaling of musical timbres.

Bayesian Factor Analysis as a Variable-Selection Problem: Alternative Priors and Consequences.

A spatio-temporal nonparametric Bayesian variable selection model of fMRI data for clustering correlated time courses.

Integration of Multiple Genomic Data Sources in a Bayesian Cox Model for Variable Selection and Prediction.

Multidimensional scaling analysis of the dynamics of a country economy.

Incorporating abundance information and guiding variable selection for climate-based ensemble forecasting of species' distributional shifts.

Across population genomic prediction scenarios in which Bayesian variable selection outperforms GBLUP.

Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection.

Spatial Bayesian Variable Selection Models on Functional Magnetic Resonance Imaging Time-Series Data.

Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors.

Uncertainty in Propensity Score Estimation: Bayesian Methods for Variable Selection and Model Averaged Causal Effects.

Bayesian variable selection for hierarchical gene-environment and gene-gene interactions.

High Dimensional Variable Selection with Error Control.

Graphical Representation of Proximity Measures for Multidimensional Data: Classical and Metric Multidimensional Scaling.