Substance Use & Misuse, 49:1546–1554, 2014 C 2014 Informa Healthcare USA, Inc. Copyright ISSN: 1082-6084 print / 1532-2491 online DOI: 10.3109/10826084.2014.913388

ORIGINAL ARTICLE

Analyzing the Effect of Selected Control Policy Measures and SocioDemographic Factors on Alcoholic Beverage Consumption in Europe within the AMPHORA Project: Statistical Methods Michela Baccini1 and Giulia Carreras2∗ 1

Department of Statistics, University of Florence, Florence, Italy; 2 Cancer Prevention and Research Institutte (ISPO), Florence, Italy The analysis aimed to estimate of the “net” effect on total alcoholic beverage consumption of intervention control policies and measured unplanned factors which vary over time and reflect socioeconomic and demographic dynamics of the population. This presented several critical points. The first concerned the strong correlation among the unplanned factors, which made it impossible to distinguish the effect of each of them. Moreover, due to the time series nature of the data, it was also difficult to disentangle the overall effect of the measured unplanned variables from the temporal behavior of the alcoholic beverage consumption related to other potentially relevant, but unobserved, factors. The second challenging point was related to the need of investigating the effect of policy measures of similar or of a different nature and efficacy, implemented within the same country during the study period, the effects of which were probably overlapping. The third critical point was related to the high percentage of missing values for several variables. Limiting the statistical analysis to the subset of units (i.e., years) for which information was complete was not feasible. In fact, the number of units with a missing value for at least one variable was so large for some countries, that it was not feasible to estimate the regression models. In addition, such an approach would destroy the time series structure of the data, bring to biased results unless the missing data mechanism is completely at random, and underestimate the uncertainty around the final estimates. We addressed these previous described difficulties by implementing a multiple imputation (MI) procedure for filling missing values (Little & Rubin 2002), and then specifying a multiple regression model for each country, which accounted for both policy measures and a limited

This paper describes the methods used to investigate variations in total alcoholic beverage consumption as related to selected control intervention policies and other socioeconomic factors (unplanned factors) within 12 European countries involved in the AMPHORA project. The analysis presented several critical points: presence of missing values, strong correlation among the unplanned factors, long-term waves or trends in both the time series of alcohol consumption and the time series of the main explanatory variables. These difficulties were addressed by implementing a multiple imputation procedure for filling in missing values, then specifying for each country a multiple regression model which accounted for time trend, policy measures and a limited set of unplanned factors, selected in advance on the basis of sociological and statistical considerations are addressed. This approach allowed estimating the “net” effect of the selected control policies on alcohol consumption, but not the association between each unplanned factor and the outcome. Keywords multiple imputation, multiple regression, time series, alcoholic beverage consumption

INTRODUCTION

One of the aims of the AMPHORA project1 was to investigate variations in alcoholic consumption in Europe as related to selected intervention control policies and other socioeconomic factors, called unplanned indicators, such as increasing economy, increasing gender equity or population mobility (e.g., movement from rural to urban areas). We were interested in results at both the country and European level, in order to compare the estimated associations and possibly detect similarities among countries. 1

The AMPHORA project involved 12 European countries: Austria, Finland, France, Hungary, Italy, Netherlands, Norway, Poland, Spain, Sweden, Switzerland, United Kingdom. Address correspondence to Giulia Carreras, Cancer Prevention and Research Institutte (ISPO), via delle Oblate 2, 50139, Florence, Italy; E-mail: [email protected].

1546

STATISTICAL METHODS FOR ANALYZING THE EFFECT OF POLICY MEASURES AND SOCIODEMOGRAPHIC FACTORS

set of unplanned factors, chosen in advance on the basis of sociological and statistical considerations. The regression model follows that developed by Nelson in the context of evaluating the relationship between unemployment and alcoholic beverage consumption, and estimating the effect of advertising bans for an international panel of countries in the Organization for Economic Cooperation and Development (Nelson, 2010; Nelson & Young, 2001). In a first stage of the analysis, the policy measures were introduced in the model one at time and a country-specific choice of the most relevant unplanned variable was made (country level analysis). In a second stage, all of the policies were included in the model at the same time, trying to reduce the number of parameters through simple assumptions on the effects. Moreover we used the same unplanned variables for all countries, in order to promote comparison. The results from this second stage analyses were finally combined and compared in a random effects meta-analysis, and an evaluation of the heterogeneity among countries was done. This paper details the statistical approach adopted for the country-specific analyses; as an example, we report few results to clarify methods. The complete results are presented in the paper by Allamani, Pepe, Baccini, Massini, and Voller (2014) and in the paper by Baccini & Carreras (2014), where a description of the meta-analysis model adopted to combine the country-specific findings is also provided. We first illustrate the MI procedure used for filling in missing data. The regression models, we specified to estimate the associations of interest, are then described. We finally discuss the qualities and drawbacks of the statistical methods employed for the analysis, and provide recommendations to correctly interpret the statistical results and avoid inappropriate conclusions. MULTIPLE IMPUTATION

Due to the incompleteness of the time series data a MI of missing values was performed. This is not the first time that a similar approach has been applied in the context of aggregate time series of alcoholic beverage consumption. For an example, see Grittner, Gmel, Ripatti, Bloomfield, & Wicki, (2011). MI is a useful strategy for dealing with incomplete data sets by filling in the missing entries with a set of plausible values. MI involves specifying a posterior distribution for all incomplete variables, which accounts for correlation among the variables themselves. Randomly drawing values from this distribution to fill missing values, one simulates uncertainty due to lacking information. The final result consists in n different imputed data sets (for example, 5), on which separate statistical analyses can be carried out, obtaining n different results. The heterogeneity among these n results expresses the uncertainty due to missingness. Several procedures for multiply imputing data have been proposed. In the present work, a Multivariate Imputation by Chained Equations (MICE) was

1547

carried out (Raghunathan, Lepkowski, van Hoewyk, & Solenberger, 2001). A separate MI procedure was applied for each country. All the information concerning alcoholic beverage consumption, unplanned factors and policy measures were included in the MI procedure. Both data collected by the AMPHORA study partners and data from international sources were considered. In order to avoid computational problems, variables with more than 40 missing values (on a series of 50 values) were removed from the dataset. The limit 40 was chosen to maintain variables with values every 5 years (i.e., WB variables). Implementing MICE required a preliminary step consisting in the selection of a matrix of predictors for each variable with missing values. The missing data was first filled in for this purpose by randomly sampling from the empirical marginal distributions of the variables with missing values, then we linearly regressed each of them on each of the other variables included in the dataset and calculated the Akaike information criterion (AIC) (Akaike, 1974). For each incomplete variable, we selected as predictors the four variables associated to the minimum AIC values. This set of predictors was enriched by including terms capturing the time pattern (t, t2 , and t3 ). When two sources of data were available for the same variable (from study partners and from international sources), the information arising from the alternative source was forced to be one of the predictors. Once the predictors’ matrices were defined, the MICE package implemented in R software was used to produce five different imputed datasets. In imputing data, a square root transformation of the positive continuous variables was eventually adopted in order to avoid imputation of negative values van Buuren & Groothuis-Oudshoorn, 2011; R Core Team, 2013). MI was made on a restricted set of variables for France and Spain, because of collinearity problems that hampered the MICE procedure. As an example, Figure 1 shows the observed values of the variable “female educational level” for Italy (dots) together with the five imputed time series obtained from MICE. All of the planned statistical analyses were separately performed on each of the five imputed datasets, and the five results were combined according to Little & Rubin, (2002) (see Appendix). REGRESSION MODEL

The analyses focused on the per capita total alcoholic beverage consumption. Alcoholic beverage consumption by type of beverage (wine, beer, spirits) was not analyzed. The annual time series of total per capita alcoholic beverage consumption approximately ranged from 1961 to 2008, for a maximum of 48 observations for each country. On the basis of theoretical considerations about the phenomenon and practical considerations about the data coverage (the idea is that variables with a very large amount of missing values are less reliable than variables

1548

M. BACCINI AND G. CARRERAS

6000

Country Level Analysis 4000

The analysis at the country level consisted in three steps:

2000

1. specification of a core model for total alcoholic beverage consumption; 2. selection of the “best” model accordingly to the minimum AIC criterion; 3. estimate of the marginal association of each policy measure with total alcoholic beverage consumption.

0

total consumption (litres per capita)

8000

10000

for a maximum of six (for more detail see the country reports in the same issue). This pre-selection may be associated with losing some potentially relevant policy actions, but focusing on few specific events allowed us to better identify the effect of each control policy, by limiting overlapping interventions, whose single contribution could not be otherwise detected.

1960

1970

1980

1990

2000

2010

year

FIGURE 1. Female educational level by year for Italy (dots) and the five corresponding imputed time series obtained from MICE (lines).

with a small amount of missing values, even after MI), we selected as potential predictors for the alcoholic beverage consumption the following unplanned variables: income, alcoholic beverages prices, percentage of males over 65 (as an indicator of the demographic structure of the population), female educational level, female employment, mother’s mean age at all childbirths, and urban level (for a detailed description of these variables see Voller, Maccari, Pepe, & Allamani, 2014). For these variables, when two sources of data were available, the source with the largest number of complete original observations was usually preferred, as it is shown in Table 1. Due to the large number of control policies implemented during the study period, for each country a preliminary selection of the most relevant measures was done,

Core Model Definition For each country, a core model was specified for the per capita total alcoholic beverage consumption on a log scale (Ledermann, 1956). Relying on the importance of the economic factors in determining changes on alcoholic beverage consumption (see e.g., Nelson, 2010), we included in the model income and price of the two alcoholic beverages which were, on average most often drunk over the study’s period (see Allamani et al., 2014 for more detail). We took into account the demographic age structure of the population through the percentage of men over 65 years of age (Nelson, 2010; Nelson & Young, 2001). All these variables were included in the linear predictor after a logarithmic transformation. Finally, a time trend was added, to capture the long-term behavior in consumption that could be related to unobservable factors. The assumption of independent normally distributed error terms was made. Unplanned Variables Selection The other unplanned variables a priori selected for the analysis (female educational level, female employment, mother’s mean age at all childbirths, and urban level),

TABLE 1. Number of missing values for the unplanned variables selected for the analysis and for which data from different sources were available. In bold the source used in the analysis

Austria Inland France Hungary Italy Netherlands Norway Poland Spain Sweden Switzerland United Kingdom

Female education

Age at childbirth

Female employment

Urban level

Income

Partners

WB

Partners

ES

Partners

OECD

Partners

WB

Partners

WB

45 19 24 45 22 39 20 4 46 26 46 46

40 40 40 40 40 40 40 40 40 40 40 40

25 1 14 39 50 1 2 25 16 11 12 5

0 0 38 0 1 0 1 30 11 8 0 13

46 10 11 28 0 11 12 11 46 11 31 12

9 4 8 32 0 38 12 33 12 10 0 12

44 31 44 45 18 1 27 1 45 43 32 50

0 0 0 0 0 0 0 0 0 0 0 0

44 27 33 8 18 29 32 16 39 19 41 18

0 0 0 10 0 0 0 10 0 0 0 0

STATISTICAL METHODS FOR ANALYZING THE EFFECT OF POLICY MEASURES AND SOCIODEMOGRAPHIC FACTORS

were inserted in the core model one at time after a log transformation (Nelson, 2010; Nelson & Young, 2001). Since the effect of these variables was expected to be delayed in time, their lagged association with the outcome was considered by introducing in the model the mean of the unplanned variable in the current and in the previous two years: Z t(0−2) =

Z t + Z t−1 + Z t−2 , 3

where Zt generically indicates the unplanned variable in the year t. The resulting model was the following: log Ct = β0 + β1 t + β2 log It + β3 log Pt + β4 log PtC +β5 log At + β6 log Z t(0−2) + εt , where Ct : total per capita alcoholic beverage consumption at time t It : income at time t Pt : price of the main alcoholic beverage at time t PtC : price of its competitor at time t At : percentage of men over 65 years of age at time t Z t(0−2) : mean from lag 0 to lag 2 of the unplanned variable Z εt : error term at time t. The AIC was calculated for each model and the model with minimum AIC was selected for the subsequent analyses. The idea was that by introducing only the “most explanatory” unplanned variable in the model, we were able to capture the short/medium term influence on total alcoholic beverage consumption of a mix of social factors, retaining at the same time model parsimony. Model for the Policies Effect After having selected the best unplanned predictor on the basis of the minimum AIC, the effect of each policy was investigated by including in the selected model a dummy variable Polt which was equal to 0 for t < t ∗ and 1 elsewhere, where t ∗ is the year when the policy was introduced: log Ct = β0 + β1 t + β2 log It + β3 log Pt + β4 log PtC

1549

indicator Ut estimated the discrepancy between the average levels of alcoholic beverage consumption before and after t ∗ . We did not include tax policies in the subset of interventions to be investigated, despite current research which considers the introduction of taxes on alcoholic beverages to be one of the most effective and cost effective policy options (Elder et al., 2010). This choice is motivated by the fact that given that alcoholic beverage prices include taxes, introducing prices variables in the model should partly capture the effect of taxes on consumption. This is in agreement with recent reviews in this field, where a clear distinction between the effect of price and tax levels is not done (see, e.g, Elder et al., 2010; Nelson, 2013; Sornpaisarn, Shield, Cohen, Schwartz, & Rehm, 2013; Wagenaar, Salois, & Komro, 2009; Xu & Chaloupka, 2011). As an example, in Figure 2 the time series of total per capita alcoholic beverage consumption (upper graph) in Italy is plotted together with the estimated associations for four policy measures (lower graph). Each policy measure was identified by the year in which was introduced:

• 1988: introduction of the limit of 0.8 grams per liter for blood-alcohol concentration (BAC) as threshold above which driving is forbidden • 1991: no alcoholic beverage sales during concerts, sporting events, or other events • 1998: introduction of a limit for the quantity of alcoholic beverages sold on highways • 2001: general alcoholic beverage control policy laws (restrictive measures on alcoholic beverages advertising; BAC level threshold at 0.5 grams per liter). These findings indicate that the average alcoholic beverage consumption levels before and after introduction of each policy are not significantly different. The results are adjusted for the selected socioeconomic factors, but the policy measures were considered one at time, so that it was not possible to disentangle the effect of each policy from the effect of the other ones, in particular for actions which were introduced close in time (1988 and 1991; 1998 and 2001). In this specific context (but this is the case even for other countries), the absence of a significant association between policies and total alcoholic beverage consumption, together with the evidence of a strong decreasing trend of the outcome during the study period, suggests that variations in consumption could be mainly related to gradual cultural changes.

+ β5 log At + β6 log Z t(0−2) + γ Polt + εt . European Level Analysis

In this way, we estimated the association between policy measure and alcoholic beverage consumption, given long-term trend and social, demographic, and economic factors. At this stage of the analysis, we did not account for possible cumulative or synergic effects of succeeding and previous interventions, so that the coefficient for the policy

The country-level analysis provided useful information on the association between policy measures and alcoholic beverage consumption. We used this information to develop a model which allowed comparison of the countryspecific results. This model differed from the previous one for two main points. First, in order to avoid possible heterogeneity deriving from model specification, we did not select the unplanned

1550 20 18 16 14 12 10 8 1960

1970

1980

1990

2000

2010

1960

1970

1980

1990

2000

2010

0.00 -0.05 -0.10

policy effect

0.05

total consumption (litres per capita)

M. BACCINI AND G. CARRERAS

FIGURE 2. Time series of total alcoholic beverage consumption (liters per capita) in Italy; the vertical lines indicate the occurrence of a policy (upper graph). Estimates and 90% confidence intervals for the association between policies and alcoholic beverage consumption in Italy (country level analysis); the horizontal line indicates no association (lower graph).

variable Z according to the minimum AIC, but we updated the core model for all countries including in the linear predictor the mean of the urbanization from lag 0 to lag 2: Ut(0−2) =

Ut + Ut−1 + Ut−2 , 3

where Ut is the urbanization value at time t. This choice derived from the fact that urbanization appeared to be one of the unplanned variables that was most related to alcoholic beverage consumption on the basis of the AIC in all countries (Table 2). Second, all of the alcohol policies were simultaneously introduced in the model. This required making a number of assumptions about the policies’ effects to both reduce the number of parameters and allow comparability among countries. We defined five subgroups of alcoholic beverage control policies:

• Restrictive advertising policies: introduction of limitations in alcoholic beverage advertising; • Restrictive availability policies: restrictive policies influencing licensing rules and trading hours sales; • Permissive availability policies: permissive policies influencing licensing rules and trading hours sales;

• Changes of the minimum age to buy alcoholic beverages, and • Changes of the BAC limit. Then, one regression term for each kind of policy was defined. The effect of restrictive advertising policies and of restrictive and permissive availability policies was modeled by step variables. In particular, for restrictive (advertising and availability) policies a variable was defined which was equal to 0 up to the first intervention, one from the first to the second, two from the second to the third and so on. For permissive policies, analogous but decreasing step functions were specified. In this way, we are assuming that policies belonging to the same class had the same effect and that this effect was immediate and constant over time, with the coefficient of the step function expressing the average effect of the policies. The effect of BAC limits was investigated by including in the model a continuous variable corresponding to the BAC limit, setting to 1% by volume the BAC limit before introduction of any policy. Under this parametrization, a positive regression coefficient for the BAC variable indicates that lowering the limit is associated with alcoholic beverage consumption reduction.

STATISTICAL METHODS FOR ANALYZING THE EFFECT OF POLICY MEASURES AND SOCIODEMOGRAPHIC FACTORS

1551

TABLE 2. Values of AIC associated to regression models where a different unplanned variable was added to the core model, by country Unplanned variable/Country

Italy

Austria

Finland

France

Hungary

Netherlands

Female education Female employment Urbanization Age at childbirth

−148.85 −119.99 −181.35 −163.06

— −126.13 −149.08 −125.86

−119.99 −102.96 −112.89 −96.95

−170.86 −207.29 −167.99 −179.82

−120.52 −123.38 −135.16 −139.58

−156.98 −146.13 −160.97 −163.03

Unplanned variable/Country Female education Female employment Urbanization Age at childbirth

Norway −166.71 −159.83 −170.83 −185.37

Poland −82.9 −62.45 −62.3 −70.54

Spain −118.97 −115.43 −123.47 −117.36

Sweden −129.36 −133.18 −141.8 −126.84

Switzerland −184.75 −184.77 −203.74 −192.98

United Kingdom −147.55 −147.97 −148.37 −149.16

The effect of the minimum age limit was investigated by including in the model a continuous variable corresponding to the minimum age, setting to 21 the minimum age in the absence of any policy (Nelson, 2010). A negative regression coefficient for the minimum age variable indicates that the increase of the minimum age needed to buy alcoholic beverages is associated with total consumption reduction. Summarizing, the final model was the following: log Ct = β0 + β1 t + β2 log It + β3 log Pt + β4 log PtC + β5 log At + β6 log Ut(0−2) + γ1 Advrestrictive t permissive

+ γ2 Availrestrictive + γ3 Availt t + γ4 MinAget + γ5 BACt + εt

As an example, we report in Table 3 the results for Italy. During the study period two BAC limit policies (1988, 2001), two restrictive availability policies (1991,1998) and one advertisement policy (2001) were implemented. This analysis confirmed the marginal results reported in Figure 2: the policies did not appear significantly correlated with a change in total alcoholic beverage consumption. DISCUSSION

This paper describes the statistical methods used within the AMPHORA project; their advantages and drawbacks. The statistical methodology, we adopted for the country-specific analyses, derives from conventional economic models, where alcohol demand was studied in reTABLE 3. Estimates and 90% confidence intervals (CI) for the association between control policies and alcoholic beverage consumption in Italy, arising from the regression model (European level analysis) Policies Changes of the BAC limit Restrictive advertising polizie Restrictive availability policies

Policy effect

90% CI

−0.182 −0.008 0.018

−0.432, 0.069 −0.045, 0.028 −0.025, 0.062

lation to real income, real price, and populations age (Nelson, 2010; Nelson & Young, 2001). This allowed us to estimate the association between prevention policies and alcoholic beverage consumption from time series aggregated data, taking into account selected socioeconomic unplanned factors which can influence the pattern of alcoholic beverage consumption over time. Performing separated analyses by country, we preserved the specificity of the relationship between policies/ contextual factors and alcoholic beverage consumption in different areas of Europe. On the other hand, specifying the same regression model in each country (in particular in the European level analysis), we avoided possible heterogeneity among country-specific estimates due to differences in model specification or explanatory variables choice. This favored the second step of the analysis, consisting in comparison among countries (see the paper in the same issue by Baccini & Carreras, 2014). Using MI it was possible to perform analyses even in the presence of missing entries in the original data sets. The limitations of the proposed statistical approach mostly arise from the nature of the data (aggregate annual time series) and are well known in the literature (Rehm & Gmel, 2001). First, both the time series of alcohol consumption and the time series of the main measured unplanned factors usually show long-term waves or trends. In order to avoid potential spurious results, de-trending is required to remove the influence of potential unobserved variables which could confound the association of interest. However de-trending is in some sense a preventive and strong intervention that could preclude us the possibility of discovering actual causal associations2 between unplanned variables and outcome. We are excluding the possibility that coinciding trends are causally related. Moreover, de-trending removes most of the variability from the outcome time series, so that not much more is left to be explained by other variables. 2

The reader is referred to Hills’s criteria for causation which were developed in order to help assist researchers and clinicians determine if risk factors were causes of a particular disease or outcomes or merely associated. [Hill, A. B. (1965). The environment and disease: associations or causation? Proceedings of the Royal Society of Medicine 58: 295–300.]. Editor’s note.

1552

M. BACCINI AND G. CARRERAS

We decided to account for long-term trend in time series adjusting for time as a deterministic variable. An alternative approach could consist in modeling differences, that is, in specifying an integrated model (Box & Jenkins, 1970). We preferred the first solution because model interpretation was easier and results were similar. We assumed independent error terms because after detrending no strong evidence of residual autocorrelation arose. A second difficulty is that predictor variables are often highly correlated itself. Higher income, later age at childbirth, increasing urbanization, increasing prices, higher female education may all just reflect a single underlying development, that is, a development to a better welfare state. The high collinearity between unplanned indicators increases the chance of spurious results when more than one of these variables is entered into the model. In conclusion, as a consequence of the high correlation between the time series involved in the analysis, we were not able to identify the effect(s) of each unplanned variable on total alcoholic beverage consumption, nor to measure the extent to which the observed unplanned variables, together, explained the time pattern of the alcoholic beverage consumption. A third relevant issue regards the possible endogeneity of the explanatory variables, mainly alcoholic beverage prices or policies. For example, one cannot exclude the existence of a causal relation which goes from the social costs associated with alcohol use to the policies’ implementation, originating a loop of causality between outcome and dependent variables. In the presence of this loop, explanatory variables (policies in this case) and error terms are correlated and ordinary regression models are inappropriate. The most common alternative estimator used in this context is the IV estimator (Kennedy, 2003). This analysis did not address the complicated issue of endogeneity, assuming exogeneity of policies and prices, accordingly to the approach followed also in other studies about alcoholic beverage consumption (Nelson, 2010). Some problems are also related to the estimate of the policies’ effect(s). Modeling the association between each policy and consumption separately from the association of the other ones can result in biased estimates because the step function will also capture the effect of interventions implemented before or after the year in which the policy was introduced. When we introduced all policies in the same model, as in the European model, we estimated their net association with the outcome. However, other difficulties can arise. In order to avoid the inclusion of too many terms in the model and to simplify the comparison among countries, we assumed that similar policies introduced during different years had the same effect on alcoholic beverage consumption. This is clearly a strong and limiting hypothesis, because it does not account for a possible scaling of alcohol policies within ¨ and across countries (Karlsson & Osterberg, 2001). For BAC limits and minimum ages, we relaxed this assumption by stating that the effect of each policy was proportional to the difference between the old limit and the new

one. This probably allowed for a better comparison among countries. Another relevant point concerns the possible lag effect of the policy measures. The model relies on the assumption that their effect is immediate and constant over time and, due to the small sample size (50 years), it was impossible to investigate delayed effects and time-varying associations. Finally, our model did not provide reliable results for interventions occurring at the beginning or at the end of the study period, when the evaluation of the alcoholic beverage consumption before or after the policy relies on only few observations. All these limitations should be considered in order to avoid over-interpretation of the statistical findings. Declaration of Interest

The authors declare no conflicts of interest. The authors alone are responsible for the content and writing of the paper. THE AUTHORS Michela Baccini, Ph.D., is researcher in medical statistics at the University of Florence. Author of several papers in the field of environmental epidemiology and biostatistics, she worked on time series analysis, meta-analysis, health impact assessment, and multiple imputation.

Giulia Carreras, Ph.D., is a statistician at the Cancer Prevention and Research Institute in Florence. She is author of several papers in the field of primary prevention and environmental epidemiology. Her key research areas are Markov models and dynamic models for decision making, analysis of prevention studies, and health impact assessment.

GLOSSARY

Akaike Information Criterion (AIC): Measure which expresses the quality of a statistical model when applied on a specific set of data. It considers the trade-off between goodness of fit and complexity of the model itself. Lower is AIC, higher is the quality of the model. Collinearity: Statistical term used to describe the situation in which two or more variables are highly correlated. Confidence interval (CI): Interval calculated from data which expresses a “plausible” range for the parameter

STATISTICAL METHODS FOR ANALYZING THE EFFECT OF POLICY MEASURES AND SOCIODEMOGRAPHIC FACTORS

of interest. The level of the interval (usually 95% or 90%) is defined as the probability that, in hypothetical repeated experiments, the calculated interval includes the true value of the parameter. Endogeneity: Statistical term used to describe the situation in which, in a regression model (see Multiple regression model), the explanatory variable depends on the dependent variable, producing a loop of causality. Exogeneity: Absence of endogeneity (see Endogeneity). Linear predictor: See Multiple regression. Meta-analysis: Procedure that combines results from independent studies conducted on the same issue. The random effects meta-analysis accounts for heterogeneity among studies. Multiple imputation (MI): Procedure that creates k different data sets from the original one, by replacing missing values with values randomly drawn from an “appropriate” distribution. Each imputed data set is analyzed separately and the results are combined in order to obtain a result which accounts for the increased uncertainty due to the presence of missing data. Multiple regression model: Statistical model used for investigating the relationship between a dependent variable and several explanatory variables. The model assumes that the dependent variable is the sum of a linear predictor (a linear combination of the explanatory variables) and a random term centered around 0. Posterior distribution: Probability distribution of the parameter of interest, given the observed data. The posterior distribution arises from the combination of prior knowledge about the parameter and observed data (likelihood) through the Bayesian rule. Time series: Series of data measured at different instants in time. Residual autocorrelation: In the context of regression analysis on time series data (see Multiple regression model and Time series), correlation between values of the dependent variable observed at different times, even after accounting for the influence of the explanatory variables on the dependent variable itself. REFERENCES Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic, Control, 19, 716– 723 Allamani, A., Pepe, P., Baccini, M., Massini, G., & Voller, F. (2014). Europe. An analysis of changes in the consumption of alcoholic beverages: The interaction among consumption, related harms, contextual factors and alcoholic beverage control policies. Substance Use & Misuse, 49. Baccini, M., & Carreras, G. (2014). Analyzing and comparing the association between control policy measures and alcohol consumption in Europe. Substance Use & Misuse, 49. Box, G., & Jenkins, G. (1970). Time series analysis: Forecasting and control, San Francisco, CA: Holden-Day. Elder, R. W., Lawrence, B., Ferguson, A., Naimi, T. S., Brewer, R. D., Chattopadhyay, S. K., . . .Fielding, J. E. (2010). Task force on community preventive services. The effectiveness of tax policy interventions for reducing excessive alcohol consumption and

1553

related harms. American Journal of Preventive Medicine, 38, 217–229. Grittner, U., Gmel, G., Ripatti, S., Bloomfield, K., & Wicki, M. (2011). Missing value imputation in longitudinal measures of alcohol consumption. International Journal of Methods in Psychiatric Research, 20, 50–61. ¨ Karlsson, T., & Osterberg, E. (2001). A scale of formal alcohol control policy in 15 European countries. Nordic Studies on Alcohol and Drugs. English Supplement, 18, 117–131. Kennedy, P. (2003). A Guide to Econometrics (5th edn). Somerset, NJ: Wiley-Blackwell. Ledermann, S. (1956). Alcool, alcoolisme, alcoolisation. Donn´es scientifiques de caract´ere physiologique, e´ conomique et social. Paris: Presses Universitaires de France. Little, R. J. A., Rubin, D. B. (2002). Statistical analysis with missing data. New Jersey, NJ: Wiley. Nelson, J. P. (2010). Alcohol, unemployment rates and advertising bans: international panel evidence, 1975–2000. Journal of Public Affairs, 10, 74–87. Nelson, J. P. (2013). Meta-analysis of alcohol price and income elasticities—with corrections for publication bias. Health Economics Review, 3, 17. Nelson, J. P., Young, D. J. (2001). Do advertising bans work? An international comparison. International Journal of Advertising, 20, 273–96. R Core Team. (2013). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Raghunathan, T. E., Lepkowski, J. M., van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27, 85–95. Rehm, J., & Gmel, G. (2001). Aggregate time series regression in the field of alcohol. Addiction, 96, 945–954. Sornpaisarn, B., Shield, K., Cohen, J., Schwartz, R., & Rehm, J. (2013). Elasticity of alcohol consumption, alcohol-related harms, and drinking initiation in low- and middle-income countries: A systematic review and meta-analysis. IJADR, 2, 45–58. van Buuren, S., & Groothuis-Oudshoorn, K. (2011). MICE: Multivariate imputation by chained equations. R. Journal of Statistical Software, 45(3), 1–67. Voller, F., Maccari, F., Pepe, P., & Allamani, A. (2014). Changing Europe: Trends in social, economic demographic factors, alcoholic beverage drinking and prevention policies between 1960s and 2000s. Substance Use and Misuse, 49. Wagenaar, A. C., Salois, M. J., & Komro, K. A. (2009). Effects of beverage alcohol price and tax levels on drinking: a metaanalysis of 1003 estimates from 112 studies. Addiction, 104, 179–190 Xu, X., Chaloupka, F. J. (2011). The effects of prices on alcohol use and its consequences. Alcohol Res Health, 34, 236–245.

APPENDIX Combining Results Arising from the Multiple Imputed Data Sets

Indicating with Q the unknown parameter that we want to estimate, for example, the regression coefficient which expresses the effect of a policy measure on the total alcohol consumption, let Qˆ i and Uˆ i (i = 1, . . . ,5) be the point estimate of Q and its associated variance, deriving from the analysis on the ith imputed data set. Then, the estimate of Q arising from the MI can be obtained as the average of

1554

M. BACCINI AND G. CARRERAS

the 5 estimates Qˆ i :

variance, 5 Q¯ =

i=1

5

Qˆ i

2 1  ˆ Q i − Q¯ . 4 i=1 5

;

the variance of Q¯ can be estimated accordingly to the following formula:   1 B, T = U¯ + 1 + 5 where U¯ is the within-imputation variance (i.e., the average of the 5 estimates Uˆ i ), and B the between imputation

B=

The confidence intervals for √ Q can be obtained as usual ¯ from Qand its standard error T , assuming a normal distribution for Q¯ (the normality assumption being valid for quietly large sample sizes). Unless otherwise specified, all the results in this report were obtained accordingly this procedure.

Copyright of Substance Use & Misuse is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Analyzing the effect of selected control policy measures and sociodemographic factors on alcoholic beverage consumption in Europe within the AMPHORA project: statistical methods.

This paper describes the methods used to investigate variations in total alcoholic beverage consumption as related to selected control intervention po...
236KB Sizes 0 Downloads 3 Views