ISSN 0017-8748 doi: 10.1111/head.12457 Published by Wiley Periodicals, Inc.

Headache © 2014 American Headache Society

Research Submissions Statistical Testing of Association Between Menstruation and Migraine Mathias Barra, PhD; Fredrik A. Dahl, PhD; Kjersti G. Vetvik, MD

Objective.—To repair and refine a previously proposed method for statistical analysis of association between migraine and menstruation. Background.—Menstrually related migraine (MRM) affects about 20% of female migraineurs in the general population. The exact pathophysiological link from menstruation to migraine is hypothesized to be through fluctuations in female reproductive hormones, but the exact mechanisms remain unknown. Therefore, the main diagnostic criterion today is concurrency of migraine attacks with menstruation. Methods aiming to exclude spurious associations are wanted, so that further research into these mechanisms can be performed on a population with a true association. Methods.—The statistical method is based on a simple two-parameter null model of MRM (which allows for simulation modeling), and Fisher’s exact test (with mid-p correction) applied to standard 2 × 2 contingency tables derived from the patients’ headache diaries. Our method is a corrected version of a previously published flawed framework. To our best knowledge, no other published methods for establishing a menstruation–migraine association by statistical means exist today. Results.—The probabilistic methodology shows good performance when subjected to receiver operator characteristic curve analysis. Quick reference cutoff values for the clinical setting were tabulated for assessing association given a patient’s headache history. Conclusions.—In this paper, we correct a proposed method for establishing association between menstruation and migraine by statistical methods. We conclude that the proposed standard of 3-cycle observations prior to setting an MRM diagnosis should be extended with at least one perimenstrual window to obtain sufficient information for statistical processing. Key words: menstrual migraine, classification, statistical association, diagnostic criteria, Fisher’s exact mid-p, ROC curve analysis Abbreviations: ICHD International Classification of Headache Disorders, MRM menstrually related migraine, MO migraine without aura, OR odds ratio, PC probability menstrual migraine diagnostic criteria, PMM pure menstrual migraine, ROC receiver-operator characteristic, RR risk ratio (Headache 2015;55:229-240)

From the Health Services Research Center, Akershus University Hospital, Lørenskog, Akershus, Norway (M. Barra and F.A. Dahl); Institute of Clinical Medicine, University of Oslo, Oslo, Norway (K.G. Vetvik); Head and Neck Research Group, Research Centre, Akershus University Hospital, Lørenskog, Akershus, Norway (K.G. Vetvik). Address all correspondence to M. Barra, Health Services Research Center, Akershus University Hospital, Postboks 1000, Lørenskog, Akershus 1478, Norway, email: [email protected] Accepted for publication September 02, 2014. Conflict of Interest: None.

229

230 Migraine is a common, painful, and occasionally debilitating condition which adversely affects the quality of life and the economic productivity of both migraineurs and their families.1,2 The prevalence of migraine is similar between the sexes until adolescence.2,3 Between menarche and menopause migraine is 2 to 3 times more common in women than in men, adjusted for age,2,4-6 suggesting that female sex hormones play a role. Prospective diary studies have shown that female migraineurs experience an increased risk of attacks of migraine without aura (MO) during the (peri)menstrual window, ie, day −2 to day +3 of the menstrual cycle7-9 (where day 1 corresponds to the first day of bleeding). Furthermore, studies suggest that attacks occurring at menstruation are more painful, longer lasting, more disabling and less responsive to treatment compared to other migraine attacks.10,11 As a result of these observations, diagnostic criteria for menstrual migraine without aura were proposed.12,13 The criteria are placed in the appendix of the International Classification of Headache Disorders III beta version (ICHD-III beta), because more research is warranted. Menstrual migraine (MM) is classified as subtypes of 1.1. MO (see Table 1). It is debated whether menstrual migraine should be regarded as MO triggered by menstruation, or if it constitutes a distinct entity.14 The suspected culprit is the abrupt drop in estrogen levels normally occurring premenstrually.9,15 Other factors, eg, prostaglandin release,16 are also thought to play a role. The important feature of these diagnostic criteria here – given an established MO diagnosis – is that they are exclusively concerned with (1) concurrence of attacks and menstruation, and (2) the frequency of the attacks in relation to menstruation. One of the aims of the ICHD-III classification was to exclude women whose migraine attacks might be only spuriously associated with their menses.14 For researchers aiming to unveil the (possibly various) pathophysiological mechanisms of menstrual attacks, it is particularly important to exclude such cases. If not, the results could be diluted to the point where they are no longer detectable by an otherwise sufficiently powered test.

February 2015 Table 1.—Diagnostic Criteria for Menstrual Migraine According to the International Classification of Headache Disorders (ICHD) III Beta Version

A1.1.1.

A.

B.

A1.1.2.

A.

B.

Notes: *



Diagnostic Criteria (PMM):

Attacks, in a menstruating woman*, fulfilling criteria for 1.1. Migraine without aura and criterion B below Documented and prospectively recorded evidence over at least three consecutive cycles has confirmed that attacks occur exclusively on day 1 ± 2 (ie, days −2 to +3)† of menstruation* in at least two out of three menstrual cycles and at no other times of the cycle. Diagnostic Criteria (MRM):

Attacks, in a menstruating woman*, fulfilling criteria for 1.1. Migraine without aura and criterion B below Documented and prospectively recorded evidence over at least three consecutive cycles has confirmed that attacks occur on day 1 ± 2 (ie, days −2 to +3)† of menstruation* in at least two out of three menstrual cycles, and additionally at other times of the cycle. For the purposes of ICHD-III beta, menstruation is considered to be endometrial bleeding resulting from either the normal menstrual cycle or from the withdrawal of exogenous progesterone, as in the use of combined oral contraceptives or cyclical hormone replacement therapy. The first day of menstruation is day 1 and the preceding day is day −1; there is no day 0.

Definitions of A 1.1.1. Pure Menstrual Migraine (PMM) and A 1.1.2. Menstrually Related Migraine (MRM) according to the International Classification of Headache Disorders III Beta Version.13

Restricting inclusion criteria to only accept patients with pure menstrual migraine (PMM) for such studies would make the design robust in theory. Unfortunately, very few women present with PMM, thus rendering this strategy impractical. In a recent study, only 19 out of 141 (13.4%) of the study participants with menstrual migraine were categorized with PMM.17 Another possible weakness with the current definitions is how to meet and categorize a woman who presents eg, with a history of severe MO attacks during every second perimenstrual window (on average), and never at other times. Such a patient

Headache

231 test, we will report on the test’s sensitivity and specificity under various parameters, as obtained by experimenting with the simulation model. Finally, we will discuss possible shortcomings of the PC method, and briefly outline ongoing work to further improve the method.

does not fulfill the criteria for PMM (nor menstrually related migraine [MRM]) since attacks are too infrequent (< 2/3 of menstrual windows). Yet, the relation to her menstrual cycle appears clear, and clinical studies aiming to explore the pathophysiology of menstrual migraine could quite possibly benefit from including all women with a clear association. What we have is a classic case of the trade-off between sensitivity (ability to correctly identify cases of MRM) with specificity (the ability to correctly exclude non-MRM). However, what constitutes true negatives and true positives is also partly a matter of definition, and is relative to the current understanding of pathogenesis. If statistical methods are to be used, the problem at hand needs to be precisely formulated as a mathematical question. One attempt to mathematize and meet this challenge was made in the fairly* recent article by Marcus et al: A Prospective Comparison Between ICHD-II and Probability Menstrual Migraine Diagnostic Criteria.18 We commend the authors for undertaking this important task, but unfortunately the mathematics presented there are flawed. The concept of a P-value appears as if confused with that of a probability, and the binomial distribution is incorrectly applied where the hypergeometrical distribution correctly describes the model. This paper is organized as follows: In Methods and Analysis, first we state the mathematical assumptions necessary for the formulation of the Probability Menstrual Migraine Diagnostic Criteria (PC) from the previously published paper.18 This yield a mathematical model of MRM, which may be implemented as a simulation model on standard computer software. We then explain in some detail why and how the mathematics behind the previously proposed PC18 fails. Next, a statistical test which is appropriate for investigating association between menstruation and migraine under the assumed MRM model is developed. The corrected PC is tested on a panel of simulated MRM patients in adherence to the previously developed MRM model. In addition to stating the properties of the

METHODS AND ANALYSIS Mathematical Formulation of MRM.—The original18 PC is based on counting and classifying migraine days into 2 disjoint categories: menstrual migraine days and non-menstrual migraine days, and diagnosing MRM only if the association is statistically significant at some predefined level α. As such, the PC is a family of criteria – one for each 0 < α < 1. After observing a patient with suspected MRM for a period of N days one thus obtains an observation consisting of a pair of numbers† k and n. Here n is the total number of migraine days, and k ≤ n is the total number of menstrual migraine days. An assumption made for the PC is that, for any given patient, the occurrence of a migraine on a given day is independent of the history of migraine up to that day. Mathematically this means that for each individual i and day d, there is a probability μi,d of observing a migraine day, and this probability depends on i and d only. Moreover, we know the number K of menstrual days out of the N days of observation. Since we want to identify women with migraine attacks which are associated with their menstruation, we need to formulate a model for women whose migraine attacks have no such association. To this end, we follow Marcus et al18 in assuming independence of such women’s attack patterns, so that the probability of migraine on a given day is simply μi (ie, we may omit the subscripted d). We will also omit the subscript i in the following. The aim now becomes to devise a statistical test which may reject the null-hypothesis that a woman’s migraine pattern is a Bernoulli trial‡. The test proposed in by Marcus et al (p. 542) is the following:

*The paper is 4 years old, is not yet widely cited and to our best knowledge, only 1 paper14 reports on the actual methods proposed therein.

†These are called r and N in.18 ‡A Bernoulli trial is a series of independent observations with a dichotomous outcome.

232

February 2015

1. Compute the fraction p = K/N; 2. then compute the number

⎛ n⎞ n−k t ( n, k, p) = ⎜ ⎟ pk (1 − p) ; ⎝ k⎠

(1)

3. reject the null-hypothesis at the α-level when t(n, k, p) < α. We assume that the somewhat unusual notation

⎛ n ⎞ is intended to represent the binomial coefficient ⎜⎝ ⎟⎠ k n! ⎛ n⎞ = . The notation might instead be inter⎝ k ⎠ k !( n − k )! preted as a fraction, but in that case we are unable to provide a statistical meaning of the formula. Under the binomial coefficient interpretation, the formula gives the probability of observing exactly k successes in a Bernoulli trial of size n with probability p of success:

n n−k X ~ Binom ( n, p) ⇒ P ( X = k ) = ⎛ ⎞ pk (1 − p) . ⎝ k⎠ So, for n, k and p = K/N, the number t(n, k, p) is the probability of observing exactly k menstrual day migraines out of a total of n migraine days, if each migraine day has a p = K/N probability of being classified as menstrual. When we perform tests of statistical significance, however, we are not concerned with the probability of the specific outcome, as this will usually be small, simply because the number of possible outcomes is large. When we compute a P-value, we evaluate the probability of getting an outcome which is at least as extreme as the one observed. We trust that the intention behind the formula was to sum the values t(n, i, p) for all i greater than or equal to the observed one, although this was not mentioned. Even with this benevolent interpretation, however, the formula is inaccurate because it assumes that the menstrual days are drawn with replacement. In reality, a migraine day occurrence on a menstrual day blocks other migraine occurrences from happening on that day. Therefore, the correct probability distribution is the hyper-geometrical one, with probability function:

⎛ K⎞ ⎛ N − K⎞ ⎝k ⎠ ⎝ n − k ⎠ . θ ( k, n, N , K ) = ⎛ N⎞ ⎝ n⎠

(2)

Statistical tests of this sort are usually represented through 2 × 2 contingency table of the form

Non-migraine Migraine

Menstrual Non-menstrual K −k N − (K + n − k) k n−k

where K is the number of menstrual days during the observational period, k is the observed number of migraine attacks on menstrual days, n is the total number of observed migraine days (whence n − k is the number of migraine attacks not on menstrual days), and where N is the length (total number of days) of the observational period. Note that the sum of the table is N. There exists a large body of literature on exact tests for 2 × 2 tables. An excellent and recent review is provided by Lydersen et al.19 Because the number of migraine days is itself a random variable, the Fisher’s exact test can be applied with so-called mid-p correction for greater power (see19-21). The formula for the one-sided, mid-p corrected, P value is exact, and readily computable as

pˆ ( k, n, N , K ) =

∑(

k ≤ i ≤ min n, K )

θ ( i, n, K , N ) −

θ ( k, n, K , N ) 2

for θ as defined by equation (2) above. The “mid-p” name derives from the subtracted term θ(k, n, K, N)/2, which increases power without compromising the size of the test.19 The above argument also implies that under the null hypothesis of non-association with menstruation and independence of daily migraine probability, the Fisher’s exact test is conservative, whence the fraction of false positives is strictly less than the test’s significance level. In the present context, we are only concerned with the possibility that migraine be positively associated with menstruation. This leads to one-sided testing. Given a predefined significance level α, a one-sided test for association of migraine attacks with menstruation is thus to reject the null-hypothesis in favor of the alternative hypothesis of association in the case pˆ ( k, n, K , N ) < α , since this indicates that there are

Headache relatively too many migraine days on menstrual days, and that this will only be expected to happen less than a fraction α of the times under the null-hypothesis. The power of our test – ie, the test’s ability to correctly classify women with MRM as having an association between their menses and their migraine attacks – is dependent upon several factors. The most important ones are the underlying migraine frequency μ, the length N of the observational period, the number K of menstrual days, and the distinct probability μ′ of having migraine on menstrual days. Under MRM, we have μ′ > μ. In the next section, we report on simulation-model experiments which establish empirical receiver-operator characteristic (ROC) curves22,23 for this test with respect to realistic (empirical) values for the parameters N, μ and μ′. Experiments.—As pointed out above, the test is conservative with respect to its significance level. Furthermore, it is easy to compute directly that any woman with a PMM diagnosis, set according to the criteria in Table 1, will have an association at a significance level < .02 (this means that PMM will be subsumed by PC). Exact values for the specificity of the test depend upon the parameters of the alternative hypothesis: the baseline migraine probability μ, the elevated migraine probability μ′ on the menstrual days, and the length N of the observational period (number if days observed). We therefore set up a simulation model in the statistical software R24 in order to obtain empirical estimates. Our simulation model is defined w.r.t. the two parameters μ and ρ, where μ′ will be computed as the product μ·ρ. The model simulates the observation of one patient with a S-day long cycle for a period of N days. On each simulated day exactly 1 of 4 observations is possible. By a 2-step procedure, it is first recorded if the current day is a menstrual day. In step 2, a random draw determines if the patient will have a migraine attack or not. When the current day is not a menstrual day, then a probability of μ is assigned to the outcome of observing a migraine attack. This μ is the baseline migraine frequency. If the day is a menstrual day, an adjusted probability of μ ′ = ρ ⋅ μ is employed instead. The parameter ρ is thus interpretable as a risk ratio (RR), since we obtain that ρ = μ′/μ is the ratio of the probability of

233 migraine attack on a menstrual day to the probability of a migraine attack on a regular (non-menstrual) day. Such a simulation is set up with an observational length N. The average cycle length is S = 29 days (day 1–day 1), and this value was used in the simulations reported on here. Since there are 5 menstrual days per 29 days in an observational period of N days, if N is eg, 87 = 3·29, we will have K = 3·5 = 15 menstrual days and 72 = 87 − 15 non-menstrual days. Hence, the expected result of one simulation run may be summarized as

Menstrual Non-menstrual Non-migraine K −k N − (K + n − k) ( ) Migraine E k = 5μ ⋅ ρ E ( n − k ) = 24 μ where n is the total number of migraine days, k is the number of migraine days on menstrual days – menstrual migraines – and N and K are as described above. The test is quite sensitive to the number of cycles observed, and we return to this point in the Results and Discussion. In order to employ the test, one would need data on a patient’s migraine pattern. This is quite often collected via migraine diaries, where women record their migraine attacks in relation to their menstrual days – indeed one of the changes proposed in the ICHD-III beta13 is to demand that a migraine diary is required prior to diagnosing MRM. Clinical experience suggests that it is difficult to have patients record this type of data for periods exceeding 100–110 days. There are some natural constraints imposed on the parameters μ and ρ by the MRM context. The first constraint is (1 − μ·ρ)5 > 1/3. This is so, because the term to the left is the probability of observing 5 consecutive days without a migraine, during the menstrual window. Hence, a woman with a lower aggregated probability of experiencing migraine attacks on her menstrual days would (probably) not be suspected of having MRM, since she would be expected to have attacks on her menstrual days during fewer than 1 − 1/3 = 2/3 of her cycles. Solving the inequality for μ·ρ, we obtain that this value should exceed 1 − 5 1 3 ≈ .20 . Second, it is natural to restrict μ·ρ away from 1, since the contrary would lead to simulations of women who always have an attack on

234 every menstrual day. While this remains a possibility, we believe that for such women the link between menstruation and migraine is sufficiently clear. Our test will be adequate also for such women; however, we will not simulate such patients. In order to decide on realistic values for the baseline migraine rate μ, we were forced to make some semi-arbitrary choices. A baseline rate range of 0.020.40 seems reasonable. We therefore populated the simulation-model with a beta-distribution supported on this range, and with an expected migrainefrequency μ of approximately 0.13.This means that the typical simulated woman will have migraine attacks on just over 3 non-menstrual days per cycle. Of course, some simulated women will have more frequent attacks, while others will have them more rarely. The RRs for the simulation runs were also generated by a beta distribution. Parameters for the RR distribution were tuned by a manual binary search, until the authors deemed output as having good face validity. The simulation runs then consistently produced pooled RRs near those reported in the MRM literature (1.45 in9 and 2.5 in1), and this is also consistent with reported odds ratios (1.91 in,11 1.66 in,25 2.01 in7). See Results for the actual figures. Finally, in each individual migraine diary simulation, if the combination μ·ρ fell outside the range 0.20–0.90, then the RR was increased or decreased until 0.20 ≤ μ · ρ ≤ 0.90. The final parameter to be specified is the observation length N. Preliminary testing revealed that, as expected, the test’s performance is rather insensitive to the length of the cycles in the normal range of 25–35 days, that is to say, the test’s performance changes very little. On the other hand, the test is quite sensitive to the number of cycles. Intuitively, this is a consequence of what the test is designed to do: detecting the signal from the RR’s effect on the underlying baseline migraine frequency during the 5-day menstrual window. We therefore ran simulations to investigate how the test’s performance was improved by prolonging the observational period to include an extra menstrual window. In the clinical setting, this would entail instructing women to start their diary on the first day of the next menstruation – recording any migraine attacks of the previous 2 days in order to

February 2015 cover days −1 and −2 – and then logging migraines until the end of the fourth menstrual window. Note that this entails prolonging the observational period by only 5 days instead of by a full menstrual cycle. In what follows, we refer to this as a tailed observational period and denote an observational period of, eg, 3 cycles with an additional 5 menstrual days as “N = 3 + 1,” and eg, 4 full cycles (without a tail) as “N = 4 + 0.” We simulated runs of 1 million women with randomly drawn baseline migraine frequencies and menstrual day RRs from the above distributions, and analyzed the test’s performance by ROC curves.22,23 We also experimented with simulation runs of more extreme parameters to test performance in different ranges of the parameters. All simulations were implemented in the statistical programming environment R version 3.0.1 (2013–05–16).24 Obviously, given a fixed number k of observed menstrual migraine days, the statistical probability of an association between menstruation and migraine declines when the number of migraine attacks n − k on non-menstrual days increase. We therefore computed (see section 3.2) exact quick-reference tables of the maximal number of non-menstrual migraine days permissible for various combinations of observed menstrual migraine days, specificity of the test for association and length of observational period.

RESULTS Test Performance.—In the simulation model, a baseline migraine rate of μ = 0.129 and ρ = 2.301 yields a menstrual day rate of 0.129 · 2.301 = 0.297, which means that the typical simulated patient will experience 5 · 0.297 = 1.484 migraine-days per 5-day menstrual window, which does not appear inconsistent with the MRM literature we have reviewed in the introduction. In the Figure, we see the results of a simulation run with N = 3 · 29 + 5 = 92 days of observation, (what we dub an “N = 3 + 1” observational period). The Figure shows the ROC curve of the simulation run for a “3 + 1” diary. Using the standard classification26 of clinical tests by ROC curve area, we see that our test rates good (0.80–0.90) for this particular

Headache

235

Figure.—(a)–(c) shows simulation results after 106 iterations of simulated “3 + 1” (92 days) observational periods with randomly drawn μ and ρ. (a) shows the empirical receiver-operator characteristic (ROC) curve; (b) and (c) show the underlying distributions of the two parameters μ and ρ.

observational length, and that we obtain approximately 70–75% sensitivity in the simulated population if we are willing to accept 25–30% false positives. The histograms (Fig. b,c) are presented to convey the modeled patient population. Note how the pooled RR of the 106 simulation runs at 1.952 is consistent with frequently cited RRs, while the mean taken over the 106 simulations’ individual RRs at 2.301 is slightly higher. In Table 2, the ROC curve areas and parameter means for various observational lengths are summarized. The results presented in Table 2 are based on a patient population with a random mixture of patients with regard to their (simulated) baseline migraine frequencies and RRs. In Table 3, the same summary statistics are shown for selected simulation runs where the μ and ρ have been kept constant. This table

serves to highlight which combinations of parameters – which segments of the patient population – for which we can expect better or poorer test properties. Cutoff Tables.—For easy reference in a clinical setting, we here provide tabulated critical values (Table 4) for the maximal number of non-menstrual migraine days permissible, given a prespecified significance level and observational length. In a worked example, we demonstrate the use of the table, and give a caveat for situations when cycle lengths deviate substantially from the mean of 29 days. Table 4 shows cutoff values based on Fisher’s exact mid-p values for observational periods of cycle length 29 days. This means that for women with substantially longer [shorter] cycles, the tabulated values may be too small [large].

236

February 2015 Table 3.—ROC Curve Area Analysis (Fixed Parameters)

Table 2.—ROC Curve Area Analysis

N

ROC

μ

RR

Est. RR

Pooled RR

2+0 2+1 3+0 3+1 4+0 4+1 5+0 5+1 6+0 6+1

0.732 0.767 0.776 0.802 0.809 0.828 0.834 0.849 0.853 0.867

0.128 0.129 0.129 0.129 0.129 0.128 0.129 0.129 0.129 0.129

2.301 2.300 2.300 2.301 2.298 2.302 2.301 2.300 2.301 2.301

2.651 2.674 2.604 2.597 2.546 2.547 2.506 2.490 2.474 2.473

1.949 1.950 1.953 1.952 1.952 1.950 1.952 1.951 1.950 1.952

The table demonstrates the added power derived from longer observational periods. Summary of simulation runs: Each simulation run entailed simulating 106 patients for the number of days specified in the row header. Migraine baseline frequencies and risk ratios (RRs) were drawn at random from betadistributions as described in the Experiments section. On the right the receiver-operator characteristic (ROC) curve areas are plotted against the observational lengths. Note how almost all improvement in power is derived from adding a tail.

Example 1 (Use of cutoff table) Suppose a patient’s migraine diary yields the following 2 × 2table after a tailed 3 cycle observational period (“N = 3 + 1”):

Menstrual Non - migraine 13 Migraine 7

Non - menstrual 70 . 14

Consulting the cutoff table (Table 3) shows that under 3 + 1 periods and 0.10 specificity, the maximal number of permissible non-menstrual days (lower right of the 2 × 2 table) is 15. Since we observed only 14 such days, the association would be established at the 0.10 level. For a 0.05 level the maximal number of days is 13, so this patient should be excluded at the 0.05 level. Scrutinizing the numbers though, we see that the total number of observational days is

13 + 7 + 70 + 14 = 104 > 92 = 29 ⋅ 3 + 5 indicative of a woman with a relatively long 33-day cycle. Recall that the table is based on 29-day cycle length. Hence, there are a total 12 extra possibilities for non-menstrual migraine-days outside the K = 20 days pertaining to the four menstrual windows of the observational period. This may mean that the tabu-

N=3+1

Esttimated

Pooled

ROC

μ

RR

RR

RR

Quality†

0.942 0.978 0.863 0.951 0.755 0.900 0.629 0.824 0.500 0.728

0.050 0.050 0.100 0.100 0.150 0.150 0.200 0.200 0.250 0.250

5.000 7.000 2.500 3.500 1.667 2.333 1.250 1.750 1.000 1.400

5.854 7.587 2.915 4.033 1.833 2.562 1.332 1.860 1.047 1.467

5.015 6.992 2.502 3.496 1.670 2.336 1.251 1.749 0.999 1.402

Excellent Excellent Good Excellent Fair Excellent Poor Good Poor Fair

Receiver-operator characteristic (ROC) curve areas for fixed (non-random) µ and risk ratio (RR) for simulation runs of 105 women. The table shows improved quality of test for larger RRs. †rated as per.26

lated values are too low. Indeed, computing the exact mid-p value shows that the exact value is 0.044, indicating inclusion. This example shows that when cycles are several days longer (or shorter) than 29 days, exact values should be computed when the number of nonmenstrual migraine days is close to the tabulated value.

DISCUSSION We have pointed out mathematical errors in the previously published Probability Menstrual Migraine Diagnostic Criteria.18 Because the use of the formula from18 yields a value which is not a P-value – even though the contrary is stated – the use of values thus computed may lead to erroneous conclusions about association between migraine attacks and menstrual patterns. Even when the appropriate summation is performed, the binomial distribution should be replaced by the hyper-geometrical distribution. We therefore conclude that the framework for investigating statistical association between menstrual cycles and migraine attacks in possible MRM patients presented in18 should not be implemented by researchers. In this article, we have corrected these flaws and described the performance of the resulting test in

Headache

237 Table 4.—Cutoff Values Table

3+0

MMD

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

3+1

4+1

0.25

0.20

0.10

0.05

0.25

0.20

0.10

0.05

0.25

0.20

0.10

0.05

6 10 14 18 23 27 32 36 41 46 51 57 62 69 / / / / / / / / / /

5 9 13 17 21 25 30 35 40 45 50 55 61 68 / / / / / / / / / /

3 6 9 13 17 21 26 30 35 40 45 51 57 64 / / / / / / / / / /

2 4 7 10 14 18 22 26 31 36 41 47 53 61 / / / / / / / / / /

/ 7 10 14 17 20 23 27 30 34 37 41 45 48 52 56 60 65 70 / / / / /

/ 6 9 12 15 19 22 25 29 32 36 39 43 47 51 55 59 64 69 / / / / /

/ 4 7 9 12 15 18 21 25 28 32 35 39 43 47 51 56 60 66 / / / / /

/ 3 5 7 10 13 16 19 22 25 28 32 36 39 43 48 52 57 64 / / / / /

/ / 11 14 18 21 25 28 32 36 39 43 47 51 55 58 62 66 71 75 79 84 88 94

/ / 10 13 16 20 23 27 30 34 37 41 45 49 53 57 61 65 69 73 78 82 87 93

/ / 7 10 13 16 19 22 26 29 33 36 40 44 48 52 56 60 64 69 73 78 84 90

/ / 5 8 10 13 16 19 22 26 29 33 36 40 44 48 52 56 60 65 70 75 80 87

The golumn group heading indicate observational period length; the column heading gives the level of specificity. A “/” indicates that the combination is impossible; eg, under a “3 + 1” observational period the maximal number of possible menstrual migraine days is 20 = 4 × 5.

some detail. Contrary to Marcus et al,18 we find that such a probabilistic approach deteriorates quickly with decreased observational length, in particular for patients with a weak association. This discrepancy may be due to the fact that in the previously published paper18 the P-values may have been incorrectly computed, which could lead to severe overestimation of the test’s power on data derived from shorter observational periods. The choice of the erroneous binomial distribution is of minor impact here, it is the lack of the appropriate summation which leads to figures off by up to several orders of magnitude. Our MRM simulation model’s output indicate that it would be a great advantage to instruct women to maintain their diaries for at least four menstrual windows, albeit not necessarily for four full months (what we have dubbed a “3 + 1” period).

Statistical intuition (and extensive empirical testing on our behalf) reveals that the test’s specificity first and foremost depends on the number of 5-day menstrual windows occurring during the observational period, and in a much lesser degree on cycle length. This is also the reason why a “n + 1” observation performs substantially better than a “n + 0” observational period even though the extra effort invested in data collection is relatively small (cf. Table 2). On the other hand, the test rates excellent (ROC curve area > 0.90) for women with a low or moderate baseline migraine frequency coupled with a high RR for migraine during the menstrual window: for women with a RR > 3 simulation runs yield ROC-curve areas of excellent (> 0.90) test performance. This means that even for a strict inclusion regime where specificity is set

238 to 0.05 (or even lower) the prospective participants with a strong association are unlikely to be excluded by the PC as we have (re)defined it here. The average baseline migraine probability employed (0.13) is slightly elevated compared to what would be expected in the general population. We still chose this rate for the following two reasons. First, our test’s context is a population in which migraines are frequent to the point where the association with menstruation is not clear. Second, we consider this as erring on the right side: a lower baseline risk μ improves our test’s performance, since it will in general be paired with a higher ρ, which, as we have demonstrated, improves our test’s performance. There is one serious issue with this type of test which merits some attention: the assumption of independence. That the migraine probability on a given day is not affected by the previous day(s) migraine attack history is likely overly simplistic. This has implications for the test’s performance, in particular when it comes to specificity. If migraine attacks have a strong tendency to cluster together – ie, attacks which last for “days on end” – then attack patterns which are independent of the menstrual cycle may appear to be associated more often than they should, thus compromising the size of the test. Other factors such as recurrence of attacks and post-dromal effects would compromise both the test’s sensitivity and specificity. One way to account for dependencies is to construct a more elaborate model with additional parameters to μ and ρ. A natural extension is to include a parameter δ which, once a migraine attack has begun, determines when it will end. This would result in a Markov-chain type model. If one is to pursue such a path, different statistical tools will be needed. When the assumption of independence is violated the hypergeometrical distribution can no longer be applied. Another direction for the refinement of the model is to alter the basic unit of time: the day. Such an approach could allow detection of association to be more sensitive for women whose attacks on menstrual days are generally longer in hours but not in days. One possible unit of time could, eg, be 6-hour periods, thus subdividing the day into four. Suitably tuning the parameters and rerunning the simulations showed (as expected) that the performance of our

February 2015 test was unaltered. However, we do have caveats before implementing such a refinement. First, it may introduce practical problems such as lower real accuracy and compliance of respondents due to the added complexity. Another pitfall with this refinement is tied to the assumption of independence of attacks. As pointed out, this assumption is problematic. A little thought also shows that this assumption’s validity correlates negatively with the length of the unit of time. In the extremes, this is most readily appreciable: the probability of experiencing a migraine attack on 2 consecutive weeks might actually be independent, while the probability of experiencing a headache in the next second is but determined by the current state of affairs. Hence, the question, ubiquitous in all modeling, which must be answered empirically prior to adopting the PC as a standard is not if the model faithfully represents reality, but rather if it represents reality sufficiently well for the task at hand. We only remark that such a time refinement could of course be combined with a more elaborate two-parameter model as outlined above. For the time being, if a true association is paramount for study design, we encourage the use of the PC – as presented in this work; not as originally proposed – as a method to exclude patients whose menstruation–migraine correlation might be spurious. Of course, an important step before integrating this methodology in official diagnostic criteria will be to validate the PC against clinical data.

STATEMENT OF AUTHORSHIP Category 1 (a) Conception and Design Mathias Barra; Kjersti G Vetvik; Fredrik A Dahl (b) Acquisition of Data N/A (c) Analysis and Interpretation of Data Mathias Barra; Kjersti G Vetvik; Fredrik A Dahl Category 2 (a) Drafting the Manuscript Mathias Barra (b) Revising It for Intellectual Content Mathias Barra; Kjersti G Vetvik; Fredrik A Dahl

Headache Category 3 (a) Final Approval of the Completed Manuscript Mathias Barra; Kjersti G Vetvik; Fredrik A Dahl

REFERENCES 1. MacGregor EA, Brandes J, Eikermann A, Giammarco R. Impact of migraine on patients and their families: The Migraine and Zolmitriptan Evaluation (MAZE) survey-phase III. Curr Med Res Opin. 2004;20:1143-1150. 2. Hutchinson SH, Peterlin BL, eds. Menstrual Migraine. New York: Oxford University Press; 2008. 3. Bille B. A 40-year follow-up of school children with migraine. Cephalalgia. 1997;17:488-491. 4. Lipton RB, Bigal ME, Diamond M, Freitag F, Reed ML, Stewart WF. Migraine prevalence, disease burden, and the need for preventive therapy. Neurology. 2007;68:343-349. 5. Stewart WF, Wood C, Reed ML, Roy J, Lipton RB. Cumulative lifetime migraine incidence in women and men. Cephalalgia. 2008;28:1170-1178. 6. Buse DC, Loder EW, Gorman JA, et al. Sex differences in the prevalence, symptoms, and associated features of migraine, probable migraine and other severe headache: Results of the American Migraine Prevalence and Prevention (AMPP) study. Headache. 2013;53:1278-1299. 7. Stewart WF, Lipton RB, Chee E, Sawyer J, Silberstein SD. Menstrual cycle and headache in a population sample of migraineurs. Neurology. 2000; 55:1517-1523. 8. MacGregor EA, Hackshaw A. Prevalence of migraine on each day of the natural menstrual cycle. Neurology. 2004;63:351-353. 9. MacGregor EA, Frith A, Ellis J, Aspinall L, Hackshaw A. Incidence of migraine relative to menstrual cycle phases of rising and falling estrogen. Neurology. 2006;67:2154-2158. 10. MacGregor EA, Victor TW, Hu X, et al. Characteristics of menstrual vs nonmenstrual migraine: A post hoc, within-woman analysis of the usual-care phase of a nonrandomized menstrual migraine clinical trial. Headache. 2010;50:528-538. doi: 10.1111/j.15264610.2010.01625.x. 11. Pinkerman B, Holroyd K. Menstrual and nonmenstrual migraines differ in women with menstruallyrelated migraine. Cephalalgia. 2010;30:1187-1194. doi: 10.1177/0333102409359315.

239 12. Headache Classification Subcommittee of the International Headache Society. The International Classification of Headache Disorders: 2nd edition. Cephalalgia. 2010;24(Suppl. 1):9-160. 13. Headache Classification Subcommittee of the International Headache Society. The International Classification of Headache Disorders: 3rd edition (beta version). Cephalalgia. 2013;33:629808. 14. MacGregor EA. Classification of perimenstrual headache: Clinical relevance. Curr Pain Headache Rep. 2012;16:452-460. doi: 10.1007/s11916-0120282-y. 15. Somerville BW. The role of estradiol withdrawal in the etiology of menstrual migraine. Neurology. 1972;22:355-365. 16. Nattero G, Allais G, De Lorenzo C, et al. Relevance of prostaglandins in true menstrual migraine. Headache. 1998;29:233-238. 17. Vetvik KG, MacGregor EA, Lundqvist C, Russell MB. Prevalence of menstrual migraine: A population-based study. Cephalalgia. 2014;34:280288. doi: 10.1177/0333102413507637. 18. Marcus DA, Bernstein CD, Sullivan EA, Rudy TE. A prospective comparison between ICHD-II and probability menstrual migraine diagnostic criteria. Headache. 2010;59:539-550. doi: 10.1111/j.15264610.2010.01627.x. 19. Lydersen S, Fagerland MW, Laake P. Recommended tests for association in 2 × 2 tables. Stat Med. 2009;28:1159-1175. doi: 10.1002/sim .3531. 20. Berry G, Armitage P. Mid-P confidence intervals: A brief review. The Statistician. 1995;44:417423. 21. Hwang JTG, Yang M-C. An optimality theory for mid p-values in 2 × 2 contingency tables. Statist Sin. 2001;11:807-826. 22. Metz CE. Basic principles of ROC analysis. Semin Nucl Med. 1978;VIII:282-298. 23. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29-36. 24. R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0, (Available at: http://www.Rproject.org/).

240 25. Johannes CB, Linet MS, Stewart WF, Celentano DD, Lipton RB, Szklo M. Relationship of headache to phase of the menstrual cycle among young women: A daily diary study. Neurology. 1995;45: 1076-1082.

February 2015 26. Sicignano A, Carozzi C, Giudici D, Merli G, Arlati S, Pulici M. The influence of length of stay in the ICU on power of discrimination of a multipurpose severity score (SAPS).ARCHADIA. Intensive Care Med. 1996;22:1048-1051.

Copyright of Headache: The Journal of Head & Face Pain is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Statistical testing of association between menstruation and migraine.

To repair and refine a previously proposed method for statistical analysis of association between migraine and menstruation...
194KB Sizes 2 Downloads 5 Views