International Journal of

Radiation Oncology biology

physics

www.redjournal.org

EDITORIAL

Breast Brachytherapy Versus Whole-Breast Irradiation: Reported Differences May Be Statistically Significant but Clinically Trivial Robert R. Kuske, MD,* and S. Stanley Young, PhDy *Arizona Breast Cancer Specialists, Scottsdale, Arizona; and yNational Institute of Statistical Sciences, Research Triangle Park, North Carolina Received Dec 3, 2013. Accepted for publication Dec 4, 2013. The article by Smith et al in this issue of the Red Journal (1) represents an extension of their prior work in this area (2), with the addition of patients treated by lumpectomy alone and presenting outcomes by modified ASTRO guidelines. Like the earlier study, which used a full Medicare sample, this current analysis using the SEER-Medicare database has significant limitations that, in our opinion, prevent the data from being practice changing. Indeed, despite the wellrecognized statistical flaws in this type of analysis, the very small absolute differences between these 2 nonrandomized treatments could inspire women to choose the more convenient 5-day alternative that exposes less normal tissue to ionizing radiation. This study is based on “national average” outcomes, a methodology that is widely criticized in the literature. Making valid claims from observational data sets is widely acknowledged as problematic (3). Recent evidence indicates that, when claims coming from observational studies are rigorously retested, for example, in randomized clinical trials, they seldom replicate. Ioannidis (4) noted that only 1 in 6 claims coming from highly cited observational studies replicated when retested. Young and Karr (5) found 12 papers in which 52 hypotheses suggested from observational studies were tested in randomized clinical trials. None of the claims were confirmed in the expected direction; 5 were significant but in the wrong direction. Somewhat ominously, it is reported that even experimental biology claims fail to replicate 75% to 90% of the time (6, 7). Poor reproducibility of research is even being reported in the mainstream media (8). Obviously, great care needs to be exercised in the analysis of observational studies, and readers need to be wary of reported claims and conclusions. The time period over which the data were collected (20022007) represents the early period of applicator-based accelerated

Reprint requests to: Robert R. Kuske, MD, Arizona Breast Cancer Specialists,10460 North 92nd Street, #101, Scottsdale, AZ 85258. Tel: þ1 480 922 4600; E-mail: [email protected] Int J Radiation Oncol Biol Phys, Vol. 88, No. 2, pp. 266e268, 2014 0360-3016/$ - see front matter Ó 2014 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ijrobp.2013.12.002

partial breast irradiation (APBI), and was limited to single-lumen balloon applicators. Breast surgeons and radiation oncologists were just entering the learning curve after the FDA approved the device in 2002. It is well known in medicine that the initial years after introduction of a new method or technology may not reflect the point further down the learning curve when the practice/art improves. Imagine if heart transplant surgery had been condemned after early failure rates and high complications. Most notably, the technology and dosimetry limits have vastly improved since 2007. Narrow skin spacing between the skin surface and the closest point of the balloon is associated with very high skin doses, complications, and poor cosmesis. Doseevolume constraints are now incorporated in newer applicators with dosesculpting capability unavailable in the years of this study. As a result, more recent data demonstrate lower brachytherapy-related morbidities. Logically, any comment on an observational study starts with the quality of the data and the treatment of the data even before analysis begins. Subtle nuances in the decision making of abstracting raw data, such as the time era, patient age, and use of endocrine therapy make the outcomes not usable by modern brachytherapists and breast surgeons. Here we are dealing with Medicare payment records rather than medical healthcare records, so imprecision and ambiguity are expected. Recent research from the Observational Medical Outcomes Partnership (OMOP) (9) delves into seemingly simple things that, at the time of their initial research, were thought to be completely unimportant: How were the raw data converted into the data file that was subjected to statistical analysis? Consider the statement “.evaluate all women in the country age 67 years with fee-for-service Medicare Conflict of interest: none.

Volume 88  Number 2  2014 coverage treated with lumpectomy for invasive breast cancer from 2002-2007.” A number of choices go into that decision: age, feefor-service, invasive breast cancer, and years of coverage. OMOP in its research systematically tried multiple combinations of these decisions. For example, they might use several cut points of 60, 63, 66, 69, and 71 years for age in combination with invasive/noninvasive, and different start and end years of coverage. Indeed, the choices changed from initial article by Smith et al to this one, including lumpectomy alone and adding ductal carcinoma in situ (DCIS). Using a computer farm, all combinations were used to create hundreds of “analysis data sets.” Each analysis data set was then subjected to a number of standard statistical analysis methods. Unexpectedly, the results ranged widely from “protection” to “harm” as a function of how the data were staged. Risk ratios varied from 0.50 (protection) to 2.00 (harm) or even wider. OMOP researchers comment that a risk ratio needs to be 3 or 4 before causation should be considered. If experts use intimate knowledge of the subject matter to stage the data, they can select 1 of the many possible analytical data sets. At a minimum, a systematic sensitivity analysis should be conducted to better understand the range of answers that are possible from the data. Important data sets should be available so that the robustness of claims can be examined. Without access to the data set, the process comes down to “trust me science,” meaning that the reader has to trust that all data quality and data handling and decisions were sufficiently correct that claims would not change if alternatives were used. For example, interstitial brachytherapy was the only APBI method for 11 years before 2002, and treats 2 cm of tissue rather than 1 cm with balloon catheters; and those years were excluded. The reader has to hope that some sort of exploratory analysis did not guide the choices to a particular conclusion. Let us now turn to statistical analysis. Assuming that the base method of statistical analysis is sound, most likely true, there is the important issue of how many statistical tests were computed. How many questions are at issue? With very large, complex observational data sets, it is common to test many questions. Smith et al had their pick of a number of endpoints. Death is a common and unambiguous endpoint. Mastectomy is a more ambiguous endpoint, and as they and others have noted, the Medicare records do not note whether it is the same breast as the original tumor. Furthermore, there are many noncancer reasons women may have had a mastectomy: false-positive MRI findings (an issue with balloon brachytherapy during the early years when rim enhancement was misinterpreted by radiologists); compromised cosmesis as a result of an inordinately high skin dose before dose limits (NSABP/RTOG quality assurance) were known; significant breast asymmetry/volume loss, with plastic surgeons offering to “fix the appearance of the breast”; persistent seroma that is now unacceptable as an indication for mastectomy; uncontrolled pain; or subsequent positive genetic susceptibility (BRCA) testing. One unknown of public health significance is the number of women between 2002 and 2007 who had unnecessary mastectomies because they did not have access to brachytherapy APBI, and could not travel 6 to 7 weeks for whole breast irradiation (WBI) or rejected that option. Multiple endpoints were examined, each with 1 or 2 statistical methods. Was the statistical analysis protocol pre specified, as would have been the case in a clinical trial? And was this analysis protocol filed before the study was conducted, again as would be the case with a randomized clinical trial?

Breast brachytherapy

267

Another multiplicity question relates to statistical models used to attempt to equalize the risk profile of the 2 groups. Multiple linear regression can be used to try to make the 2 groups comparable for risk factors. We count 15 covariates that can be used for adjustments so a very large number of models are possible, 215 Z 32,768, possible models. In fact, getting comparable groups in this case is problematic, as doctors will likely channel patients to 1 treatment or another. The starting data set is impressively large, on the order of 130,000. But there are only about 7000 patients treated with brachytherapy. The effective sample size is much closer to the 7000. P values quoted for this study do not adjust for multiple endpoint, multiple modeling, and therefore do not fully reflect the variability in the study. Biases are systematic differences between groups. Biases will be present unless all the correct covariates are used, and used in the correct statistical model. As the sample size increases, the effects of random error can be reduced, but any bias is expected to remain (10). Depending on how the data are staged and how the modeling is done, this bias can favor either treatment. OMOP recommends that risk ratios less than 3 or 4 are not supportive of cause and effect. Based on practical experience, Temple (11) also suggests risk ratios of 3 or 4 as the level at which observed effects might be considered real. With large data sets, small P values are easy to obtain. In this study, are the differences clinically important or meaningful? The rate of subsequent mastectomy with brachytherapy APBI is 2.8%, versus 1.3% with WBI. Even with older technology, poorly understood selection criteria, and unknown dosimetric limits to normal tissues in the early years of APBI, should a 1.5% difference be enough to change a woman’s mind for a 5-day treatment/ procedure? Similarly, most of the toxicities that were analyzed are expected with an invasive insertion of a catheter that persists for 5 to 7 days in the patient. Infection rates were 5%, breast pain 6%, and fat necrosis 7.6% different between ABPI and WBI. These outcomes are statistically different but clinically trivial, and at best should be presented to the patient for her decision about their importance to her, along with cardiac risk with WBI in left-breast cancers and the double pneumonitis risk of WBI over APBI. Age is a critical factor in the ASTRO guidelines for APBI. Eliminating a key patient selection from the analysis, as all patients in this study are over the age of 67, makes dividing the outcomes by “Suitable, Cautionary, or Unsuitable Unless on Clinical Trial” of questionable value. The larger scientific body of literature supports the continued use and study of brachytherapy APBI, and this study should not slow down its momentum. Multiple randomized, prospective clinical trials have been completed, with maturing data, so ultimately we will have a scientifically valid head-to-head comparison of APBI versus WBI. Of course, neither treatment may be the winner in every situation. Depending on details mostly not presented in this study, WBI might have a better risk/benefit trade-off in some subgroups, and brachytherapy APBI in others. Both options should be available to select women with breast cancer.

References 1. Smith GL, Jiang J, Buchholz TA, et al. Benefit of adjuvant brachytherapy versus external beam radiation for early breast cancer: Impact of patient stratification on breast preservation. Int J Radiat Oncol Biol Phys 2014;88:274-284.

268

Kuske and Young

2. Smith GL, Xu Y, Buchholz TA, et al. Association between treatment with brachytherapy vs. whole breast irradiation and subsequent mastectomy, complications, and survival among older women with invasive breast cancer. JAMA 2012;307:1827-1837. 3. Taubes G, Mann CC. Epidemiology faces its limits. Science 1995;269: 164-169. 4. Ioannidis J. Why most published research findings are false. PLoS Med 2005;2:e124. 5. Young SS, Karr A. Deming, data and observational studies. Significance 2011;8:116-120. 6. Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets? Nat Rev Drug Discov 2011;10:712-713.

International Journal of Radiation Oncology  Biology  Physics 7. Begley CG, Ellis LM. Raise standards for preclinical cancer research. Nature 2012;483:531-533. 8. Unreliable research: trouble at the lab. Economist. October 19, 2013. Available at: www.economist.com/news/briefing/2158 8057. 9. Observational Medical Outcomes Partnership. Available at: omop. fnih.org. 10. Young SS. Statistical analyses and interpretation of complex studies: a little math will help us get started. 2008. Available at: http://www. medscape.org/viewarticle/571523_2. 11. Temple R. Meta-analysis and epidemiologic studies in drug development and post marketing surveillance. JAMA 1999;281: 841-844.

Breast brachytherapy versus whole-breast irradiation: reported differences may be statistically significant but clinically trivial.

Breast brachytherapy versus whole-breast irradiation: reported differences may be statistically significant but clinically trivial. - PDF Download Free
140KB Sizes 0 Downloads 0 Views