RESEARCH—HUMAN—CLINICAL STUDIES RESEARCH—HUMAN—CLINICAL STUDIES

Helen Kim, PhD*‡§ Adib A. Abla, MD¶ Jeffrey Nelson, MS*§

Validation of the Supplemented Spetzler-Martin Grading System for Brain Arteriovenous Malformations in a Multicenter Cohort of 1009 Surgical Patients

Charles E. McCulloch, PhD‡§ David Bervini, MDk Michael K. Morgan, MDk Christopher Stapleton, MD# Brian P. Walcott, MD# Christopher S. Ogilvy, MD# Robert F. Spetzler, MD** Michael T. Lawton, MD*¶§ *Department of Anesthesia and Perioperative Care, ‡Department of Epidemiology and Biostatistics, §Center for Cerebrovascular Research, and ¶Department of Neurological Surgery, University of California, San Francisco, California; kDepartment of Neurological Surgery, Macquarie University, Sydney, Australia; #Department of Neurological Surgery, Massachusetts General Hospital, Boston, Massachusetts; **Division of Neurological Surgery, Barrow Neurological Institute, Phoenix, Arizona Correspondence: Michael T. Lawton, MD, Department of Neurological Surgery, University of California, San Francisco, 505 Parnassus Ave, M780, Box 0112, San Francisco, CA 94143-0112. E-mail: [email protected] Received, June 23, 2014. Accepted, August 25, 2014. Published Online, September 22, 2014. Copyright © 2014 by the Congress of Neurological Surgeons.

SANS LifeLong Learning and NEUROSURGERY offer CME for subscribers that complete questions about featured articles. Questions are located on the SANS website (http://sans.cns.org/). Please read the featured article and then log into SANS for this educational offering.

NEUROSURGERY

BACKGROUND: The supplementary grading system for brain arteriovenous malformations (AVMs) was introduced in 2010 as a tool for improving preoperative risk prediction and selecting surgical patients. OBJECTIVE: To demonstrate in this multicenter validation study that supplemented Spetzler-Martin (SM-Supp) grades have greater predictive accuracy than Spetzler-Martin (SM) grades alone. METHODS: Data collected from 1009 AVM patients who underwent AVM resection were used to compare the predictive powers of SM and SM-Supp grades. Patients included the original 300 University of California, San Francisco patients plus those treated thereafter (n = 117) and an additional 592 patients from 3 other centers. RESULTS: In the combined cohort, the SM-Supp system performed better than SM system alone: area under the receiver-operating characteristics curve (AUROC) = 0.75 (95% confidence interval, 0.71-0.78) for SM-Supp and AUROC = 0.69 (95% confidence interval, 0.65-0.73) for SM (P , .001). Stratified analysis fitting models within 3 different follow-up groupings (,6 months, 6 months-2 years, and .2 years) demonstrated that the SM-Supp system performed better than SM system for both medium (AUROC = 0.71 vs 0.62; P = .003) and long (AUROC = 0.69 vs 0.58; P = .001) follow-up. Patients with SMSupp grades #6 had acceptably low surgical risks (0%-24%), with a significant increase in risk for grades .6 (39%-63%). CONCLUSION: This study validates the predictive accuracy of the SM-Supp system in a multicenter cohort. An SM-Supp grade of 6 is a cutoff or boundary for AVM operability. Supplemented grading is currently the best method of estimating neurological outcomes after AVM surgery, and we recommend it as a starting point in the evaluation of AVM operability. KEY WORDS: Arteriovenous malformation, Microsurgery, Patient selection, Risk prediction, Spetzler-Martin grading system, Supplementary grading system Neurosurgery 76:25–33, 2015

DOI: 10.1227/NEU.0000000000000556

T

he supplementary grading system for brain arteriovenous malformations (AVMs) was introduced in 2010 as a tool for improving preoperative risk prediction and selecting patients for surgery.1 Through analysis of additional factors outside the Spetzler-Martin (SM) grading ABBREVIATIONS: AUROC, area under the receiveroperating characteristic curve; AVM, arteriovenous malformation; mRS, modified Rankin Scale; SM, Spetzler-Martin; SM-Supp, supplemented SpetzlerMartin; UCSF, University of California, San Francisco

www.neurosurgery-online.com

system that influence patient outcome after AVM resection, an analogous grading system was constructed that would supplement, rather than replace, the entrenched SM system.2 Points were assigned for the ABCs of AVMs: patient age, bleeding or hemorrhagic presentation, and AVM compactness, analogous to SM scoring (Table 1). Pediatric patients (,20 years of age) were assigned 1 point; adults (20-40 years of age) were assigned 2 points; and older patients (.40 years of age) were assigned 3 points. Patients presenting with unruptured AVMs were assigned

VOLUME 76 | NUMBER 1 | JANUARY 2015 | 25

Copyright © Congress of Neurological Surgeons. Unauthorized reproduction of this article is prohibited

KIM ET AL

TABLE 1. Comparison of the Spetzler-Martin and Supplementary Grading Systems Spetzler-Martin Grading Size, cm ,3 3-6 .6 Venous drainage Superficial Deep Eloquence No Yes Total

Points 1 2 3 0 1 0 1 5

Supplementary Grading Age, y ,20 20-40 .40 Bleeding Yes No Compactness Yes No

1 point, and those with ruptured AVMs were assigned 0 points. Diffuse AVMs were assigned 1 point; compact AVMs, 0 points. These points were added together for a supplementary AVM grade that ranged from 1 to 5. Simplicity is critical to the popularity of grading scales, and the supplementary grading scale was designed with this in mind. In addition, the 2 grading systems are analogous in their structure to make the supplementary grading scale memorable. In a consecutive surgical series of 300 patients, the supplementary grading system, sometimes referred to as the Lawton-Young grading system, had a higher predictive accuracy than the SM system and stratified surgical risk more evenly.1 A sum of the 2 scores, or the supplemented SM (SM-Supp) score, #6 identified patients with acceptably low surgical morbidity, providing a bedside numerical boundary for safe surgery. This combination of 2 grading systems has proved to be a useful way of simplifying complex treatment decisions and has become an integral part of our thinking about AVMs. However, the supplementary grading system was derived retrospectively from a single surgical series from 1 institution.1 Validation of this grading system in a larger surgical series that includes other institutions and other neurosurgeons would encourage broader application. Therefore, we assembled such a cohort from 4 recognized centers with high-volume AVM practices and highlevel neurosurgical expertise. In this report, we demonstrate that SM-Supp scores have greater predictive accuracy than SM scores alone and that supplemented scoring is currently the best method of estimating neurological outcomes after AVM surgery.

METHODS Study Population The study was approved by the University of California, San Francisco (UCSF) Committee on Human Research and was conducted in compliance with Health Insurance Portability and Accountability Act regulations. Four institutions were invited to participate: UCSF, Barrow Neurological Institute (Phoenix, Arizona), Massachusetts General

26 | VOLUME 76 | NUMBER 1 | JANUARY 2015

Hospital (Boston, Massachusetts), and Macquarie University (Sydney, Australia). The study sample consisted of patients with AVMs treated with microsurgical resection. UCSF patients included the original 300 patients plus those treated thereafter (n = 117). Patients from outside institutions included those patients treated in the last 10 years with complete clinical, angiographic, and outcome data (Barrow Neurological Institute, n = 266; Massachusetts General Hospital, n = 153; and Macquarie University, n = 173).

Study Variables Data collected from 1009 AVM patients who underwent microsurgical AVM resection were used to compare the predictive power of the SM score and the SM-Supp score. Variables included the SM grade components (AVM size, venous drainage pattern, and eloquence),2 the supplementary grade components (age at resection, hemorrhage before resection, and compactness of the AVM nidus),1 preoperative modified Rankin Scale (mRS) score, last available postoperative mRS score, and the time from resection to postoperative mRS. Data collection was harmonized through the use of standard terminology.3

Statistical Analysis Summary statistics of clinical characteristics were calculated by cohort. Differences among cohorts were assessed with x2 tests for categorical characteristics and analysis of variance for continuous characteristics. Outcomes were defined as the difference in mRS scores before and after surgery and were dichotomized into good (unchanged or improved mRS) or poor (worse mRS). The primary predictor variables were SM score (ranging from 1 to 5) and SM-Supp score (ranging from 2 to 10). Logistic regression models were adjusted further for log time from resection to last postoperative mRS assessment because patients with longer follow-up have more time to recover and time may confound the relationship between mRS score and outcome. The performance of the SM and SM-Supp scores was evaluated by comparing the areas under the receiver-operating characteristic curves (AUROCs) corresponding to SM and SM-Supp models, sometimes also referred to as c statistics.4 An AUROC of 0.5 indicates no discrimination, whereas an AUROC of 1 indicates perfect discrimination. In general, an AUROC of 0.8 is considered clinically useful.4 Data from all 4 institutions were combined to assemble 1 large cohort with a large sample size and tighter confidence intervals. To assess model fit, AUROC values were computed by use of 10-fold cross-validation, which corrects for possible overfitting and inflated AUROC values.5,6 For this procedure, we partitioned the data set into 10 mutually exclusive subsamples, each containing 10% of the data. Predicted values for subjects within a given subsample were calculated from model fitting using the other 90% of the data. Because timing of last mRS assessment is an important confounding variable that needed to be included in the logistic regression models, we performed a sensitivity analysis that removed the effect of time from the models. The data were subdivided into 3 groupings based on follow-up time (,6 months, 6 months-2 years, and .2 years), and SM and SM-Supp coefficients were calculated in each of these mutually exclusive groups. Given the lack of external validation data, 10-fold cross-validation was used for more honest assessments of the AUROC. Values of P , .05 were considered to be statistically significant. All analyses were performed with Stata/SE 13.1 software (StataCorp, College Station, Texas).

www.neurosurgery-online.com

Copyright © Congress of Neurological Surgeons. Unauthorized reproduction of this article is prohibited

VALIDATION OF THE SUPPLEMENTARY AVM GRADING SCALE

RESULTS Overall, 1009 patients were included in the combined cohort (Table 2). UCSF patients included 300 used for the original fitting of the SM-Supp model1 and 117 used as an internal replication cohort.7 Cohorts differed with respect to a number of AVM characteristics, including age at resection, distribution of AVM size, deep venous drainage, eloquence, unruptured presentation, and diffuse AVM nidus. The Massachusetts General Hospital cohort had a greater proportion of SM grade I AVMs (41% vs approximately 20% for other cohorts; P , .001). The UCSF cohort had a greater proportion of eloquent AVMs (P , .001). Compared with other cohorts, the Macquarie University cohort had a greater proportion of diffuse AVMs (40% vs 18%; P , .001) and neurologically normal patients preoperatively (preoperative mRS score of 0-1, 69% vs 52% for other cohorts; P , .001), as well as the longest average follow-up time (2.5 6 3.2 vs 1.7 6 2.0 years for other cohorts; P = .004). The predictive accuracy of the SM-Supp scores was the same in the combined replication cohort of 709 patients from all 4 institutions as in the original cohort of 300 UCSF patients used to generate the SM-Supp model (AUROC = 0.7482 vs 0.7428, respectively; P = .89; Figure 1). The UCSF internal replication cohort of 117 patients represents a prospective application of the SM-Supp system, and the SM-Supp system outperformed the SM system alone using a model fitting based on the original 300 UCSF patients (AUROC = 0.77 vs 0.71, respectively). Because of the small sample size, this difference was not statistically significant (P = .10). However, when the model was refitted in the combined cohort of 1009 patients, analysis of AUROC values for the SM and SM-Supp systems showed that the SM-Supp system still performed significantly better than the SM system alone: AUROC = 0.69 (95% confidence interval, 0.65-0.73) for SM and AUROC = 0.75 (95% confidence interval, 0.71-0.78) for SM-Supp (Figure 2). This difference was highly statistically significant (P , .001). Stratified analysis fitting models within the 3 different followup groupings (,6 months, 6 months-2 years, and .2 years) demonstrated that the SM-Supp system performed better than SM system alone for both medium follow-up (AUROC = 0.71 vs 0.62; P = .003) and long follow-up (AUROC = 0.69 vs 0.58; P = .001). Differences were not significant for short follow-up (AUROC = 0.70 vs 0.67; P = .21). Therefore, the SM-Supp system predicted poor outcomes more accurately than the SM system alone beyond 6 months independently of timing of last mRS assessment. We explored adding other risk factors or changing the age cutoffs currently in the SM-Supp model. However, neither the addition of deep perforator supply (odds ratio = 0.88; 95% confidence interval, 0.59-1.30; P = .51) nor the use of alternative age cutoffs significantly increased the predictive power of the SM-Supp score or improved the grading system (data not shown). Neurological outcomes were stratified by SM-Supp grades, with surgical morbidity increasing with increasing grade (Table 3).

NEUROSURGERY

SM-Supp grade 6 was a boundary for AVM operability. Patients with SM-Supp grades #6 had acceptably low surgical risks (0%24%), with a significant increase in risk in patients with SM-Supp grades .6 (39%-63%).

DISCUSSION Validation of the SM-Supp Grading System Patient selection is the key to avoiding surgical complications and poor neurological outcomes with microsurgical resection of brain AVMs. The wide variety of AVM anatomy, size, location, and clinical presentation makes patient selection for surgery a difficult process. Neurosurgeons have analyzed their surgical experiences to identify factors that determine the risks of surgery to assist them in this selection process. Numerous classification schemes have been developed, each with its own emphasis, accuracy, advantages, and disadvantages.8-13 These classification schemes have value because they transform complex decisions into algorithms. Some are complex and others simple, each striving to predict surgical risk and to achieve bedside applicability. The SM grading system continues to withstand the test of time and remains the predominant classification scheme.2 However, it is crude and has deficiencies such as lumping the heterogeneous group of grade III AVMs together without clarifying the management of subtypes.14 The SM system also has redundancies, with low-grade AVMs managed similarly with surgery and high-grade AVMs managed conservatively. Therefore, Spetzler and Ponce15 condensed the original 5-tier grading system into 3 tiers and made broad treatment recommendations based on AVM class. Proponents of this simplification emphasize that the fewer classes correspond more directly with treatment recommendations. Opponents of this simplification argue that it does not simplify the analysis because the same scoring steps of the original SM scale are required along with an additional step to reclassify the AVM. Opponents also emphasize that the classspecific recommendations are vague and encumbered with exceptions. For example, the class system still does not shed light on the heterogeneous grade III lesions. Patient selection is a sophisticated process that requires more complexity, not less, which is why the supplementary grading system was proposed. This study validates the SM-Supp grading system in a large, multicenter cohort of .1000 surgical patients. The AUROC value for the SM-Supp system was better than for the SM system alone (AUROC = 0.75 vs 0.69; P , .001; Figure 2). Prospective application of the SM-Supp system in a small subgroup of patients resulted in an even higher AUROC value (AUROC = 0.77),7 indicating improved predictive accuracy when the system becomes part of the decision-making process. This study also confirmed that an SM-Supp grade of 6 was a cutoff or boundary for AVM operability. SM-Supp grading embodies 6 of the most important surgical variables, stratifies AVMs across 9 possible grades, and has 1 clear threshold value, yet it remains simple, memorable, and applicable at the bedside in our clinical experience. Statistical validation showing improved predictive

VOLUME 76 | NUMBER 1 | JANUARY 2015 | 27

Copyright © Congress of Neurological Surgeons. Unauthorized reproduction of this article is prohibited

KIM ET AL

TABLE 2. Comparison of Patients and Arteriovenous Malformation Characteristics in Cohorts at 4 Institutionsa

Age at resection, y ,20 20-40 .40 AVM size, cm ,3 3-6 .6 Deep venous drainage Eloquence Nonhemorrhagic presentation Diffuse nidus SM grade 1 2 3 4 5 SM-Supp grade 2 3 4 5 6 7 8 9 10 Preoperative mRS score 0 1 2 3 4 5 Postoperative mRS score 0 1 2 3 4 5 6 Worsening of mRS score Time from surgery to last mRS, y

UCSF Retrospective (n = 300), n (%)

UCSF Prospective (n = 117), n (%)

Barrow Neurological Institute (n = 266), n (%)

Massachusetts General Hospital (n = 153), n (%)

Macquarie University (n = 173)

47 (16) 113 (38) 140 (47)

33 (28) 29 (25) 55 (47)

61 (23) 91 (34) 114 (43)

10 (7) 44 (29) 99 (65)

29 (17) 74 (43) 70 (40)

194 101 5 130

75 40 2 48

154 105 7 107

P Value ,.001

,.001 (65) (34) (2) (43)

(64) (34) (2) (41)

(58) (39) (3) (40)

110 42 1 40

(72) (27) (1) (26)

63 102 8 62

(36) (59) (4) (36)

.007

160 (53) 143 (48)

70 (60) 48 (41)

118 (44) 138 (52)

43 (28) 81 (53)

73 (42) 112 (65)

,.001 .001

37 (12)

15 (13)

59 (22)

42 (27)

69 (40)

,.001

55 122 92 29 2

(18) (41) (31) (10) (1)

25 36 43 12 1

(21) (31) (37) (10) (1)

64 92 79 30 1

(24) (35) (30) (11) (,1)

63 56 31 3 0

(41) (37) (20) (2) (0)

28 69 48 24 6

(16) (40) (28) (14) (2)

8 20 55 92 72 41 9 3 0

(3) (7) (18) (31) (24) (14) (3) (1) (0)

5 8 27 31 31 9 3 2 1

(4) (7) (23) (27) (27) (8) (3) (2) (1)

8 23 54 66 65 32 15 3 0

(3) (9) (20) (25) (24) (12) (6) (1) (0)

2 5 35 63 19 23 5 1 0

(1) (3) (23) (41) (12) (15) (3) (1) (0)

1 4 32 39 45 33 16 2 1

(1) (2) (19) (23) (26) (19) (9) (1) (1)

.001

,.001 85 65 33 55 33 29

(28) (22) (11) (18) (11) (10)

26 33 21 16 9 12

(22) (28) (18) (14) (8) (10)

41 109 60 26 12 18

(15) (41) (23) (10) (5) (7)

22 53 39 17 13 9

(14) (35) (25) (11) (9) (6)

94 26 25 13 3 12

(54) (15) (14) (8) (2) (7) ,.001

96 88 52 21 15 1 27 77

(32) (29) (17) (7) (5) (,1) (9) (26)

1.69 6 1.69

43 34 25 9 5 0 1 25

(37) (29) (21) (8) (4) (0) (1) (21)

0.99 6 1.05

91 74 54 22 13 6 6 51

(34) (28) (20) (8) (5) (2) (2) (19)

2.12 6 2.35

28 54 45 16 3 2 5 35

(18) (35) (29) (10) (2) (1) (3) (23)

1.80 6 2.42

99 27 33 6 0 1 7 34

(57) (16) (19) (3) (0) (1) (4) (20)

2.51 6 3.17

.008b

a

mRS, modified Rankin Scale; SM, Spetzler-Martin; SM-Supp, supplemented Spetzler-Martin; UCSF, University of California, San Francisco. Values are the number observed with the specified value (and percentage) or the mean 6 SD. P values are derived from a x2 test or analysis of variance. b This P value is based on a comparison of log-transformed values.

28 | VOLUME 76 | NUMBER 1 | JANUARY 2015

www.neurosurgery-online.com

Copyright © Congress of Neurological Surgeons. Unauthorized reproduction of this article is prohibited

VALIDATION OF THE SUPPLEMENTARY AVM GRADING SCALE

FIGURE 1. Graph showing receiver-operating characteristic analyses for the supplemented Spetzler-Martin (SM-Supp) grading system in the combined replication cohort of 709 patients from all 4 institutions (orange curve) and the original cohort of 300 University of California, San Francisco patients used to generate the SM-Supp model (green curve). There were no significant differences in predictive accuracy in these 2 groups (area under the receiveroperating characteristic curve = 0.7482 vs 0.7428, respectively; P = .89).

accuracy in a large cohort should encourage others to adopt this augmented grading system. Elements of Supplementary Grading The supplementary grading system works because patient age, hemorrhagic presentation, and compactness have important surgical consequences. Youth is associated with tolerance of surgery, recoverability, and neural plasticity and is therefore weighted heavily in the supplementary grading system. AVM hemorrhage separates portions of the nidus from adjacent brain, thereby performing some parenchymal dissection. Hematoma evacuation immediately accesses this plane as if it were a free cortical surface and opens working room. Hematomas often extend to the brain surface and localize AVMs that are not cortically based. Hematomas lead to deep AVMs without a superficial draining vein, and clot evacuation opens nonanatomic corridors of exposure that do not exist in patients with unruptured AVMs. Hematoma above the nidus opens a transcortical corridor to its top side, whereas hematoma adjacent to the nidus dissects its parenchymal sides, and hematoma beneath the nidus facilitates deep dissection. Accessing an AVM through the hematoma cavity requires no corticectomy, produces no additional morbidity, and reaches AVMs in deep lobar white matter, basal ganglia, and

NEUROSURGERY

FIGURE 2. Graph showing receiver-operating characteristic analyses for the Spetzler-Martin grading system (blue curve) and the supplemented SpetzlerMartin (SM-Supp) grading system (red curve) in the combined cohort of 1009 patients. The predictive accuracy of the SM-Supp grading system was greater than that of the Spetzler-Martin grading system (area under the receiveroperating characteristics curve = 0.75 vs 0.69, respectively; P , .001).

thalamus. Absorption of hematoma after rupture reduces, but does not eliminate, these surgical advantages. Blood is replaced with chronic inflammation, scar tissue, gliosis, and encephalomalacia. Old hemorrhage cavities can be exploited as routes of AVM access, and thickened gliotic planes handle better and separate more cleanly than soft, friable, unbled planes. A compact AVM is wound tightly with distinct margins, no intermixed brain, and separability from adjacent brain, all of which clarify parenchymal dissection. In contrast, a diffuse AVM is wound loosely, as if pulled apart or unraveled, with indistinct margins, intermixed brain, and poor separability, all of which complicate parenchymal dissection. Diffuseness comes in many variations: a lacy web of small arterioles; tentacles of large arteries; a ragged lattice of arteries and arterioles spread across a pial surface; a network of leptomeningeal collateral arteries in a watershed zone between vascular territories; or wisps of perforating arteries penetrating the nidus from below. Compact AVMs are easily deciphered and followed, whereas diffuse AVMs force the neurosurgeon to establish the plane of separation between the AVM and brain. The margin of a diffuse AVM can be an unruly fringe that must be encircled either widely at the expense of intervening brain tissue or closely in a way that crosses more of this arterial fringe. The interplay of eloquence and hemostasis pulls in and pushes back the circumdissection, challenging the neurosurgeon to find the right dissection distance. In

VOLUME 76 | NUMBER 1 | JANUARY 2015 | 29

Copyright © Congress of Neurological Surgeons. Unauthorized reproduction of this article is prohibited

KIM ET AL

TABLE 3. Neurological Outcomes Associated With Spetzler-Martin Grades, Supplementary Grades, and Supplemented Spetzler-Martin Grades Neurological Outcome Worse

Spetzler-Martin grade I II III IV V Supplementary grade I II III IV V Supplemented Spetzler-Martin grade 2 3 4 5 6 7 8 9 10

Improved or Unchanged

n

%

n

%

17 80 84 37 4

7 21 29 38 50

218 295 209 61 4

93 79 71 62 50

5 30 70 92 25

6 16 18 32 44

84 161 314 196 32

94 84 82 68 56

0 1 21 54 56 54 30 6 0

0 2 10 19 24 39 63 55 0

24 59 182 237 176 84 18 5 2

100 98 90 81 76 61 38 45 100

some cases, this judgment can bring the dissection too close to the nidus and cause bleeding or leave behind abnormal portion of the nidus. In other cases, this judgment can bring the dissection too far from the nidus, resulting in complete and easier AVM resection but removing more brain tissue than is necessary. The difficulties in circumdissecting diffuse AVMs can lead to tissue injury, parenchymal bleeding, contusions, postoperative edema, seizures, and delayed hemorrhagic complications, all of which affect patient outcomes and account for its place in the supplementary grading system. Limitations This study is limited by selection and inclusion biases. Operated patients were highly selected by the participating neurosurgeons, and selection criteria were not uniform across centers. Only those patients with complete clinical and radiographic data necessary for grading and outcome assessment were included, resulting in an unknown number of excluded patients. Participating neurosurgeons have different surgical techniques, technical skills, and experience, but each neurosurgeon directs a busy AVM service, has experience with .500 AVM operations, and has a high level of surgical expertise. This study was a retrospective review of clinical and radiographic data, with the exception of the 117 patients

30 | VOLUME 76 | NUMBER 1 | JANUARY 2015

from UCSF analyzed prospectively after the creation of the supplementary grading system.7 However, each of these centers maintains an active vascular registry, and much of the data in these other 892 patients were collected prospectively. Scoring diffuseness is a limitation of the supplementary grading system because it is difficult to quantify and may vary between observers. In fact, diffuseness scores varied between 12% and 40% at different centers. We developed methods using digital angiography, contour plots, and area-intensity profiles to quantify diffuseness.16 Computer algorithms generate outlines around AVMs at set threshold image intensities, which are converted to AVM areas.16 As the threshold intensity is dialed down, the area of the contour plot increases linearly with compact AVMs but exponentially with diffuse AVMs. In other words, as the AVM is outlined more generously, compact AVMs grow a little and diffuse AVMs grow a great deal. Area-intensity profiling is a cumbersome research tool for defining diffuseness and is not applicable at the bedside. Instead, as a general definition, a circle can be drawn easily around a compact AVM on an angiogram but not so easily around a diffuse one. The subjectivity of diffuseness should not taint the supplementary grading system. Eloquence is also somewhat subjective and imprecise, given the known translocations of neurological function associated with AVMs, and this has not tainted the SM grading system. In fact, eloquence scores also varied in this study between 28% and 60% at different centers, similar to the range in diffuseness scores. Although critics might suggest that an AUROC value of 0.75 is low and question the predictive power of the SM-Supp grading system, this value exceeds that of the SM grading system (0.69), which remains the gold standard with which all other grading systems are compared. Establishing a cutoff for AVM operability requires an evaluation of the natural history of the AVM, the morbidity of therapeutic alternatives, the patient’s age and comorbidities, and his or her unique emotional state, including treatment preferences, expectations, risk tolerance, and anxieties. These factors influence the cutoff for AVM operability in each individual patient and make it difficult to establish a universal cutoff. Although it may seem arbitrary in this context, our cutoff score of 6 reflects a significant increase in morbidity observed in patients with SM-Supp score .6, as well as a risk (0%-24%) that many AVM patients might find acceptable in pursuit of AVM cure. Our stated cutoff should be viewed as a guideline, and we encourage individualizing AVM treatment recommendations.

CONCLUSION This study demonstrates the predictive accuracy of the SMSupp system in a large, multicenter cohort and validates the use of SM-Supp scores. We recommend it as a starting point in the evaluation of AVM operability. No grading system, combination of grading systems, or simple algorithm can replace the discriminating process of patient selection. We offer this supplementary grading system as just another tool to guide the process of analyzing some of the critical factors that influence patient

www.neurosurgery-online.com

Copyright © Congress of Neurological Surgeons. Unauthorized reproduction of this article is prohibited

VALIDATION OF THE SUPPLEMENTARY AVM GRADING SCALE

outcome to help neurosurgeons to make more rational choices when weighing known risk of spontaneous AVM rupture against risk of intervention. The supplementary grading system is intended to improve preoperative risk prediction, and we expect that it will assist in patient selection for surgery. The current crop of grading systems is generally useful but imperfect and evolving. As the pathophysiology, hemodynamics, and genetics of AVMs are elucidated through research, grading systems will incorporate these advances. Therefore, the work of developing AVM grading systems should be viewed as an ongoing process, and clinicians should be open to revising established, proven grading systems. Disclosure This work was funded in part by a grant from the National Institutes of Health (R01 NS034949; principal investigator, Dr Kim). The authors have no personal, financial, or institutional interest in any of the drugs, materials, or devices described in this article.

REFERENCES 1. Lawton MT, Kim H, McCulloch CE, Mikhak B, Young WL. A supplementary grading scale for selecting patients with brain arteriovenous malformations for surgery. Neurosurgery. 2010;66(4):702-713; discussion 713. 2. Spetzler RF, Martin NA. A proposed grading system for arteriovenous malformations. J Neurosurg. 1986;65(4):476-483. 3. Joint Writing Group of the Technology Assessment Committee American Society of Interventional and Therapeutic Neuroradiology; Joint Section on Cerebrovascular Neurosurgery a Section of the American Association of Neurological Surgeons and Congress of Neurological Surgeons; Section of Stroke and the Section of Interventional Neurology of the American Academy of Neurology, Atkinson RP, Awad IA, Batjer HH, et al. Reporting terminology for brain arteriovenous malformation clinical and radiographic features for use in clinical trials. Stroke. 2001;32(6):1430-1442. 4. Harrell FE. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York, NY: Springer; 2001. 5. Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Paper presented at: International Joint Conference on Artificial Intelligence; August 20-25, 1995; Montreal, Quebec, Canada. 6. Westerhuis JA, Hoefsloot HC, Smit S, et al. Assessment of PLSDA cross validation. Metabolomics. 2008;4(1):81-89. 7. Kim H, Pourmohamad T, Westbroek EM, McCulloch CE, Lawton MT, Young WL. Evaluating performance of the Spetzler-Martin supplemented model in selecting patients with brain arteriovenous malformation for surgery. Stroke. 2012;43(9):2497-2499. 8. Davies JM, Kim H, Young WL, Lawton MT. Classification schemes for arteriovenous malformations. Neurosurg Clin N Am. 2012;23(1):43-53. 9. Griessenauer CJ, Miller JH, Agee BS, et al. Observer reliability of arteriovenous malformations grading scales using current imaging modalities. J Neurosurgery. 2014;120(5):1179-1187. 10. Hollerhage HG, Dewenter KM, Dietz H. Grading of supratentorial arteriovenous malformations on the basis of multivariate analysis of prognostic factors. Acta Neurochir (Wien). 1992;117(3-4):129-134. 11. Pertuiset B, Ancri D, Kinuta Y, et al. Classification of supratentorial arteriovenous malformations: a score system for evaluation of operability and surgical strategy based on an analysis of 66 cases. Acta Neurochir (Wien). 1991; 110(1-2):6-16. 12. Spears J, Terbrugge KG, Moosavian M, et al. A discriminative prediction model of neurological outcome for patients undergoing surgery of brain arteriovenous malformations. Stroke. 2006;37(6):1457-1464. 13. Tamaki N, Ehara K, Lin TK, et al. Cerebral arteriovenous malformations: factors influencing the surgical difficulty and outcome. Neurosurgery. 1991;29(6):856861; discussion 861-863.

NEUROSURGERY

14. Lawton MT; UCSF Brain Arteriovenous Malformation Study Project. SpetzlerMartin grade III arteriovenous malformations: surgical results and a modification of the grading scale. Neurosurgery. 2003;52(4):740-748; discussion 748-749. 15. Spetzler RF, Ponce FA. A 3-tier classification of cerebral arteriovenous malformations: clinical article. J Neurosurg. 2011;114(3):842-849. 16. Du R, Keyoung HM, Dowd CF, Young WL, Lawton MT. The effects of diffuseness and deep perforating artery supply on outcomes after microsurgical resection of brain arteriovenous malformations. Neurosurgery. 2007;60(4):638646; discussion 646-648.

COMMENTS

T

he authors present a detailed analysis of a multicenter cohort of 1009 surgical patients in whom the predictive powers of the SpetzlerMartin (SM) grades and supplemented Spetzler-Martin (SM-Suppl) grades were compared. Included in the cohort were the initial 300 patients from the SM-Suppl article published in 2010. The study variables included the SM grade components (size, venous drainage, and eloquence), the SM-Suppl grade components (age, hemorrhage, and compactness), preoperative and postoperative modified Rankin Scale (mRS) score, and the time from surgery to postoperative mRS. Outcomes were categorized as good (unchanged or improved mRS score) or poor (worse mRS score). Statistical analysis used included x2 tests, analysis of variance, and logistic regression modeling. The main findings were as follows:

1. The predictive accuracy of the SM-Suppl scores was the same in the original cohort of 300 patients as in the combined cohort of 709 patients from all 4 institutions. 2. When the original cohort of 300 patients was compared with the internal cohort of 117 from the same institution, the SMSuppl system outperformed the SM system alone, but the difference was not statistically significant. When the larger combined cohort of 1009 patients was analyzed with the 2 systems, the SM-Suppl performed significantly better than the SM system (P , .001). 3. The SM-Suppl system also performed better than the SM for patients in the medium follow-up group (6 months-2 years) and the long follow-up group (.2 years). The difference was not significant for patients in the short follow-up group. 4. Patients with SM-Suppl grades of #6 had an acceptable surgical risk compared with those with a higher grade. The authors conclude that the results validate the use of the SM-Suppl scores and recommend that it be used as a starting point for arteriovenous malformation (AVM) operability. This is a well-done and timely study, given the recent results of the controversial A Randomized Trial of Unruptured Brain Arteriovenous Malformations (ARUBA) study. A conscious effort to preoperatively assign the surgical risk will facilitate patient selection and will clearly benefit our patients. As the authors point out, the diffuseness score and the eloquence scores are more subjective than the others and could lead to a discrepancy in scoring. In fact, there were clear differences between the trial sites with respect to the proportion of patients with eloquent and diffuse AVMs. It is unclear how much of an effect this subjectivity could have on the overall validity of the system. The authors should be congratulated on “fine-tuning” (without overcomplicating) and validating this AVM grading system. Cargill H. Alleyne Jr Augusta, Georgia

VOLUME 76 | NUMBER 1 | JANUARY 2015 | 31

Copyright © Congress of Neurological Surgeons. Unauthorized reproduction of this article is prohibited

KIM ET AL

T

he authors performed a multi-institutional analysis of a large cohort of operated arteriovenous malformation (AVM) patients to determine the accuracy of the “supplemented” Spetzler-Martin grading system vs the traditional Spetzler-Martin grading system in predicting patient outcome. The additional elements included in the supplemented system are young vs medium vs older age, hemorrhagic vs nonhemorrhagic presentation, and compact vs diffuse nidus. Adding these elements to the traditional Spetzler-Martin grading system expands AVM subcategories from 5 to 9. Patient outcome was assessed per the modified Rankin Scale, with patients having worse modified Rankin Scale score on later follow up categorized as having poor outcome. Overall, the authors show convincing evidence that the supplemented Spetzler-Martin grading system performs better at predicting outcome after surgery vs the traditional system. Moreover, the authors suggest that supplemented Spetzler-Martin grade 6 is a cutoff for AVM operability. The strengths of this study are many: (1) use of a novel grading system that is based on previous studies that identified the additional elements used in the system (age, presentation, and nidus morphology); (2) a very large cohort from 4 well-established, high-volume AVM surgery centers; (3) internal validity with different subgroups of the overall cohort; and (4) sound statistical analyses. The main weakness of this study is that it lacks third-party adjudication of outcomes. This is a very well-conceived and well-executed study with important outcomes. In this era when the value of AVM treatment is being questioned, improving the predictive value of the most commonly used AVM grading system to predict surgical outcome is an important line of inquiry. My only caution concerning the authors’ conclusions relates to the identification of supplemented Spetzler-Martin grade 6 as a guideline cutoff for AVM operability. As the authors point out, establishing operability for a particular AVM patient requires evaluation of multiple factors: (1) the natural history of the AVM, (2) the morbidity of treatment alternatives, (3) the age and comorbidities of the patient, and (4) the mental makeup of the patient vis-à-vis the impact of the AVM on the patient’s level of anxiety and well-being. With these factors in mind, I would argue that 24% permanent surgical morbidity (the outcome of grade 6 AVMs in this study) is likely acceptable only in a portion of AVM patients, whereas in many others, this level of morbidity may be considered too high. The authors are to be congratulated for providing us with an improved surgical grading system for AVM patients, one that I plan to adopt in my own practice from this point forward. Gregory J. Zipfel St. Louis, Missouri

H

arvey Cushing once wrote, “All surgeons who make for themselves opportunities to observe the manipulative work of their fellows must appreciate the present tendency toward the abandonment of the applauded methods of comparatively few years ago.”1 This aphorism may be aptly applied not just to operative technique but also to the decision-making process as to whether a patient is an appropriate candidate for surgery. The benefits of an operation must outweigh the risks of surgery for any procedure to be justified. Often, classification schemes are used to weigh various patient, disease, and procedure-related characteristics to decide if surgical intervention is the most appropriate course of treatment for a given pathology. In neurosurgery, the Spetzler-Martin grading scale (SM) for arteriovenous malformations (AVMs) has been widely adopted for this purpose.

32 | VOLUME 76 | NUMBER 1 | JANUARY 2015

The authors present an excellent follow-up to the original publication that described the supplemented SM grading scale (SM-Supp; the original SM plus patient age, hemorrhage at presentation, and nidus compactness)2 in which they validate that the SM-Supp more accurately predicts postoperative morbidity than the original SM. The cohort consisted of 1009 patients who harbored AVMs operated on by highly experienced vascular neurosurgeons at 4 high-volume cerebrovascular centers. The authors applied analysis of variance and logistic regression techniques to determine that the SM-Supp more accurately predicted poor surgical outcome than the SM alone as measured by the dichotomized modified Rankin Scale. The strengths of this study are obvious. The statistics are robust, and the data clearly demonstrate that a large proportion of AVMs, in the hands of experienced cerebrovascular neurosurgeons, can be safely removed to immediately eliminate the risk of hemorrhage with acceptable morbidity. The timeliness of this well-done study cannot be emphasized enough following the recent publication of the dubious A Randomized Trial of Unruptured Brain Arteriovenous Malformations (ARUBA) (a startlingly nonsurgical trial wherein only 18 of the 114 patients in the “interventional arm” underwent surgical resection).3 Moreover, the SM-Supp, although slightly more complicated, remains easy to incorporate into decision making. The additional characteristics are those that experienced surgeons have long recognized as important when deciding to recommend surgery to a patient with an AVM. In particular, the additional factors considered in the SM-Supp significantly improve risk stratification for patients with an AVM that previously fell into the heterogeneous grade 3 within the original SM rubric. The weaknesses of this report are minor. Some disparity in scoring will be inevitable because the determination of whether the AVM nidus is compact or not is nonstandardized. In addition, the recommendation that an SMSupp score of 6 be used as the cutoff for surgical intervention is seemingly capricious and highlights the shortcomings of any grading system. No grading system can supplant the doctor-patient relationship. Our role is to provide accurate information to the patient to empower the individual to make the best decision for him or her. However, these minor criticisms should not overshadow such laudable work, which truly embodies Dr Cushing’s 100-year-old sentiment. That is to say, we should continuously endeavor to improve on the great work of the past. Brian M. Howard Daniel L. Barrow Atlanta, Georgia

1. Cushing H. The control of bleeding in operations for brain tumors: with the description of silver “clips” for the occlusion of vessels inaccessible to the ligature. Ann Surg. 1911;54(1):1-19. 2. Lawton MT, Kim H, McCulloch CE, Mikhak B, Young WL. A supplementary grading scale for selecting patients with brain arteriovenous malformations for surgery. Neurosurgery. 2010;66(4):702-713; discussion 713. 3. Mohr JP, Parides MK, Stapf C, et al. Medical management with or without interventional therapy for unruptured brain arteriovenous malformations (ARUBA): a multicentre, non-blinded, randomised trial. Lancet. 2014;383 (9917):614-621.

CME QUESTIONS: 1. A 35 year-old male presents to the emergency room with a ruptured right 4cm fronto-polar arteriovenous malformation (AVM). On angiography,

www.neurosurgery-online.com

Copyright © Congress of Neurological Surgeons. Unauthorized reproduction of this article is prohibited

VALIDATION OF THE SUPPLEMENTARY AVM GRADING SCALE

this AVM is noted to be compact in architecture and have superficial drainage. What is the supplemental (ie, Lawton-Young) grade of the patient’s AVM?

A. 1 B. 2 C. 3 D. 4 E. 5 2. What AVM characteristic considered in the supplemented Spetzler-Martin grading system for AVMs portends a positive prognosis with respect to surgical outcome?

A. Presence of a nidal aneurysm.

B. Presence of feeding artery aneurysm. C. Diffuse AVM architecture D. Hemorrhagic presentation E. Frontal location 3. What is the cutoff score for operability of AVMs in the supplemented Spetzler-Martin Grading system?

A. 2 B. 4 C. 6 D. 8 E. 10

“1984” “1984” is an American television commercial which introduced the Apple Macintosh personal computer. It aired on US national broadcast 31 years ago this month, in January 1984. It was aired only twice on American television. In one interpretation of the commercial, “1984” used the unnamed heroine to represent the coming of the Macintosh as a means of saving humanity from “conformity”. These images were an allusion to George Orwell’s noted novel, Nineteen Eighty-Four, which described a dystopian future ruled by a televised “Big Brother”. Originally a subject of contention within Apple, it has subsequently been called a watershed event.

NEUROSURGERY

VOLUME 76 | NUMBER 1 | JANUARY 2015 | 33

Copyright © Congress of Neurological Surgeons. Unauthorized reproduction of this article is prohibited

Validation of the supplemented Spetzler-Martin grading system for brain arteriovenous malformations in a multicenter cohort of 1009 surgical patients.

The supplementary grading system for brain arteriovenous malformations (AVMs) was introduced in 2010 as a tool for improving preoperative risk predict...
436KB Sizes 2 Downloads 5 Views