JOURNAL OF NEUROTRAUMA Volume 9, Number 2, 1992 Mary Ann Liebert, Inc., Publishers

Comment: Need for Standardization of Animal Models of Spinal Cord Injury ALAN I. FADEN

The

is an important issue, and we need to emphasize potential long-term goals. I want to spend a few minutes summarizing my own experiences and present some points for discussion. We have reviewed possible pharmacologie strategies, including problematic issues and potential pitfalls. We have talked about the appropriateness of the models for spinal cord injury, their clinical relevance, and whether we should have more than one kind of model. Drug administration issues have been addressed by Ed Hall. In this session, we have focused on outcome measures and biostatistics. Certainly, more attention to these issues is needed. Standardization. Appropriate animal models for SCI must minimize variability. Important parameters include injury level, mechanical factors, and strain and sex differences. For example, the NYU group uses Long-Evans rats, whereas the Ohio State group uses Sprague-Dawley rats. It may be instructive and useful to examine a series of strains within a given model to see which one yields maximal consistency. In addition, the severity of injury is most important as far as drug evaluation is concerned. One can easily miss a drug effect if one examines too severe or too minimal levels of injury. Let me give an example from our work related to effects of opiate blockade (using nalmefene) after spinal cord injury. After a 50 g cm contusion injury (a moderately severe injury), nalmefene showed a very consistent dose-related protective effect with regard to motor recovery. After a more severe injury (75 g cm) in which none of the animals recovered walking ability in the control group, nalmefene had no significant effect. Perhaps this is not surprising, but we need to be aware that relatively modest changes in the injury severity can mask the potential effectiveness of a drug. Choice of outcome measure is also crucial. When we selection of appropriate end points

solutions and





examined tissue cation data in these studies, we found the reverse effect: the beneficial actions of nalmefene were less evident at the moderately severe injury level than at the severe injury level. So, both injury severity and type of outcome measure are important in determining potential protective actions with a given drug. Anesthetic Effects. With most of the models we use, the animals are anesthetized (there are ischemia models done in awake animals). We have preliminary results looking at anesthetic effects on posttraumatic outcome that show marked differences among various anesthetic agents. There is remarkably little information available on this subject. One of the few published works has come from Steve Salzman's group comparing anesthetic treatments in an impact model. A related issue that has not been addressed concerns the interaction of test drugs with anesthetic agents. Let me review two examples illustrating the potential pitfalls if one does not pay sufficient attention to neuroprotective actions of anesthestics or drug-anesthetic effects. We compared two anesthetic regimens, isofluorane and pentobarbital-ketamine, in an impact model using Sprague-Dawley rats. The animals were intubated, and respiration was controlled. Animals anesthetized with pentobarbital-ketamine did not regain walking ability, whereas those anesthetized with isofluorane did. If we are to reach the stage of a consortium among laboratories, we must all decide to use the same appropriate anesthetic agents.

Office of the Dean of Research,

Georgetown University Medical Center, Washington, 169

DC.

FADEN

Drug-anesthetic interactions also must be considered. We examined the TRH analog YM14673, which according to the structure-activity data we had accumulated previously, should have had neuroprotective actions in CNS trauma. Our initial study used our standard injury regimen, including a pentobarbitalketamine combination anesthetic. There was absolutely no drug effect on inclined plane scores, Tarlov scores, percent walkers, and so on. We were also doing a study in traumatic brain injury, where we had begun to look at NMDA receptor-mediated actions. There we used pentobarbital alone as the anesthetic and saw a very striking neuroprotective effect with the same compound. Subsequently, we did another randomized study in spinal cord injury with pentobarbital alone and saw a very significant effect. Thus, the presence of ketamine appeared to block the therapeutic effect of a TRH analog. We have found a similar antagonistic effect between another noncompetitive NMDA antagonist and a TRH analog. In this case, treatment with YM14673 or dextrorphan alone improved outcome after fluid percussion-induced traumatic brain injury, whereas a combination of these drugs had no protective action. There is a general warning here—namely, that we must pay very careful attention to drug-anesthetic or drug-drug interactions. Outcome Measures. Several investigators have pointed out the need to use multiple measures. One is more comfortable in drawing a conclusion about the effect of a drug if beneficial actions are found with several independent outcome measures. Sensitivity. We have stated that the noncompetitive NMDA antagonist MK-801 at a dose of 1 mg/kg produced significant improvement of posttraumatic walking ability and inclined plane scores. When we

measured residual cord volume, similar to the Ohio State measurements described earlier, we saw trends toward improvement with drug, but this was not statistically significant. In the same system, looking only at preservation of the 5-HT fibers, which in the rat spinal cord run immediately adjacent to the motor control fibers, we saw a significant effect of MK-801 vs. saline. So, again, the issue is choice of outcome measure, with sufficient sensitivity to detect drug effects. Statistics. We need to pay much more attention to study design, for example choosing sample sizes based on an understanding of expected differences and a power analysis. We are all comfortable looking for type 1 errors, but, in fact, we need to be very mindful of false negatives. Adequate dose-response data, pharmacokinetics, and choice of appropriate outcome measures are critical issues. When we look at a drug and it doesn't show an intended effect, we have to be very cautious in saying that it is ineffective in the treatment of spinal cord injury—rather, that the treatment failed to show a statistically significant effect under the conditions of the study. Address reprint requests to: Alan I. Faden, M.D. the Office of Dean of Research NW 105 Medical Dental Building Georgetown University Medical Center 3900 Reservoir Road, NW Washington, DC 20007

DISCUSSION Dr. Choi: Relating to cancellation of effects with a combination of drugs, it would be important to determine if it is true cancellation or a masking of effects. When you put the two (Yamanouchi and dextrorphan) together, are results similar to the untreated controls? That is really a fascinating effect. Dr. Young: We have seen an antagonistic effect between lazaroid and nimodipine in our middle cerebral artery occlusion model. Recently, we found a complete cancellation of the methylprednisolone effects by GM-1 when the two were given together. Dr. Choi: Often we think of negative studies as lacking much value, but I think that these cancellation effects are really fascinating, and are worth pursuing at the level of mechanism. There is obviously a very big clue here, and there are results in the literature that our current concepts of how these drugs act do not predict. They are really important clues. 170

STANDARDIZATION OF SCI MODELS Dr. Tator: Regarding the statistics of combining tests, as Dr. Wrathall has done with the combined behavioral score (CBS), what about combining scores across types of outcome measures, such as anatomy and

physiology?

Dr. Stokes: One can look at those separately and together. We do that routinely with regard to traumatic spinal cord injury. Some years ago, we were not convinced about the predictiveness or sensitivity of any one individual measure, so we came up with a grouping of five outcome measures that looked at different aspects of performance and put them in a composite score. It is still largely motor, like Dr. Wrathall's CBS. The problem with many of the acute measures is that we kill the animal at 4-24 h, and combination with other measures is impossible. We have used spectroscopy to measure acute biochemical tests for the first 4-18 h. Then we follow those animals for a period of weeks with behavior, kill them and look at histology, and compare the acute biochemical effects with later outcomes. That adds predictiveness, but there is a question of cost and practicality of that kind of pursuit. Dr. Wrathall: Looking at combined measures, you risk combining a plus and a negative and missing both effects. The only reason to combine results would be that it had greater statistical power than the individual results. Dr. Stokes: Our data have suggested that for a number of the behavioral trials, some may be very good early and others may be much better at later survival periods. So we favor keeping some outcomes separate over time. At a given end point, however, there may be value in combining scores as long as you have some realistic expectation of what that combination is going to tell you. Dr. Faden: You gain power by combining a series of complementary outcomes in a fashion that would increase sensitivity and predictiveness. Dr. Walker: Four tests mildly positive might be more persuasive than just one of them strongly positive and the other three negative. To see all of them as an array would be valuable. Dr. Grüner: Having done this type of work, I can point out several very important caveats if you are going to do multiple behavioral measures. For example, at postinjury day 2 and at day 28 you get the same results, whereas on days 7 and 14, you do not. It is important to keep such things in mind when you are comparing outcome measures as a function of time. It is fine to use repeated measures, but I think, if at all possible, you should keep your measures separate, do a total analysis of variance, multiple regression, multiple ANOVA, whatever. Find out if there is a total difference in your population, and then break it down. You have to keep the effects separate until you are sure what you are dealing with. Dr. Beattie: It becomes a problem if you make up a combined score, then run analysis of variance. You are taking assumptions of the analysis of variance into the other analysis. I think you may not have created a linear or ratio scale. You may have created some kind of highly skewed scale that will then cause problems. Dr. Wrathall: As you increase multiple tests, your probability of finding one that shows a random difference increases. I think what is really critical is making a prior decision on how you will analyze data. It is not fair to go through and do all these sets, then select effects from noneffects, or vice versa. Unfortunately, we really don't know what we should expect on some of these things. Dr. Faden: The following issues are important with regard to the kinds of outcome measures that are chosen: (1) the sensitivity of tests in predicting injury level or recovery, (2) the predictivity of the outcome measure, (3) resource and time intensiveness of the outcome measure, (4) the discriminating ability of a test, particularly to discriminate a drug effect, (5) reliability or standardization within and between laboratories, (6) interrater reliability, and (7) clinical relevance. How might we stratify the issues? Can I get some opinions? Dr. Young: I know we don't know enough about the subject to say we can choose specific models, specific parameters, or specific outcomes. Certainly, there are outcome measures that we all agree would be a superb (NMR, 17-p.m resolution, repeated measures, and so forth). But is it practical for drug screening? If something is extremely expensive or extremely difficult to do, shouldn't we save that for the last 100 yards of the marathon, the finish line being a clinical trial? I would regard quantitative morphometry and many of the trained behavioral analyses approaches toward outcome measures as being the final 100 yards of the marathon. We should orient as much of our initial preclinical testing so that we minimize the number of false leads that reach that final state, and we can spend our resources on the most promising drug. From that point of view, in the initial stages, we should use the most sensitive tests, but I think the real issue that faces us here is what we should use as a screening test. At some point, we must initiate multicenter studies, in the same way and for the same reasons multicenter studies are done clinically. I am convinced by what I have seen today and 171

FADEN

experience that no single laboratory can carry out the entire process of systematic dose-response, replication, going across several different models, and then solving the effects of anesthesia. We should have some kind of cooperation among laboratories. It need not to be a formal consortium, but at the very least, if Alan Faden is going to study MK-801 and Jean Wrathall is going to study MK-801, the two of you should be talking. Dr. Faden: Your point is well taken and should be addressed. Brad (Stokes, OSU), what are your views on from

our

the two issues—one, the choice of outcomes, and two, the idea of either a consortium or a series of groups that begin to work together to standardize methodology? Dr. Stokes: I think it's absolutely necessary. But our group might see it a little bit differently, obviously, from the data we have presented. From the data that we have seen today, I am not convinced that there are acute, functional, physiologic, or morphometric indices that are going to have enormous predictability in chronic outcome, which is our ultimate aim. I would not advocate, for screening purposes, long duration, multicenter, complicated behavioral trials. On the other hand, it seems to me we can supply some index that we see acutely might have some behavioral meaning, at least in the subchronic phase. I think we must do behavioral testing, perhaps at some simpler level, with maybe a selected number of behavioral trials over a circumscribed time period. I think we must corroborate some of these acute measures with some of the

longer-term outcomes.

Dr. Tator: I think that we need to use the most accurate injury models that we can, and if that means standardizing, we should move in that direction. Furthermore, I agree that we can't give up the long-term outcome measures. That would be a mistake. In spite of how keen we are at developing low-labor, low-cost, early acute measures, I don't think they will convince anyone. Even as a screening, I fear that we'll miss some things. Maybe I need to look at the tissue or cell volume measures in more detail to appreciate the meaning in terms of tissue loss, but I think we need to stay with the long-term effects. Dr. Young: I must really clarify. In fact, I developed the Na minus K method not for screening drugs but to allow models to be compared across institutions. What are the reasons for monitoring our injury? It is primarily so that data can be shared and compared across institutions. I really do not think that we should say, "Let us go with one model, let us go with one outcome measure, let us not try any drugs for chronic functional studies until we have screened them." We are rapidly reaching the point where our resources are very limited, and there are 40 drugs that have been reported to be effective for spinal cord injury. If we are to put all these 40 drugs through their paces for chronic study and systematic dose-response, it will be many years from now before we will have a drug for clinical trial. Dr. Faden: I do want to get input from others here. Can we hear your points on standardization and interinstitutional collaboration? Dr. Hogan: I am inclined to think that we are not ready for multicenter studies, that we need to have more definitive focus on drugs that will be given after methylprednisolone. I must say you give me pause when you tell me you have 40 drugs worth studying. That's all I have to say. Dr. Wrathall: I think some things we should standardize. For instance, if we have some methods that are easily exportable, such as Dr. Young's, we should begin to share them. We have been using techniques that have been very hard to export, and it has been very difficult to combine or standardize in terms of some of the behavior, but not impossible. We could now make some videotapes of Dr. Tator and his inclined plane, and those of us who use it might actually replicate his methods instead of doing it an entirely different way. We can talk to each other in the same language, in the same numbers. But I think we would lose a lot making one injury model with one anesthetic and one particular set of outcomes at this point. We will learn more from seeing an effect of a drug in different models than only in one. Dr. Anderson: What do we mean by "screening?" I have always been a believer in a hierarchy of models, that there should be different models, that you learn a lot from diversity. I strongly agree with Dr. Young that there is too much work to be done by a single laboratory, and we are going to have to try to come up with some ways of looking at these compounds in a manner to increase the data. I am not sure what that is. I work with a cat model with behavioral testing at 4-6 weeks. We have done dose-response studies on methylprednisolone, dose-response studies on lazaroid, and delayed administration with lazaroid, and it takes a long time and many resources. It is one of the better models around, but is it practical for widespread use? Perhaps not, but I certainly do not think it should be eliminated. 172

Comment: need for standardization of animal models of spinal cord injury.

JOURNAL OF NEUROTRAUMA Volume 9, Number 2, 1992 Mary Ann Liebert, Inc., Publishers Comment: Need for Standardization of Animal Models of Spinal Cord...
635KB Sizes 0 Downloads 0 Views