Some of the problems and difficulties associated with clinical studies of antidepressant agents.

Br. J. clin. Pharmac. (1977), 4, 199S-207S

SOME OF THE PROBLEMS AND DIFFICULTIES ASSOCIATED WITH CLINICAL STUDIES OF ANTIDEPRESSANT AGENTS F.A. JENNER Medical Research Council Unit for Metabolic Studies in Psychiatry, University Department of Psychiatry, Middlewood Hospital, PO Box 134, Sheffield S6 1 TP, UK

1 The fact that the classification of psychopharmacological agents depends on clinical studies is emphasized. 2 The difficulties in defining homogenous groups of depressed patients are emphasized. 3 Some contradictions in the literature are considered. 4 Caution in interpreting the literature on antidepressant agents is recommended.

Introduction

Questions about how best to acquire knowledge in philosophy and clinical pharmacology have much in common. They are repeatedly posed because they are so important, but also because they are not easily answered. Upon them rests so much else. Establishing, for example, the clinical effectiveness of an antidepressant drug is required before its christening. Unfortunately, the fact is often forgotten that the possibility of acquiring confidence in the clinic is less than in the laboratory. In the so-called more fundamental sciences, elegance of experimental design and strictness of criteria can be of a high order not to be found in the more relevant studies of patients. The elegant design cannot, however, circumvent the linguistic problem that the word 'antidepressant' itself means a clinically demonstrated effect. Further, the questions posed in the clinic, such as "What is the treatment of choice for psychotic depression", have hidden semantic depths not so apparent in studies of the reversal of reserpineinduced catatonic states in rats. The art of living, and certainly of medicine, involves an appropriate compromise between critical thought and action. Choosing to give a drug, taking a wife or applying for a job, all require acts of faith and are often much more effective if done decisively. Others are influenced by the confidence involved and react appropriately. Not only in the clinic but also in industry decisions cannot await impeccable evidence. The decision to go ahead at some stage must be made and the moment of inertia in the whole system when once starting to roll carries much with it. Nevertheless, history probably tends to show that short cuts to the truth are too attractive to men and ultimately very expensive. This

may well be very true of the study of treatment of depression by drugs. The questions posed in assessing a new antidepressant, however, include:

(1) Has it any antidepressant activity? (2) For which sorts of patients is it effective? (35 For which of those types of patients is it more efficacious than other preparations? (4) What are the dangers involved in its use? (5) What is its relative cost: effectiveness ratio? (6) Is drug treatment of the individuals involved really justified and, if so, in whose interest? Or does it involve a refusal to attempt to deal with their loneliness, bereavement or other personal problems? (7) How does it work clinically? Patients to be studied

Efficacy of a drug is meaningless unless it is clear for whom or what it is of value. It is, however, of scientific interest and importance to know whether a compound has any antidepressant effect. It does not matter how slight the effect; the scientific as distinct from clinical interest is considerable if the effect is real. If there is some effect, playing with the molecule might improve it; if there is no demonstrable effect, there is no reason to continue with the series. Also, if the action exists, that result must be explicable in a viable neurophysiological theory of mood and its disorders. Moods are cerebral states (what else could they be?) and products of the individual's genes, memory, past experience and his present environment. Pathological

200S F.A. JENNER

states are probably simply those which are felt to be

undesirable. For modern psychopharmacological purposes, we must still see men as essentially machines, but with intentions, usually computing the probable outcome of their own actions and using their own Pvalues for decision making. The majority of the depressed do seem to have learned helplessness and require more evidence of possibilities to have a go and to hope than do others (Seligman, 1975). A severely non-ecological view is probably outmoded except for a few rare persons who may have a syndrome arising from the almost complete penetrance of a gene, or a fairly predictable periodicity of the affective disorder. Winokur & Tanna's (1969) demonstration of the chromosome linkage in some families between colourblindness, the Xg protein system and affective disorders may highlight a group whose life situation and mood are unrelated. Jenner (1974), among others, has also repeatedly emphasized that predictable periodic conditions at least present a temporal course of affective difficulties which cannot be explained by environmental changes. Pharmacological studies of such persons are therefore often more securely based though difficult to extrapolate. Clinical trials are not, however, usually carried out on such people. Most studies are likely to be on patients whom Brown et al. (1973), for example, have shown to suffer the powerful effects of life events, and their social situation. Large homogeneous groups are therefore difficult to find, or more accurately homogeneity in this respect is difficult to define, but confidence in the similarity of groups before treatments is a requirement of statistical methods. Further, the very word 'depression' can only be used to classify a poorly and arbitrarily delineated quorum of behaviour (including verbal reports of experience). There do not seem to be clearly discernible articulated joints in the universe of psychopatholgy (Hempel, 1965). As Kendell (1975) has pointied out, there has been no unequivocally obvious line demonstrated between depression and schizophrenia, depression and paranoid illnesses, depression and anxiety states, or depression and obsessional neuroses, and so on. Although claims have been made using symptomatology alone to divide depression into psychotic and reactive, they have used statistical methods inappropriately. Most modern studies show that endogenous means either little more than idiopathic, or else conforms to a picture which has been seen traditionally as signifying a state analogous to those seen in internal medicine. For then a warm humanistic approach to the person, although desirable, is thought to be essentially irrelevant to his illness. Unfortunately, the degree to which one can be so confident in most individual cases cannot be known. Scientific progress, however, seldom depends on

absolute philosophical certainty, it proceeds by operationalism. An operational definition is used for the group of patients to be assessed. This can be done while realising that the diagnostic criteria available must have been delineated in an arbitrary or traditional fashion. There is therefore great advantage to be gained from using one of the more popular diagnostic inventories. Doing so would necessarily make attempts to repeat the work and compare it with the literature simpler. Diagnostic recommendations are contaminated by linguistic tradition, and successful varieties represent diplomatic victories of one or another group. Scientifically, those which can be shown to correlate best with prognosis, genetic studies, efficacy of treatment methods or something else uncontaminated with the questions posed, clearly have most to recommend them. Currently, the Present State Examination (Wing et al. 1974) holds a special position in acceptable labelling procedures. It could be used for purposes of defining the diagnosis of depression. The Newcastle Rating Scale (Kiloh et al. 1972) is of value in giving a measure along the neurotic to psychotic dimension which is defined for the purpose by the scale. The intensity of depression can be rated using the Hamilton Scale (Hamilton, 1967). The results of the Newcastle Scale are of use in predicting the likely value of electroconvulsive therapy in reducing the Hamilton rating. Insofar as that is possible, the scale must reflect a 'real characteristic'. Kendell (1975), discussing and almost dismissing treatment as a basis for diagnosis, quotes Hamilton, who apparently pointed out what a strange nosology might be produced if odte had a disease called 'aspirin responsive'. Nevertheless, that classification is obviously of some value. For many purposes, certainly for the patient, used accurately such nosologies are the best one can have. Unfortunately, studies of imipramine responsiveness and electroconvulsive theraphy responsiveness do not always produce coextensive groups. In the MRC (1965) trial, for example, depressed males responded relatively better to drugs and women to electroconvulsive therapy. Classification by quorum-that is, by a collection of symptoms which can be itemized, 'quantified' and summated-cannot give a totally satisfactory basis for a nosology. It adds little information about the real world and cannot on theoretical grounds alone determine treatment. The abscissa may not be linear, or to be more precise the concept of linearity is difficult to use but is essential for the argument. The procedure can, however, be defended on the grounds that that is what ordinary language is like. We are in a sense doing the same thing, if intuitively, to various objects when we classify tables, desks, stools and chairs as things. That classification usefully mixes structural and functional factors and is adequate for some purposes. It would not be appropriate in studying

CLINICAL STUDIES OF ANTIDEPRESSANT DRUGS

the chemical nature of chairs, tables, and so on, nor in predicting the effects of fire or acid on the group. A classification of metallic, wooden, plastic, and so on, would have been better for that purpose. For other purposes, such as insurance policies, inflammable or non-inflammable is all that is required. Language involves classification but the commonsense use of language and of diagnoses can lead one to take words as things which are more concrete than they are, and so to expect that the same nosological vocabulary will do for different purposes. Often it won't. The most useful classification is the one with the widest application, but this is tautological, and as in so many branches of science the truth is only the simplest adequate explanation for the purpose. Here it is required to find out who will benefit by taking a particular drug. The study of general paralysis of the insane is perhaps instructive here. There was considerable debate about the relation of dementia to paralysis until, as some historians would have us believe, Anton Bayle ( 1822) intuitively grasped the unity of dementia paralytica. Re-reading his own writings on the subject does not, however, confirm this view. He didn't carry out an intuitive cluster analysis; his newly described condition was an arachnoditis. The roughened meninges gave the game away. That single criterion defined dementia paralytica, and his classification became undeniable. Most of us feel it will outlive Kraepelin's because it is dependent on an essentially unambiguous criterion. This group proved amenable to specific treatments and was caused by a specific organism discovered later. Science is usually the struggle to produce concepts of that sort. In the meantime, concepts like depression must be used. The word and its precursor melancholia have been used for centuries, but it is vital to emphasize that its meaning changes from epoch to epoch. Melancholia for many ages included paranoia, anxiety, obsessional neuroses and catatonia. Modern of efficacy claims of chlorimipramine, electroconvulsive therapy, monoamine exidase (MAO) inhibitors, amitriptyline, thioridizine and fluphenazine are still made in ways as appropriate for a pre-Esquirol nosology of melancholia as for a postKraepelinian diagnosis of endogenous depression. A group of patients to be studied must necessarily be a somewhat arbitrary group but possibly well defined between borders agreed diplomatically as are those of countries. With certain political alignment of forces, border disputes can be rare! Placebos in antidepressant trials

The reasons for stating that double-blind controlled trials are necessary are obvious. The ethics of using placebos is not so very complicated either. This depends on one's belief in the value of alternative 10

201S

treatments and cannot be the same for everyone. Ethics always depend on intentions and are relative to belief about the consequences of actions. If a patient is profoundly depressed and the doctor believes that there are clearly more effective treatments than the placebo, he must to be highly ethical declare this fact to the patient. To be beyond reproach, he must explain why and in whose interest he is performing the tria&. By and large, it is part of the professional corntract that the doctor acts in the patient's interest. If the doctor really believes there is real doubt about the efficacy of available pharmacological treatments and side-effects can be significantly weighed in arguments against their use, he is perfectly justified in carrying out a placebo trial. He is, in fact, to be applauded for trying to find out what he should do. He is then, however, likely to be in difficulties trying to explain why, despite the ineffective other antidepressant drugs, he really believes this one might work. In the current climate, one feels a friendly psychotherapist, who has little faith in the value of drugs, might nevertheless be open-minded enough to allow a colleague to adminster a simultaneous drug trial on his patients who could then be made aware of the situation and possibly get the best of both worlds. The great scientific advantage of a placebo trial is that it is the easiest way to demonstrate a real, even if only limited, pharmacological action. A failure to demonstrate a difference statistically implies a negligible clinical value for the group studied. Further, the placebo response found helps to identify the cohort studied. Low placebo response rates in bipolar manic-depressive patients might be acceptable, but negligible placebo responses in an ordinary cohort of depressed patients, in an ordinary out-patient or in-patient service, shows there is something different about the trial, the assessment, or patient group from those used in other studies. An impressive apparent placebo response is usually found in about 30% of patients, as, for example, in the MRC

trial (1965). Rogers & Clay (1975) have surveyed controlled trials of imipramine against a placebo. The overwhelming tendency to demonstrate efficacy of imipramine is emphasized, both in the treatment of neurotic and acute endogenous depression. It is also very striking, however, that in trials which did not. attempt to separate those subtypes of depression, the results obtained did not show any clear effect of imipramine. Kerry & Orme (1977) have pointed out that, if trials with less than 20 persons are omitted. the larger and statistically more significant trials show that in endogenous depression 35% responded to a placebo, in studies of mixed groups of patients 65% responded, whereas only 9% of the neurotic patients were improved by a placebo. From such results, taken at face value, one must conclude that imipramine is equally effective irrespective of subdivisions of

202S F.A. JENNER

depressed patients, whereas endogenously depressed patients are more suggestible than neurotic patients, and that the patients of doctors who do not subdivide their depressed patients are statistically significantly the best placebo responders. The complicated nature of such results must instil great caution in simply adding the results together. Because of the low placebo response in studies reported by Rogers & Clay (1975), the statistical significance of the advantages gained from the drug for neurotic depression was much greater than for endogenous depression. On that basis, placebos would not be justifiable in the trials on neurotic patients as extrapolation of results would suggest a small degree of contamination by placebo responses, in fact less than one in ten patients. In ,he endogenous patients, one would predict a placebo contamination in more than one in three, and in the mixed group in more than one in two. Claghorn (1976), although recommending threeway comparisons of new drugs, takes a by no means unique point of view when he writes:". . . a failure to differentiate placebo from either or both of the other compounds indicates that either the assessment techniques or patient population used were inappropriate". There is an alternative explanation, for if editors take Claghorn's view, and other factors reduce the publication of such results, a sociological explanation of Rogers & Clay's (1975) findings is very conceivable. Further, if one takes Rogers & Clay's results very seriously, one must conclude that Kuhn (1957) originally identified the imipramine response as specific to "vital depression", because of the increased placebo response they show compared with neurotic patients. Either something is wrong in handling the literature in the way Rogers & Clay do, or such intrinsically unlikely conclusions must be drawn. The very classical studies of Uhlenhuth et al. (1959) showed that the results of so-called double-blind controlled trials correlate with the investigators' prediction of their outcome. The subtle ways in which the control and non-control subjects can be detected are difficult to obviate, and Rosenthal (1964) has shown just how powerful experimenters' expectations can be. Sceptical caution seems imperative.

Comparative trials For obvious reasons of convenience and ethics, it is easier to undertake comparative drug trials, especially if it is felt the new drug is as likely to be as effective as any available. In this field, for example, we might compare neothyme and imipramine or amitriptyline. It is most important to remember, however, that with a placebo response of about 30%o and a drug response of 50-70%, the pharmacological effectiveness of the test substance is low. In fact, the evidence is such that

only between one in three or five patients are being pharmacologically helped. For 60-80% of the imipramine group, the patient may just as well have been given chalk, or perhaps nothing, as we accept placebo effects lightly. What are they? Perhaps measures of perception of the physician's interest and ability. In a study with small numbers, the failure to show a statistically significant difference between neothyme and imipramine does not necessarily mean that neothyme would have been more effective than a placebo. There are many situations in which imipramine would not be distinguished from a placebo (see, for example, Angst & Theobald, 1970). To substantiate a claim that a comparative trial demonstrates any action at all of a new drug, it is necessary to include reasons for one's confidence that the standard drug would have been distinguished from the placebo in these conditions, and to recall the wide variations of placebo response reported. The MRC multicentre trial in this country is usually taken very seriously, although it too produced confusing results. Using their figures of a 30% placebo response and a 50% drug response, we can calculate the numbers of patients probably required to produce evidence of effectiveness of a drug using the X2

statistic. In fact with 160 patients in each group one could expect 80% chance of demonstrating statistical significance at P

Some problems associated with intestinal surgery in the horse.

Clinical research--some legal problems of the pharmaceutical manufacturer.

Some Psychological Difficulties of Evacuation.

Some single-machine scheduling problems with learning effects and two competing agents.

Some Problems with Randomized Controlled Trials and Some Viable Alternatives.

Some problems inherent in transport studies in synaptosomes.

Antidepressant profiles of bupropion and three metabolites: clinical and pre-clinical studies.

Anthelminthic agents: some recent developments and their clinical application.

3-Arylquinolizidines, potential antidepressant agents.

GABAmimetics: a new class of antidepressant agents?

Some clinical pharmacological studies with terfenadine, a new antihistamine drug.

Some clinical pharmacological studies with butriptyline, an antidepressive drug.

Pharmacological and clinical studies of some nucleoside analogs.

Some problems related to the design and analysis of clinical trials.

The gel test: some problems and solutions.

The myxomycetes--some problems and unanswered questions.

Pharmacokinetics of gestagens: some problems.

Starting from scratch--some problems with Forrest.

Starting from scratch--some problems with Forrest.

Bidder's hypothesis revisited. Solution to some key problems associated with general molecular theory of ageing.

Problems with immunosuppressive agents in renal disease.

Mixed depression: clinical features and predictors of its onset associated with antidepressant use.

Some solved and unsolved problems of chemoinformatics.

Patient enrollment and logistical problems top the list of difficulties in clinical research: a cross-sectional survey.