Int. J . Cancer: 46, 761-769 (1990) 0 1990 Wiley-Liss, Inc.

Publication of the International Union Against Cancer Publication de I'Union lnternationale Contre le Cancer

REPORT ON A WORKSHOP OF THE UICC PROJECT ON EVALUATION OF SCREENING FOR CANCER Meeting held at Selwyn College, Cambridge, UK, on April 2-5, 1990

A.B. M[ILLER', J . CHAMBERLAIN, N.E. DAY,M. HAKAMA and P.C. PROROK This is the 5th report of the UICC Project on the Evaluation of Screening for Cancer. Previous reports were based on our evaluation of screening for individual sites or groups of sites. The present report is based on a workshop at which most of the sites were re-evaluated in the light of new information that had become available since we previously considered the sites (4 years for breast cancer to 7 years for cancer of the cervix) together with an evaluation of 4 sites not previously considered (melanoma, neuroblastoma, nasopharyngeal carcinoma and prostate cancer). We elected not to re-evaluate screening for lung, bladder and oral cancer (considered in 1984) and endometrial cancer (considered in 1985) as we were not aware of any new data that would have led us to reconsider our previous conclusion, that screening should not be considered as public health policy for these sites. The present report comprises a summary of the communications presented at the workshop, together with our conclusions on the state of the art of screening for the cancers considered. At the end of the report we summarize some advances in the methodology of the evaluation of screening. A full report on the workshop will be published elsewhere. In drawing our conclusions, we have incorporated the evidence previously available (Chamberlain et al., 1986; Day et al., 1986; Hakama et al., 1985; Prorok et al., 1984) as well as that presented at the workshop. We emphasize that screening, as considered in our reports, is the detection of unrecognized disease by the application of tests in the general population, or an important subsegment of that population. We have not evaluated medical surveillance or public education campaigns, except to the extent that they have an impact on screening. Our recommendations are, in general, related to the application of screening as public health policy, and the research that we feel should be conducted before such policies on screening are implemented. Further, we are largely concerned with organized programmes of screening, as described in our report on cervical cancer screening (Hakama et al., 1985).

Breast cancer screening Recent results from breast screening trials in Sweden, the UK and Canada were considered. Updated mortality data to December 1989 in the Swedish 2-county (WE) trial show that the relative risk (RR) of dying from breast cancer in the study group allocated to screening has remained around 0.7 since the first publication in 1985. The effects in each 10-year age-group are relatively unchanged, with no reduction in mortality in those aged 4 W 9 on entry. For women aged 50-69 on entry, the reduction in breast cancer mortality is approximately 40%. Death rates due to other causes among women with breast cancer were close in h e 2 study arms. Updated mortality data in the Malmo study show an increasing reduction in breast cancer mortality in the study arm in women aged 55-64 on entry, now approximately 20%. No reduction in breast cancer mortality is seen for women aged 45-54. Poor survival of patients with interval cancers in this younger age-group was noteworthy. The differences in the effect on breast cancer mortality between the Swedish 2-county trial and the Malmo trial are explicable in terms of lack of compliance in the study group and the level of screening in the control group in the Malmo trial. In both trials high breast cancer mortality was seen in the non-compliers. This resuked in an apparent but probably spurious effect of screening in a case-control analysis of breast cancer mortality within the Malmo study group. Data on sensitivity and specificity are now available from the UK trial, the design allowing a comparison of physical examination (PE) alone compared to PE plus mammography (MA). The sensitivity of PE alone was considerably lower than that of PE plus MA, particularly at the later screens, suggesting the unsuitability of PE alone as performed in this trial as a screening test. PE alone also had worse specificity and positive predictive value. Both cohort and case-control mortality analyses indicated that refusers in Guildford were at slightly greater risk of death from breast cancer than a control population (Stoke), with a relative risk (RR) of 1.13. Women screened had an RR of 0.63 relative to the women in the control population. This value corresponds to an RR of 0.94 that the screened women would have had in the absence of screening compared to the control population, i.e. the corrected relative risk for screening is 0.67. Only intermediate outcome measures are available in the Canadian National Breast Screening Study as linkage to the National Mortality Data Base has not yet been completed. In women aged 50-59 on entry the design involves a comparison of PE alone compared to PE plus MA. Although the sensitivity of PE alone was lower than that of PE plus MA, the specificity of the screening process was higher, suggesting the possibility that PE alone as performed in this trial may be appropriate for screening under circumstances where high-quality MA is not available. The rate of interval cancers in the MA arm decreased as the trial proceeded, suggesting that the sensitivity of MA progressively improved. The benign to malignant ratio at biopsy was very much higher than the ratios ,seen in the European studies, reflecting North American practice. In women aged 40-49 on entry, an excess of stage 2 or worSe cancers has been ascertained in the group allocated MA plus PE compared to those who had a single PE alone. In those aged 50-59 on entry there has been no reduction in advanced disease in those who had MA plus PE compared to those who had PE alone. Elucidation of these findings must await complete mortality data. A study of the efficacy of breast self-examination is under way in the USSR and GDR, under the auspices of WHO, in women aged 40-64. A total of 200,000 women in Leningrad and Moscow and 150,000 in the GDR will be entered, random-

'To whom correspondence and reprint requests should be sent, at the Dept. of Preventive Medicine and Biostatistics, University of Toronto, Toronto, Ontario M5S lA8, Canada.

762

MILLER ET AL.

ization being on a factory basis in Moscow and a policlinic basis in Leningrad. In the GDR there will be a non-randomized (quasi-experimental) comparison. The active intervention is BSE, taught on a group basis in Moscow and individually in Leningrad and the GDR. PE is given to both groups in Leningrad and the GDR but not in Moscow. Initial results from Leningrad indicate that more breast cancers have been detected in the BSE arm, with a more favourable stage and node distribution than in the control arm. When national programmes of breast cancer screening are introduced, the opportunity to build in features permitting rigorous evaluation should be taken whenever possible. This has been accomplished in Finland by basing the date of entry into the programme on a woman's year of birth, in a way which allows the identification of comparable groups of women with different screening histories. In the first year of the programme, wide differences were seen between the counties in the false-positive rate for MA (varying from 2.3 to 6.0%) and in the ratio of screening prevalence to annual incidence (varying from 1.6 to 5.0). Compliance so far has been 88%, an extremely high figure for a public health programme. Projections have been made of the effect of introducing mass screening for breast cancer in &heNetherlands for women aged 50-70. Model-based predictions derived from data from other studies have been made for the period 1990-2017, during which an increase of 25% in crude breast cancer incidence is expected to occur because of aging of the female population. Some of the main conclusions were that a programme of MA every 2 years would prevent 17,000 deaths with a saving of 260,000 life-years; the overall cost would be 460 million Dutch guilders, i.e., 7,650 guilders per life-year saved, much lower than for heart transplants and lower than for screening for cancer of the cervix. Screening more frequently than every 2 years leads to a rapid increase in the ratio of marginal cost to marginal improvement in mortality. Adjustment of life years saved for quality of life made only minor changes (Department of Public Health and Social Medicine, 1990). There are certain critical requirements if a breast cancer screening programme is to succeed in a population, particularly high compliance and high quality. There has been concern over the results of some recent studies in that the degree of mortality reduction expected from earlier studies does not appear to have been achieved. These apparently disappointing results can be entirely explained on the basis of a failure to achieve early or intermediate outcome events, in particular high compliance with invitations to attend for screening. In the UK and Malmo studies, for example, because the underlying risk of breast cancer mortality was lower in the compliers than in the refusers, the mortality reduction seen with a 67 to 70% compliance was less than might have been anticipated. Nevertheless, the reduction in breast cancer mortality among women aged 50 or more who attended for screening appears to be similar in these studies to that seen in earlier studies, of the order of 35 to 40%. Whether the high compliance so far achieved in Finland will be duplicated in the emerging programmes in the Netherlands, UK and Canada is yet to be determined, but the initial compliance levels reported from the UK, an average of 67% with a range of 38% to 88%, indicate that high compliance can be achieved in some districts. A further question to be borne in mind by those planning screening programmes is when the anticipated mortality reduction can be expected to be seen. The HIP and Swedish 2-county studies suggested mortality reduction commencing at about 3 4 years after the initiation of screening. More recent studies suggest a delay to about 6 years after start of screening, with no clear indication of when the maximum effect could be expected. Part of the difficulty could have been the learning curve before high-sensitivity MA was achieved, though it is likely that population programmes will have the same delay. Conclusions on breast cancer screening Screening for breast cancer by mammography every 1 to 3 years can reduce breast cancer mortality substantially in women aged 50-70. In women under age 50 there is little evidence of a benefit, at least in the first 10 years after screening is initiated. The cost-effectiveness of screening every 2-3 years by mammography for women aged 50-70 compares well with many other medical procedures. The time taken for a reduction in breast cancer mortality to appear will depend on the initial quality of the screening modalities used. The level of effect in the target population will be strongly dependent on the degree of compliance and on the quality of the mammography. The effect on mortality will be reduced if the screening sensitivity is inadequate. Physical examination, for women over 50, has lower sensitivity than mammography, and in some studies was inferior to mammography in specificity and predictive value. In programmes with high-quality mammography, physical examination may not be a cost-effective adjunct to mammography as a screening procedure. The introduction of mass screening into a population should be planned in such a way that the initial and long-term outcome measures can be evaluated. Non-randomized studies of the effectiveness of breast screening in which screened women are compared with women who refused screening will give an incorrect estimate of the effect of screening in the population if the non-compliers are at substantially different underlying risk from the rest of the population. The results of studies in which this source of bias is not specifically examined need to be treated with extra caution. Study designs which specify women who attended, those who refused and those who had no opportunity to be screened may reduce this problem. Recommendationsfor research The relative benefit of different screening intervals and their cost-effectiveness. Measures to guarantee high compliance with invitations to screening. The effectiveness of physical examination alone, when mammography is inappropriate or unavailable for all. The effectiveness of breast self-examination, both alone and as an adjunct to other screening tests for breast cancer.

Screening for cancer of the cervix More is known about the natural history of cervical cancer, the protective effect of screening based on cytology, and the mechanisms through which the protection appears than for any other cancer. Data continue to accumulate, confirming our previous conclusions that screening for cancer of the cervix is effective in reducing the incidence and mortality from the disease (Hakama et al., 1985). This evidence is derived from non-experimental cohort, case-control and ecological studies. The IARC

SCREENING FOR CANCER

163

collaborative study showed that an approximately 90% protective effect can be achieved at the individual level, while experience from the Nordic countries indicates that organized programmes in such populations can achieve an 80% protective effect at the population level in the targeted age groups. Organized programmes have so far resulted in a larger reduction in the risk of invasive cervical cancer than opportunistic screening, because they achieve higher compliance by the at risk groups and can ensure a higher quality of the screening examination in taking the smear and in the laboratory interpretation of the smear. There is also a potential for major cost saving from organized programmes because they could prevent the overuse of services as compared to unorganized programmes. However, for this to be achieved in countries such as Canada and the United States, it will be necessary to persuade the medical profession to accept less frequent screening for many women not at increased risk, so that resources can be released to ensure that higher-risk women are brought into the programme. It has yet to be demonstrated that this is possible. It should be re-emphasized that in all programmes highest priority must be given to ensure that women at higher risk, especially at the ages of maximum incidence of cancer of the cervix, are adequately covered by screening programmes. Screening more frequently than every 3 years provides only marginally improved protection. Even 5-year intervals between the screening rounds in Finland and Aberdeen resulted in protection as satisfactory as that achieved by more frequent screening in other countries and at far lower cost. In addition, screening of women under 25 provides marginal extra benefit at the population level because of the infrequency of invasive cancer at young ages, while the costs are substantial because of the high prevalence of preclinical lesions, the majority of which will not progress within the next few years after detection, while many will regress. Nevertheless, in many countries decisions have been taken to start screening at younger ages because of the substantial weight given to cases of invasive cancer of the cervix at young ages. The age-specific incidence of cancer of the cervix, and the trends in recent years, should be considered in each country, therefore, to guide such decisions. For maximal cost-effectiveness screening should commence at an age only a few years before invasive disease appears at a level that is high enough to justify screening. Because the protective effect is known, it is possible to estimate the marginal benefit corresponding to any change in the age at which to start screening and make recommendations that are specific to the age-specific rates in the population. The protective effect at age:; older than 65 is largely unknown in women who have been previously screened. Because of problems in taking smears, attendance for screening and possibly a lesser protection at older ages (because of more rapid progression of preclinical lesions at older than at younger ages) it may be appropriate not to extend screening beyond the age of 65 for women who have had a number of negative smears in the previous 10 years and no positive smears. Women older than 65 who have not had a number of negative smears should continue to be screened until they have achieved such a record (at least 2 negative smears). Decisions on the age at which to start and stop screening, and the frequency of rescreening, that depart from those recommended here are dependent on an assessment of the marginal benefits and disadvantages of such changes and the resources that are available in the country concerned. Apart from the age at which to stop screening, which can still be considered to be a research issue, quantitative data are available to facilitate such decisions. Not all lesions that are diagnosed as neoplastic and that receive treatment would have progressed to invasive cancer. Estimates of rates of progression are strongly related to local diagnostic practices and age. The problem of overdiagnosis is particularly great if viral abnormalities on cytology or viral lesions on colposcopy are considered prmeoplastic in the absence of moderate or severe dysplasia. There is evidence that the introduction of colposcopically directed therapy for such lesions in younger women has led to substantially increased costs and anxiety, without any apparent benefit on the rates of occurrence of more severe lesions. Screening for cancer of the cervix in developing countries deserves special consideration. The resources are limited, there are other health problems with a higher priority and there may be little opportunity for diagnostic work-up and treatment. Without such facilities, screening should1 not be undertaken. However, if such facilities are available, consideration should be given to even one smear at about the age of 40, as the risk of cervical cancer is high and a single smear may result in substantial protection. It has been suggested that, in the absence of laboratory facilities, a programme of clinical examination with a speculum should be introduced to find invasive cancers and result in “down-staging”. There is no evidence available on the effectiveness of such an approach. Given the costs associated with locating and examining women, a more cost-effective approach may be to combine a cervical smear with the speciilum examination of all women included in such programmes. This would seem to be a high priority research issue for any country that considers adopting such a programme. Conclusions on screening for cancer of the cervix Screening for cancer of the cervix is effective in reducing the incidence and mortality from the disease, and is applicable as public health policy. Almost maximal effectiveness is achieved by an organized programme with high coverage that initiates screening at the age of 25 and continues with 3- or 5-yearly screening to the age of 60. Variations from this approach should only be considered if maximal coverage has been obtained, the resources are available and the marginal cost-effectiveness of the change recommended has been evaluated. In developing countries, high priority should be placed on a single smear at age 40, and the programme extended to that recommended above only when high coverage has been achieved and the necessary resources are available. Recommendationsfor research Methods to ensure high coverage of the at-risk population. The benefit from screening previously screened women at ages older than 60. The cost-effectiveness of repeat cytology versus immediate referral to colposcopy .

Screening for ovarian cancer The incidence of ovarian cancer in Western populations is relatively high and in many the mortality from ovarian cancer exceeds that from all other gymaecological cancers combined, in large part because of the inaccessibility of the ovaries and the late stage at diagnosis of many cancers. The screening tests so far available are based on tumour markers, especially CA 125, and ultrasound. High specificity of the tests is a prerequisite because diagnostic confirmation is invasive. In pre-menopausal

764

MILLER ET AL.

women, low specificity of the tests and a relatively low incidence make screening inapplicable. In post-menopausal women, acceptable levels of specificity appear to be obtained by optimally combining several tests or by applying the same test twice within a given interval. However, it seems likely that the sensitivity of the screens has been reduced in such combinations, though the extent of this reduction is not known. Further evaluation of the specificity and sensitivity of such combinations should be obtained. Conclusions on screening for ovarian cancer At the present time there are no data on the effect of screening on ovarian cancer mortality, and screening for ovarian cancer cannot be recommended as public health policy. Recommendationsfor research Data on intermediate outcomes ( e . g . ,on the sensitivity and specificity of a combination of tests, the extent to which early cancers are diagnosed and the effect of screening on advanced cancer) should be obtained. When satisfactory data on intermediate outcomes are available, well-planned randomized studies of screening for ovarian cancer with mortality as an endpoint should be supported. However, such trials, either individually or in combination, will have to be large, involving at least 150,000 post-menopausal women.

Screening for colorectal cancer We previously concluded that screening for colorectal cancer was not recommended as public health policy, but that data that could become available from the controlled trials then in progress might justify reconsideration of this conclusion in 5 years (Chamberlain et al., 1986). Unfortunately, this hope has not been realized. The data available from the 4 randomized trials of screening and 1 case-control study are still interim, and no data on mortality reduction following screening with a faecal occult blood test (FOBT) are yet available from them, nor from the 5th trial in New York, still to be published. The Minnesota trial started in 1975 and recruited 46,550 volunteers who were randomized into 3 groups over a 3-year period: a biennial screen group, an annual screen group and the controls. Screening continued for 5 years, but during the subsequent 5-year follow-up screening was resumed because fewer deaths than anticipated had occurred in the control group. Compliance with screening has been high but falls with increasing age. To increase sensitivity, rehydration of all FOBTs has been performed since 1982, and currently 12% of those tested are positive. Colonoscopy is used for diagnosis and by the end of 1993 about 30% of subjects in the annual screen group will have received at least one colonoscopy. Extensive death review is performed in this trial according to pre-agreed criteria, with similar but rather small proportions of circumstances in which the review committee concluded that death was due to colorectal cancer which was not recorded on the death certificate, or in which the death certificate indicated death was due to colorectal cancer but the committee did not agree. It has been decided not to publish mortality results from this trial until a significant difference emerges or it can be concluded that a 25% reduction in mortality in the annual screen group will not be seen. The study may be able to show whether reduction in incidence of colorectal cancer results from the screening process applied, but interpretation will be difficult because the occurrence of adenomas is not recorded in the control group. After a pilot study in 1984, the Nottingham study in the UK started in 1986 and has so far identified over 140,000 men and women aged 50-74 from general practitioners’ lists and randomly allocated them to study and control groups. The planned sample size is 160,000. An FOBT every 2 years is offered to the study group. Compliance was low in the early part of the trial but, after changing the letters of invitation so that they came from general practitioners, compliance increased and averages 53%. Two percent of the subjects screened have positive tests and the detection rate at first screen so far is 2.3 per 1,000. The sensitivity of the test has been estimated at 65%. Rehydration has not been used to avoid a large number of false positives. Nearly 6 times as many adenomas over 2 cm in diameter have been detected in the screened group as in the control group. Dukes’ stage-A cancers comprised 53% of those detected at the prevalence screen and 50% at subsequent screens compared to 13% in the control group, though so far there has been no reduction in the rate of advanced disease in the screened group. There has been a high incidence of advanced cancers in the non-responders. Because of this, the planned 23% reduction in mortality in the screened group compared to the control is now considered unrealistic and a reduction of 10 to 15% more likely, with longer follow-up planned. The trial in Gothenburg, Sweden, started in 1982 and has enrolled nearly 52,000 subjects through the general population register, randomized to a study group offered an initial FOBT with a repeat screen 16-22 months later and a control group. Compliance was 65%; 5% of those screened had false-positive tests; 2.3 cancers have been found per 1,000 persons screened. Sensitivity of the test was only 22% initially so rehydration was introduced. This increased the positivity from 1.9% to 6.1% and the sensitivity to 89%. Those who test positive are now asked to repeat the test with more attention to the dietary restrictions, and specificity has improved from 95% to 98%. Eleven times as many adenomas 1 cm or more in diameter have been detected in the screened group as in the control group. Dukes’ stage-A cancers comprise 50% of those detected at the prevalence screen and 31% at subsequent screens compared to 12% in the control group. However, there has so far been no reduction in the absolute number of advanced cancers in the study group compared to the controls. The trial in Funen, Denmark, started in 1985 and 62,000 people aged 45-74 have been randomized to a study group offered an FOBT test every 2 years and a control group. Compliance was 65%, the positivity rate 1% and the detection rate at the first screen 1.8 per 1,000. There has been a shift to an earlier stage distribution of screen-detected cancers, twice the number of adenomas in the study as in the control group, but a relatively high incidence of interval cancers. There is a non-significant reduction of 22% in mortality from colorectal cancer in the study compared to the control group. It has been estimated that the study will have a 70% power to demonstrate a 25% reduction in colorectal cancer mortality when all subjects have been followed for 5 years. In the Federal Republic of Germany screening for rectal cancer by digital examination has been public health policy for those over 45 since 1971, and for colorectal cancer by FOBT since 1977. It was estimated that in 1986 14% of men and 25% of women were screened and that by then 50% of eligible subjects had been screened at least once. Evaluation of the effect of this

SCREENING FOR CANCER

765

programme is difficult because of the FRG’s privacy laws. However, a case-control study has been started in which deaths from colorectal cancer are identified by pathologists enquiring about the follow-up status of patients aged 45 to 74 diagnosed between 1979 and 1985. The screening history and 4 matched controls are then sought by the pathologist from the patient’s general practitioner. Findings are anticipated in about 2 years. Conclusions on screening for colorectal cancer The design of the current trials will eventually give an answer on the effect of screening on colorectal cancer mortality, but each trial alone will not be definitive and most will still take several more years. It will also take several more years of follow-up before it can be determined if removal of adenomas has an effect on cancer incidence. In the meantime, screening for colorectal cancer or its precursors is not justified as public health policy. Recommendations for research Evaluate the validity of new screening tests for faecal occult blood. Conduct combined analyses of the currently ongoing FOBT trials. Evaluate the efficacy of flexible sigmoidoscopy performed every 3 years

Screening for stomach cancer Stomach cancer is one of the most common cancers in the Western Pacific, Central and South America and Eastern Europe, and the possibility that screening can reduce mortality requires careful consideration. Screening for stomach cancer for those aged 40 and over began in Japan in 1960 using barium X-ray. In Miyagi prefecture from 1960 to 1988, 2,859,000 persons were screened and 5,350 cases of stomach cancer detected. A trend towards increasing incidence of early cancer and decreasing incidence of advanced cancer was observed over this period. Studies to evaluate the effectiveness of the Japanese screening programme have used several designs including time-trend analyses, case-control studies andLa trial with group randomization of a special invitation letter to encourage participation currently in progress. Completed studies have suggested a benefit, but several possible biases cannot be ruled out. Recently completed time-trend analyses, for example, show a drop in mortality which cannot be explained by decreased incidence. However, lesser falls in incidence than in mortality could be explained by the effect of screening in detecting earlier disease or lesions which would not have surfaced otherwise. A recent case-control study using 367 advanced cancers as cases and 367 matched controls obtained a relative protection of 65%, but no attention was paid to the possibility that the underlying risk was different in the 2 groups. A case-’controlstudy has also recently been initiated in Venezuela where a programme of screening using barium X-ray has been in progress in a high-risk area of the country. Conclusions on screening for stonuich cancer There are data from Japan which suggest that stomach cancer screening can reduce mortality. Screening programmes should continue in those regions with high stomach cancer incidence where they are already under way, but stomach cancer screening cannot be recommended in other countries as public health policy. Recommendationsfor research Where screening has not a1re:ady been implemented as public health policy (i.e., outside Japan), the opportunity should be taken to investigate the effectiveness of barium X-ray screening in high-risk populations by means of a randomized, controlled trial.

Screening for melanoma of the stkin Malignant melanoma is a candidate site for screening as the disease is increasing in incidence in many populations and there are major differences in long-term survival in terms of lesion thickness. The only screening test which has been proposed is visual examination of the skin. This is relatively simple and non-invasive though rather non-specific and, if used on a large scale, such testing could involve a substantial time commitment on the part of many health professionals. Campaigns aimed at encouraging limitation of sun exposure, use of sun-screens, self-examination and prompt response to early suspicious signs have been undertaken in many parts of the world. Some programmes also encourage professional screening by offering free skin checks. The earliest campaign was started in Queensland, Australia, where a shift has been observed in recent years to a lower proportion of thick melanomas, though the incidence of thick as well as thin melanomas continues to rise. Similar programmes have begun in other parts of Australia and in New Zealand. However, mortality and incidence rates for the whole of Australia continue to rise as well. In 1985, the American Academy of Dermatology began a melanomdskin cancer screening campaign which has so far screened 260,oO persons and yielded 1,644 suspected melanomas and 20,000 non-melanoma skin cancers. This programme has involved volunteer dermatologists screening self-selected individuals. In the UK, campaigns aimed at increasing awareness of melanoma and encouraging early referral to special clinics, begun in 1987, have been monitored in 9 districts over a 2.5-year period. The sensitivity and specificity of the skin examination are unknown. Sensitivity is difficult to assess in programmes encouraging mass self-examination. However, programmes can and should monitor specificity, costs and compliance with follow-up. It is uncertain who can best perform examinations and which is the appropriate target population. The fundamental issue is whether the programmes can reduce mortality from the disease. Only then will it be possible to determine whether the thickness of the lesion can be used as an intermediate end-point. Controversy exists over the: role of the dysplastic naevus syndrome in the natural history and screening for melanoma. It is

766

MILLER ET .4L.

not known whether this is a true precursor lesion or a high-risk indicator. The syndrome occurs in 2-8% of the melanoma-free population, and in about one-third of melanoma patients. However, the clinical and histologic features are not generally agreed upon. Conclusions on screening for melanoma Screening for malignant melanoma is in an early stage of development. No evaluation studies have been completed to determine the impact of screening on mortality. Until such data are available, screening for malignant melanoma is not recommended as public health policy. Health promotion programmes advocating enhanced individual awareness may be beneficial but data are not available to confirm this. Recommendations for research Monitor trends in the distribution of lesion depth at diagnosis. Determine the value of using the early signs of melanoma in population programmes. Determine the value of high-risk markers, including their frequency in the population and their reproducibility. Conduct a randomized controlled trial to determine the value of screening for melanoma in reducing the mortality from the disease, if a suitable opportunity occurs. Ongoing programmes of public education and self-examination as well as of physician examination should be evaluated for reduction in mortality.

Screening for neuroblastoma Neuroblastoma is a common malignancy in children, with most cases diagnosed after 2 years of age when survival is poorer. The disease has an unusual natural history, with a high incidence of in situ lesions, most of which spontaneously regress. Survival is best in stages I, I1 and IVs, and many lesions spontaneously regress from stage IVs. This raises the question of the value of finding such lesions by screening. Based on the pioneering work in Japan beginning in 1974, it is possible to detect neuroblastoma cases by screening. The test has been shown to be acceptable to parents in Japan and elsewhere. Screening in Japan is done by measuring the excretion of the phenolic acids vanillylmandelic acid (VMA) and homovanillic acid (HVA) in the urine using quantitative measurement by high-performance liquid chromatography. Recent results from Japan of such testing performed at 6 months of age indicate an increase in incidence, a shift to earlier ages at diagnosis and to stage I and I1 disease, and a decrease in deaths from neuroblastoma since the beginning of the programme. Although these findings have been interpreted as indicating that the screening programme has been effective, it is pointed out that 2 different peaks of neuroblastoma incidence suggest that those cases detectable at 6 months may be prenatal in origin, while the later peak after 1 year may represent disease with later onset and poorer prognosis, not so readily influenced by screening. Additional studies have recently been proposed. One is a controlled trial with sequential screening of one-year birth cohorts in Scotland and 5 regions of England. This will involve 561,000 screened children and 1,123,000controls over 5 years, with neuroblastoma mortality as the end-point. A second controlled study was started recently in North America with screening in the Canadian province of Quebec, with other regions in North America as control areas. Conclusions on screening for neuroblastoma There is some suggestion from Japanese studies that screening for neuroblastoma can reduce mortality. There is a need to determine if the results from Japan can be duplicated in other countries. At present, screening for neuroblastoma in countries other than Japan cannot be recommended as public health policy. Recommendations for research Complete the evaluation studies in the UK and North America. Determine the optimum age for screening.

Screening for nasopharyngeal cancer Populations at high risk for nasopharyngeal cancer include those in South Eastern China, Taiwan, Chinese migrants to Singapore and the USA, natives of Alaska, Canada and Greenland, with intermediate risk in Thailand, Vietnam, the Philippines and certain North African groups. The disease seems a good candidate for screening in high-risk populations because it is a very serious health problem in countries with a high prevalence. However, it is not known that treatment of “early” disease is effective, nor has the optimal test been identified. Tests based on Epstein-Barr virus serology are available, and several different methods are used to perform the tests. Population screening programmes have been initiated in the Guang Xi and Guang Dong areas of the People’s Republic of China. Data are not available for complete evaluation, however, as exemplified by information from China in which prevalence screens were performed with no repeat screens. There are no data on selection of screenees, compliance or active follow-up of negatives, so no valid estimation of sensitivity and specificity is possible. A stage shift to stage I and I1 disease from 30% to 70% has been reported, but no mortality data. Conclusions on screening for nasopharyngeal cancer While the existing screening tests show some promise, no study has been done to examine the impact of screening on mortality. Therefore, screening for nasopharyngeal cancer cannot currently be recommended as public health policy. Recommendations for research Standardize available tests with respect to definitions of positives and negatives. Estimate sensitivity and specificity of the tests under screening conditions.

SCREENING FOR CANCER

767

Evaluate therapy for cancers suspected on the basis of abnormal serology. A randomized controlled trial should be conducted in unscreened high-risk areas, using mortality from the disease as the endpoint.

Screening for cancer of the prostate Screening for prostate cancer is confronted by a number of questions including: sensitivity and specificity of the screening tests; appropriate treatment to be used for “early” disease; morbidity and mortality associated with the detection and treatment of non-progressive cancer; costs of screening; the few potential years of life saved, given the age distribution of prostate cancer patients; and the ethics of screening for the disease. Post mortem studies have shown many times more latent prostate cancers than will ever surface in life, so that uncovering and operating on these cancers would lead to great over-diagnosis and over-treatment. No data are available on the accuracy of any screening test in detecting with1 reasonable sensitivity and specificity those cancers that will progress to clinical disease. Conclusion on screening for prostate cancer Screening for prostate cancer cannot be recommended at present as public health policy. There are strong contraindvcations for any screening on a large scale, given the likelihood of over-treatment. Recommendations for research Obtain data on the sensitivity and specificity of the tests proposed on those cancers that will progress to clinical disease, the morbidity and mortality from treatment for early-stage prostate cancer and the resulting quality of life of those treated. If the above data are satisfactory, perform large-scale, randomized, controlled efficacy and effectiveness trials a:s a prerequisite before screening for prostiite cancer is introduced as public health policy. These trials should collect data on the quality of life of those with screen-detected disease in comparison to the controls, as well as on mortality as the definitive end-point.

Issues relevant to the evaluation of screening Privacy laws In some countries data relevant to ongoing screening programmes are kept in a way which makes them inaccessible for evaluating the effectiveness of the programme. This is the result of attitudes towards data privacy and the confidentiality of medical information. The restrictions so caused inhibit research on the provision of optimal cost-efficient health services. It is clear that this deterrent to health !service evaluation is not in the best interests of public health. Over-restrictive rules on the confidentiality of medical and other records and their linkage, where they can be shown to be inhibitory to such research, should be reconsidered. Sensitivity/proportionate incidence

The approach normally used to evaluate the sensitivity of a screening test is heavily influenced by the extent tlo which non-progressive lesions, even if apparently identical histologically to cancers diagnosed clinically, are counted. This “overdiagnosis” problem is avoided if the sensitivity measure used is one minus the ratio of the incidence rate of interval cancers to the expected incidence in the abseince of screening (derived from a control group or a comparable unscreened population). This ratio has also been called the proportionate incidence. Although dependent on the complete ascertainment of interval cancers and their distinction from screen-detected cancers, and on the appropriateness of the comparison group from which the expected incidence is derived, it completely avoids the problem of the definition of false negatives. Its use is recommended. Intermediate end-points It is accepted that reduction in mortality is the definitive end-point for evaluation of screening for cancer, while reduction in incidence may be used in addition when screening for a precursor. There is, however, a need for intermediate outcome measures that can be used before mortality or incidence data become available, to serve as an early check on the effectiveness of a programme for a cancer where previous study has shown that a favourable mortality reduction can be expected. One iqqxoach to determining such intermediate outcome events is consideration of the process of effective screening. This includes high compliance with invitations to attend screening, low interval cancer rates and reduction in the cumulative incidence of advanced disease. The last two will be accompanied by a high proportion of small invasive cancers among screen-detected cancers in the prevalent and subsequent screens. For breast cancer screening using MA, a prevalence of invasive cancer of the order of 3 times the expected annual incidence can be expected for women in their fifties, with higher prevalence-to-incidence ratios in older women. This ratio can be affecteld, however, by the degree of over-diagnosis of cancers. For other cancers, this ratio will be different, depending on the average sojourn time of the asymptomatic lesions and the sensitivity of the test. Studies of breast screening based on the Swedish 2-county trial show that the survival differences among different groups of cancers (prevalence-screen-detected, other screen-detected, interval cancers and control group cancers) can be accounted for by 3 prognostic variables, tumour size, nodal status and malignancy grade. Future breast cancer mortality in new breast-cancer screening programmes can therefore be predicted if these variables are assessed. However, to do so requires a programme with constant standards applied to determination of tumour size and malignancy grade. This means that it may be difficult to apply these measures in a programme without prior experience that confirms their validity in predicting survival. If the measures can be validated, however, greater statistical power will be achieved by using them as an endpoint for decision making, rather than by basing decisions, after a much ‘longertime interval, on mortality. Providing the screening test has a high and relatively constant sensitivity, the total number of cancers occurring in the interval between and excluding the prevalent screen and up to and including a subsequent screening lest should be approximately equal to the expected incidence in a comparable unscreened group. This has been called the “unbiased” set of cancers, since it should not suffer from length bias. Provided prognostic variables have been identified that account for survival differences by mode of detection, then trials of

768

MILLER ET AL.

different screening frequencies or modalities can be based on comparison of these tumour characteristics. Trials based on such comparisons will be more powerful and achieve a result in a considerably shorter time span than trials based on mortality differences. Nevertheless, it is important that all programmes should be monitored to confirm that the expected mortality reduction is seen. For screening for cancer at other sites, before such an approach could be used, results would have to c o n f i i (1) that a set of prognostic factors can be identified which are sufficient to account for the survival difference between screen-detected, interval and control group cancers, and ( 2 ) that the sensitivity of the test is high. It is likely that neither condition is met by current screening tests for lung cancer. Models Models are being developed that should facilitate greater understanding of the natural history of screen-detected cancer and of measures of the effectiveness of screening. These, however, should be interpreted recognizing the assumptions that go into the models. In the model developed by Walter and Stitt (1987), the magnitude of survival benefit for screen-detected cases is estimated allowing for lead time. Incidence and prevalence functions are developed based on the expected incidence of clinical disease, the distribution of the sojourn times and the screen sensitivity. The model estimates the distribution of the extra survival after the expected time of clinical detection, and it is then possible to compare the total survival distribution to that of cases not detected by screening, to examine whether improvement has occurred. A disadvantage of this model is that it does not account fully for length, nor possibly selection, bias. Another model entitled Periodic Screening Evaluation (Baker and Chu, 1990) estimates the age-specific probability of cancer incidence in the absence of screening from the experience after screening of older individuals, applied to that of younger screenees at corresponding ages. Using data on case-fatality in the absence of screening from an unscreened control group, the model estimates the age-specific mortality from cancer given asymptomatic status at a certain age. The key assumption of the model is that the incidence of cancer does not vary by birth cohort. Some models attempt to extrapolate various process measures to the expected final result. An example is the Stage-shift model (Connor et al., 1989) which facilitates exploratory analyses of a screening randomized controlled trial by estimating the proportion of cancers that have been shifted by screening to an earlier stage, and the proportion that have been detected earlier in the same stage. Mortality benefits can then be estimated according to stage. The model requires, however, that screening has been completed, and that the follow-up has reached the point in time when comparable sets of cancer cases have accumulated in the study and control groups. The Peak analysis model (data not shown) uses data from a randomized trial to determine the time period when the effect of screening on mortality reduction is maximum. The results of the trial can then be analysed restricting attention to that time period, providing a more powerful statistical test. For breast cancer screening, for example, this could mean excluding the mortality experience in the first few years after the initiation of screening. A disadvantage of this model is that the selection of the peak time period for the mortality comparison could be regarded as “data-driven’’ and subject to the usual problems of apost hoc analysis. An attempt to overcome this disadvantage is achieved by computing standard errors by bootstrapping the entire procedure of selecting a peak time period and analysing mortality within this restricted period. It is clear that these, and other models already developed or under consideration, may enhance our understanding of the natural history of screen-detected lesions and the process of screening. However, they require validation with the best available data, which is preferably derived from randomized trials, before they could be extrapolated in ways that might guide policy decisions. As such data become available, assumption-based models need to be modified to incorporate this extra information, in order to improve the extrapolations needed to make policy decisions. Case-control studies Understanding of case-control studies of screening has increased substantially since we first considered this issue (Prorok et al., 1984). It is now recognized that different considerations apply to case-control studies used to evaluate screening when mortality is the preferred end-point (or when advanced disease is used as a substitute) than when incidence of disease is the end-point. For example, the screen that led to the diagnosis of a case is included in the exposure in the former circumstance, but excluded from the latter. Case-control studies that have been undertaken on breast cancer screening have often not addressed the problem of selection bias. Recent experience in studies in Sweden and the UK, where case-control studies were performed within trials, show that although those who refuse invitations for screening show a breast cancer incidence similar to that of controls, their breast cancer mortality experience is worse than that of controls. This means that the estimate of effect of screening in such case-control studies will show a greater effect than could be expected in the total population. This bias might be corrected if data relating to risk factors for prognosis of breast cancer were available for a comparable unscreened population and for the non-compliers. For case-control studies used for evaluation of cervix cancer screening, care is required in selecting controls; thus controls for screen-detected cases, if identified as such, should be women screened at the same time, with the prior screening history used as the exposure for both cases and controls. In theory it may be easier to correct for a selection bias in this type of study as more information is available on the determinants of incidence than on mortality. However, in studies where risk factors for disease risk were available, adjusting for these had little effect. For other cancers, caution is appropriate in using the case-control approach, as in any observational study. The design does, however, have application in considering questions such as frequency of rescreening. However, it is important that the findings of all case-control studies should be interpreted with care before they can be extrapolated in ways that could guide policy decisions.

SCREENING FOR CANCER

769

PARTICIPANTS L. JANZON,Malmo, Sweden. J. KEWINTER, Gothenburg, Sweden. 0. KRONBORG, Odense, Denmark. V. KOROLTCHOUK, Geneva, Switzerland. H.K. KOH, Boston, MA, USA. A.B. MILLER,Toronto, Canada. S. Moss, Sutton, Surrey, UK. P.C. PROROK, Bethesda, MD, USA. D.M. PARKIN,Lyon, France. T. SAWADA, Kyoto, Japan. A.J. SASCO,Lyon, France. M. THOMAS, Nottingham, UK. L. TABAR,Falun, Sweden. J. WAHRENDORF, Heidelberg, FRG.

J.H.M. BLOM,Rotterdam, The Netherlands. J. CHAMBERLAIN, Sutton, Surrey, UK. H. CUCKLE,London, UK. A.W. CRAFT,Newcastle-upon-Tyne, UK. T. CHURCH,Minneapolis, MN, USA. J. CUZICK,London, UK. A.P. DAVIES,London, UK. V.R. DOHERTY, Glasgow, Scotland. N.E. DAY,Cambridge, UK. N. EINHORN, Stockholm, Sweden. R. ELLMAN, Sutton, Surrey, UK. M. HAKAMA, Tampere, Finland. J.D.F. HABBEMA, Rotterdam, The Netherlands. S. HISAMICHI, Sendai, Japan.

ACKNOWLEDGEMENTS

The authors (members of the Core Committee of the UICC Project on Screening for Cancer) express their grateful thanks for the support of the Cancer Research Campaign, of the Imperial Cancer Research Fund and of the International Union Against Cancer (UICC) for the workshop on which this report is based. A.B.M. is supported in part by a National Health Scientist Award of Health and Welfare, Canada. REFERENCES BAKER,S.G. and CHU,K.C., Evaluating screening for the early detection and treatment of cancer without using a randomized control group. J . Amer. statist. Ass., 410, 321-327 (1990). CHAMBERLAIN, J., DAY,N.E., HAKAMA, EA., MILLER,A.B. and PROROK, P.C., UICC Workshop of the Project on Evaluation of Screening Programmes for Gastrointestinal Cancer. Int. J'. Cancer, 37, 329-334 (1986). CONNOR, R.J., CHU,K.C. and SMART,C.IR., Stage-shift cancer screening model. J . clin. Epidemiol., 42, 1083-109.5 (1989). DAY,N.E., BAINES,C.J., CHAMBERLAIN, J., HAKAMA, M., MILLER, P.C., UICC project on screening for cancer: Report of A.B. and PROROK. the workshop on screening for breast cancer. Int. J . Cancer, 38, 303-308 (1986).

DEPARTMENT OF PUBLICHEALTHAND SOCIAL MEDICINE.The cosrs and effects of screening for breast cancer. Final report. Erasmus University. Rotterdam (1990). HAKAMA, M., CHAMBERLAIN, J., DAY,N.E., MILLER,A.B. and PROROK, P.C., Evaluation of screening programmes for gynaecological cancer. Brit. J . Cancer, 52, 669-673 (1985). PROROK, P.C., CHAMBERLAIN, J., DAY,N.E., HAKAMA. M. and MILLER, A.B., UICC Workshop on the Evaluation of Screening Programmes for Cancer. Int. J . Cancer, 34, 1-4 (1984). WALTER,S.D. and STITT.L.W., Evaluating the survival of cancer cases detected by screening. Stat. Med.. 6 , 885-900 (1987).

Report on a Workshop of the UICC Project on Evaluation of Screening for Cancer.

Int. J . Cancer: 46, 761-769 (1990) 0 1990 Wiley-Liss, Inc. Publication of the International Union Against Cancer Publication de I'Union lnternationa...
1MB Sizes 0 Downloads 0 Views