Am. J. Hum. Genet. 48:819-823, 1991

Invited Editorial: Research Courtroom Application

on

DNA Typing Catching Up with

Eric S. Lander Whitehead Institute for Biomedical Research and Department of Biology, Massachusetts Institute of Technology, Cambridge

Throughout the century, basic advances in the study of human variability have consistently provided new tools for the analysis of evidence samples in criminal cases and paternity disputes. Forensic genetic typing began with the ABO blood group but soon expanded to other blood groups, serum proteins, red blood cell enzymes, and (in applications, principally paternity testing, in which fresh samples can be compared) histocompatibility antigens. With the recognition (Botstein et al. 1980) that the human genome is replete with DNA sequence polymorphisms such as RFLPs, it was only a small leap to imagine that DNA could, in principle, provide the ultimate identifier. Forensic DNA typing emerged as a practical reality with the important papers of Jeffreys et al. (1985a, 1985b), reporting the discovery of hypervariable "minisatellite" regions in the human genome having common core sequences. According to the authors, a multilocus probe detecting many such regions simultaneously could provide "individual-specific DNA fingerprints." In a linguistic stroke, the term "DNA fingerprinting" changed the entire paradigm used in forensics. Previously, forensic scientists had used genetic markers to "include" or "exclude" a suspect from the individuals who might have the source of an evidence sample. With the prospect of individual-specific DNA patterns, forensic scientists could hope to achieve absolute identification. The field has been in ferment ever since. The British Home Office and Scotland Yard soon began using multilocus probes for immigration cases and for criminal investigations in 1985, including the

Received February 19, 1991. Address for correspondence and reprints: Eric S. Lander, Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142.

o 1991 by The American Society of Human Genetics. All rights reserved. 0002-9297/91 /4805-0001$02.00

famous Colin Pitchfork case described in Joseph Wambaugh's The Blooding. In the United States, private companies began to market DNA-typing services to police (using hypervariable single-locus VNTR probes, which many considered technically more robust for use with dirty evidence samples), and criminal cases based on DNA results began to come to trial by 1988. The Federal Bureau of Investigation (FBI) proceeded more cautiously but still felt enough pressure to open its own DNA-typing laboratory by 1989. The desire to introduce the powerful new system as rapidly as possible was understandable, but an unfortunate consequence was the lack of careful scientific papers describing and justifying the specific DNAtyping procedures being used, including determination of the detailed properties of the probes on forensic samples; defining the matching rule for defining when two DNA samples should be considered indistinguishable; the statistical procedure for assigning weight to the occurrence of a match; establishing the population data bases upon which these conclusions were based; and presenting the results of rigorous proficiency studies revealing the laboratory error rate. On the one hand, private companies were initially reluctant to disclose much of this information, feeling that it should be kept a trade secret. On the other hand, not realizing the many differences that make forensic applications technically more demanding, most observers, including the present author, initially reasoned that DNA forensics was so closely analogous to DNA diagnostics that careful characterization was not a pressing need (Lander 1989). Finally, some felt that high-quality scientific journals would not be willing to publish detailed characterizations of the procedures. It is not surprising that, although DNA evidence proved generally successful, problems began to crop up. After a 15-wk pretrial hearing in New York v. Castro, a court threw out DNA evidence of a match because the testing laboratory's procedures and interpretations failed to follow generally accepted stan819

820

dards (Lander 1989). One week into a hearing in Maine v. McLeod, a prosecutor suddenly dropped all criminal charges against an accused child rapist, when it became clear that the testing laboratory could not defend its procedures for correcting DNA patterns for bandshifting during electrophoresis (Norman 1989). More recently, some courts have grown skittish about population genetic calculations reported by some testing laboratories - calculations in which the reported odds against a match occurring by chance have reached such dizzying heights as 739,000,000,000:1. In January 1991, the Supreme Judicial Court of Massachusetts in Commonwealth v. Curnin rejected DNA evidence, pending a resolution of the population genetic issues. Within weeks, trial courts reached the same conclusion in Arizona v. Despain and Illinois v. Fleming. Against this background, it is encouraging to see the recent appearance of a number of serious scientific studies -including papers in this issue of the Journal, from the FBI (Budowle et al. 1991) and from Jeffreys et al. (1991)-aiming to provide firmer foundations for procedures used in forensic DNA typing. While they necessarily leave much unresolved, the papers represent an excellent first step. Laboratory Interpretation: What's a Match? A key component of any DNA-typing procedure is the matching rule, the precise criterion for determining whether two DNA patterns match. A formal matching rule is hardly necessary in DNA diagnostics, which involves a simple binary decision about which of a parent's two alleles have been passed to a child, but it is essential in DNA forensics, in which one is comparing unknown samples on the basis of bypervariable loci

having dozens of closely spaced alternative alleles. A valid matching rule must be based on empirical studies of reproducibility that use typical evidence samples. It is remarkable that many forensic testing laboratories initially failed to perform such measurements. One private testing laboratory used no reproducibility studies at all; it simply based its assumed resolution capacity on the distance between consecutive resolvable fragments in its molecular-weight ladder, thereby ignoring all sources of experimental variability other than the final step of selecting the center of a band on an autoradiogram. Another private laboratory based its matching rule on studies with fresh blood samples, neglecting the artifacts that arise with degraded and contaminated evidence samples. Budowle and his colleagues at the FBI report in this

Lander issue of the Journal (Budowle et al. 1991) the first properly designed studies to support a matching rule. Criticizing previous efforts ("it is not possible to define the resolution capacity of a system based solely on the minimum physical distance separating two resolvable fragments" and "special attention should be paid . . . to determine the measurement imprecision for forensic applications"), Budowle et al. examine a cleverly chosen set of matched samples. For 111 rape cases, they compared fresh DNA samples from the victim's blood with forensic DNA samples from the victim's vaginal epithelial cells isolated from the police rape kit. They find that corresponding fragment measurements can differ by up to 5% of molecular weight, with an SD of about 1.5%. These figures are nearly threefold larger than those reported by some private laboratories. Although the level of reproducibility need not be identical among laboratories, owing to differences in methodology, the disparity suggests that some laboratories may have underestimated the true degree of variation inherent in working with forensic samples. Budowle et al. also show that forensic samples show a statistically significant tendency to migrate faster in the gel than in the fresh blood samples, possibly because of degradation. Their data also indicate that the variability has somewhat wider tails than does a normal distribution, which cautions against some statistically elegant proposals to construct matching criteria based on likelihood ratios by assuming a normal distribution. Studies such as that by Budowle et al. are especially important because the FBI plans to create a national computerized data bank of DNA patterns from convicted felons, just as is done for ordinary fingerprints. Accurate characterization of variability is essential. To provide the foundation for such a project, the present work will need to be supplemented with published studies of secular variation within individual laboratories (e.g., measurements of the same samples taken a year apart to study "genetic-typing drift") and laboratory-to-laboratory variation among groups using the FBI system. One important and still unresolved issue concerns the interpretation of apparent bandshifts exceeding the usual matching rule, which can occur because of sample contamination, degradation, or gross concentration differences. It is generally agreed that the solution lies in the use of nonpolymorphic fragments as "internal" molecular-weight standards, but there are no studies addressing the proper calculation to correct for a bandshift or the number of monomorphic bands needed to achieve adequate accuracy. The FBI cur-

Invited Editorial rently terms such cases "inconclusive," although some private laboratories are more venturesome. Statistical Interpretation of Single-Locus Probes

If two samples show matching DNA patterns for the loci studied, the second key step is to estimate the probability that the match might have occurred by chance in the population. As courts have recognized, evidence of a match is meaningless if one does not know the approximate population frequency of the DNA pattern. In a panmictic population, genotype frequencies can be computed by multiplying the individual allele frequencies ("the multiplication rule"), because each allele represents a statistically independent event. Unfortunately, some private testing laboratories initially applied the multiplication rule without regard to whether the requirements for panmixis were satisfied- ignoring the fact that the rule may give incorrect results if applied to populations having genetically differentiated subgroups. Population substructure causes statistical dependence which cannot necessarily be ignored: If blond hair, blue eyes, and fair skin are each found at 10% frequency in the population, one cannot conclude that the proportion of blond-haired, blue-eyed, fair-skinned individuals in the population is 1/1,000. Genes on different chromosomes can show correlation due to population substructure, notwithstanding Mendel's law of independent assortment: for example, individuals heterozygous for mutations at the Tay-Sachs disease locus on chromosome 15 are also more likely to be heterozygous for mutations at the Gaucher disease locus on chromosome 1-since mutations at both loci are more common among descendants of Ashkenazi Jews. Whether there is significant subpopulation differentiation for a particular locus (and whether it might significantly affect the multiplication rule) is an empirical question requiring population studies. Originally, forensic laboratories focused on comparing broad groups-Caucasians, blacks, and Hispanics (which is not even a meaningful genetic classification) -rather than on narrower ethnic subgroups. The choice seemed to be based on a notion that there was more genetic variability among races than within each race, although population geneticists showed decades ago that the reverse was true (e.g., see Lewontin 1972). Statistical tests were used to try to detect population substructure within the racial subgroups by looking for deviations from either Hardy-Weinberg equilibrium or linkage equilibrium, but the results are virtu-

821

ally meaningless because the tests have such low statistical power to detect substructure even if it is present. The task is made especially difficult by the large number of alleles and by the measurement error inherent in hypervariable genetic systems. The issue has recently attracted the interest of noted population geneticists such as Richard Lewontin and Dan Hartl, who have called for more detailed population studies. Budowle et al. (1991) wade into the population genetic controversy -but with less success than attaches to their experimental contributions. After announcing that "at present, it is not possible to assess whether, for the alleles at a particular VNTR locus analyzed by Southern blotting, a population sample is in HardyWeinberg equilibrium," the authors argue that "a reasonable, empirical assumption of random association of alleles can be made." In other words, the result cannot be proved but should be assumed. They offer three justifications, none convincing: (1) "People are unaware of their VNTR genetic composition, and their VNTR genotype does not enter into their decision to have offspring. Therefore, the algebraic approaches put forth in the Hardy-Weinberg rule can be applied" (p. 851). While genetic composition may not be a direct basis for choice, it is correlated with ethnicity, religion, and geography, which are powerful bases for marriage choices. (2) Population samples within a racial group showed similar frequencies in studies in Texas and Florida, suggesting that there cannot be much population heterogeneity (p. 852). One might analogously conclude that blond hair, blue eyes, and fair skin are not correlated because each trait shows similar frequencies in Florida and Texas; examining average frequencies in mixed populations sheds no light on substructure. (3) The fact that the FBI calculates allele frequencies in a conservative way (by aggregating alleles covering ranges of about 8% of molecular weight, a proportion which is somewhat larger than that of the 5% matching rule) should offset any bias arising from the use of the multiplication rule (p. 851). One cannot compensate for a bias without knowing how large it is. As most observers agree, the right way to settle the question of population heterogeneity is to sample ethnically distinct populations and to observe the actual degree of genetic differentiation. It is encouraging that Budowle et al. report that such studies are underway. Statistical Interpretation: Multilocus Probes

While the statistical interpretation of single-locus probes remains unresolved, the paper by Jeffreys et al.

Lander

822

(1991) in this issue of the Journal goes a long way toward resolving a number of important issues concerning the use of multilocus probes. As the authors candidly state, "the limited data published to date ... do not address a number of points of concern. These include the degree of independence of the DNA fingerprints detected by [different probes], the validity of the assumption of band independence in DNA fingerprints, the potential problems of individual variation in band number and of relatively invariant subpopulations, and the effect of minisatellite mutation on parentage testing." In a monumental study, 1,702 paternity cases were examined by using the multilocus minisatellite probes 33.6 and 33.15, with each revealing 8-33 bands/sample. All told, nearly 200,000 bands were recorded. By studying the proportion of band sharing among related and unrelated individuals, the authors demonstrate well-separated thresholds for declaring paternity (>40% band sharing) and nonpaternity (20% band sharing) from their speculations based on extrapolating the Poisson distribution to the extreme (i.e., that the probability of a match occurring at random may be

Research on DNA typing catching up with courtroom application.

Am. J. Hum. Genet. 48:819-823, 1991 Invited Editorial: Research Courtroom Application on DNA Typing Catching Up with Eric S. Lander Whitehead Inst...
800KB Sizes 0 Downloads 0 Views