Am J Hum Genet 31:680- 696, 1979

A Maximum Likelihood Map of Chromosome 1 D. C. RAO,' B. J. KEATS, J. M. LALOUEL, N. E. MORTON, AND S. YEE

SUMMARY

Thirteen loci are mapped on chromosome 1 from genetic evidence. The maximum likelihood map presented permits confirmation that Scianna (SC) and a fourteenth locus, phenylketonuria (PKU), are on chromosome 1, although the location of the latter on the PGM 1-AMY segment is uncertain. Eight other controversial genetic assignments are rejected, providing a practical demonstration of the resolution which maximum likelihood theory brings to mapping.

INTRODUCTION

Tentative maps of human chromosome 1 were constructed by Cook et al. [1], Sturt [2], and Meyers et al. [3]. Later maximum likelihood methods were developed to combine data efficiently from chiasma maps, chromosomal markers, and genetic loci, with likelihood ratio tests of hypotheses [4]. Here we construct a maximum likelihood map for chromosome 1, compare it with a nonparametric method [5], and test various hypotheses on chiasma maps, interference, and sex differences. THEORY

Consider a linear sequence of n loci L1, L2, . ., Ln, whose genetic map locations are w1, w28,. . . ., was, where s = 1 for male, s = 2 for female, and s = 3 for unknown sex. When the second subscript is suppressed, we refer to males only (wi -a wil). Some of the n markers may be genetic or chromosomal polymorphisms, others are breakpoints of translocations, inversions, deletions, and insertions. Let L1, L2,

Received December 6, 1978; revised April 20, 1979. This study is Population Genetics Laboratory paper no. 200 and was supported by grant NF 1-475 from the National Foundation and grants GM-23021 and GM 24941 from the National Institutes of Health. 1 All authors: Population Genetics Laboratory, University of Hawaii, Honolulu, HI 96822. © 1979 by tie American Society of Human Genetics. 0002-9297/79/3106-0001$1.79

680

MAXIMUM LIKELIHOOD MAP OF CHROMOSOME 1

681

Lk be polymorphisms, and Lk+l, Lk+2, . . ., Ln represent breakpoints (k ' n). A chiasma map gives the genetic map locations of the breakpoints, leaving the other k locations to be estimated. Denote the map interval between Li and Lj by dij, = -w,-w8 |, the absolute difference between wi8 and wj8. We use a general mapping' function to obtain the recombination fraction Oij0 for a given map interval dij,8 in centimorgans (cMo), and vice versa (Rao et al. [6]): = [p(2p - 1) ( -4p) ln(I -26) + 16p(p -1) (2p - 1) tan-' (20) + 2p(l -p) (8p + 2) tanh-1 (20) + 6(1 -p) (1 - 2p) (1- 4p)0]16 . (1)

This is a function of an unknown interference parameterp, wherep = 1 in the absence of interference (Haldane's function) andp = 0 for complete interference. Also, p = V2 and p = 1/4 yield, respectively, the mapping functions of Kosambi and Carter-Falconer. Based on chiasma distributions, Rao et al. [6] estimated p = .351 for the human male. By inverting equation (1) numerically, we can obtain 0 for a given value of d (andp). At this stage, we make two assumptions concerning the map locations of females and unknown sex: (1) female map locations depend on those of the males, and (2) recombination fraction for the unknown sex is the average of male and female values; that is, OU3 = (Oiul + Oij2)/2. Under the first assumption, we estimate the male location wj, of a given locus Lj and obtain the female value wj2 by linear interpolation. Consider neighboring locations on each side of wjj: wil < wj, < w1j, where wil and w11 are breakpoints or telomeres. Then, in each iteration, wJ2 is calculated as (2) Wj2= wi2 + (w12 - w2) (Wjl - wil) / (w11 - Wil) , orWj2 = a + Rjwjj, wherea = (wi2 -Rjwi ) and Rj = (w12 -w2) /(w11 -wil) = ratio of the female to male segments containing Lj. In this way, female locations are made

functions of male locations and need not be estimated separately. For a given pair of loci Li and Lj, the lod score corresponding to their map locations wi8 and wj8 for the sth sex, denoted by z(wi8, wj8), is calculated as follows: the map distance dij8 is first converted into Oij8 by inversion of equation (1), and the corresponding lod score for 0 = Oij, is obtained from our lod tables by quadratic interpolation. For unknown sex (s = 3), a slightly different procedure is followed: we first obtain Oij, and 0ij2 from dij, and dij2, calculate 0jj3 = (Oij, + Oij2)/2, and then obtain the corresponding lod score from the lod table on the unknown sex for 0 = ij3. The total lod score, summed over all three sexes and all pairs of loci Li, Lj (i = 1, 2, . k;j=2,. . . ,n), is n

1

j=i+l

k

3

1 1 z (Wi8 Wj8) i=1 s=1

(3

Since this depends only on male locations w,, w2, . . ., wn, let us denote the total lod score by z(wl, . . . , wk; Wk+l, . . ., wO) or simply z(w). Therefore, the overall log-likelihood is given by lnL = [z(w)] [ln 10] + Constant

( (4)

682

RAO ET AL.

, wk. are estimated by maximizing In L (see The k unknown locations, wl, w2, . .I . , APPENDIX). Let the maximum likelihood estimates be denoted by w,'2, and the corresponding value of the total lod score (constrained maximum) of equation w'; Wk+1.. w7). If u or simply Zu, denotes the (3) by ZC = z(wil*, maximum observed lod score for Li, Lj and sth sex (unconstrained maximum), then goodness of fit of the estimated map to the observed lod scores is tested by the large-sample likelihood ratio criterion [7]: n

2(ln 10) (' E

k

3

l;

j=i+l i=1 s-i

Z

(5)

ZC),

which is distributed as a x2 with degrees of freedom (df) = m - k, where m = number of entries in the summation of equation (5). Specific null hypotheses can be tested in terms of these residual x2 values. LINKAGE MAPS

A physical map gives bands and assigned markers. Figure 1 shows the mitotic diagram [8] and locus assignments by the likely region of overlap (LRO) method [9]. Symbols follow Rao et al. [10]. Each assignment has a substantial error, indicated by 1.9 3 .9

ENl - FNO1 6

ENOI PGD

5 4

3

RH AK2

FUCA FUCA

AK2

FUCA

l

- UMPK

1

PGD

EN01

19.0 21.3 23.0 36.6

ELI RH FUCA SC

37.6

UMPK

53.4

86.4

1.9 7.0

PGD ENOJ

11~~~~.6ENOI 11.6

ELI

19.4 26.0 32.1 37.6

RH FUCA SC UMPK

PGM1

48.6

PGM1

AMY

86.8

AMY

PGM1

2

2 1

i2 1 1

103.2 --01012 FY, CAEI

108.2

2

99.7

108.2

01012

FYCAEI

UGPP

12 2

3 4 5

3

2

GUK I

2

FH PEPC, RN5S

137.5

UGPP

PEPC

~~~~PEPC

1

FH

Genetic Cotransference Nonparometric Physical FIG. 1.- Physical and male genetic maps of chromosome 1

MAXIMUM LIKELIHOOD MAP OF CHROMOSOME 1

683

the LRO and smallest region of overlap (SRO) intervals. However, broad features have been established: for example, that PGD is distal to PGM1 on the short arm (p), and PEPC is distal on the long arm (q). These assignments are supported by cotransference frequencies 11]. The cotransference map in figure 1 was obtained by nonparametric analysis of the published data from 10 J kg-' irradiation, taking the complement of cotransference as the analogue of recombination [5]. A chiasma map gives frequencies of exchanges between cytological points (in cMo). Morton et al. [12] presented a chiasma map after deleting the distal chiasmata. Here we use the revised chiasma map of Keats et al. [13], which redistributes terminal chiasmata in each arm over the distal 1/3 of that chromosome arm according to a triapgular distribution with terminal vertex. The female map was taken as 1.8 times the male, map, based on the observed ratio of recombination [14]. These assumptions gave a higher likelihood than alternative chiasma maps. For the genetic map inferred from meiotic recombination between loci, we have data on 13 polymorphisms, including the C-band heteromorphism at 01Q12 [13]. The highest likelihood was obtained forp = .351 in both sexes, as suggested by chiasma distributions in spermatogenesis [6]. These and other assumptions are tested in the next section. The best-fitting map has a total unconstrained score = 197.69 (representing 140 observed maximum lods by pair of loci and sex) and a total constrained score Zc= 176.74 (for corresponding recombination values predicted from the map). Goodness of fit is tested by x2 = 2(ln 10) ( - Zc) = 96.47, with 140 - 13 = 127 df, since 13 male locations are estimated. This remarkably good fit suggests that errors in lod scores, breakpoints, and sex-specific chiasma distributions and interference parameters are small. Inspection of table 1 reveals certain inconsistencies which further study of the human linkage map must explain. Several LRO assignments do not agree with the genetic map, which places PGD distal to ENO1 and both FUCA and UMPK distal to PGM1, for which physical assignments are inconsistent (Williams et al. [9]). When the order of PGD and ENO1 is reversed, the conditionally best map gives zC = 176.66, which is negligibly different from the best map. PEPC is more proximal on the genetic map than its consistent physical assignment would suggest. However, the conditional best map confining PEPC to the SRO gives Zc = 176.37, and hence the maximum likelihood estimate of PEPC is not significantly inconsistent with the SRO [fV = 4.605 (176.74 - 176.37) = 1.70]. Also, consistent with our estimate, Cook et al. [15] suggested that PEPC is more proximal on the long arm than the hybrid studies claim. When FUCA is confined to its SRO, the best map givesZc = 173.42, making the estimate significantly inconsistent with its physical assignment (X21 = 15.29). On one hand, the chiasma map is approximate and may be substantially erroneous around FUCA. When the genetic map becomes more precise through accumulation of data, it may be used to construct an accurate chiasma map [16]. On the other hand, the SRO for FUCA is based on a single segment, which may have been incorrectly interpreted [17]. At present, the order of closely linked loci is uncertain. As pointed out above, forcing ENO1 to be distal. to PGD gives a high lod score (176.66). Similarly, reversing the order of ELI, RH, and FUCA gives Zc = 176.65, and making UMPK distal to SC

684

RAO ET AL. TABLE

1

GENETIC MAP LOCATIONS (CMO) MALE

MARKER

01PTER ............................. PGD ............................... ENO1 ............................... 01P36 ............................... ELi ............................... RH ............................... FUCA .............................. SC ............................... UMPK .............................. 01P34 ............................... PGM1 .............................. 01P32 ............................... AMY ............................... 01P13 ............................... OICEN .............................. 01Q12 .............................. CAE1 ............................... FY ............................... 01Q21 .............................. 01Q23 .............................. 01Q25 .............................. PEPC ............................... 01Q32 .............................. 01Q42 .............................. 01Q43 .............................. 01QTER ............................

Estimate

0 1.9 3.9 17.7 19.0 21.3 23.0 36.6 37.6 48.0 53.4 60.6 86.4 96.0 97.0 103.2 108.2 108.2 112.5 123.5 131.9 137.5 147.5 170.3 183.0 196.0

SE

FEMALE

Interval

...

...

1.9 4.3

00.0-52.8 00.0-35.2 00.0-35.2

...

1.6 1.4 1.9 2.4 2.0 ...

1.7

...

00.0-68.0 40.7-68.0 ...

54.7-68.0 40.7-52.8 42.2-63.3 54.7-68.0

3.8 3.4 1.7 1.6 ...

...

...

6.4 ...

...

94.4-96.8 97.0-97.4 98.3-109.1 ...

109.1-116.1 119.2-127.2 129.4-134.3 161.7-179.5

142.2-154.6 161.7-179.5 179.5-186.6

Estimate

0 3.5 7.1 31.9 34.2 38.3 41.4 65.9 67.7 86.5 96.2 109.1 155.7 172.9 174.7 185.8 194.8 194.8 202.6 222.4 237.6 247.6 265.6 306.7 329.6 353.0

SE ...

3.4 7.7 ...

2.8 2.5 3.4 4.4

3.5 3.1 6.9 ...

..

6.1 3.0

2.9 ...

...

... 11.5 ...

NOTE. -Standard errors are given for locations estimated from recombination data. For physically assigned genetic loci, intervals are SRO if available; otherwise, LRO. For cytological markers, point estimates refer to midbands and intervals to margins [13].

gives Z, = 176.67. None of these orders is excluded. Additional data will increase the power of this approach in resolving different orders. However, the following order of loosely linked segments seems established: PGD-ENO1, EL1-RH-FUCA, SC-UMPK, PGM1, AMY, Q12, FY-CAE1, PEPC. Standard errors relate to the map for the 13 estimated loci conditional on other information and assumptions (see APPENDIX) and so may be underestimated. It is noteworthy that PEPC has the largest standard error. The cotransference map confirms genetic assignments for seven loci (fig. 1). The nonparametric map uses only observed recombination values and their estimated amounts of information, excluding breakpoints and PEPC, which is remote from markers on the p arm and has too many missing data to be mappable by the nonparametric method. For other loci, the nonparametric map agrees precisely in order with the maximum likelihood map. There are few disagreements with tentative genetic maps based on part of the present material, although Cook et al. [ 1] assigned FY to the p arm, and Sturt [2] reported three orders with slightly greater likelihood than PGD-RH-PGM 1-FY-PEPC. Detailed examination of the genetic data reveals a good fit (table 2). For each pair of loci and sex of the informative parent, the unconstrained maximum likelihood estimate

MAXIMUM LIKELIHOOD MAP OF CHROMOSOME 1

685

TABLE 2 GOODNESS OF FIT OF THE GENETIC MAP First locus

Second locus

AMY ..

FY

AMY ..

PGD

AMY ..

PGM1

AMY ..

RH

CAEI ..

FY

CAEI ..

RH

ELI

..

FUCA

ELI

..

FY

ELI

..

PGD

ELI

..

PGMI

ELI

..

RH

ENOI .. ENOI .. ENOI ..

FY PGD RH

FUCA ..

FY

FUCA ..

PEPC

FUCA ..

PGD

FUCA ..

PGMI

FUCA ..

RH

FY

..

I0101

FY FY

..

I0102 PEPC

..

FY. FY .

PGD .

PGMP

FY .

.

RH

FY .

.

SC

Sex

oc

M F M F M F M F M F M F M F M F M F M F M F F F M F M F M F M F M F M F M F M M F M F U

10.36 -0.75 -0.00 -0.00

M F U M F U M F U

3.12 0.24 0.04 0.02 2.47 1.79 -0.01 -0.00 2.90 1.33 0.03 0.00 2.82 0.06 0.64 0.05 18.38 21.10 -0.00 2.28 2.30 -0.79 0.01 0.00 -0.00 0.00 0.79 -0.18 0.26 -0.07 8.81 3.72 1.20 0.07 0.07 0.12 -0.05 -0.00 -0.00 -0.00 1.98 -0.00 0.06 -0.06 -0.00 -0.00 -0.01 -0.00 -0.00

,.

Vs

.190 .472 .500 .500 .293 .419 .470 .466 0 0

.500 .500 0

.095 .211 .352 .077 .378 .188 .420 .025 .040 .500 .013 .013 .500 .330 .013 .500 .300 .010 .500 .216 .500 .025 .030 0 .252 0 .329 .500 .500 .500 .500 .443 .500 .372 .500 .500 .500 .500 .500 .500

e .212 .350 .485 .500 .307 .443 .458 .498 .000 .000 .487 .500 .040 .072 .489 .500 .169 .290 .317 .450 .023 .041 .500 .036 .171 .294 .486 .500 .497 .500 .206 .342 .287 .428 .017 .031 .013 .102 .393 .278 .420 .496 .500 .498 .428 .493 .461 .487 .500 .493 .470 .499 .485

testinqg

v.,

0.85 3.63 0.00 0.00 0.15 0.19 0.03 0.56 0.00 0.00 0.03 0.00 0.76 0.08 1.64 0.20 1.29 0.54 0.85 0.04 0.04 0.00 0.00 0.14 3.70 3.62 0.13 1.24 0.00 0.09 1.84 0.82 0.24 0.32 0.15 0.00 0.04 0.72 2.46 0.13 0.24 0.00 0.00 0.00 0.72 0.01 0.24 0.26 0.00 0.01 0.04 0.00 0.01

686

RAO ET AL. TABLE 2 (continued)

First locus

FY . I0102 .

-Second locus

Sex*

z

Hu

k

.

UMPK

-0.02 -0.00

.500 .500 .194

.

PGM1

M F U M F F M F M F M F M F U M F U M F U F M F U M F M F U M F U M F U M F M F M F M F F M F M F M M F M F

.469 .499 .484 .013 .023 .445 .499 .500 .485 .500 .498 .500 .415 .491 .453 .190 .321 .255 .319 .451 .385 .456 .300 .438 .369 .166 .286 .156 .271 .214 .151 .263 .207 .161 .278 .220 .010 .018 .371 .477 .475 .499 .398 .486 .129 .351 .469 .256 .400 .490 .327 .456 .036 .065

10102 .. PEPC .. PEPC .

RH PGD .

PEPC .. PGD .

PGM1 RH

.

PGM1

PGD ..

RH

PGD ..

SC

PGD .. PGM1 .

.

UMPK RH

PGM1... SC PGMI

...

UMPK

RH ..

SC

RH ..

UMPK

SC .

.

OIP13 ..

PGMI

0OP13 ..

01P32 .

UMPK

RH .

01P32 ..

FY

O1P32 ..

PGM1 RH

01P34 ..

RH

01P36 . OIP36 ..

.

FY PGM1

OIP36............. RH

0.04 0.60 0.30 -0.03 -0.00 -0.00 -0.00 0.00 0.02 0.00 0.13 -0.01 0.05 19.75 4.94 -0.23 0.19 0.11 -0.02 -0.00 17.65 1.65 1.98 1.38 -0.26 0.05 0.34 -0.12 10.33 -0.67 0.59 3.88 2.88 0.29 1.42 1.13 0.05 -0.01 -0.00 0.00

0.22 0.03 0.88 -0.17 -0.09 0.04 0.03 0.01 -0.04 -0.01 1.05

0.90

0 0

.500 .500 .500 .500 .468 .280 .296 .411 .500 .331 .200 .312 .500 .395 .384 .500 .500 .308 .411 .295 .143 .500 .294 .316 .500 .131 .432 .025 .205 .267 .243 .013 .013 0

.500 .500 0

.203 0 0 .500 .500 0

.332 .352 .500 .500 .001 .001

X,!etesting vs. 0,

0.09 0.00 1.74 0.02 0.02 0.13 0.00 0.00 0.00 0.04 3.65 3.01 0.00 0.03 0.27 0.27 0.12 1.05 0.91 0.26 0.08 0.00 0.35 1.51 2.84 0.11 1.19 1.26 0.25 0.53 0.38 3.29 2.12 1.46 0.16 0.06 0.00 0.00 1.12 0.03 0.01 1.16 0.89 2.69 1.04 0.79 0.42 1.26 0.05 0.28 0.18 0.04 0.07 0.36

MAXIMUM LIKELIHOOD MAP OF CHROMOSOME

1

687

TABLE 2 (continued) First locus

Secondlocus

Sex

OlQ12 ... OlQ12 ...

AMY

M F

FUCA

M

-OQ12

FY

...

OlQ12 ... OlQ12 ... OlQ12 ...

PEPC PGD

PGM1

OlQ12 ... OlQ12 ...

RH

O1Q21 ...

FY

O1Q23 .. O1Q25 ..

FY FY

O1Q25 ..

O1Q43

PEPC

FY O1Q32 .. PGM1 O1Q32 .. RH O1Q32 .. O1Q42 ..F Y RH 01Q42 ..

O1Q43 ..F Y PGM1 O1Q43 ..

O1Q43 ..

RH

F M F M M F M F U M F M M F. F M F M F M F M M M M M F M

F

Totals:

X2 testing

VS. 0, ie

3.16 0.14 -0.00 0.00 3.96 12.20 0.32 -0.00 0.00 -0.33 -0.00 -0.02 -0.03 -0.00 -0.00 0.93 -0.37 0.33 -0.64 0.06 0.86 -0.22 0.08 -0.00 0.00 -0.00 0.00 -0.00 -0.03 0.00 0.00 0.00 0.00

.064 .361 .500 .300

.1Z3 .061 0

.500 .264 .500

.500 .500 .500 .500 .500 .001 .500 0

.500 0 0 .437 0

.500 0^ .500 0 .500' .500 .253 .500 0 .385

.165 .285 .481 .500 .050 .090 .317 .494 .500 .408 .489 .448 .483 .500 .481 .043 .078 .264 .230 .372 .056 .100 .351 .500 .499 .451 .500 .500 .475 .499 .500 .500 .500

2.69 0.61 0.00 0.09 4.18 0.65 2.69 0.01 0.42 1.50 0.00 0.08 0.14 0.00 0.01 0.18 1.69 1.74 2.97 0.46 0.20 0.99 1.01 0.00 0.76 0.02 2.77 0.00 0.15 1.03 0.00 3.20 0.58

= 176.74 Goodness of fit X = 96.47 (df =140- 13 = 127) 2 = 147.04t (df = 171) Heterogeneity

* M = male, F = female, U = unknown sex. t Significance level = pe' 212, where Pe is the nominal (tabular) significance level. See Rao et al. [10] for

details.

of recombination is 0,, and the value predicted from the map is 60, with corresponding lod scores zll and Zc . Goodness of fit of 0, is tested by 2 = 21n 0 (Zu- Z), which may be treated approximately as having 1 df, since the number of parameters estimated (13) is small relative to the total entries in table 2. The sum of these individual x2 values has 140 - 13 = 127 df. Tests of Hypotheses Summarizing genetic evidence for the 13 markers to be on chromosome 1, the conventional significance level z > 3 is satisfied for all except PEPC, where the

688

RAO ET AL.

recombinational evidence is weak (2c = 1.05). However, somatic cell studies leave no doubt that this locus is in or near 01Q42. Discrepancies from predicted sex-specific recombination values, over all other markers and breakpoints at their assigned locations, are tested by x2. Agreement is good. To determine if maps differ between sexes by more than a scalar, separate maps were obtained for males, females, and unknown sex. The total of the three constrained maximum lod scores is 182.27. Even though the three sex-specific maps are slightly different, the variation is not significant since the large-sample test yields x = 4.605 (182.27 - 176.74) = 25.47, which has 19 df (total number of parameters estimated in the three separate maps - 13). To test adequacy of chiasma maps, the data were split into two parts according to whether or not a breakpoint was involved. Separate maps gave a total constrained maximum lod score of 177.02. Discrepancy between these two maps is not significant since 4 = 4.605 (177.02 - 176.74) = 1.29. Therefore, genetic and chiasma maps are in satisfactory agreement. Other loci [13] were tested for linkage to chromosome 1 by finding the location which maximized the likelihood (table 3). No lod score approaches the conventional level z > 3, including dominant retinitis pigmentosa (RP1), auticulo-osteodysplasia (AOD), pseudocholinesterase-1 (El), transferrin (TF), and Dombrock blood group (DO), each of which suggests linkage to particular loci on chromosome 1. The data on elliptocytosis, Rhesus-unlinked (EL2) include one pedigree not differentiated from ELI. When this pedigree is deleted, EL2 is suggestive of tight linkage to FY (W = 108.2,z = 2.06), as reported by Keats [18]. As a final attempt to falsify the map, we varied the female chiasma distribution, genetic length, and sex-specific interference parameter p. At first we assumed p = .351. Both a uniform distribution and Drosophila distribution [16] for female chiasmata require large genetic length, and the lod score is smaller than for our best map. Proportionality to the chiasma map of the human male gives a maximum lod score when the female length is 1.8 times the male length. Having explored the chiasma map of the female, we varied the interference parameterp between two extreme values (.125 and .5) for each sex. The combinationp = .125 for male and p = .5 for female yields a lod score marginally greater than our map (Z, = 176.82 compared to Z, = 176.74), requiring considerably greater genetic length for the female. It is possible that interference is greater in males and less in females than we supposed, but until additional data become available, we suggest p = .351 for both sexes. HETEROGENEITY

Of the 55 tests for heterogeneity among sources within sex in table 2, only one is significant (female recombination between FY and PGM1, X27 = 15.08, PO = .02). Seven of these studies are homogeneous (e6 = 5.21), but give a recombination fraction significantly greater than the data of the Rh Laboratory in Winnipeg. Although little weight can be attached to one significant result among many tests, we are led to examine heterogeneity among the four major contributors of data for chromosome 1: the Galton Laboratory in London, Rh Laboratory in Winnipeg,

MAXIMUM LIKELIHOOD MAP OF CHROMOSOME

1

689

TABLE 3 LOD SCORES FOR LINKAGE Locus AG

TO

-0.00

JK

ALB

.0.59

JR

AN

.0.03

K

zi 0.44

.............................

.............................

..............................

KOA .0.80 KS BP .0.41

0.68 0.02

AOD .1.75KM KO AU .0.50 BCNS

1

BLocus

c

..............................

CHROMOSOME

0.00

0.33 0.08

..............................

.............................

LDC

0.73

0.17 0.01 C3 ............................. -0.01 LP 0.71 0.43 C5 LW .............................. 0.30 CAE2 ............................. 0.21 MNS -2.58 CAT .............................. 0.06 MS ............................. 0.15 CF 0.13 NA .............................. 0.55 0.16 CO ............................. NB -0.00 CU ............................... 0.43 NS 0.21 DHDR ............................ 0.39 0.02 DI ............................. 0.37ORO P ............................. -1.36 DIAl 1.05 PH -0.00 DYS .............................. -0.00 PI .............................. 0.01 0.02 E2................................ P11 .............................. 0.39 EBS2 ............................. 0.55 BY

.............................

-0.00 LE

.............................

.............................

............................

.............................

..............................

...................

EM

.............................

FR GALT

.............................

GC GE

.............................. ..............................

GM

..............................

GN .............................. GPT .............................. HA5 ............................. HA9

HC

0.02 -0.00

.............................. .............................. .............................

0.10

RMI

.............................

.............................

-0.00

-0.00

TR

VEL 0.93 WB 0.20 WR -0.00

HHG

-0.00 .0.04

0.01

-0.00

SPD

.............................

0.01

0.02 0.80

SPH1

SW .............................. 0.07 SYN ............................. 0.02 0.00TDO

HCH

ISF

PTC ............................. RD .............................

1.21RHM

0.12 -0.00

-0.00

HBA HBB

0.16

PR -0.00

EBS3

..................

.............................

0.25 1.47 -0.00

-0.00

0.71

.............................

-0.00

.............................

-0.00

WS1 ............................. YE .............................. YT ..............................

0.07 0.68 0.52

NOTE. -Assigned loci and the controversial assignments of table 6 are omitted.

University of Rochester (Weitkamp), and Universities of Indiana and Oregon (Merritt). We partitioned all data on chromosome 1 according to these four major sources and tested each source against the rest for heterogeneity. Best maps were obtained for each source and its complement. Heterogeneity between source i and its complement is tested by V = 4.605 (Zi, + ZjC- 176.74) on ni + nj - 13 df, where ZiC = maximum constrained lod score for source i, Z4c = the same for the complement of source i, and ni and nj are the numbers of parameters estimated from source i and its complement, respectively. Results are given in table 4. It is impressive that no source shows significant heterogeneity from its complement. Finally, heterogeneity among all sources is tested by x = 4.605 (68.08 + 29.71 + 19.34 + 21.60 + 41.46 - 176.74) = 15.89, which is not significant on 9 + 7 + 4 + 5 + 7 - 13 = 19 df, where 41.46 is the 'z for the remainder of data on chromosome 1

RAO ET AL.

690

TABLE 4 HETEROGENEITY AMONG MAJOR SOURCES MAXIMUM

SOURCE

Source

LOD (

])ESTIMATED PARAMETERS

HETEROGENEITY

Complement

Source

Complement

x2

df

P

109.28 148.97 157.40

9 7 4 5 7

10 11 13 13

2.86 8.93 0.00 3.82

6 5 4 5

>.80 >.10 >.99 >.55

Galton Laboratory, London ....... 68.08 Rh Laboratory, Winnipeg .29.71 Rochester (Weitkamp) .19.34 Indiana/Oregon (Merritt) .21.60 Others .41.46

155.97 ...

...

...

(denoted by "others" in table 4). We therefore conclude that, whatever reservations one laboratory may have about others, the data for chromosome 1 used in this paper are reasonably homogeneous. PROVISIONAL ASSIGNMENT OF PKU TO THE PGMI-AMY SEGMENT

Berg and Saugstad [ 19] suggested that the loqus for PKU may be linked to PGM1. When RH and FY are included, a proximal location is suggested (w = 55.7, Z = 1.22). Recently, close linkage to AMY has been proposed [20], based on carrier tests which are known to be fallible. The pooled estimate (w' = 86.4, Zc = 4.92) strongly favors linkage to chromosome 1, but the data are significantly heterogeneous (table 5). Perhaps there was a tendency to minimize recombinants in classifying ambiguous carrier tests. When PKU is classified as recessive, the overall evidence goes down considerably ( w' = 86.1, Zc = 3.20). In any case, PKU is provisionally assigned to the PGM1-AMY segment, but its exact location is in doubt. If established, linkage may contribute to carrier identification for genetic counseling. TABLE 5 GOODNESS OF FIT OF PKU TO CHROMOSOME I Other locus

AMY

.............

FY ............ PGM1 ............ RH ..............

Totals

Sex

4e

Z,

de

M F

1.51 2.41 .30 .05 .53 .06 -.04 .04 .01 .05

1.51 2.41 .30 .05 1.57 .97 0 .14 1.43 .09

.001 .001 .001 .282 .307 .443

4.92

8.47

*--

U U M F U M F U

.375

.458 .498 .478

X2 testing 0, vs. 0e 0 0 0

.00

.280 0 0 .5 .249 0 .438

*..

.01 .00 .00 4.81 4.19 .19 .45 6.51 .17

16.33

MAXIMUM LIKELIHOOD MAP OF CHROMOSOME 1

691

DISCUSSION

The fifth edition of McKusick's catalog [21] lists 10 questionable loci on chromosome 1. Of these, we confirm linkage for SC whose constrained lod score is 10.25 to RH, and 3.94 to other loci on chromosome 1. Evidence of linkage for RP1 is suggestive, but does not approach significance. The other eight loci give no evidence of linkage to the linear map of chromosome 1 (table 6). This demonstrates the clarity that maximum likelihood theory brings to gene mapping in man. Once such maps are available for all chromosomes, the simple lod score tests on pairs of loci will be absorbed in the larger theory, which provides a linear map as well as the most powerful test for linkage. TABLE 6 CONTROVERSIAL GENETIC ASSIGNMENTS TO CHROMOSOME 1

code

McKusick interpretation

RPI ........ Retinitis pigmentosa

10620 10730 11060 17740 19000 11120 18210 16090 18010

L L L L L ? ? ? I

EL2 ........ Elliptocytosis, Rhesus-unlinked

13060

...

SC ........ Scianna blood group PKU ........ Phenylketonuria

11175 26160

McKusick Locus

ANR2 ...... AT3 ........ DO ........ El ........ TF ........ LU ........ SE ........ DM ........

Name

Aniridia, type II Baltimore

Antithrombin III Dombrock blood group Pseudocholinesterase- 1 Transferrin Lutheran blood group Secretion of ABH

Myotonic dystrophy

P ...

2e

Our interpretation

No evidence of linkage No evidence of linkage No evidence of linkage No evidence of linkage No evidence of linkage No evidence of linkage No evidence of linkage No evidence of linkage Nonsignificant evidence of linkage 2.08 Nonsignificant evidence of linkage 14.19 Linkage confirmed 4.92 Linkage confirmed 0 0.56 0 0 -.21 0 -.62 0.13 1.31

* L = limbo, ? = questionable, I = inconsistent, P = provisional.

APPENDIX MAXIMUM LIKELIHOOD ESTIMATION OF LINKAGE MAPS BY NEWTON-RAPHSON TECHNIQUE The log-likelihood of equation (4) in the text may be rewritten as k

n

InL

=

E M=l+l 1=1

3

lE z(w18, Wns) (ln 10)

(A-1)

s=1

For parameters estimated (wi, i = 1, 2, . k), the scores are derived as follows: 3 In L _ In L 8 i - 8 - ( - U)is (In 10), i = 1, .k

where 8 Uis

Ez(Wis, W.s)

=

n 8 Z(Wi'8, W8) Yw.

1=1

(A-2)

692

RAO ET AL.

since for pairs not involving Li, the derivatives vanish. The terms in the last summation are separately evaluated for each sex.

Male

8z(wil, wI)

Mi ll

( 8z

Sdill 1

8w,

\ - (

8w,

lz )\A 8di all

(A-3)

where - I if wi < w

+I if wi >wW and 8z /8d is evaluated algebraically as follows. Standard lod score tables are maintained in terms of 6 and its corresponding d. We take the unconstrained maximum point (dill, zill) and another point in the standard lod table which is closest to the estimated value dill, say (d, z) and fit the constrained quadratic: z = 2iil + b(dill - dJ11)2, where b = (z -ill) / (d - dill)2. This yields (8z/8d) = 2b(dill -dll).

Female

w12) = ( 8Z(wi2, 8Wk

8z )( 8di12 1k

8dill 8wi

8dl2 ill

(A-4)

where 8z18d is evaluated as above, except that this is done on the female lod table, and

8dil2/8dill is evaluated numerically. Assuming without loss of generality that wi

dill

=

wil- WI,

dill + A

=

wil-w11+ A=(wil+- )-(w1l-

dj12

=

wi2 - w12

= (a +

Riwil)

-

>

wl;

(b + RIw1,)

which follows from equation (2) in the text. Therefore, the value of di12, when dill is incremented by A, is

dil2* __2 8dill

[b + RI(w1l--2 Ri(wil + 2 di2 *- dI2 (Ri + RI)A/2 Ri + R1 A 2 2\A

= [a +

which holds if both wi and w1 are estimated. If one of them, say wl, is a breakpoint, this reduces to Ri.

Unknown Sex

8Z(WM, W13) 8wj

- ( 8z

MM

\/ 86il3 \/ IIi 8i3/ il J

B oil

il, \

Mill

j

Md ill \

8Wj~

(A-5)

where 8z/80 is evaluated by fitting a quadratic function in terms of 6 on the lod table for unknown sex, 86/8d is numerically evaluated using equation (1) in the text, and

MAXIMUM LIKELIHOOD MAP OF CHROMOSOME 1 '6013

6

(ill + 0112)

80ill

80ill

2

I

[ 1 + (

+

= 2

8062

6diI2

)(

(661i12

[+ (

2

oil,]

1 + )

dill

6d1 1

0111

)(

i2

693

)(

2

il

)]

- Ril, say (same as Rli). EMPIRICAL INFORMATION MATRIX Consider two parameters being estimated, wi and wj. By definition, elements of the empirical information matrix are given by

6

6

(u-) ((1 uKijj

[ii

{6wj

- s=1

s=1

Uis u (In

10)

6zw8,w18](n

0

3

-

*

Kijs,

(A-6)

say.

s=1

Male Male contribution corresponds to the term s = 1 in the above summation. For i # j we get,

using (A-3), E __ A111j in 10

6W 6wj [1i dill1

K13-= -

_6 [

[l

1=1

A j )+1

l 6(

=[ 62z

(6d,11 ~

zA1j. (In 10)

Ai

+

6&2z2 6d,,

Aj

(In

10),

and therefore,

_____

82

(In 10) where

82z/Sdj12

I

1

Z

8dill 8djll

A1.1 +

2ZAj Ai

8dj 2

is evaluated in terms of the quadratic function fitted previously to evaluate

Wz/Ud, and we approximate ) (in 10) =

d, 6d

Kdill d jll-product of the two u-scores (

z

i

n nlO

6z

in

10)

or

K,,1 = (in 10)2

( 6 1=i

6dnY1

)\( 1

6)(z

8A-2 Ail Ail + d,2 (In 10)

(A-7)

694 Since

Aj

RAO ET AL. =-4j, when i = j, it is easy to see that

Kjjj= K.-.

=

(In 10) I ()

(A-8)

1=1

since Aj2 = 1. Female Female contribution to the information matrix is

Kij2

Ri)](i10 [n~ (z )(I, + R, )( 8d2 )( 2d) Aj] (In 10)

FI

I+

[

(

1=1

[ i (S (

= [j

2 )( Sd,8d,,, )(djll )4 2 (2Z ) , MM

82i)2

+ Au(R2 R)( Y.

( 10)forAji Ai

k (d,12

(90)

(In 2

2

2'

(In

10)0

1=1

8dj2 )(i 2) R

(

(In

IO) fori ij

(A-9)

and for i =j, this simplifies to

[H

Y.

dU2)

I )] (In 0)

2

.(A-10O)

1=1

Unknown Sex s = 3 in (A-6) contributes

8 Swj

K,,3 - -

[

/

8z

4 ( [*i Ri 13

S6,l d

/i

1=1

8zj

+

[jE

=

+

+

Rij 4i (Sezi) (In 10) (S ')( Sz )\Ri.Rj41

.2 ()SOj1 \2 ( Rij2 ( 8dj, )

(

Sdi l

82 Z

8

o32) (In 10) fori $1j

)( 86,11 )] (in 10)2

(A-il

and for i =j,

K

[3 1=1Y.

(

2) R2 (

) ]

(In 10)

(A-12)

MAXIMUM LIKELIHOOD MAP OF CHROMOSOME 1

695

In summary, the maximum likelihood scores are given in equations (A-2) to (A-12). These scores are valid if each "marker" is either a polymorphism or one breakpoint (translocations, deletions, and insertions). If "wl" in the above equations represents an inversion, which has two breakpoints, the scores are evaluated as follows. Inversion with a polymorphism: Consider one pair of "markers" Li = polymorphism and L1 = inversion and hence not estimated. On the genetic map, let the two breakpoints of the inversion (L4) be denoted by a,8 and bl,(al, < bl) for sth sex. Values are taken from the chiasma map of Keats et al. [ 13] for al, and bl, (s = 1, 2). Depending on where wi, falls on the genetic map, three cases arise. For each case, however, the following simplifications hold: 8l2 -R.

8dil-1 since only wi is an estimated parameter and breakpoints. Also,

Bo6l,1

2

Ri

LI

corresponds to an inversion with fixed 12

8dil2

)(

86il

Case (I )

wis < als (note that if wil < al,, then also wi2 < a12). In this case, we treat LI also as a locus whose genetic location, though not estimated, is taken as wis = the nearest breakpoint (al,), and hence dils = al - wis. Case (2)

wis > b1s. This case goes through just as above, where wl, = the higher breakpoint (bl,). Case (3) als < wi, < bl. For this case, the apparent recombination frequency (6) between the locus (Li) and the inversion is calculated first following Morton [4], which is then converted into a map distance for each sex. The usual formulae apply to this case also, except that Sdil,18wi is not necessarily equal to + 1 as before. This quantity is now evaluated numerically. Since only one gene location is estimated (w;), the second terms in (A-7), (A-9) and (A- I 1) do not arise. If the row vector u' = (u, Uk) denotes the score vector, summed over sexes, and K denotes the information matrix, also summed over sexes, Newton-Raphson iteration yields T,+1 -wn + u' K-1, where w,, are the gene locations in the nth iteration, and w n+ are the improved values in the next iteration. Iterations continue until In L fails to increase over successive iterations. MAP, a FORTRAN program written for HARRIS 125 incorporating these methods, also uses Fletcher-Powell logic, as outlined by Lawley and Maxwell [22], when Newton-Raphson breaks down. In addition, positive definiteness of the K-matrix in Newton-Raphson method is guaranteed by using Greenstadt's method [23], as briefly discussed in [24].

REFERENCES 1 COOK PJL, ROBSON EB, BUCKTON KE, JACOBS PA, POLANI PE: Segregation of genetic markers in families with chromosomal polymorphisms and structural rearrangements involving chromosome 1. Ann Hum Genet 37:261- 274, 1974 2. STURT E: The use of lod scores for the determination of the order of loci on a chromosome. Ann Hum Genet 39:255- 260, 1975 3. MEYERS DA, MERRITT AD, CONNEALLY PM ET AL.: Linkage group I: a statistically significant locus order from family studies, in Winnipeg Conference (1977) Fourth International Workshop on Human Gene Mapping, New York, The National Foundation, 1978, pp 396-400 4. MORTON NE: Analysis of crossingover in man, in Winnipeg Conference (1977) Fourth

696

5. 6. 7. 8. 9.

10. 11.

12.

13. 14.

15.

16. 17. 18. 19. 20. 21. 22. 23. 24.

RAO ET AL. International Workshop on Human Gene Mapping, New York, The National Foundation, 1978, pp 15- 36 LALOUEL JM: Linkage mapping from pair-wise recombination data. Heredity (Lond) 38:61- 77, 1977 RAO DC, MORTON NE, LINDSTEN J, HULTiN M, YEE S: A mapping function for man. Hum Hered 27:99- 104, 1977 MORTON NE: The detection and estimation of linkage between the genes for elliptocytosis and the Rh blood type. Am J Hum Genet 8:80- 96, 1956 Paris Conference Supplement, Birth Defects: Orig Art Ser XI(9), New York, The National Foundation, 1975 WILLIAMS WR, MORTON NE, LEW R, YEE S: The likely region of overlap (LRO) method for physical assignment of loci. Hum Genet. 47:297- 304, 1979 RAO DC, KEATS BJB, MORTON NE, YEE S, LEW R: Variability in human linkage data. Am JHum Genet 30:516- 529, 1978 Goss SJ, HARRIS H: Gene transfer by means of cell fusion. II. The mapping of 8 loci on human chromosome 1 by statistical analysis of gene assortment in somatic cell hybrids. J Cell Sci 25:39- 58, 1977 MORTON NE, RAO DC, LINDSTEN J, HULTiN M, YEE S: A chiasma map of man. Hum Hered 27:38- 51, 1977 KEATS BJB, MORTON NE, RAO DC, WILLIAMS WR: A Source Book for Linkage in Man. Baltimore, Johns Hopkins University Press, 1979 WEITKAMP LR: Population differences in meiotic recombination frequency between loci on chromosome 1, in New Haven Conference (1973) First International Workshop on Human Gene Mapping, Birth Defects: Orig Art Ser X(3), New York, The National Foundation, 1974, pp 179- 182 COOK PJL, FEAR CN, POVEY S: A lq translocation family segregating for peptidase C, in Winnipeg Conference (1977) Fourth International Workshop on Human Gene Mapping, New York, The National Foundation, 1978, pp 375 - 377 MORTON NE, RAO DC, YEE S: An inferred chiasma map of Drosophila melanogaster. Heredity (Lond) 37:405- 411, 1976 BURGERHOUT WG, LEUPE-DE SMIT S, JONGSMA APM: The regional map of chromosome 1 of man (abstr.). Presented at Fourth International Workshop on Human Gene Mapping, Winnipeg, 1977 KEATS BJB: Another elliptocytosis locus on chromosome 1? Hum Genet. In press, 1979 BERG K, SAUGSTAD LF: A linkage study of phenylketonuria. Clin Genet 6:147- 152, 1974 KAMARYT J, MRSKos A, PODHRADSKA 0 ET AL.: PKU locus: genetic linkage with human amylase (AMY) loci and assignment to linkage group I. Hum Genet 43:205- 210, 1978 McKusIcK VA: Mendelian inheritance in man, in Catalogs of Autosomal Dominant, Autosomal Recessive, and X-linked Phenotypes, 5th ed. Baltimore, Johns Hopkins University Press, 1978 LAWLEY DN, MAXWELL AE: Factor Analysis as a Statistical Method. London, Butterworths, 1963 GREENSTADT JL: On the relative efficiencies of gradient methods. Math Comput 21:360367, 1967 RAO DC, MORTON NE, GULBRANDSEN CL, RHOADS GG, KAGAN A, YEE S: Cultural and biological determinants of lipoprotein concentrations. Ann Hum Genet 42:467- 477, 1979

A maximum likelihood map of chromosome 1.

Am J Hum Genet 31:680- 696, 1979 A Maximum Likelihood Map of Chromosome 1 D. C. RAO,' B. J. KEATS, J. M. LALOUEL, N. E. MORTON, AND S. YEE SUMMARY...
1MB Sizes 0 Downloads 0 Views