J. rheor. Biol (1979) 77, 505-512

Reliability in the Genetic Code-j’ S. ALVARADO, R. FIGUEROA, A. SEP~JLVEDA, M. A. SOTO AND J. TOHA Biojisica, Departamento de Fisica y Departamento de Matemciticas, Facultad de Ciencias Fisicas y Matemdticas, Urkersidad de Chile, Chile (Received 10 April 1978. and in revisedform 5 October 1978) Base sequences of 4 x 174 and MS2 viruses genomes and of somemRNAs (Coat protein fd virus, Rabbit B. Globin, Rat Growth Hormone and Human Chorionic, Somatomammotropin) show a preferential use of some aminoacid codons. Based on this observation the reliability of three non-degenerate codes are analyzed. All of them display higher reliability than the standing genetic code andspeciallyone formed by a set of non-directly related codons. The absence of these type of codes in Nature is discussed in terms of a balance between reliability and mutability of the genetic information, able to preserve species and maintain evolution 1. Introduction

Reliability and high amount of genetic information in living organisms are attained using all the 64 available triplets of the degeneratecode. Nevertheless, a preferential utilization of somecodons over the rest of them hasbeen found (Sanger et al., 1977; Min Jou, Haegeman, Ysebaert & Fiers, 1972; Fiers et al.. 1975; Fiers et al., 1976; Efstratiadis, Kafatos & Maniatis, 1977; Seeburg, Shine, Martial, Baxter & Goodman, 1977; Shine, Seeburg, Martial, Baxter & Goodman, 1977; Sugimoto, Sugisaki, Okamoto & Takanami, 1977). This redundant code affords the high fidelity of genetic message.but eventually could evolve to a non-degenerate one, formed by selected codons, in such a way that each amino-acid could be codified preferentially by only one triplet of a set, where in most cases,more than one basechange should be necessary to generate the triplet corresponding to a different amino-acid. In this paper we discussthat eventual evolution in terms of the reliability of some non-degenerate codes, including one, in which, in all cases,at least two base changes per triplet are necessary to generate a codon with different meaning. In this theoretical code formed by 16 selected triplets, those 48 excluded codons can be considered asmissenseor used (with some degree of ambiguity) as equivalent to the nearest translatable ones.

t This Artistica,

work was partially Universidad

supported de Chile.

0022-5193/19/080505+08

%02.00/O

by OEA

and Servicio

de Desarrollo

Cientifico

y Creacibn

505 0

1979 Academic

Press Inc. (London)

Ltd.

506

S.

ALVARADO

E7

.ii.

2. Methods Four different codes were analyzed evaluattng (a) The average probability of error (or change, considering in this calculation all second and third base of triplet. (b) The average experimental probability change in only one base of every triplet. codons. (c) The total number ofcodon pairs that bases. (Codons of different amino-acids.) In all cases. a computer

the following

parameters:

mutation) induced after one base the possible changes in the first. of error found after a random maintaining unvaried the rest of differ by one. two or three of their

BASIC programme

was run

Code 1. The standing degenerate, non-ambiguous genetic code represented by 64 triplets (see Table 1). Code II. A non-degenerate, non-ambiguous code conststing of 3 1 triplets (Table 2), the more often used in some described sequences: 4 x 174 (Sanger PI al.. 1977) and MS2 genomes (Min Jou, Haegeman. Ysebaert & Fiers. 1972: Fiers et al., 1975: Fiers ef al., 1976); coat protein (Bacteriophage fd) (Sugimoto. Sugisaki, Okamoto & Takanami. 1977); Rabbit P-Globin (Efstratiadis Kafatos & Maniatis, 1977) : Rat-growth hormone (Seeburg. Shine, Martial, Baxter & Goodman, 1977), Human Somatomammotropin (Shine, Martial, Baxter & Goodman, 1977) RNA messengers. TABLE

Code

I.

Standing

1

,ymc~tic

L’CC: 1:cc

(/CA LJCG c’l:L. (‘UC cu,4

Leu

C’L’G‘ 2:;;

AUA AUG

I’tiL

Ser

Ile

Met

ACti ACC ACA .ACC

(;(;C‘

cys

l’G.4

Ter Trp

1’GG

ecu

CCC (‘CA CCG

code

Pro

C‘GL’

CAC. g4;

H’s

c‘4(1

Gin

AA6 ACC

*”

Thr GAL’ GAC GAA GAG

CGC CG.4

ACE

CGG

AGC: AGC AGA AGG

Ser A&!

GG(i *‘I’ Glu

g; GGG

Gly

RELIABILITY

IN

THE

TABLE

GENETIC

507

CODE

2

Code II. Non-degenerate and non-ambiguous code consisting of 21 codons, those more often used in 2 virus genomes and 4 mRNAs of well-known base sequence (see methods) l/UC:

Phe

(/CC:

Ser

UAU: Tyr UAG : Ter

UGU: cys UGG : Trp

CUG:

Leu

CCU:

Pro

CAC: His CAG : Gln

CGU:

Arg

A UC:

Ile

.4CC:

Thr

A.4C:

Al/G:

Met

GUG:

Val

GGU:

Gly

Asn

A.4A : Lys CC-U:

GAC: GAG:

Ala

Asp

Glu

Code III. A non-degenerate, non-ambiguous code formed by 20 selected triplets from present genetic code, optimized for the lowest probability of mutation (amino-acid change), after one basechange in any triplet (Table 3). Code IV. A theoretical non-degenerate code consisting of 16 selected triplets. such that, a mutation is attained after at least two basechanges per triplet (Table 4).

Only the Code I usesthe 64 possibleactual triplets. The others use 21,20 or 16 triplets, and when another triplet is generated, this can be read in two ways : (a) As the original triplet from which it arisesor as any other form which it differs in only one base. For instance, in Table 4.

TABLE

Code

III.

3

Non-degenerate and non-ambiguous consisting of 20 less related codons

CJUC:

Phe

UCA : Ser

UAC:

Tyr

CUU:

Leu

CCG

: Pro

CAU: CAA:

His Gln

AUA: AUG:

Ile Met

ACU:

Thr

AAU: AAG:

Asn Lys

GUC:

Val

GCG:

Ala

GAC:

Asp : Glu

CAA

Code

UGU: UGG:

Cys Trp

CCC:

Arg

GGA : Gly

S. ALVARADO

508

E7

II..

TABLE 4 Code IV. Theoretical non-degenerute code, consistiry sf 16 non-directlv reiated codons

Phe “L,JLI __~22!!“““” original triplet

t CLJL’

non-considered in Code IV

L;L’L’ (Phe) CLrC (Leu t CALI (His) near triplets

(b) As nonsense triplet. that is, when any of them is generated the message reading is stopped and the incomplete wrong messenger or protein is hydrolyzed. It is assumed that the information in DNA is repeated. so that a right transcription can be accomplished by a different redundant segment. Obviously, in this case, no error is detected after one base change in any triplet of Code IV. Codes III and IV do not have termination triplets, but the end statement can be accomplished, for instance, by the repetitive use of some given triplets. Results Table 1 shows the standing genetic code. with an average probability of error by one base change in any triplet. of 0.8 17. As an example, average probability of error after 22 at random changes. of only one base in different triplets of mRNA of Growth Rat Hormone (Seeburg, Shine, Martial, Baxter & Goodman. 1977) is 0.818 and the total number of triplets differing by one, two or three bases are 84, 112 and 14 respectively, (Mutation changes.) In the case of the non-degenerate and non-ambiguous Code II, the average probability of error after one base change in any triplet is 0.771. The average experimental error after 22 at random changes of only one base in different triplets is 0.78 1. and the total number of triplets differing by one, two or three bases are 3 1, 90 and 89 respectively. Code III displays a lower probability of error (0.761) after changing one base in any triplet, being the average experimental error found after 22

RELIABILITY

IN

THE

GENETIC

CODE

509

changes of only one base in different triplets 0.726 and the total number of triplets differing in one, two or three bases: 16, 91, and 83 respectively. Finally, Code IV, without degeneracy, has 16 triplets such that, after a base change in each triplet, no error is induced when considering as missense the codons excluded of the Code. Otherwise, the average probability of error is 0.667 when triplets non-represented in this Code are generated. For instance, CUU generated by a change in any of the following triplets: UUU, CUG or CA U (see Table 4). In this case, the experimental average probability of error after 22 at random changes of one base in any triplet is 0.648. Assuming that any base change corresponding to transitions (U 4 C and A + G) is recognized and read as the original triplet (for instance, CUU should be read as UUU), the average probability of error in this Code is:

0.445. In Code IV, the total number of triplets differing by one, two or three bases are 0, 72 and 48 respectively. Code shown in Table 4 is one of the 576 ((4!)‘) solutions or sets of triplets with the above mentioned characteristics. The calculation of this figure can be represented by a magic square in which the numbers 1,2, 3 and 4 correspond to the four nitrogen bases in the third position of all possible codons (see Appendix).

i=

1

2

3

4

2 3 4

. . .

. . .

. . .

$40

10

As can be seen in Table 4, there are four ambiguous triplets shared by two equivalent amino-acids, those displaying in Nature the highest rate of replacement (Dayhoff, 1972). Even considering this kind of replacement as error, the average probability of error after changing only one base in any triplet, is in this Code of only 0.673. 4. Discussion The above analyzed non-degenerate codes display a lower probability of error than the standing natural code, specially the one formed by unrelated codons (Code IV): the absence of such a code in Nature is understandable considering the random appearance of early codons and the difficult transition of these codons to a set of non-related ones. In fact, Code IV and

510

S.

ALVARADO

ET

4L..

even Code III joint the most independent set of triplets. Otherwise. reliability and mutability of the genetic information has to be balanced to preserve species and maintain evolution; in this sense. a rather immutable code would not represent a paramount aim. APPENDIX Let B the set formed by the four bases B = {A, I/. G. (‘I Let the triple Cartesian product

of B.

B” = j(.xls2s3).~,

E B;,

(for instance (.4, U, C) E B3). A distance d in B3 can be defined. d: B3 x B’--+ (0. 1. 2, 3; (T,, T2) -+ d( T, , T2) number of different coordinates, Example : T, = > (A, U. G) (7‘2 = (A. C’. 6) = > d(T,. T,) = 1. We must find a subset M in B3 such that: VT,. T?EM. (i) d(T,. T2)> I (ii) Cardinal (M) be maximum.

T, # 7-Z.

H~pporhesis: / of A4 fulfils prints (i) and (ii), then cardinal

(M)
.I

(C, (C, (C, (C,

A, c, u, G,

.) .) -1 .)

(U. A, (U, c. (U, u. (U.G.

.) .I .) .I

(G. (G. (G, (G.

A, C, U, G.

.) ) .) .)

(1)

> 16, there would be at least a pair of elements in M with equal

RELIABILITY

two first co-ordinates:

IN

THE

GENETIC

511

CODE

Ti, Tj. = > d(Ti, Tj) = 1

So cardinal (M) < 16. In (1) the third element must be selected in such a way that, to maintain inter-triplet distances greater than 1, none of the third component of a column or row can be repeated. Example : AAA ACC AUG AGU

CAC CCG cuu CGA

UAG UCU UUA UGC

GAU GCA GUC GGG

which proves that any subset M of B3 with conditions elements. Proposition 2 The total number of subsets M of B3 with conditions Dem : If we associate

(2)

(i), and (ii) has 16

(i) and (ii) is (4!)2.

1-A 2-c 3-G 4-u

and considering the distribution of the elements in the third coordinate, the sum of the figures associated to the third component in rows or columns is 10. Example in (2) we have: 1 2 3 4 10

2 3 4 1 10

3 4 1 2 10

4 1 2 3 10

= = = =

10 10 10 10

This type of configuration is known as: “magic square”. Moreover by combinatorial analysis is shown that the total number of different “magic squares” for four different numbers is (4!)‘.

REFERENCES DAYHOFF, M. 0. (1972). Atlas of Protein Sequence and Structure, pp. 90. Washington National Biomedical Research Foundation. EFSTRATIADIS, A., KAFATOS, F. C. & MANIATIS, T. (1977). Cell 10, 571.

D.C.:

512

S.

ALVARADO

El

./iL.

FIERS, W., CONTRERAS. R.. DUERINCK. F.. HAEGEMA~. G.. MERREGAERT, J., MI> Jot’. W.. RAEYMAEKERS. A.. VOLCKAERT, G.. ISEBAERT. M.. VAN DE KERCKHOVE, J.. NOLF, F. CG V.43 MONTAGU. M. (1972). Naturr 256. 273. FIERS. W., CONTRERAS. R.. DUERINCK. F., HAEGEMAN. G.. ISERENTANT. D., MERRtCAERI. J.. MIN Jov, W., MOLEMANS. F.. RAEYMAEKERS. A.. VAN I)E BERGHE. A.. VOISKAERT. G. & YSEBAERT. M. (1976). Naturr 260. 500. MIN Jou. W., HAEGEMAN. G.. YSEBAERT, M. & FIERS. W (1972). ,Vuture 237. 81. SANGER, F.. AIR. G. M., BARRELL. B. G.. BROWN. N. L.. COULSOU. A. R., FIDDES. J. C’.. HUTCHISON III. C. A.. SLOCOMBE. P. M. & SMITH. M. (1977). Nurure 265, 687. SEEBURG. P. H.. SHINE, J.. MARTIAL. J. A., BAXTER. J. D. &GOODMAN. H. M. (1977). .?v’urur~~ 270.

486. SHINE,

J.. SEEBURG.

P. H., MARTIAL.

J. A., BAXTER.

J. D. & GOODMAN.

H. M. ( 1977 1 A~rrrrrt~270.

494. SUGIMOTO.

K.,

SUGISAKI.

H..

OKAMOTO.

T. & TAI(AN.~MI.

M. (1977).

.I. )>Io~. &oi.

1 Il.

48:

Reliability in the genetic code.

J. rheor. Biol (1979) 77, 505-512 Reliability in the Genetic Code-j’ S. ALVARADO, R. FIGUEROA, A. SEP~JLVEDA, M. A. SOTO AND J. TOHA Biojisica, Depar...
342KB Sizes 0 Downloads 0 Views