Early Assembly Pathways of Type I Collagen D O N A L D C. WALLACE

Celtrix Laboratories, 2500 Faber Place, Palo Alto, California 94303

SYNOPSIS

A method was developed for computing the free energy ( AF,) of aggregatesof type I collagen. The method was based on a treatment of Matheson and Flory describing phase equilibria of rigid rod polymers. It included a polymer-solvent interaction term that depended on near neighbor transfer energies. Extrahelical portions of the molecule were assigned local interaction energies differing from that assigned to the helix. Free energies of reaction for successive steps along assembly pathways ( AF,-,+l) were computed. When allowance was made for specific pairing between extrahelical and helical domains,the so-called D-staggered (D = 670 A) alignment of molecules was preferred, as opposed to a nonstaggered, or nematic, alignment. Based on AF-L+lalone, it appeared that 1D-staggered oligomers arise first in assembly, followed later by addition of molecules in 4D alignment. Neither 4D dimers nor 4D-8D trimers were predicted to be major intermediates in assembly. This result is contrary to previous hypotheses. When energies of activation were included in the analysis, the prediction was less certain, and specific circumstances were identified in which 4D dimers and 4D-8D trimers were the earliest aggregated species in assembly.

INTROD ICTION Over the past decade there have been repeated attempts to define the early steps in assembly of type I collagen (reviewed in Ref. 1). Based on experimental data and model building, two qualitatively distinct events have been discussed. First, Helseth e t a1.2 have proposed t h a t the N-terminal extrahelical peptide ( N T E H P ) on one molecule undergoes a conformational change that creates a binding domain for a complementary helical region on a second molecule. ( T h e collagen molecule is understood t o consist of the triple helix plus extrahelical peptides, EHP) . The bond t h a t forms links two molecules in the so-called 4D ( D = 670 A ) configuration (Figure 1and Table I). T h e C-terminal extrahelical peptide ( C T E H P ) of the second molecule participates in the bimolecular linkage, binding t o a different complementary region on the helix of the first m ~ l e c u l e . ~ This 4D-linked oligomer is the first stage of the second event, namely, the association of monomers into oligomers. Many morphologies are possible. Silver has suggested that 4D-linked trimers form as wellBiopolymers, Vol. 32, 497-515 (1992) 0 1992 John Wiley & Sons, Inc.

CCC 0006-3525/92/050497-19$04.00

defined intermediates. Such trimeric intermediates may assemble into larger structures containing 1215 molecules. Finally, the dodecamers and pentadecamers associate into large fibrils. I t is proposed that the entire fibril is constructed from such oligomeric building blocks, but the possibility of individual molecules adding to the growing fibril is not ruled out. A further proposal is that the 4D dimers and trimers constitute the dominant solution components during the so-called lag phase of fibril assembly. There has been a n intensive search for such aggregates in the lag phase, but the evidence for their existence is contradictory.' They may be present in relatively small amounts ( < 2-5% by weight of total protein), in which case they could escape detection by dynamic light scattering and electric birefringence. It is also conceivable that initial aggregates arise by 1D-staggered alignment (Figure 1 C ) . Otter et al.5 have noted t h a t one segment of the CTEHP, residues 6-9, closest to the termination of the helix, is strongly hydrophobic and can bind to the region near residue 780 on a n adjacent helix. The resultant stagger of the two molecules is 1D. In this configuration, a portion of the CTEHP, residues 15-24, is still available to bond 4D to a third molecule near 497

498

WALLACE

B

A 01-

C

, ‘I1 l\q232

2345-

67-

8-

D scale

~

0

:i jA2, ‘723

3 4

51

residue 87. Thus, one could have 1D-aligned dimers, which then add a third molecule 4D to the first, to produce a 1D-4D trimer. Again, such oligomeric intermediates, if present as dominant solution components, should be detectable in the lag phase of assembly. In this paper the nature of early assembly pathways is examined theoretically. A model is constructed in which the collagen triple helix plus nonhelical peptides is simulated by a simplified geometry. The result is a multisegmented rod whose phase equilibria can be computed by a theory of Matheson and Flory.‘ The two phases in equilibrium are an isotropic solution of rods and an anisotropic aggregation of rods, which represents the collagen fibril. Intermediates in fibril assembly, such as the dimer

\

‘73

a

E

d=l4.

Figure 1. Schematic diagram of distinct domains on the collagen molecule and the orientation of these domains in oligomers. Details of the molecular dimensions for each domain are given in Table I. The D scale is depicted alongside the molecules. A: The NTEHP is depicted as a circle at the top of the molecule and has a sequence length ql. The vertical line represents the helical domain 7 2 , which bears the domains vZS2, vTJ1,and vZl, which pair

~

~~

with specific E H P domain^.^'^^^ On the bottom is a pair of circles, the CTEHP, consisting in this example of two subdomains, q31rand q32.5 B: Two molecules paired together in the 4D configuration, with pairing (arrows) between ( a ) regions q1( 2 ) of molecule ( 2 ) and qZl( 1) of molecule ( 1) , ( b) between q232( 2 ) and q32(1) , and (c ) between the short stretch of overlapping helical domains, designated bimolecular (“bi”) . Regions in which no helical overlap occurs are designated unimolecular (“uni” ) . Molecules in a n oligomer are numbered starting from top to bottom, with numbers in parentheses. C: Two molecules paired in the 1D configuration, with pairing between ( a ) qS1( 1) and 7231 ( 2 ) and ( b ) between the overlapping helical domains. D: An alternative model for collagen, in which there are only 2 subdomains in the helix that pair with EHP. In the 4D configuration, pairing occurs between overlapping helical domains, between q1 ( 2 ) and qzl ( 1), and between q Z 3 ( 2 )and ~ ~ ( 1 )In. the 1D configuration there is only pairing between overlapping helical domains; no specific pairing of E H P and complementary pairing domains on the helix occurs. E: Equivalent cylinder for the 4D dimer depicted in 1B. The central rigid region is represented as a right cylinder. The E H P are represented as very short cylindrical appendages connected to the central cylinder by flexible joints. Strips circumscribing the central cylinder at specific D positions represent the helical domains which can pair with E H P of adjacent molecules. Under the noncooperative assumption, the strip q232(1)will have x = 1.59, but under the cooperative assumption this same strip could take on several values, depending on whether the 4D dimer is a reaction product ( X = 0.094, inactive) or a reactant ( X = 1.59). The intervening helical domain has X = 0.107 for the noncooperative assumption. Under the cooperative assumption, X = 0.094 for the 4D dimer as product. As a reactant (step 24,Figures 4 and 5),uni-, bi-, and trimolecular domains will be created (see the 1D-4D trimer in Table IV) by the reaction, and x = 0.094 for uni- and bimolecular associations, but X = 0.099 for the short trimolecular segment.

EARLY ASSEMBLY PATHWAYS

499

Table I Position of Pairing Regions and Dimensions of Each Region Along the Collagen Molecule" Position by Amino Acid Region

D Position

No?

Position in A Units

71C

-.0522 to 0.0 +0.354 to 0.397 3.32 to 3.37 3.96 to 4.02 4.33 to 4.37 4.37 to 4.42

-16 to 0 83 to 96 779 to 784 928 to 938 1014 to 1026 1026 to 1039

-35 to 0 237 to 275 2228 to 2256 2654 to 2689 2900 to 2929 2929 to 2957

7232 7231

721 731 732

Length of Pairing Region in D Units

Length of Regions in Axial Ratio Units

0.0522 0.0427 0.0427 0.0522 0.0427 0.0427

71 = 2.41 q2=

200

73 =

3.94

a Symbol explanation: in vijh(l),i is the domain location, helical or nonhelical, where the sequence resides; i = 1 is NTEHP, i = 2 is central helical, i = 3 is CTEHP; j or jk is the domain location of the pairing partner (e.g., the pairing domain could be 1 for NTEHP, but 31 or 32 for CTEHP); and (1) is the molecule number in a multimolecular aggregate. From Refs. 2, 3, and 5. ' Symbols: q l , N-terminal EHP; g232rhelical region pairing with v32 portion of CTEHP in 4D alignment5 (see Figure 1 for an illustration of this and subsequent alignments); vz31, helical region pairing with 1 3 1 portion of CTEHP in 1D alignment5; vzl, helical region pairing with NTEHP in 4D alignment'; v31 and q3', proximal and distal regions of CTEHP. The helix extends from OD to 4.33D.

or trimer species mentioned above, can be treated in a similar manner by reducing the oligomer to an equivalent multisegmented cylinder, for which phase equilibria can also be computed (see Figure 1E). This information then permits the determination of free energies of reaction between oligomers of size i and i 1 ( i being the number of molecules in the oligomer). Oligomers in which different modes of attachment (OD, lD, 4D, etc.) occur can be examined, and pathways of assembly involving these modes can be compared. From the free energies of reaction along alternative paths, it is possible to predict which paths are preferred thermodynamically. Path preference can be controlled by kinetics as well as thermodynamics. Kinetically, the preferred path will be the one with the lowest energy The energy barrier can include positive differences in free energy of reaction (from state i to i 1) and activation energy arising from a variety of molecular mechanism^^^^^ (Figure 2 ) . It is proposed that an important component of the activation energy is the rotation of bonds in extrahelical peptides and in helical amino acid side chains, in order to facilitate molecular association. Thus, the theoretical model indirectly addresses the effect of conformational changes. It addresses directly the attachment geometry-e.g., 1D vs 4D-and attempts to quantify its role in the assembly process.

+

the problem, even though the outcome may not necessarily conform to the classical description of these phenomena. During the transformation of an isotropic parent phase to an anisotropic, crystalline phase, one can envision association of molecules of the parent phase a1 in a stepwise manner:

The clusters, or nuclei, ai, can be considered embryos of the anisotropic phase. Classically, such nuclei require free energy in order to grow until a cer-

+

GENERAL F O R M U L A T I O N

activated complex

free energy energy barrier

i

i + l

step in pathway

It is convenient to utilize terminology and symbolism from nucleation and growth theory to describe

Figure 2. Representation of the energy barrier in an assembly pathway. See text for details.

500

WALLACE

tain size i* is reached, at which point growth continues with decreasing free energy. The barrier to growth is associated with the construction of oligomers below the critical size, and the driving force is due to the bulk free energy difference between the parent phase and the new phase (see also Refs. 1113). The energy along a reaction path from state i to i 1 is depicted in Figure 2. In the context of collagen assembly, the energy barrier may include a positive free energy difference between states i and i 1, AFi-i+l,as well as an activation energy. The activation energy for a polymeric molecule like collagen is undoubtedly complex. It presumably consists of the following components: ( 1) AF‘,,the energy of activation for transport of molecules a short distance from solution to the surface of the nucleus 1’,11,13,14 ; ( 2 ) the energy associated with conformational changes of reacting groups on amino acid side chains’s3; and (3) an orientational, or steric, term, describing the fraction of encounters in which molecules are in the proper alignment for stable b ~ n d i n gThere .~ may be several paths of assembly, such as Eqs. ( 1) . There is a hypothetical path with the lowest energy barrier A4*, and the rate of nucleation I is proportional to this barrier

+

+

’’:

I

K

( k T / h )exp(-Ac$*/kT)

(2)

where k is Boltzmann’s constant, T is the temperature, and h is Planck’s constant. In a molecular theory of assembly, Ac$* will be taken as a barrier that arises from the microscopic details of the particular system under study; in the classical theory A$* is proportional to the interfacial free energy between the parent phase and the newly forming anisotropic

CALCULATION OF THE FREE ENERGIES OF EACH PATHWAY STEP, AFi-i+l In order to calculate molecule

AF-i+l

+ molecule

u r nb

for a given step, bimolecular aggregate (e.g., a 4D dimer)

(where the subscript mb in m m b refers to “moleculeto-bimolecule” ) one may employ the expressions of Matheson and Flory‘ for equilibria of rod-like polymers in solution vs a crystalline phase of the same polymers. The procedure follows that described previously, 15,16with some modifications. One proceeds by an indirect method, utilizing the

equilibrium of each species with respect to the fibrillar state: Mmf

molecule e fibril

(3a)

u hf

bimolecule e fibril

(3b)

where the subscript mf in AFmfrefers to “moleculeto-fibril’’ and so forth. Written as chemical potentials:

Subtracting Eq. (4b) from 2 X Eq. ( 4 a ) ,one obtains

The pfibril in Eq. (4a) is on a monomeric basis (per mole of nonaggregated molecules), whereas Pfibril in Eq. (4b) is on a dimeric basis (per mole of bimolecules) , and 2 pfibril [ Eq. (4a)1 = Pfibril [ Eq. (4b) 1. The same general approach is followed for all pathway steps. The problem then is reduced to computing AF between any specified collagen oligomer and the mature fibrillar phase. To calculate these free energies we first need to define the dimensions of the molecule and oligomers, which will then be applied to the expressions of Matheson and Flory‘ for phase equilibria of rodlike macromolecules. The collagen triple helix can be divided into segments 670 A long, the so-called D spacing. A complete collagen triple helix is 4.33D long (or 1014 amino acid residues), which is equivalent to 2900 A. The molecule has a hydrodynamic diameter of 13-15 A. The EHP extend outward from each end, and are of dimension 0.0522D for the NTEHP and 0.0854D for the CTEHP.I7 It is assumed that the EHP have specific conformations in the collagen fibril. Presumably they also have a preferred conformation in the isolated molecule. Otter et al.’,’’ have presented nmr data suggesting that the NTEHP can adopt a hairpin conformation, and the CTEHP, a relatively extended conformation. For the purposes of the model employed here, these EHP are taken to be rigid regions connected by flexible joints to the rigid helix. Using the theoretical formulation of Matheson and Flory, the complete molecule has an axial ratio of x = 206 with a central rigid region qz = 200 and ql = 2.41 for the NTEHP, and q3 = 3.94 for the CTEHP. The relationships between molecule dimensions are given in Figure 1

EARLY ASSEMBLY PATHWAYS

and Table I. The axial ratio units are derived by taking the diameter of a single helix as 14.5 A, which gives an aspect ratio of 2900/ 14.5 = 200. The lengths of the EHP in axial ratio units are then taken from the equivalence of 200 axial ratio units per 4.33D for the helix-e.g., the length of the NTEHP = 0.0522D X 200/4.33D = 2.41. The dimensions used in this paper differ slightly from those of Wallace.16 The current values are more precise representations of the dimensions of collagen 17; higher precision is required in this study because ID and 4D pairing alignments in collagen oligomers are very sensitive to the specified dimensions. To compute phase equilibria for these rod-like structures, the following expressions are used:

Equation ( 5 ) represents the difference in chemical potential of solvent in the anisotropic ( p k ) and isotropic phases ( p a ) ,and Eq. ( 6 ) represents the analogous difference for anisotropic solute ( p : ) and isotropic solute ( p x ). The ub and up are the solute volume fractions in the anisotropic and isotropic phases, respectively. The X is the solvent-polymer interaction parameter, which can be related to a transfer free energy per polymer segment, A f 6,15,16:

x

=

-Af/RT

(7)

A f incorporates all the local (per segment) bonding forces that stabilize the fibril by short distance interactions. Such interactions include predominantly hydrophobic and electrostatic forces, although hydrogen bonding and other bonding types may participate. Equations ( 5 ) and ( 6 ) represent a specific region of the phase diagram for which y = 1. They are analogous to Eqs. ( 19') minus (21) and Eqs. (20') minus (22) in Ref. 6. They are derived from Eq. (12) of Ref. 6 with y = 1 and ql, q 2 , and q 3 specified as the three rigid sequences of the molecule, all connected by nondimensional flexible joints. No random coil sequences are present. The standard state is taken to be the pure fibrillar phase-that is, a perfectly ordered collagen fiber with its tightly bound ~ a t e r . 6 . '[The ~ current Eqs. ( 5 ) and ( 6 ) are

501

valid only for y < q l , q 3 ;additional variations of these equations are necessary if the above conditions do not hold.] At equilibrium both equations are satisfied and equal zero. When conditions for equilibrium are not fulfilled, e.g., when up is greater than the equilibrium solubility, Eq. ( 6 ) is the free energy driving that molecular species to the fibrillar state, AF,, in Eq. (3a). When intact collagen is subjected to protease digestion, the EHP may be partially or totally removed. Such truncated molecules are more soluble than intact collagen16and provide additional information needed to parameterize the model described below. When intact collagen is treated with the protease pepsin, the NTEHP is entirely removed and the CTEHP is partially r e m o ~ e d .For ~ , ~pepsin-digested collagen, equations analogous to ( 5 ) and ( 6 ) can be constructed, in which there are only two rigid segments q2 = 200 and q3 = 1.97, and x = 202. The q3 represents the truncated C1 portion of the CTEHP, which is 0.0427D in length (See Table I), or 0.0427D X 200 axial ratio units/4.33 D = 1.97 axial ratio units in length. (This truncated q3 EHP will hereafter be designated as ~ 3 1 ) .Digestion with the protease pronase removes both EHP completely.'~~There is a third set of equations for pronase-digested collagen, with one rigid segment, x = ~2 = 200, and no EHP (see Ref. 16 for a detailed discussion). The next step is to take the experimental solubilities of intact (native) collagen (up = 6.5 X or 9 pg ~ o l l a g e n / m L ) , 'pepsin-digested ~ collagen ( up = 6.8 X or 98 pg/mL), l3 and pronasedigested collagen (up = 3 X lo-*, or 433 pg/mL)," and determine x a t equilibrium for each species. At equilibrium, Eqs. ( 5 ) and ( 6 ) are equal to zero, and up, ql, q 2 , q3? and x are known for each species. The two equations are solved iteratively for the two unknowns, ul, and x. The resultant x factors are x, = 0.175 for native collagen, Xp = 0.136 for pepsindigested collagen, and X, = 0.107 for pronase-digested collagen. Finally, one may resolve these composite X factors into components X1, x2,and XB representing the interaction parameter for each segment. For the native molecule,

where xn is the composite value for the entire three segment species; q2 is the length of the central rigid sequence [axial ratio units or D units may be used until application in Eqs. ( 5 ) and ( 6 ) , for which only axial ratio units may be used] ;xzzand qz2designate

502

WALLACE

the subdomain of the helix which interacts with other helical domains; 7721 designates the helical subdomain which can form strong pairing interactions with vl, the NTEHP; 7231 and 7232 designate helical subdomains that pair strongly with the CTEHP. The identification of possible pairing domains is presented in detail in Figure 1 and Table I. The NTEHP (designated q l ) pairs with a portion of the helical domain designated vZl. The pairing of the CTEHP with the helix may involve two domains on the helix, sequences ~ 2 3 2and ~ 2 3 1 and , the complementary sequences 931 and ~ 3 2 on , the CTEHP.5 It must be emphasized that the identification of such pairing domains is at present hypothetical; this is particularly the case for the t)231-f/31 pair, since there is no cross-linking evidence to support the propo~al.’,~ Figure , ~ , ~ 1D considers a simpler pairing arrangement, in which there are only two subdomains, designated 7723 and 921,on the helix. For pepsin-treated collagen, the composite xpis

where xzpis the interaction parameter for the total central helical domain, x r is the axial length of the truncated pepsin-treated molecule, and 9b2 is the helical subdomain that interacts with other pepsintreated helical domains. (In this species there is no NTEHP, so the region corresponding to qZ1on the pepsin-treated helix has no pairing partner and has a X value like that of the 7722 subdomain; likewise for the 11232 subdomain, which has no pairing 932 CTEHP domain.) Finally, for pronase-digested collagen, xZz= X, = x2 represents the whole polymer-solvent interaction term, since there is only one helical domain with no EHP pairing partners. For all three collagen species, the following values (in D units) were deduced from the dimensions of the different domains (see also Table I ) : x’ = q2 7731 = 4.33 0.0427 = 4.37; x = 171 + 7 2 773 = 0.0522 4.33 0.0854 = 4.47; 7722 = 772 - 7721 - 0231 - 77232 = 4.33 - 0.0522 - 0.0427 - 0.0427 = 4.19; 772; = 9 2 - 7231 = 4.33 - 0.0427 = 4.29. These values, together with x,, xp, and X z z for pronase-digested collagen permit determination of XI = 0.596, X z = 0.142, and x3 = 1.59, from Eqs. ( 8 ) - ( 1 1 ) . For intact collagen, Eqs. (5) and ( 6 ) are satisfied when up = 6.5 X vl, = 0.921, X = 0.175, and x , q l , 772, and ~3 are as given above. The equilibrium constant K,, for the reaction collagen molecules collagen fibrils, is [ a f J / [ a l ] ,where [ a f ] and [al]

+

+

+

+ +

are the molarities of collagen in the fibril and SOlution phases, respectively. Since the molecular weight of a fibril is variable, it is simpler to express [ a f] as moles of monomeric collagen per liter. Then [ a i ]= [ai] =ui/(lOOO&M),where ui is the volume fraction and Ui is the partial specific volume of collagen (0.695 for dry collagen, l3,I5 or 1.06 €or collagen with hydrated length of 2990 and 14.5 A in diameter). The standard reaction free energy for assembly of molecules to fibrils, A F L f , is given by21

Example Calculation of AFi-i+l for Isolated Molecule to 4D-Staggered Bimolecule (See Figure 1 )

Two molecules of configuration A react to form one molecule of configuration B. In this example, it is assumed that the energies of pairing of EHP and their complementary regions on helices are not dependent on the size of the oligomer. This particular method of calculating AF-L+lwill be referred to as the “noncooperative assumption.” An alternative method invoking cooperativity will be described in a subsequent section. For all oligomers, it is assumed that there is a complex central sequence, which holds the component molecules in a single rigid structure because of pairing interactions (e.g., the region from OD to 8.331) in Figure 1B). Within the rigid central segment there are stretches (sequences) that have multiple molecules aligned side by side (the bimolecular helix overlap of Figure 1B); other parts of the central segment are unimolecular. At either end of the oligomer, there is an NTEHP and CTEHP that are connected by flexible joints. Thus, all molecular structures being considered consist of three rigid segments connected by flexible joints; the single molecule has a unimolecular central segment, but the oligomers have a more complex structure in the central segment. To use the Matheson-Flory equations to calculate the standard free energy of reaction from the 4D dimer to the fibril, A F : D d p f , one needs values for the molecular dimensions of the structure, in terms of x , the axial ratio (also termed the aspect ratio), and for x. These values can then be inserted into Eqs. (5) and ( 6 ) . The calculation of x is considered first. Again referring to Figure lB, the central rigid region is 8.33D long. This central rigid region contains two segments, each consisting of one helix-

EARLY ASSEMBLY PATHWAYS

( 1) between 0 and 4D and ( 2 ) between 4.33 and 8.33D; total extent, 4D 4D = 8.OD. This type of helical segment will be referred to as unimolecular; each collagen helix, of course, is itself a triple helix of three a-chains. The central rigid region also contains a bi-molecular segment, consisting of two overlapping helices, from 4D to 4.33D, or 0.33D in length. The problem now is to reduce this complex geometry to an equivalent cylinder (Figure 1E). The unimolecular segment is already a perfect right cylinder 8.OD (5360 A ) in length and 14.5 A in diameter. For the bimolecular segment, the diameter of the equivalent cylinder is obtained by apposing the two constituent cylinders side by side horizontally and then inclining them 45O to the horizontal. The diameter of this inclined cross section (24.8 A ) thus approximates the average diameter of the bimolecule as it rotates about its long axis (Figure 3 and Table 11). The diameter of the equivalent cylinder is then simply the mean diameter averaged over the component segments:

+

diam4d dimer

=

(8.OD x 14.5 A

+ 0.33D x 24.8 A)/

8.33D

=

14.9 A

(13)

Table I1 Parameter Assignments for Lateral Association of Helices Number of Mole-

cules in Overlap

Effective Diameter

(A)a

28.5A

Figure 3. Equivalent cylindrical diameter of aggregates of helices, approximated as the widest diameter inclined 45' to the horizontal.

Approximate % Solvent Accessibilityb

14.5 24.8 27.1 35.0 38.8 38.8 38.8 46.3 52.8

100 83 67 58 53 50 43 42

41

X Under Cooperativity 0.094 0.094 0.099 0.102 0.104 0.105 0.107 0.107 0.107

a Calculated as illustrated in Figure 2 (assuming that the cross section is inclined at 45' and calculating the width, or diameter, at inclination). Calculated by the method of Lee and Richards.23

To calculate the axial ratio, one converts the total molecule length, 0.0522 8.33 0.0854 = 8.47D to A units. The conversion can be derived from Table I. The most precisely known ratio is 2900 A/4.33D, corresponding to the helix length for a single mole ~ u 1 e .The l ~ total length then becomes 8.47D X 2900 A/4.33D = 5670 for the 4D dimer. The axial ratio for the 4D dimer = 5670 A/14.9 A = 380 = x. Finally, the length of the three component segments- ql, q 2 , and V3-in axial ratio units are q1 = 0.0522D X 380/8.47D = 2.35; q2 = 8.33D X 380/8.47D = 374; and q3 = 0.0854D X 380/8.47D = 3.84. Next, the composite x factor for the 4D dimer is calculated. One identifies regions of the helix that have different x values. Helical regions with no specific pairing interactions with EHP are assigned x = 0.107. This value was obtained from the fit to experimental solubility of pronase-digested collagen.16,20Pronase-digested collagen has no EHP; therefore, its helical domain can only interact at low specificity with other helical domains. As such helical regions are aligned with adjacent helices, some hydrophobic and electrostatic contacts can occur. However, some regions of each helix are still exposed to solvent. The exposed regions contain hydrophobic and electrostatic groups that could potentially interact with additional helices. Only in the interior of a large fibril would all available energetic pairs be made, and the value x = 0 is assigned to these regions. The exterior of the equivalent cylinder of associated helices in bimolecular, trimolecular, etc., bundles, because of solvent exposure, is assigned the same x value as a unimolecular domain, 0.107. (Note

+

A

24.8a

503

+

504

WALLACE

that the treatment of pairing of EHP and complementary domains on the helix is different; once they pair, they are assigned x = 0, independent of whether they are on the surface of the aggregate or buried in its interior. See footnote c, Table 111.) Referring now to Figure l B , one identifies regions of the central domain that are unimolecular and a short region that is bimolecular. One subtracts from those regions the helical subdomains that can pair with EHP. Table I11 outlines these operations in detail. Considerable complexity can arise in regions where several distinct domains are exposed within the same D interval [ e.g., val (1) and a short sequence of q2 ( 2 ) in the interval D = 4.33 to 4.37; footnote d, Table 1111. All regions of the molecule are listed, with the appropriate D length and x factor. These values are then inserted into equations analogous

to Eqs. (8) and ( 9 ) for a single molecule; for the more complex central region, Eq. ( 9 ) becomes

where 72 = 8.33, X = 8.47, V2zuni = 7.68, q22bi = 0.33, and the remaining parameter values are given in Table 111. The result is x2 = 0.135 and Xn = X = 0.153. Now, from the final x value for the aggregate, and from x , q l , vz, and v3 (given above), Eqs. (5) and ( 6 ) are solved for the equilibrium up (2.83 X u6 (0.951) values. Then from Eq. ( 12) the standard free energy of reaction for the conversion of the 4D dimer to the fibril, AF:d-f, is -18,690 cal/mol ( a t T = 302 K ) . T h e corresponding energy for the

Table I11 Determination of Composite X Factor for the 4D Dimolecular Aggregate (Detailed Allocation of Energies Along the Central Rigid Segment, tld Region

-

X

D Length

XXD

Unimolecular

0.822 0.035 0.068 0.068 0 0.036 0 0.068

0.031 1.13 X, = (0.596 X .0522

+ 1.13 + 1.59 X .0854)/8.47 = 0.153‘

a One additional category of 7 sequences: t)2Zuni refers to a helical sequence (i = 2) that pairs with other helices ( j = 2) and is a unimolecular region (index = “uni”); t))722biis the corresponding sequence length for a himolecular sequence. In some cases, e.g., when cooperativity of helical domains is invoked, these distinct t) sequences will be assigned distinct X values. Analogous X subscripts are used. Each helical region that pairs with an E H P partner must, for consistency, have the same x as that E H P partner. ‘When two specific pairing regions interact, that pair then possesses no net transfer energy between solvent and solute phases; i.e., X = - A f / ( R T ) = 0. For this region of the equivalent cylinder, there are two types of domain exposed to solvent: q3,(1) on molecule (l),which has a X value of 1.59, and a segment of helical of domain v2(2), which has a X value of 0.107. The average X value for this region is thus the arithmetic average: X. = (1.59 0.107)/2 = 0.851. This is the application of Eq. (14)to the specific example. Using Eq. (8).

+

EARLY ASSEMBLY PATHWAYS

monomer to fibril is -7120 cal/mole, and utilizing Eqs. (4a-c)2

Pfibril

(pfibril -

=

2 x 0;-f

=

2 X -7120 cal/mole

(15)

- P4Dd = A F i D d - f = -18690 cal/mofe ( 16)

and subtracting Eq. (16) from (15)-

Thus, this step in the pathway is highly unfavorable. Cooperative Interactions Assumption

Since mechanisms of action of biological macromolecules are often cooperative, 22 this feature was incorporated into an alternative method for calculating AFy-i+l. In cooperative associations, it is proposed that all domains on the reacting molecule initially are in an “unactivated” state. Whenever two collagen molecules associate with potential high-affinity sites closely apposed, then side chains on the respective sites undergo rearrangement and position themselves for the highest energy of interaction. Such rearrangements are termed cooperative because they are presumed to involve several side chains simultaneously and only achieve a high-energy bond when the appropriate pairing partner regions are accessible. In the cooperative model employed here, we will include cooperative interactions between EHP and EHP-pairing domains on helices, such as qzl(l)and q l ( z )in Figure 1B. We also include modest cooperative interactions between helices. This feature is based on the hypothesis that clusters of helices can maximize contact area between helices and reduce solvent exposure. The contact area is approximately inversely proportional to solvent a c ~ e s s i b i l i t y(Table ~ ~ 11),and this provides a means for scaling x from 0.094 (chosen arbitrarily to generate a modest energy effect) for isolated helices and dimeric associations, to 0.107 for associations of seven and greater numbers of helices. There is an important distinction between activation of side chains due to the association of several helices, leading to rising x for any new helices that attach to the surface of the growing aggregate, and the eventual burying of helices within the interior of the growing aggregate. Buried helices are no longer part of the exterior of the equivalent cylinder, have x = 0, and do not contribute to the composite X. Following the cooperative bonding scheme, when two molecules approach, only the interacting sites

505

are converted to the activated state; otherwise, nonassociating domains remain a t the unactivated x level, chosen as 0.094, which is the level for an isolated helical domain. Taking again the example in Figure 1A and B, when molecules ( 1) and ( 2 ) are considered as reactants, the pairing qzl ( 1) site and the q1( 2 ) sites are assigned x = 0.596. In the dimeric product, the corresponding segment [the paired qZl( 1) - q1 ( 2 ) region in Figure 1B] is assigned x = 0. Sites that do not pair in the formation of the 4D dimer, such as ~17232(1) remain a t the same X value (0.094) as the surrounding helix. Helical domains in this reaction are uni- or bimolecular, and remain at X = 0.094. The composite X for each monomer reactant turns out to be 0.128, which is lower than for a monomer reactant ( x = 0.175) when no cooperativity is assumed. The difference is due to nonpairing sites remaining at low x values in the cooperative case, whereas the same sites in the noncooperative case would remain at high energy, e.g., x = 0.596 or 1.59. In both models, when EHP and EHP-pairing domains on helices are paired, they are assigned x = 0. When helical domains associate with helical domains, their exterior (solvent-exposed region of the equivalent cylinder) surfaces are assigned x = 0.094, 0.099, etc., depending on the number of helices overlapping in a bundle (see Table 11). When a collagen molecule adds to large aggregates, all sites on the reacting monomer will find pairing partners, and helix clusters will be very large, so that helical x factors will be at the maximum value. In such a case the AFy-i+l will be the same for the noncooperative and cooperative assumptions. Thus, the effect of cooperativity is to depress the energetics of early assembly steps relative to later. Calculation of Further Steps in the Pathway

The same approach is followed, but the intricacy of the paired and overlapped regions increases rapidly with oligomer size. For several oligomers, Table IV presents dimensions and x factors of the whole oligomer and of the central rigid segment. As one moves to oligomers with increasing diameters, there is a distortion of the lattice model that occurs. In the original formulation of the theory,24the polymer segment was assumed to occupy the same volume as a molecule of solvent. In a hydrogen-bonded solvent like water, however, the effective size of a solvent lattice cell could be much larger, perhaps 2050 molecules. The number of solvent molecules occupying a lattice site may differ from unity for other reasons, in particular, because of the simplifications

506

WALLACE

inherent in a lattice model for In any case, as the collagen oligomers grow in diameter, the polymer segment x may contain differing numbers of atoms. Compare, for example, a single molecule, x = 206, with the oligomer 7 in Table IV, a tetramolecular species with x = 201. The effect of the distortion is to give solubilities that are too high for larger oligomers, which in turn wili understate the AFo (species to fibril). The effect is not expected to be large, generally less than 10%;this is insufficient to invalidate the conclusions of the paper. For reasons that are outside the scope of this work, it appears to be difficult to correct the theory for this effect.

GENERATION OF PATHWAYS Using the methodology described above and in Ref. 16, AFy-i+l were computed for numerous intermediates. Intermediates were chosen that permitted a progression to structures approaching the dimensions of microfibrils observed by electron microscopy.4,13,25-27 In future explorations of the mechanism of collagen assembly, it is anticipated that simulations can be designed in which the path is not chosen a priori; rather, some random attachment process will be used, such as a Monte Carlo method, and the resulting pathway will arise from the energetics themselves. In this study the objective is not as ambitious; plausible alternative paths are generated and their energetics are compared. Paths that have been proposed based on experimental data 1,2,4 will receive particular attention. Pathways in which assembly occurs by successive addition of a monomer to an oligomer will be discussed at greatest length. In a later section the question of assembly from oligomeric building blocks will be addressed. Figure 4 depicts the possible modes of attachment of molecules to form aggregates. Modes of attachment in which the directional sense is reversed (molecules in antiparallel orientation) will not be considered. In general, AFp-i+l for any desired step can be computed, using equations analogous to (4ac ) . The noncooperative assumption will be considered first (Figure 4A). Formation of OD, lD, and 2D dimers is thermodynamically favorable, but formation of the 3D and 4D dimer is not. If the case of the OD dimer is set aside temporarily, it will be observed that the more asymmetric the dimer, the less favorable is its formation. The aspect ratios x of the dimers are as follows: lD, 175; 2D, 237; 3D, 305; and 4D, 380. The reason for the energy barrier in the monomer-to-4D dimer and monomer-to-3D

EARLY ASSEMBLY PATHWAYS

2O lo 1 1 (-2670) (-4750) -I+

(-6300)

1

I

f

f

B

n3

II

(-2070) f

f

\-3470)

I

Figure 4. Schematic diagram of types of oligomer bonding. Individual molecules are represented by single vertical lines. Oligomers are represented by two or more closely apposed vertical lines. The oligomers are shown in an “unrolled” configuration for ease of visualization. Their actual three-dimensional arrangement is the same as for the Smith microfibrilz5with cross sections like those depicted in Figure 3. See also Fig. 10 of Ref. 28. Five different bonding configurations are depicted-OD, lD, 2D, 3D, and 4D-referring to the stagger separating the Nterminal of apposed molecules. Assembly steps are labeled with step numbers. Superscripts designate OD, lD, etc., pathways. Values in parentheses are free energies of formation for the product of the step. All steps occur by addition of a monomer, which is not depicted to avoid

507

dimer reactions is that both the 4D dimer and the 3D dimer have equilibrium solubilities much lower than that of the monomer. up = 5.29 X and 3.97 X respectively, for 4D and 3D vs up = 6.5 X low6for the monomer-all computed using Eqs. ( 5 ) and ( 6 ) . Thus, the monomer is more stable in solution (in a lower energy state) , compared to the 3D and 4D dimers. These results explain why endlinked oligomers of collagen are of lower solubility than monomers and precipitate more readily.26~29 The solubility of the 2D dimer (1.54 X lo-’) is also lower than that for the monomer; however, in this case the difference is not large enough to compensate for the relatively high level of local energy favoring association ( X = 0.178). As one proceeds to trimers the pathways branch. From the 4D dimer one can add a molecule 1D to molecule ( 1) and form a 1D-4D trimer (step 2 4 ) ,or one can add a molecule 4D to molecule ( 2 ) to form a 4D-8D linear trimer (step 24’). The formation of the latter species is again unfavorable, largely due to the drastic increase in x . Step 24, on the other hand, is extremely favorable, and partially offsets the preceeding unfavorable step. From the 1D dimer one may add a molecule ( 1 D ) (step 2’) and form a 1D-1D trimer, which is again favorable. One may also add a molecule 4D to molecule ( 1) to create the 1D-4D trimer, but this step is unfavorable (+420 cal/mole, not shown in Figure 4A). In Figure 5 additional steps are depicted, proceeding from the 1D4D trimer and from the 1D-1D trimer. The energies of these steps are presented in Table V. The latter pathways are examined in more detail below. Returning to the other dimeric products of Figure 4A, the 2D dimer and 3D dimer do not lead to unique pathways. If one continues to add molecules exclusively 2D or 3D, structures are generated that contain large gaps in the lattice. Filling in the gaps requires addition of molecules in 1D or 4D alignment. If this is done, the effect is to merge with oligomers formed by 1D and 4D alignment. For example, addition of a molecule I D to the 2D dimer generates a 1D-1D trimer (step 2’). Likewise, 1D addition to the 3D dimer (step Z 3 ) produces a 1D-4D trimer. For purposes of simplification, it will be assumed that 2D and 3D structures merge with those formed by 1D and 4D alignment. The 2D and 3D paths are

crowding. A. calculated with X computed as in Figure 1A-C and in Table 111, using EHP X values ( X I , &) and helical domain x values. No assumption of cooperativity. B: AF;-i+l assuming cooperativity and no vz31-731 pairing (as in Figure 1D).

508

WALLACE

I

- i‘ 44’

154’

- I/I 54

I

3lI

t

/

il‘

64

I 74

p

Schematic diagram of paths 1 and 4. The pathway with numbers bearing superscript 4 (including the shunt with 4 primed superscript numbers) is referred to as path 4 because it proceeds through intermediates containing only 4D alignment before molecules are added at 1D positions. The pathway numbered with superscript 1 is referred to as path 1 for the corresponding reason. Both paths merge at step 10. The largest intermediate depicted is an undecamer, of length 13.5 D, or 9020 A, and diameter 30.0 A.

Figure 5 .

thus considered to be minor excursions, and will be not be seriously explored. The last dimeric product is aligned OD ( step 1O ) . As one continues to add molecules OD, one generates tetramers, pentamers, and higher oligomers. The ultimate end point is not a collagen fibril. It is a different phase altogether, a nematic liquid crystal, with the ends of the molecules in register. Morphologically, the phase resembles so-called segment long spacing crystallites (SLS). SLS crystallites are

formed when soluble acidic collagen molecules are bridged by ATP,3” but never are observed under conditions favoring fibril assembly. In the collagen fibril, specific pairing regions are able to achieve their lowest energy states by pairing with the complementary partner, which can only occur if molecules are in staggered alignment. Thus, the OD crystallite, in which specific pairing regions are not apposed, is not as stable thermodynamically as the staggered collagen fibril.

EARLY ASSEMBLY PATHWAYS

Table V Standard Free Energies of Reaction for Steps Depicted in Figure 5

Free Energy for Step (AFP-,+,)" Step

Constant x

Noncooperative

Cooperative

1' 21 3' 41 51 6' 7'

-9490 -5700 -5850 -5650 -5580 -5530 -5490 -5470 -5170

-9320 -6300 -6050 -6750 -6600 -6520 -6410 -6390 -6350

-2710 -2060 -1760 -4440 -2540 -4020 -2700 -2260 -2950

+8280 -8970 -8020 -8820 $2340 -6780 -8790 -8960 -8480 $9580 -16500 -13100 -9210

$4450 -13400 -9130 -7920 +315 -8340 -8870 -8710 -9150 +5180 -13900 -13300 -8130

+378 -4730 -3530 -2750 +324 -4590 -4640 -3430 -5290 $1830

+420 -2340 -9060 -10,500 -9070 -7990

-2280

B1 91 i4

z4 34 44

54 64 74 84

94 24' 34' 44' 54' 2" 3" 3b 4b 5b 4' a

n

I n cal/mole at

T = 302 K, I

=

0.2, and pH

-

-1840 -3120 -

=

7.2.

The question also arises a s to whether OD alignment is found in D-staggered fibrils. Experimental data imply that some molecules in the fibril are aligned OD,31but based on models of packing order in the fibril, this is expected to occur a t crystal defects. Both the sheet model and the compressed microfibril model predict that OD molecules are separated from each other, and trimeric, tetrameric, etc., OD associations cannot develop.32Thus, progression by OD alignment (path 0 ) can only result in the formation of nematic crystallites, not D-staggered fibrils. The values for AFY-i+l along path 0 have been estimated as follows. Since the equilibria between collagen molecules and collagen nematic crystallites have not been measured (indeed, the existence of a nematic crystalline phase under conditions of fibril assembly has never been demonstrated), one can

509

only guess what the appropriate energetics would be. In a nematic crystallite only helical-helical pairing is possible, since E H P and their complementary regions are not apposed in OD alignment. I t is proposed that the appropriate x factor describing molecule-crystallite equilibria is equal to or less than that for the helical-helical interaction in pronasedigested collagen, namely x 5 0.107. Based on this conjecture, steps 1'and 2 were computed, using x = 0.107 for all species. The standard free energy of assembly of monomers to a nematic crystalline phase, AF:,, was found t o be -2640 callmol, and that of the OD dimer, AF&dimer-n, -526 cal/mol. Following equations analogous t o ( 15) and ( 1 6 ) , AF:-'Ddimer = -4750. Subsequent steps along the OD path had less negative and stabilized near -2600 cal/mol. These energies are far less favorable than those observed along the pathways depicted in Figure 5, and, as will be confirmed below, the OD pathway is not a viable alternative under physiological conditions. T h e pathways of Figure 5 result from addition of molecules t o either the 1D dimer or the 4D dimer. Two major paths are shown, referred t o as path 1 (numbers with superscript 1 ) and path 4 (superscript 4 ) . There are also several shunts that are labeled with different superscripts. The shunt labeled with 4' superscripts passes through a 4D-8D linear trimer (Step .Z4'). Table V shows that the construction of the linear trimer is quite unfavorable, but that the subsequent steps are very favorable. A method for deciding which of several paths is preferred will be presented below. T h e result of continued addition of molecules along these paths is a Smith microfibri12' of any desired length, but with constant pentamolecular cross section, depicted in Figure 3. How are thicker fibrils constructed? Figure 6 depicts two modes of thickening. In Figure 6A a single molecule attaches to the preexisting pentamolecular bundle. The initial step is favorable (AFf-L+l = -7600 cal/mol, when one molecule adds noncooperatively t o a dodecamer) , although the new molecule cannot form pairs with E H P on the preexisting oligomer. Additional molecules can add l D , one a t a time, t o the newly bonded molecule, and a second microfibrillar bundle can be constructed alongside the first. A second process involves the fusion of two preexisting Smith microfibrils (Figure 6 B ) . This step is thermodynamically favorable, but is expected to be slow, due to hindered diffusion of the reacting species. (An example of such a reaction is the lateral fusion of two dodecamers to form a 24-mer; the

510

WALLACE

Figure 6. Formation of fibrils with thicker cross sections. A New molecules (thin lines) add on to a preexisting microfibril with pentamolecular cross section (bold lines). The newly added molecules will not form EHP-helical pairing partners with the preexistingbundle;they can only interact through helical-helical pairing. B: Two preexisting microfibrils can bond by lateral collision and association.

noncooperative AF:-itl = -11,000 cal/mole.) Presumably mechanism 6B will be preferred when the concentration of monomers is depleted. The results for the cooperative assumption are presented in Figure 4B and Table V. Compared to the noncooperative, the picture is qualitatively similar; path 4 still exhibits two unfavorable steps, l 4 and tj4, but path 1 has only favorable steps, based on standard free energies of reaction.

IDENTIFICATION OF PREFERRED PATHWAYS Initially, paths 1 and 4 will be examined. Later, the significance of shunts in Figure 5 will be considered. Pathways may be compared by adding the free energies of each step.33 For example, noncooperative paths 1and 4 differ by the first 9 steps and then are

joined. Over these 9 different steps path 1has a total free energy of -60,700 cal/mole, and path 4, -60,800, nearly the same. Thus on a thermodynamic basis, both appear almost equivalent. Conclusions based on such short path fragments must be viewed with caution, however. It is presumed that collagen assembly progresses rapidly from monomers to some large aggregate that is the end product. If this is correct, there can be no equilibration or steady state condition over steps 1-9, by which thermodynamic control can prevail over kinetic control.34 Thermodynamically, the end product is an anisotropic phase whose dimensions are so large that surface effects are negligible,l 1 and it is the free energy state of this aggregate that drives the reaction. Practically, the physical size of each fibril is presumed to be limited by depletion of molecules that can diffuse to its surface. Once molecules are depleted from solution, further growth may involve the slow coalescence of thin fibrils to form thicker ones. For the purposes of comparison of pathways, it is probably sufficient to approximate the end point with an aggregate consisting of an arbitrary but definite number of molecules. Microfibrils are in the neighborhood of 50-100A in diameter and perhaps 15,000-20,000 A 10ng25-27; thus they contain 100300 molecules, which implies as many assembly steps, if molecules add one at a time. In order to estimate the energetics of numerous steps, one may approximate AFPi+, for an average step: As the assembly reaction proceeds, eventually the oligomeric structure reaches a degree of polymerization j at which the dimensions (x) and overall x value are only slightly altered by the addition of a new molecule. At such a point the standard free energy of conversion of oligomer j to a fibrillar end point, AFy-f, will be negligibly different from the analogous energy for oligomer j 1,AFytl-f. Then from equations analogous to (15) and (16), is

+

Thus, as a rough approximation for comparative purposes, pathways that join at some step (e.g., step 10 in Figure 5 ) can be arbitrarily extended by adding -7120 callmole ( = AF”,f) for each additional step. In the comparison cited above, noncooperative paths 1 and 4,additional steps beyond 9 are unnecessary in order to understand their relative thermodynamics. For noncooperative path 0, AF-itl approaches -2600 cal/mole and the overall free energy for 9

EARLY ASSEMBLY PATHWAYS

steps is -25,600 callmole. The analogue of Eq. ( 17) yields AF",_,,-2640 cal/mole, for all subsequent steps, and obviously this path is thermodynamically unfavorable, compared to paths leading to the native fibril. For the cooperative assumption after 9 steps, paths 1, 4, and 0 are not very distinct: Overall free energy of path 1, -25,500; of path 4, -28,300; and path 0, -24,600 cal/mole. In this case, because are depressed in the early steps for paths 1 and 4, it is necessary to extend the analysis beyond 9 steps. At 50 steps (adding -7120 cal/mole per step to paths 1 and 4 and -2640 cal/mole to path 0 ) ,the simulated overall free energy of each as follows: path 1, -317,000; path 4, -320,000; and path 0, -133,000 cal/mole. Again, path 0 is seen to be unfavorable, and paths 1 and 4 are similar. (These large energies can be divided by the number of steps; the overall pathway energy is then per mole of collagen monomer, and the values approach those for AFk-,). To choose between paths 1 and 4, kinetic factors must be considered. Referring to Eq. ( 2 ) , the preferred pathway kinetically will be the one with the lowest maximum In order to compare pathways, it is necessary to identify the least favorable step of each. For the noncooperative assumption, the slow step of path 1 is 3 l , -6050 cal/mole; for path 4, it is 14, +4450 cal/mole. An analogous situation exists for the noncooperative pathway. Under experimental conditions for fibril assembly, e.g., at 0.1mg/mL,35 AF=AFo+RTlnK

(18)

where K = ( uf,/up) at the specified conditions. The term RT In K is about +5700 cal/mole at 0.1 mg/ mL (up = 6.95 X uf, remains at 0.921; T = 302 K ) , and as a result, A F for each slow step grows even more unfavorable. Nevertheless, path 1 always presents the lowest barrier to assembly, compared to path 4. As noted above, AF-i+' constitutes only part of the energy barrier. The other components are activation energies and will be considered in turn. If the activation energies are all identical for slow steps in alternative pathways, then only A F - i + l will differ among competing pathways, and it is sufficient to compare that component. The probable components of the activation energy include ( 1 ) AF,,( 2 ) conformational changes of helical amino acid side chains or of EHP, and ( 3 ) a steric factor. A separate study will address these components of the activation energy in detail. In brief, it appears that ( 1 ) and ( 3 )

511

are identical for associations involving 1D and 4D alignment. Only in the case of component ( 2 ) is it deemed likely that differences exist. For 1D association, helices are apposed and side chains on each helix presumably reorient and extend outward from the helix in order to form hydrophobic, electrostatic, and other bonds with the neighboring molecule. The rotational barriers to these reorientations and extensions thus constitute an activation energy. For 4D association, there may be a corresponding reorientation of EHP and pairing side chains on the helix. The magnitude of the energy associated with such conformational changes is unknown at present. To the extent that this energy term is larger in 1D association, then 1D association could be less favorable than 4D association. In the absence of rigorous quantitation of these energies, one can only return to comparisons of AFi-i+l to propose preferred pathways. On the basis of AFi-i+lalone, one is forced to conclude that path 1 is the preferred one under both assumptions. Pathway Shunts

The 4' shunt has the same overall free energy as path 4. Thus, it is thermodynamically equivalent to path 4, but because of the higher barrier at step Z4', the whole shunt is less favorable than path 4 and, of course, path 1. Inspection of Figure 5 and Table V will confirm that there are no other pathways that are more favorable than path 1, with one exception. This is a path under the cooperative assumption consisting of the following steps in succession-1' , 2", 3b, 4b, 5I-thereafter continuing with path 1. This shunt has a slightly lower free energy for 9 steps than path 1, but its slowest step ( 3 b ) has AFy-i+l = -1840 cal/mole, which is faster (lower barrier) than the corresponding slow step (3', AFP-,,, = -1760 cal/mole) for path 1. It is noteworthy that this shunt exhibits 4D alignment in the third step ( 2 " ) along the pathway, indicating that 4D alignment can occur early in assembly, even if 1D association is initially preferred. Alternative Assumptions

Two variations of the noncooperative assumption were explored. In the first, ~ ~ ~ pairing ~ - 7 was3 elim~ inated, as depicted in Figure 1D. When this was done, the overall result was a lower energy barrier for the formation of the 4D dimer ( $1960 cal/mol; compare to column 2, Table V, for the previous version of the noncooperative assumption). AFy-i+l for

*" t I

5'

sI

EARLY ASSEMBLY PATHWAYS

the 1D dimer was less negative (-7600 cal/mol). Nevertheless, the overall conclusion was not changed path 4 was expected to be less favorable kinetically, compared to path 1. This result does suggest that part of the difference observed above between the noncooperative assumption and the cooperative assumption (Table V ) is due to the elimination of v231- ~ 3 pairing 1 in the latter. The remainder of the difference is due to the cooperative assumption per se. A second alternative tested the sensitivity of results to errors in experimental solubilities. Because the equilibrium solubility for pronase-digested collagen (433 pg/mL) is approximate,20,26 calculations were repeated under the condition that the equilibrium solubility was much higher: 800 pg/mL. This solubility, when introduced into the appropriate equations for phase equilibria analogous to Eqs. ( 5 ) and ( 6 ) [see Eqs. (10.) and (11)in Ref. 161, yielded x = 0.102 (in contrast to x = 0.107, used elsewhere in this work). For EHP regions of the molecule, the new values were XI = 0.380 and x3 = 1.85. When incorporated into a noncooperative model with no 77231-~31 pairing, the energy for formation of the 4D dimer (step 14)was +980 cal/mole, still unfavorable; that for the 1D dimer was -3010 cal/mole. Again, path 4 was unfavorable relative to path 1, based on AF!-i+l. In the extreme, it was estimated that if the solubility of pronase-digested collagen were as high as 3.9 mg/mL, then the standard free energy of formation of the 4D dimer would be zero. Thus, it appears that if pronase-digested (i.e., EHPfree ) collagen has an equilibrium solubility between 0.3 and 3 mg protein/mL at T = 302 K, I = 0.2, and pH = 7.2, the overall predictions of the model are unchanged. In other tests, the solubilities of pepsintreated and native collagens were varied twofold, and an additional set of values for XI and x3 were computed. When pathways were generated, the overall picture was again unchanged. A completely different approach, the constant X assumption, was also investigated. AFy-L+l were computed with x held at a constant value. The effect of this constraint is to make all oligomers identical in local, i.e., near neighbor, energetic interactions. Differences in aFipi+, then become more sensitive to the polymer configuration, expressed as x , the axial ratio.24It was of interest to see to what extent the pathway could be controlled by this factor alone;

513

furthermore, if the major features of the system were captured, a considerable saving in computational effort would result. The x was set at 0.175 for the monomer-to-fibril reaction; it was held constant at a slightly lower value, X = 0.17, for all oligomers. ThFse values provide a relatively high level of local energy favoring assembly. (Compare to x values actually calculated under the noncooperative assumption in Table I V ) . The for paths 1 and 4 under constant x were similar to those for the noncooperative assumption, and path 1 was preferred over 4, based on the free energy of the slowest step at 0.1 mg/mL (see Table V ) . However, since there was no identification of pairing elements, path 0 was even more favorable, both thermodynamically and kinetically (data not shown). Therefore, the constant x assumption leads to results that are contrary to experiment; i.e., the nematic crystallite is the preferred assembly product. Assembly from Oligomeric Intermediates

Three separate modes of assembly were explored, using the noncooperative assumption. In the first, the elemental building block was taken to be the 1D dimer. Two dimers associated to form a 1D-aligned tetramer, and higher oligomers ( hexamer, octamer, and decamer) were generated by adding 1D dimers in a stepwise fashion (Figure 7 and Table VI) . At step 5 this pathway merges with those depicted in Figure 5. When the overall free energy to reach the decamer was computed, it was very similar to that for paths 1 and 4.AF?-iCl for the slow step (2) was -3030 cal/mole, compared to -6050 callmole for step 3’ of path 1. Thus, this mode of assembly is not more favorable than path 1of Figure 5 and Table V, in which molecules, and not dimers, are successively added to the aggregate. In the second mode of assembly, the 4D dimer was successively added to the aggregate. The 6 steps depicted in Figure 7 generate a pathway that merges with those of Figure 5, and the overall free energy to reach the decamer was slightly greater than for paths 1 and 4. AF!-i+l for the slow step (1’)was +4450 cal/mole, identical to path 4.Therefore, this pathway is as unfavorable as path 4, in which monomers are successively added, and less favorable than path 1. Third, the proposal of Silver4 and Trelstad and Silver 36 was tested by computing pathways utilizing

Figure 7 . Assembly pathways involving oligomeric building blocks. A Stepwise addition of 1D dimers. B: Stepwise addition of 4D dimers. C: Stepwise addition of 4D-8D linear trimers.

514

WALLACE

Table VI Standard Free Energies of Reaction for Assembly by Stepwise Addition of Oligomers, as Depicted in Figure 7" 4D Dimer

Unit

Step ID Dimer Step 1 2 3 4 5

a

-9320 -3030 -4040 -3610 -3420

1' 2' 3' 4' 5' 6'

Step 4D-8D Trimer +4450 -26900 -19400 -19400 -8710 -9150

1" 2" 3" 4" 5"

+4450 $5180 -45300 -32600 -34000

Noncooperative assumption only. Energies in cal/mole.

the 4D-8D linear trimer as a unit of assembly. In this case, the computed steps generated a dodecamer, and path 4 was extended to provide a comparison at this point. Nevertheless, the overall free energy to reach the dodecamer was similar to path 4 with monomer addition. The slow step (2") was, of course, very unfavorable, both under standard conditions and at 0.1 mg/mL. This mode of assembly is also not preferred over path 1.

DISCUSSION The theory of Matheson and Flory describes phase equilibria of rods to which appendages are added.6 If both appendages and central rod have identical X factors, the tendency for model rods to form fibrils is decreased, compared to rods without appendages. In the case of the rod-like collagen helix, the reverse is true: the presence of EHP appendages increases the tendency to form fibril^.^^^'^^'^^^^ The discrepancy between theory and experiment is resolved if drastically differing x factors are assigned to helix and EHP." The X factors assigned to EHP strongly favor molecular association ( X > 0.4), and this nearest neighbor effect favoring fibril formation counteracts the strictly configurational forces, which disfavor assembly (as appendages are added) .6,24 Given that the model requires a high energy association between EHP and some complementary region in the fibril, what is the nature of the EHP association? Because i t is known that collagen molecules pack in the fibril in staggered alignmentI7 and because molecules in the fibril are physically separated from each other head-to-tail (due to the gap region; see Fig. 10 of Ref. 28), it is inferred that the associative interaction is not head to tail. That is, the CTEHP of molecule (1)does not bond with the NTEHP of molecule ( 6 ) , which is directly be-

hind it. Instead, the EHP in the fibril are expected to be adjacent to helical domains of neighboring molecules and to bond with helical subdomains, which presumably are complementary to the EHP. Thus, based on the Flory theory, on experimental data from collagen solubility, 13~1920 and on models and data concerning packing structure of fibrils, 32 one is led to invoke distinct modes of pairing between EHP and the helix. Starting from an independent set of data and models, the same notion has been proposed by several groups, as already cited in the I n t r o d ~ c t i o n . ~ -Thus, " ~ ~ although there is some ambiguity about the precise size and number of helical EHP-pairing domains, several lines of evidence support their existence. In this paper the identification and quantitation of putative complementary pairing domains was an integral part of the approach for computing composite x factors, and, ultimately, free energies of stepwise assembly. The computations were carried out under several different assumptions: ( 1) cooperativity or noncooperativity, ( 2 ) presence or absence of specific pairing domains, ( 3 ) addition of molecules one at a time or as oligomeric building blocks, and ( 4 ) different equilibrium solubilities of collagen species. All of the assumptions yielded the same general picture: Unless activation energies are overriding, 1D association appears to be favored over 4D association in the first step. However, addition of molecules 4D to the growing aggregates can occur as early as trimers and pentamers. The role of 4Ddimers and 4D-8D trimers as important intermediates is not supported by the model. The only other way, in the context of the model, by which 4D-dimers and 4D-8D trimers could prevail is if EHP-helix pairing were highly favorable and helix-helix pairing were unfavorable. If one admits that collagen devoid of E H P can form fibrils at protein concentrations below 3-4 mg/mL and that pronase-treated collagen in fact represents such a species, then helix-helix association is favorable, since the computed x factor is positive (near 0.1 ) . Therefore, the best argument in favor of 4D and 4D-8D species over 1D still appears to be differential activation energies of 4D vs 1D association, as discussed above and in a separate study (in progress). It is worth noting that experimental data on 4D alignment of m ~ l e c u l e s ' ~is~ ~ ~ ' not necessarily in conflict with path 1and early 1D alignment. That molecules are cross-linked 4D in mature collagen fibrils does not prove that this arrangement of molecules is the earliest step during assembly. Cross-linking of 4D-aligned molecules could occur after fibrils have formed. Further findings are the following: Slow steps are

EARLY ASSEMBLY PATHWAYS

identifiable in all pathways, and the oligomeric species at such steps may be considered as nuclei. If path 1 or a related sequence of events is followed, only primary nucleation is predicted. Path 4, including 4', contains two barriers, in which step l4 could be interpreted as primary nucleation and 24' as secondary nucleation. Previous analyses of collagen assembly kinetics utilized classical nucleationgrowth theory. Both types of nucleation (primary only vs primary and secondary) were found to be roughly consistent with experimental data.13-16,37 Classical theory provides methods for discriminating between the two mechanisms," but in the abovementioned studies such methods were not applied. The results of the current study provide a motivation for a reexamination of classical theory in this context. In principle, the method presented above allows the simulation of fibril assembly to any desired level of detail and to any desired size of fibril. Further exploration of this model system may shed light on the relative extent of linear and lateral assembly, and the possible existence of hierarchies of fibrillar structure.38

REFERENCES 1. Veis, A. & Payne, K. (1988) in Collagen, Vol. I, Nimni, M., Ed., CRC Press, Boca Raton, FL, pp. 114-137. 2. Helseth, D. L., Lechner, J. H. & Veis, A. (1979) Biopolymers 19, 3005-3014. 3. Capaldi, M. J. & Chapman, J. A. (1982) Biopolymers 21,2291-2313. 4. Silver, F. H. (1981) J. Biol. Chem. 256,4973-4977. 5. Otter, A., Scott, P. G. & Kotovych, G. (1988) Biochemistry 27,3560-3567. 6. Matheson, R. R. & Flory, P. J. (1981) Macromolecules 14,954-960. 7. Binsbergen, F. L. (1970) Kolloid 2. 2. Polym. 237, 289-297. 8. Binsbergen, F. L. (1973) J. Polym. Sci. Polym. Phys. Ed. 11,117-135. 9. Jencks, W. P. (1969) Catalysis in Chemistry and E n zymology, McGraw-Hill, New York, p. 612. 10. Turnbull, D. & Fisher, J. C. (1949) J. Chem. Phys. 17,71-73. 11. Hoffman, J. D., Weeks, J. J. & Murphey, W. M. (1959) J. Res. Natl. Bur. Stds. 63A, 67-98. 12. Mandelkern, L. (1964) Crystallization of Polymers, McGraw-Hill, New York, pp. 273-285.

515

13. Wallace, D. G. & Thompson, A. (1983) Biopolymers

22, 1793-1811. Wallace, D. G. (1990) Biopolymers 29, 1015-1026. Wallace, D. G. (1985) Biopolymers 24,1705-1720. Wallace, D. G. (1990) Biopolymers 30,889-897. Piez, K. A. ( 1984) in Extracellular Matrix Biochemistry, Piez, K. A. & Reddi, A. H., Eds., Elsevier, New York, p. 18. 18. Otter, A., Kotovych, G. & Scott, P. G. (1988) Biochemistry 28,8003-8010. 19. Cooper, A. (1970) Biochem. J. 118,355-365. 20. Helseth, D. L. & Veis, A. (1981) J. Biol. Chem. 256, 7118-7128. 21. Wall, F. T. (1958) Chemical Thermodynamics, W. H. Freeman, San Francisco. 22. Lehninger, A. L. (1975) Biochemistry. Worth, New York, p. 236. 23. Lee, B. & Richards, F. M. (1971) J. Mol. Biol. 55, 379-400. 24. Flory, P. J. (1953) Principles of Polymer Chemistry, Cornell University Press, Ithaca, NY. 25. Smith, J. W. (1968) Nature 219, 157-158. 26. Comper, W. D. & Veis, A. (1977) Biopolymers 16, 2113-2131. 27. Trelstad, R. L., Hayashi, K. & Gross, J. (1976) Proc. Natl. Acad. Sci. USA 73, 4027-4031. 28. Nimni, M. E. & Harkness, R. D. ( 1988) in Collagen, Vol 1, Nimni, M. E., Ed., CRC Press, Boca Raton, FL, p. 14. 29. Na, G. C. (1989) Biochemistry 28, 7161-7167. 30. Schmitt, F. O., Gross, J., & Highberger, J. H. (1953) Proc. Natl. Acad. Sci. U S A 39,459-470. 31. Zimmerman, B. K., Pikkarainen, I., Fietzek, P. P. & Kuhn, K. (1970) Eur. J. Biochem. 16,217-225. 32. Fraser, R. D. B., MacRae, T. P. & Miller, A. (1987) J. Mol. Biol. 193, 115-125. 33. Klotz, I. M. (1967) Energy Changes in Biochemical Reactions, Academic Press, New York, p. 58. 34. Hine, J. (1962) Physical Organic Chemistry, McGrawHill, New York, p. 69. 35. Williams, B. R., Gelman, R. A., Poppke, D. C. & Piez, K. A. (1978) J. Biol. Chem. 253,6578-6585. 36. Trelstad, R. L. & Silver, F. H. ( 1981) in Cell Biology of Extracellular Matrix, Hay, E. D., Ed., Plenum Press, New York, p. 184. 37. Cassel, J. M., Mandelkern, L. & Roberts, D. E. (1962) J. A m . Leath. Chem. Assoc. 57, 556-575. 38. Baer, E., Cassidy, J. J. & Hiltner, A. (1988) in Collagen, Vol. 2, Nimni, M., Ed., CRC Press, Boca Raton, FL, pp. 177-200.

14. 15. 16. 17.

Received May 28, 1991 Accepted November 20, 1991

Early assembly pathways of type I collagen.

A method was developed for computing the free energy (delta Fi) of aggregates of type I collagen. The method was based on a treatment of Matheson and ...
2MB Sizes 0 Downloads 0 Views