J. Mol. Biol. (1977) 112, 535-542

The Protein D a t a Bank: A Computer-based Archival File for Macromolecular Structures The Protein Data Bank is a computer-based archival file for macromolecular structures. The Bank stores in a uniform format atomic co-ordinates and partial bond connectivities, as derived from crystallographic studies. Text included in each data entry gives pertinent information for the structure at hand (e.g. species from which the molecule has been obtained, resolution of diffraction data, literature citations and specifications of secondary structure). In addition to atomic co-ordinates and connectivities, the Protein Data Bank stores structure factors and phases, although these latter data are not placed in any uniform format. Input of data to the Bank and general maintenance functions are carried out at Brookhaven National Laboratory. All data stored in the Bank are available on magnetic tape for public distribution, from Brookhaven (to laboratories in the Americas), Tokyo (Japan), and Cambridge (Europe and worldwide). A master file is maintained at Brookhaven and duplicate copies are stored in Cambridge and Tokyo. In the future, it is hoped to expand the scope of the Protein Data Bank to make available co-ordinates for standard structural types (e.g. ~-helix, RNA double-stranded helix) and representative computer programs of utility in the study and interpretation of macromolecular structures. The Protein Data Bank~ (1971,1973) was established in 1971 as a computer-based archival file for macromolecular structures. The purpose of the Bank is to collect, standardize, and distribute atomic co-ordinates and other data from crystallographic studies. As the number of solved protein and nucleic acid structures has grown to the point where some 10 v characters are necessary to represent the co-ordinate information currently held, the need for such a computer-readable file has become very clear, and demands for the Bank's services have increased accordingly. The Protein Data Bank is one of several data base activities in the field of crystallography, e.g. the Bibliographic (Kennard et al., 1972) and Structural (Allen et al., 1973) D a t a Files for organic and organometallic compounds, the Atlas of Macromolecular Structure on Microfiche (AMSOM) (Feldmann, 1977), the Bond Index to the Deterruination of Inorganic Crystal Structures (BIDICS):~ and the Powder Diffraction File. §

(a) Scope The Protein Data Bank covers atomic co-ordinates, structure factors and phases from diffraction studies of macromolecules. Since most of this information is not generally published in the primary literature, the Bank depends for comprehensiveness on data supplied directly b y the investigators. I t is essentially a depository of data, held in computer-readable form, in contrast to other data banks t h a t are based t Protein Data Bank is a misnomer of historical origin, since the file now contains entries for a nucleic acid. I. D. Brown, Bond Index to the Determination of Inorganic Crystal Structures, MeMaster University, Hamilton, Ontario, Canada, L8S 4M1.

§ American Society for Testing Materials, 1916 Race St., Philadelphia, PA. 19103, U.S A. 535

F. C. B E R N S T E I N E T A I , .

636

TaBL~ 1

Protein data bank holdings IDENT CODE IADK IADH 2RDH 2CHQ 3CHA l FAB IREI 1CPV 2CPV 3CPV 1CAB ICQC ICPA ICHG 2CHA 3CNA tBSC ICYT 2CYT ICYC IC2C 155C IEST IFDX IFXH IGCH IGPD 2MHB 1DHB 1HHB | FDH ILHB IYHX IHIP 2LDH 36OH ILYZ 2LYZ 3LYZ 4LYZ 5LYZ 6LYZ IMDH IMBH 2PBH 3MBH 3PT I 8PAP 2PAP 3PAP 4PAP 5PAP GPAP 7PAP IPGK 2PGK IPAB IRNS 2R>@I ISNS ISGB ISBT 2SBT 1SOD ITLN 2TLN 1SRX 1"ilia 2THR 3TNA ITIM IPTN 2PTB IPTC

MOLECULE

DEPOSITOR

ADEHYLATE KIHASE G. SCHULZ ALCOHOL DEHYDROGENASE (ADP-RIB) C . - I . BRAHDEH ALCOHOL DEHYDROGENASE (ORTHOPHEN) C . - I . BRAHDEH ALPHA-CHYMOTRYF'SlN (TOSYL) D. BLOU ALPHR-CHYMOTRYPSIH A, TULIHSKY ANTIGEN BINDING FRAGIIZHT (NEll) R. POLJRK BENCE-JOHES II~HOGLOBULIH RE1 O. EPP, R. HUBER CALCIUM-BIHDIHG PARVALBUMIN SET 6A R. KRETSIHGER CALCIUM-BINDING PARVALBUMIH SET 6H R. KRETSlHGER CALCIUM-BINDING PQRVALBUMIH SET El R. KRETSINGER CARBOHIC ANHYDRASE B K. KANNAN CARBONIC RNHYDRASE C K. KAHNAN CARBOXYPEPTIDASE A II. LIPSCOMB CHYHO1RYPSIHOGEH J. KRAUT COHCAHAVALIN A G, REEKE. G. EDE/..~IAH COHCAHRVALIN R K, HRRDI'tClH CYTOCHROHE B5 F, S. MATHEhlS CYTOCHROME C (ALBACORE. OXIDIZED) R. DICKERSOH CYTOCHROME C (ALBACORE, REDUCED) R. DICKERSON CYTOCHROHE C (BONITO, HEART) M, KAKUDO CYTOCHROME C2 J. KRQUT CYTOCHROHE C550 R. TIMKOVICH ELASTASE H. UATSON FERREDOXIH L. JENSEN FLAVODOXIN (CLOSTRIDIUM MP) M, LUDQIG GA~'~-CHYMOTRYPSIN COHEH,DAVIES,SILVERTON GLYCERRLDEHYDE-3-P-DEHYI)ROGEHRSE(LOBSTR) M. ROSSI'IAHH HEMOGLOBIN (HORSE, AOUO MET) LADNER, HEIDHER, PERUTZ M. PERUTZ, G. FERMI HEMOGLOBIN (HORSE, DEOXY) HEMOGLOBIH (HUHAIIo DEOXY) M. PERUTZ, G. FERMI HEMOGLOBIH (HUMRH. FETAL. DEOXY) J. FRIER HEMOGLOBIH (LAI'PREY) W, HEHDRI CKSOH HEXOKINASE (YEAST) B I I I T, STEITZ HIGH POTENTIAL IRON PROTEIH J. KRRUT LACTATE DEHYDROGEHASE M. ROSSMQHH LACTATE DEHYDROGEHASE/NAD/PYRUVATE M. ROSSMRHH LYSOZYME (HEN EGG-UHITE, SET U2) R, D I AMOND LYSOZYME (HEN EGG-UHITE, SET RS5D) R. DI AMOHD LYSOZYME (HEH EGG-UNITE, SET RS6Q) R. D IAMOHD LYSOZWE (HEH EGG-I~IITE, SET RsgA) R. DIQMOND LYSO7YME (HEN EGG-WHITE, SET RSI2A) R. D IAr~HD LYSOZYI'E (HEN EGG-WHITE, SET RS16) R. D IAMOHD MALATE DEHYDROGEHASE L. BAHASZQK MYOGLOBIH (SPERM WHALE) H. WATSON MYOGLOBILI (SPERM UHALE, MET) T. TAKAHO MYOGLOBIH (SPERM UHALE, DEOXY) T. TAKRHO PAHCREATIC TRYPSIN INHIBITOR R. IiUBER PAPAIN. NATIVE J . DRENTH PAPAII'I (ACE-ALA-ALA-PHE-ALA, CYS-25) J. DRENTH PAPAIH (CYS DERIV OF CYS-25) J. DRENTH PAPAIN (OXIDIZED CYS-25) J. DREHTH PAPAIH (TOS-LYS, CYS-25) J. DREHTH PRPAItl (BZOXY-GLY-PHE-GLY, CYS-25) J. DREHTH PAPA IH (BZOXY-PNE-ALR, CYS-25) J. DREHTH PHOSPHOGLYCERATE KINASE (YEAST) H. IJATSOI'I PHOSPIIOGI.YCERATE KIHASE (HORSE) P. EVANS. D, PHILLIPS PREALBUMIH (HUI~H, PLASI'R) S. OATLEY, D. PHILLIPS R IBOHUCLEASE S H. gYCKOFF RUBREDOXIN L. JENSEN STAPHYLOCOCCAL NUCLERSE F. A. COTTON, E. HRZEN STREPTOMYCES GRISEUS PROTEINASE B h. JAMES SUBTIL ISlH BPN" J . KRAUT SUBTIL IS IN NOVO J . DRENTH SUPEROXIDE b ISMUTASE J. AND D. RICHARDSON THERHOLY5 IN (UNREFINED) B, I~TTHEllS THERMDLYSIN (REFINED) B. HFITTHEI~3 THIOREDOXIN B . - O . SODERBERG TRANSFER RNA (YEAST, PHE) J. SUSSI"IANo S . - H . KIM TRANSFER RNA (YEAST, PHE) M. SUHDARALINGRM TRANSFER RHA (YEAST, PHE) JACK, LADHER, KLUG TRIOSE PHOSPHATE ISOMERASE I . IJILSON, D. PHILLIPS TRYPSIH (HRTI~A~, PHO) FEHLHAMI'ER• BODE. SCHUAGER TRYPSIN(BEHZAMIDIHE INHIBITED, PHT) FEHLHAYtlER • BODE, SCHWRGER TRYPSIII/TRYPSIN INHIBITOR COHPLEX BODE ET RL.

STATUS CODES BLANK R B D H P R

STATUS CODE

STANDARD ENTRY AVAILABLE FOR DISTRIBUTION ALPHA CARBON ATOMS ONLY BACKBONE BNLY NEll DATA HAS BEEN PROMISED NEll EHTRY WITH DEPOSITOR FOR APPROVAL IN PREPARATION REPLACES AH OUT OF DATE PARAMETER SET

P N RP

B PD PD P P P P P P A

R R

R B HI) A R R H P P

N RN H

L E T T E R S TO T H E E D I T O R

537

on data abstracted from scientific publications. The B a n k contains 77 atomic coordinate entries for 47 macromolecules (Table 1),~ and 13 sets of structure factors and phases. The atomic co-ordinate entries, which include descriptive t e x t and partial bond connectivities, conform to a uniform f o r m a t (see below), but the structure factors and phases are stored in the f o r m a t received from depositors. All co-ordinate entries are referred to depositors for verification, before being made available publicly through the Bank. (b) Record structure of atomic co-ordinate entries Atomic co-ordinate entries consist of records each of 80 characters.$ Using the punched card analogy, columns 1 to 6 contain a record type identifier, and columns 7 to 70 contain data. § Columns 71 to 80 are normally blank, but m a y contain sequence information which is added b y the library-file m a n a g e m e n t program U P D A T E ¶ used to maintain the file on the Brookhaven CDC C Y B E R 70/76 computing system. I n order to facilitate retrieval of d a t a from the file, the first four characters of each record define the unique record type, and the syntax of each record is independent of the order of records within a n y entry for a particular macromolecule. (In the master file, this order is always fixed.) Atomic co-ordinate d a t a contributed b y depositors are processed into the standard f o r m a t with program I~IACi~OL, Hwhich also subjects the d a t a to certain nomenclature and connectivity checking procedures. A sample partial e n t r y for the protein ribonuclease S is shown in Table 2 . ~ The unique code 1RNS identifying this entry is given in the H E A D E R record, along with the date these data were entered into the Bank, and a provisional classification based oa function, intended for future use in indexing and subdividing the file. T e x t giving the name of molecule, species from which it has been obtained, authors, literature citations, and other general description are presented in records G01~IPND through RElVIARK. SEQRES gives the amino acid sequence, and F T I ~ 0 T E records are footnotes keyed to particular residues or atoms. Records H E L I X through T U R N describe the secondary structure as stated or approved b y the depositor. Record CRYST1 defines the unit cell, while 0 R I G X and SCALE respectively give transformations relating the orthogonal l~mgstrSm co-ordinates stored in the file to those originally supplied b y the depositor (these frequently are referred to an oblique or non-isometric system) and to standard crystallographic fractional co-ordinates. A T O m records give the I U P A C - I U B (1969) standard a t o m names (IUPAC-IUB, 1970), and residue abbreviations (IUPAC-IUB, 1971), along with sequence identifiers (cf. SEQRES, above), co-ordinates in ~mgstrSm units, and occupancies and thermal t In addition to current co-ordinate entries shown in Table 1, the Bank contains obsolete entries (for adenylate kinase tosyl, ¢¢-ehymotrypsin, coneanavalin A, lactate dehydrogenase, horse methemoglobin, papain, rubredoxln, benzamidine-inhibited trypsin and pancreatic trypsin inhibitor), which have been superseded by later, more accurate data. These obsolet~e data are available on special request. Originally, the Bank used a 140-character format, similar to that employed in the protein refinement programs of Diamond (1966,1971). The 140-character format has been superseded by the 80-character format. § A detailed description of the file formats is available from Brookhaven on request. ¶ Control Data Corporation, UPDATE Reference Manual, Publication 1~o. 60342500, Control Data Corporation, Arden Hills, Minnesota, 1974. I[ G. J. B. Williams, unpublished. For the 140-character data, program PROIN by E. F. Meyer wa~ utilized. t j- The file is organized in a similar way for proteins and nucleic acids, although certain differences exist, e.g. with regard to details of atom and residue names.

F. C. B E R N S T E I N

ET

AL.

TA.BL:E2

Abbreviated sam1~le atomic co-ordinate entry (ribonuclease S) HEADER COMPNO SOURCE AUTHOR JRNL JRNL JRNL REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REHARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK SEQRES SEORES SEORES SEORES SEORES SEORES SEORES SEORES SEORES SEORES FTNOTE FTNOTE FTNOTE FTNOTE FTNOTE FTNOTE FTNOTE FTNOTE HELIX HELIX

HYDROLASF (aHOSPHORIC DIESTER, R N A ) 01-APR-73 IRNS RIBONUCLFASE-S ( B . C . 3 . 1 . 4 . 2 2 ) BOVINE (mnS TAURUS) PANCREAS F . M. RI~HAqDS AND H. ~. WYCKOFF R . J . FLETTEQICK AND H. W. WYCKOFF, PRELIMINARY REFINEMENT OF PROTETM cOORDINATES IN REAL SPACE* ACTA CRYST., VOL. A31+ P698 ( 1 9 T q ) . 1 1REFERENcF 1. F . M. RICHARDS AND H. M. WYCKOFF+ ATLAS OF 1 $TRUCTIIDES FOR MOLECULAR BIOLOGY+ VOL. 1. RIBONUCLEASE-S+ 1 CLARENnnM PRESS ( 1 9 7 3 ) . 1REFERENcp 2e F. M. RICHAROS AND H. W. WYCKOFF+ BOVINE 1 PANCREATTC RIBONUCLEASE+ THE ENZYMES+ EDITED BY P. O. 1 BOYER, vOLe IV+ THIRD EDITION+ P6~7+ ACADEMIC PRESS (1971) 1REFERENcF 3 . F . q . RICHARDS+ H. We WYCKOFF. W. 6 . CARLSON+ l N. M. ALL,WELL+ B. LEE AND Y . M I T S U I , PROTEIN STRUCTURE+ 1 RIBONU~LEASE-S AND NUCLEOTIOE INTERACTIONS, COLD SPRING 1 HARBOR ~yMPOSIA ON QUANTITATIVE BIOLOGY, VOL. XXXVI+ P35 1 (1971). I REFERENcF 4. N. M. ALLEWELL ANO H, W. WYCKOFF+ 1 CRYSTALLOGRAPHIC ANALYSIS OF THE INTERACTION OF CUPRIC 1 ION MITq RIBONUCLEASE S, J . 8IOLe CHEM.+ VqL. 246, P~657 (1971). 1 I REFEREN~F B. H. W. WYCKOFF+ D. TSEHNOGLOU+ A. W. HANSON, 1 J . R. xNOX, B. LEE AND F. M. RICHARDS, THE THREE1 DIMENST~NAL STRUCTURE OF RIBOMUCL~ASE-S. INTERPRETATION 1 OF AN FLECTRON DENSITY MAP AT A NOMINAL RESOLUTION OF 2 1 ANGSTRnNs. J . B I O L . CHEM.+ VOL. 2~5+ P30S ( 1 9 7 0 ) . 1 REFEREN~ 8. H. W. WYCKOFF+ K. D. HAROHAN, N. N. ALLE~ELL, 1 T. INARA~I+ 0 . TSERNOGLOU+ L* N. JOHNSON ANO F , M. 1 RICHARDq+ THE STRUCTURE OF RIBONUCLEASE-R AT 6 ANGSTROM 1 RESOLUTTON+ J . B I O L . CHEM., VOLe 2~Z+ P3749 ( 1 9 6 7 ) . 2 2 RESOLUTInN. 2 . 0 ANGSTROMS. 3 3 REFINEMENT. BY A STEEPEST-DESCENTS PROCEOUREe REFER TO THE 3 JRNL CTTATION ABOVE. 4 4 THIS CO~DDINATE SET IS DESIGNATED bO BY THE OEPOSITOR. 5 5 THE *S-DEPTIDE o (RESIOUES 1-2 0 ) WHICH FORMS A SEPARATE 5 CHAIN F~nq THE REMAINOER OF THE MOLECULE IS GIVEN THE 5 CHAIN I n F ~ T I F I E R S. 1 S 20 LYS GLU THR ALA ALA ALA LYS PHE GLU ARG GLN HIS MET 2 S 20 AS~ SER SER THR SER ALA ALA 1 104 qE~ SER SER ASN TYR CYS ASN GLN MET MET LYS SER ARG 2 104 ~SN LEU THR LYS ASP ARG CYS LYS PRO VAL ASN THq PHE 3 104 vAL HIS GLU SER LEU ALA ASP VAL GLN ALA VAL CYS SEN 6 104 ~LN LYS ASN VAL ALA CYS LYS ASN GLY GLN THR ASN CYS 5 104 TYR GLN SER TYR SER THR M~T SER ILE THR ASP CYS ARG 6 104 RLU THR GLY SER SER LYS TYH PRO ASH CYS ALA TYR LYS 7 104 THR THR GLN ALA ASN LYS HIS I L E ILE VAL ALA CY5 GLU 8 104 RLY ASN PRO TYR VAL PRO VAL HIS PHE ASP ALA SEN VAL 1 1 THE MAIM CHAIN AND MOST OF THE ASSOCIATED SIDE CHAINS ARE 1 NOT WELL.DEFINED IN THE REGIONS OF RESIDUES Z, 6 5 - 7 2 AND 1 119-123. 2 2 THE MAIN CHAIN IS VERY POORLY OEFINEO OR NOT V I S I B L E AT ALL 2 IN THE rLECTRON DENSITY MAP IN THE HEGIONS OF RESIDUES 1, 2 1 B - 2 0 , 2 1 - 2 3 AND 12&. 1 H1THR q 3 MET S 13 1 2 H2 ASN 26 ASN 34 1

LETTERS

TO THE

TABLE HELIX SHEET SHEET SHEET SHEET SHEET SHEET SHEET TURN TURN TURN TURN CRYSTI ORIGXI ORIGX2 ORIGX3 SCALE1 SCALE2 SCALE3 ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOq ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM

ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM TER COMECT CONECT CONECT CONECT CONECT COMECT CONECT CONECT MASTER EMO

EDITOR

539

2--continued

56 1 3 H3 SER 50 ALA ¢8 0 1 51 3 Lv~ 41 HIS 84 87 -1 N ASN 64 0 CYS 2 S1 3 MFT 79 THR 100 104 -1" N ASP 83 0 THR 3 $1 3 ALA 96 LYS 64 0 1 $2 4 LY~ 61 ALA 7~ 75 - 1 N VAL 63 0 CYS 2 $2 4 Aq~ 71 SER 108 1il -1 N TYR 73 0 VAL 3 $2 4 HI~ 105 OLU 118 124 -1 O ALA 109 N VAL 4 S2 4 V~ L 116 VAL 57 P S E U D O3 / 1 0 HELIX 1 TI VAL 54 VAL 59 P S E U D O3 / 1 0 HELIX 2 T2 ALA 56 SER 8ETW SIRNOS 1 , 2 OF SHEET $2 68 3 T3 CY5 65 GLY END OF 5TRANO 2 OF SHEET S1 90 4 T4 THR 87 SER 44.~50 97.150 90.00 9 0 . 0 0 1 2 0 . 0 0 P 31 2 1 6 44,650 l,O000On 0.000000 0.000000 0.000000 0.00000~ 1,000000 0.000000 0.000000 0.00000~ 0.000000 1.000000 0.000000 .0223o6 ,012931 0.000000 0,000000 0.0000~0 ,025861 0.000000 0.000000 O.O000OO 0.000000 .010293 0,000000 1 N Ly~ S 1 -15.394 7,914 20.202 1.00 0,00 2 2 CA Ly~ S 1 -15.145 7.636 18.730 1.00 0,00 2 3 C Lyq R 1 -14.982 6.107 18.763 1.00 0.00 2 4 0 Lyq S 1 -15.145 5.351 19.732 1.00 0.00 2 5 CB Lv~ ~ 1 -13,872 8,244 18.185 1,00 0,00 2 6 CO Lyq ~ 1 -12.693 7,654 18.794 1.00 0.00 2 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947

948 949 950 951 952 953 95¢ 196 312 448 498 549 644 729 844

N Acp CA ACD C AMp 0 A~p C8 AMp CG AMp 001A~o 002 A~p N Ai.a CA A I s C ALa 0 AI.A C8 ALA N 5~R CA Sro C SFp 0 SFQ CB SF~ OG S ~

N

VAL

CA C 0 CB col CO2 OXT

VAL

195 311 447 497 498

196 312 448 36

VAL Vat. VAL VAL

VAL VAL VAt. ~46 7~9 ~44 ~A9 ¢4R 643 778 R~3 In

-6.795 -9.247 -5,813 -9.425 -6,217 -10.156 -5.828 -9,850 -4.529 -10.015 -3,471 -9.503 -3.320 -8.082 -2,718 -10.333 -7.049 -11.201 -7,865 -12.086 -8.554 -13.331 -8.495 -13.636 -6.991 -12.510 -8.885 -13.915 -9.758 -15.155 -8.915 -16.127 -8.372 -15.812 -10.877 -14.659 -10.157 -14.035 -8.845 -17.415 -8,591 -18,490 -9.235 -18.381 -8.580 -17,735 -8.937 -19,929 -9.135 -20.905 -7.784 -20,573 -10.419 -19.165

121 121 121 121 121 121 121 121 122 122 122 122 122 123 123 123 123 123 123 124 124 124 124 124 124 124 124 124

0

3

7

4

0

7.034 5.935 4.789 3.652 6.6~8 5.687 5.636 4.799 5.013 4.084 4.724 5.925 2.881 3.717 3.627 2.880 1.810 2.597 1.530 3.438 2,596 1,209 .377 3.162 2.012 4.226 1.046

6

952

1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 I.O0 1.00 1,00 1.00 1.00 1.00 1,00 1.00 1.00

2

0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0,00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0,00 0.00 0.00 0,00 0.00 0.00 0.00 0.00 0.00 0.00

8

1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 Z 2

10

540

F.C.

ET

BERNSTEIN

AL.

motion factors, ff these latter data are provided. Within each residue, atoms are ordered in a standard manner, starting with the backbone ( N - - C a - - C - - O ) and proceeding in increasing remoteness from the alpha carbon atom along the side-chain. Ca has been encoded CA, C~ as CB, etc. Where the sequence is known, but atoms have not been located in the structure analysis, gaps have been left in the atom serial numbers, to allow for future insertion. A T E R record denotes an explicit chainterminating residue. CONECT records give bond connectivity, for all atoms where the covalent connectivity is not specified completely by the atom name and order of serial numbers within the entry (i.e. the primary structure of standard residues). CONECT records m a y also be used to denote hydrogen bond and salt bridge interactions. Each entry is terminated with a MASTER record, which gives checksums of the number of records, broken down by record type, and an E N D of data record. (c) Services The activities of the Protein Data Bank are to collect and standardize data from laboratories engaged in the analysis of macromolecular structures, and to distribute these data within the scientific community. As a service to depositors, data are checked for errors of a clerical nature but no exhaustive verification is attempted. The Data Bank is located at Brookhaven National Laboratory and input of data to the Bank and general maintenance functions are carried out at Brookhavent. TA.BLE

3

Protein Data Bank activities

Year

Co-ordinate entries held at end of year

1973 1974 1975 1976

15 17 40 69

Co-ordinate entries distributed Brook- Camhaven bridge Tokyo Total 106 99 513 1920

30 102 241 600

----

246

136 201 754 2766

Laboratories receiving data Brook- Camhaven bridge Tokyo Total 14 14 31 47

2

--

6

--

6

--

14

9

16 20 37 70

Duplicate copies of the Brookhaven master file are maintained at Cambridge and Tokyo. Data are available on magnetic tape for public distribution, from Brookhaven$ (to laboratories in the Americas), Tokyo (Japan), and Cambridge (Europe and worldwide). The data are also available in a limited way within the United States over the Crystallographic Computing Network (Koetzle et al., 1975). Retrieval programs to access data in the file have been described (Meyer, 1974; M. Tasumi, personal communication). These efforts represent modest initial attempts at interactive access: up to the present the Protein Data Bank has served ahnost exclusively the demands of off-line users. A statistical summary of the Bank's activities is given ill Table 3. These statistics show rapid growth, both in number of holdings and in requests filled. Details of operations are announced periodically in a Newsletter (current issue is number 3, November 1976). Copies may be obtained from Brookhaven. Requests should be accompanied with a new 2400 ft reel of magnetic tape, and a cheek or purchase order for U.S. $34.30 made to the order of Brookhaven National Laboratory, to cover postage and handling. This charge is subject to change in the future.

L E T T E R S TO T H E E D I T O R

541

(d) Future developments In the future, it is hoped to expand the scope of the Protein Data Bank to make available co-ordinates for standard structural types (e.g. ~-helix, RNA doublestranded helix) and representative computer programs of utility in the study of macromolecular structures. A small number of programs will be written to calculate useful quantities derived from the co-ordinates (namely torsion angles, wire-model bender angles, full covalent connectivities, etc.). In addition to this software, the Bank will undertake to distribute contributed programs, provided that documentation is deposited in machine-readable form with source code. t In order to facilitate use of atomic co-ordinate data to assemble models from standardized parts, it is intended to offer a "model-builder's kit", to consist of header information, a compact list of co-ordhlates to 0.1 A precision, and torsion angles. This Idt would be distributed principally in the form of printed listings, but also would be available on magnetic tape. As the size of the Bank grows, possibilities for wide interactive access to the atomic co-ordinate data via a network will become more attractive. This type of access would increase the file's utility, particularly to those outside the fields of crystallography and analysis of protein and nucleic acid conformations, by removing the necessity for repeated development of retrieval programs. For example, graphics terminals could be used to draw pictures of macromolecules for research or instructional purposes. Network access will be a topic of continued exploration in the next few years. Suggestions regarding possible future improvements in Protein Data Bank services will be appreciated by the authors. The late Walter C. H a m i l t o n was one of the founders of this project. Helen M. B e r m a n participated in the initial organization of the Data Bank and also was active in preparation of some of the first data entries as were Betty R. Davis and Daniel D. Jones. Numerous individuals have offered helpful suggestions and criticisms: Herbert J. Bernstein, David M. Blow, J o h n S. Coggins, Robert Diamond, A n t h o n y C. T. North, Michael G. Rossmann, and David G. Watson. Richard J. F e l d m a n n generously has transferred numerous atomic co-ordinate entries originally collected for inclusion in AMSOM. The members of the Protein D a t a B a n k Advisory Group, David R. Davies, K e n n e t h Neet a n d Frederic l~I. Richards have overseen our operations a n d engaged in m a n y useful discussions. This work was performed under the auspices of the U.S. Energy Research and Developm e n t Administration and supported b y the U.S. National Science F o u n d a t i o n u n d e r grants AG-370, G J 33248X, DCR-75-07702, a n d PCM75-18956. Chemistry D e p a r t m e n t Brookhaven National Laboratory Upton, N.Y. 11973, U.S.A. University Chemical Laboratory Lensfield Road, Cambridge CB2 IEW, E n g l a n d

FRANCES C. BERNSTEIN THOMAS F. KO~TZnE:~ GRAHEME J. B. W I L ~ M S EDGAR F. MEYER, JR§ MICHAEL D. BRICE JOHN R. RODGERS

University of Tokyo Hongo, Tokyo, J a p a n

TAKEmKO SHIMANOUCHI MITSUO TASUMI~

OLGAKENNARD$¶

Received 21 F e b r u a r y 1977 t The Bank will assume no responsibility for checkout or correction of errors in such deposited programs. :~To whom correspondence should be addressed. § Permanent address: Department of Biochemistry and Biophysics, Texas A & M University, College Station, Texas 77843. Research collaborator at Brool~haven National Laboratory. ¶ External staff, Medical Research Council.

542

F.C.

B E R N S T E I N E T A L.

REFERENCES Allen, F. H., Kennard, O., Motherwell, W. D. S., Town, W. G. & Watson, D. G. (1973). J . Chem. Dec. 18, 119-123. Diamond, R. (1966). Aet~ CrystaUogr. 21, 253-266. Diamond, R. (1971). Acta Cryst~llogr. sect. A , 27, 436-452. Feldmann, R. J. (1977). Atlas of Maeromoleeu[ar ~ructure on Microfiche, Tracor Jitco Inc., RockviUe. IUPAC-IUB Commission on Biochemical Nomenclature (1970). J . Biol. Chem. 245, 64896497. IUPAC-IUB Commission on Biochemical Nomenclature (1971). J . Biol. Chem. 247, 977-983. Kennard, O., Watson, ]3. G. & Town, W. G. (1972). J. Chem. Doe. 12, 14-19. Koetzle, T. F., Andrews, L. C., Bernstein, F. C. & Bernstein, H. J. (1975). In Computer, iVetworld~l and Chemistry (Lykos, P., ed.), ACS Symposium Series, vol. 19, p. 1, American Chemical Society, Washington. Meyer, E. F. (1974). Biopolymers, 13, 419-422. Protein Data Bank (1971). 1Vature N e w Biol. 233, 223. Protein Data Bank (1973). Aeta Crys~llogr. sect. B, 29, 1746.

The Protein Data Bank: a computer-based archival file for macromolecular structures.

J. Mol. Biol. (1977) 112, 535-542 The Protein D a t a Bank: A Computer-based Archival File for Macromolecular Structures The Protein Data Bank is a c...
569KB Sizes 0 Downloads 0 Views