Vol.8, no.5. 1992 Pages 425-431

CABIOS

DIROM: an experimental design interactive system for directed mutagenesis and nucleic acids engineering Kira S.Makarova, Alexander V.Mazin, Yury I. Wolf and Victor V.Soloviev2 Abstract

Introduction Rapidly improving methods of chemical synthesis of DNA has resulted in an abundance of synthetic oligonucleotides. Oligonucleotides are widely used for site-directed mutagenesis, polymerase chain reaction, dideoxy-DNA sequencing and artificial gene assembly. On the other hand, further progress in these methods is somewhat complicated by the extensive pre-experimental work required to estimate oligonucleotide reliability. Among essential oligonucleotide parameters are stability of the internal secondary structure, presence of alternate binding sites and competitive secondary structure in the target DNA, and sufficient oligonucleotide/target duplex free energy. It is to be noted that although some of these parameters can be estimated using existing software (e.g. secondary structure of an oligonucleotide), no computer system that can tackle the whole problem has been designed so far. Oligonucleotide-directed mutagenesis and artificial gene assembly experiments can readily be optimized by means of the DIROM computer system, which permits one to estimate the above parameters, and in particular, the structure of oligonucleotides, thereby making it easier for an experimenter to perform his or her task. The algorithms used formalize the empirical knowledge of oligonucleotides to be used in different Institute of Cytology and Genetics and 'institute of Bioorganic Chemistry, Siberian Division of the Academy of Sciences of USSR, Novosibirsk 630090, USSR 2 To whom correspondence should be addressed

© Oxford University Press

designed in the Laboratory of Theoretical Molecular Genetics of the Institute of Cytology and Genetics (Novosibirsk, USSR). The system operates on IBM PC/XT/AT and compatibles. It requires DOS 2.0 or higher operating system and at least 300K RAM. A minimum system configuration requires a single 360 Kbyte disk drive, but also can be installed onto the hard disk. Algorithms DIROM algorithms can be conditionally divided into three groups: optimal priming and mutagenic oligonucleotide selection; search for and generation of restriction sites; and utilities (mainly manipulations with nucleotide and amino acid sequences). Search for an optimal oligonucleotide An optimal selection of an oligonucleotide for the role of a primer or mutagene is usually required in experiments on the site-directed mutagenesis of single- and double-stranded vector and for omega-mutagenesis (Mazin et al., 1990). In his review, Smith (1985) considers five features of an oligonucleotide with respect to its suitability as a primer. These features have been formalized using the algorithm parts to decide whether the given oligonucleotide is suitable in a given experimental situation. Each criterion is specified by a set of parameters, and the corresponding values are computed for each oligonucleotide under investigation. The results are compared with the threshold values. Despite the fact that the computa425

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Birmingham on June 7, 2015

kinds of priming experiments (mainly oligonucleotide-directed mutagenesis). Methods were found to combine separate criteria A computer system DIROM for oligonucleotide-directed of oligonucleotide validity to form the hierarchical rules for mutagenesis and artificial gene design has been designed for quality estimation in the methods of both single-strand and better experimental planning and control. DIROM permits double-strand vector mutagenesis. DIROM can also be used searching for optimal oligonucleotides with respect to certain in the search for an optimal primer in the polymerase chain important parameters, namely sufficient energy of oligonucleoreaction, sequencing, etc. The system also enables one to search tide-target hybridization, the secondary structure of oligonucleofor existing and potential restriction sites and for a target + tide and target DNA, the presence of alternate binding sites vector sequence assembly. The possibility of working in the target DNA and terminal G/C pairs. It can also be used simultaneously with amino acid and nucleotide sequences makes to plan polymerase chain reaction experiments, for optimal it beneficial for protein design expeirments as well. primer selection, in sequencing, etc. DIROM enables one to search for both existing and potential restriction sites, to perform Hardware platform and system requirements vector + target sequence construction. The system consists of a set of original algorithms that formalize the empirical The DIROM system is written in Microsoft C and contains a knowledge of oligonucleotide action as primers. FORTRAN 77 module. The user interface libraries were

K.S.Makarova et al.

426

(with regard for length) complementary to a given fragment. When mutagenic oligonucleotides are generated, the following rules are applied: the oligonucleotide should contain a mutation fragment with at least six DNA bases complementary to the target at both ends. All the appropriate sequences (length amplitude is specified by the user) are regarded as potential mutagenic oligonucleotides. While working in the mode of single-stranded vector mutagenesis (and its derivative, omegamutagenesis) the oligonucleotide is thought to be complementary to the (+)-strand of the target DNA; in the double-stranded vector mutagenesis mode the strand complementary to the oligonucleotide is determined by the vector nicking site (Smith, 1985). For optimal primer search and single-stranded vector mutagenesis it is presumed that the whole DNA sequence is open for hybridization. In the case of double-stranded vector mutagenesis the gap size is presumed to be 500 nucleotides, its location depending on the position of the nicking site. Restriction site generation The generation of both the restriction site (with or without conservation of an amino acid sequence and the reading frame) and the nucleotide sequence from an amino acid sequence needs a search algorithm for potential restriction sites. But the difficulty is that both a target sequence and the recognition site can degenerate. Pressnell and Benner (1988) used a complex hierarchical notation to deal with the degenerated sequences. We propose an original algorithm capable of operating on the degenerated sequences, using matrix representation. This algorithm can be described as follows: a nucleotide sequence (either native or back-translated) and the restrictase recognition sequence (with regard to possible degeneration) are transformed to matrices [N x 4] and [M x 4], where N and M are lengths of a target DNA (or its fragment) and the recognition sequences respectively (Figure 1). Non-zero

A C G T A C G T

G A F H S K L T M D GGTGCATTCCATTCTAAATTQACGATGGAG 001001000010101111001101100011 001011001101011000101011000000 111101000000011001001001001101 001001111001101000111001010000 0001 0100 0010 1000 TCGA ( TaqI)

D

S

Fig. 1. An example of localization of a potential recognition site for Taql restriction endonuclease with amino acid sequence conservation. N and M denote sequence (fragment in question) and site lengths respectively. For the fth position in the sequence (i from 1 to N) the following sum is calculated: 4

M

J - \k - i

where D is the sequence matrix (N x 4) and 5 is the site matrix (M x 4). In this picture for / - 13, F, is non-zero.

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Birmingham on June 7, 2015

tions are quantitative, the test results are given in qualitative form: an oligonucleotide may or may not satisfy a given criterion. Below are five criteria of an optimal oligonucleotide: (i) The energy of oligonucleotide-target annealing must be sufficient for the duplex stability. We calculate the free energy of hybridization of an oligonucleotide with its complementary site on a target DNA. The free energy of the interaction of two DNA fragments has been computed by a method similar to that described by Breslauer et al (1986). Because of the inavailability in the literature of data about DNA loop energies, we have used RNA thermodynamic parameters (Jacobson et al., 1984). (ii) The absence of alternate sites of oligonucleotide binding to a single-stranded region of a target DNA. For each oligonucleotide, all the potential sites of attachment are checked, and the free energy is calculated only for those sites that have at least 50% bases complementing the oligonucleotide. If this energy is above the threshold, the oligonucleotide is regarded as unsatisfactory. (iii) The absence of stable secondary structure which can greatly impair hybridization with the oligonucleotide under investigation. All the hairpin structures which can be formed by an oligonucleotide with a stem length from minimum to maximum and a loop length up to a certain maximal size (all these parameters are defined by the user) can be tested. If complementarity in a potential stem is greater than 50%, its free energy is calculated. Comparison of this energy with threshold values is performed in accordance with the second criterion. (iv) The absence of competitive secondary structures in the target. The size of a single-stranded region is defined as its physical length with respect to the experimental conditions. The energy of all possible hairpin structures for which stems overlap the hybridization site of the oligonucleotide is estimated. Energy estimation and oligonucleotide qualification is performed in accordance with the third criterion. (v) The presence of terminal G/C pairs necessary for termini stability. An unstable 3'-end can decrease the efficiency of priming, whereas instability of a 5'-end may lead to the displacement of the mutagenic oligonucleotide by DNApolymerase during gap repair. The user specifies test window size for both termini and the number of G/C pairs the termini should comprise. If the tested oligonucleotide has fewer G/C pairs on one or the other terminus, it is regarded as unsatisfactory. Four classes of experiments require the estimate of oligonucleotide qualities: search for optimal primer, single-stranded, double-stranded and omega-mutagenesis. These differ in the set of all possible oligonucleotides generated and in the size of the target DNA region that is open for hybridization. In the search for a primer the user specifies a DNA fragment that may serve as an attachment site for the primer, the direction of priming, the minimum and maximum size of the priming oligonucleotide. The program generates all possible sequences

Computer system DIROM

Utilities The last group contains the following blocks: disk services (data

files reading and writing, disk directory browsing, elementary file manipulations, etc.); sequence manipulation functions (sequence editing and viewing); routine gene engineering procedures (assembly of vector + target constructions by sequence coordinates or restriction fragments); and different kinds of restriction analysis and calculation of oligonucleotide/ target duplex dissociation temperature. Restriction on analysis is performed by a method similar to that described by Cockwell and Giles (1989). Dissociation temperature is calculated from the formula of Rychlik and Rhoads (1989): AH + R ln(c/4)

where AH and AS are the enthalpy and entropy of duplex formation (these thermodynamic parameters are calculated as described in Breslauer et al, 1986), c is the concentration of oligonucleotide probe, and R is the universal gas constant.

Load file vita PEOTEII sequence Load file vita DIA sequence TargeUvector srstei Construction Load paraieteri of Target tequetce

Loid sequence Double-straod rector lutagenesis Single-strand rector lutagesesis Reitrictios aniljiii Restriction site Generation Search for Optiial priier Artificial gene constriction Cassete tutageaesii

Select Sutation Select Criteria of OPTIHOLIG Teit olifoaocleotidei lesoltt Coiplete reitrictioo site Uble testrictioa lap/frafieot Aniljiis Reiolti

Select Fragient Select Criteria of OPTIRPEim Test oligooQcleotidei

Load Target seqaeoce Load Vector sequence Paraieteri of targettvector sequence

Ri

Select ticking site Select tar(et Strand Select Nutation Select Criteria of OPTJSOLIC Teit oligonocleotidei lit in Its

Restriction table Graphic lap teitriction Prajienti

Select Sites to lake Select Frafient for analjiii Aiiooacid sequence Conservation lo aiinoacid sequence conserration Results

Results

Select restriction sites to take lake an artificial gene froi aiinoacid sequence Oiefa idtageDesii

Select flanking sites Select lutatioi Restriction site Generation Break cancte Results

Select Sites to lake Select Fragient for analysis Aiinoacid sequence Conservation lo aiinoacid sequence conservation Results

Fig. 2. Genera] structure of the DIROM system. Hierarchical menu tree.

427

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Birmingham on June 7, 2015

elements of the matrices show that theyth nucleotide can occupy the ith position on the sequence (site); zero elements mean the contrary. Depending on the class of experiments, the sequence (non-zero elements of the matrix) or the genetic code can be fully or partially degenerated. The site in question can be found (generated) in the given position if the sum of products of corresponding matrix elements is non-zero (Figure 1). The algorithm details for determining the best (with respect to the number of required substitutions) site position are omitted. If an amino acid is encoded by six codons, this method can produce false-positive results, so such cases are additionally tested by translation of the resulting DNA fragment. According to our estimations, this algorithm is faster than that described by Pressnell and Benner (1988).

K.S.Makarova el at.

DEROM system description The general structure of the system as a hierarchical menu tree is presented in Figure 2. The system includes the following blocks: Load sequence

Restriction analysis The 'Restriction analysis' block is intended for different kinds of restriction analysis. Search for recognition sites can be carried out from the selected file for the whole set or only a subset of the restrictases, or for the restrictases for which names and recognition sequences are entered by the user directly from the keyboard. The results of restriction analysis can be presented in the following three ways: (i) a table of restriction sites, which contains restrictase names, recognition sites, the number and position of the sites found; (ii) a graphic circular or linear map; and (iii) a table of fragments resulting from digestion by the selected restrictases (fragments are sorted by length in descending order). Single-strand vector mutagenesis This block of the system is designed for selection of an optimal mutagenic oligonucleotide for single-strand vector mutagenesis. The user specifies the location and type (substitution, deletion or insertion) of a mutation using the built-in line sequence editor. Mutations can be defined on either the nucleotide or the amino acid level (in the latter case the program automatically finds the nucleotide equivalent). In the 'Select criteria' menu item the user is prompted to select a subset of the above five criteria of oligonucleotide validity. Generation of a set of potential mutagenic oligonucleotides can be done automatically or manually. Oligonucleotide testing subject to the first four criteria can be performed in three modes. In the single-parameter mode the threshold energy is specified directly by the user. In the multiple-parameter mode the user specifies the number (up to 10) of threshold values, then looks through the results and selects the most appropriate threshold level. In the optimized-parameter mode the program uses an iterative algorithm to find a threshold value such that a given percentage of unsatisfactory oligonucleotides is discarded. After each criterion has been tested

428

Double-strand vector mutagenesis

,

Double-strand vector mutagenesis requires nicking of the target DNA nearby the mutation site by a restriction endonuclease in the presence of ethidium bromide. After a mutation has been selected (as in single-stranded vector mutagenesis) the system performs restriction analysis in order to find suitable nicking sites. Restriction analysis can be performed in different modes, including search for unique sites. Sites located within 400 nucleotides of the mutation site are considered potential nicking sites and are introduced to the user. Depending on the position of the selected nicking site with respect to the mutation site the mutagenic nucleotide should be complementary to the (+)- or (—)-strand of the target DNA, and this is taken into account automatically. A gap produced by exonuclease III is assumed to be 500 nucleotides. Oligonucleotide testing and representation of the results are essentially similar to those for single-stranded vector mutagenesis. Search for optimal primer In the 'Select fragment' menu item the user specifies coordinates of a target sequence and the direction of priming. If the fragment synthesized corresponds to the (+)-DNA sequence of the target DNA sequence, the direction of priming is regarded as positive (the primer is complementary to (—)-strand of the target DNA), and the opposite direction is negative. The criteria are selected, the oligonucleotides tested and the results presented similarly to single-stranded vector mutagenesis. Restriction site generation In the 'Restriction site generation' block the search and generation of potential sites for recognition by a restrictase is performed in two modes: with and without amino acid sequence conservation. In the line sequence editor the user specifies a fragment of up to 500 bp to search for potential sites and, from the file, names of restrictases whose sites are to be incorporated into the sequence. In the amino acid conservation mode, the system also regards the reading frame of a gene. After the parameters are selected for all possible positions of the selected restriction sites, the number and location of necessary substitutions for each site to be entered are automatically determined. In the interactive mode the user can view each site and arbitrarily select its final localization, and the required

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Birmingham on June 7, 2015

The 'Load sequence' block implements most routines for file and sequence manipulation (including an assembly of gene engineering constructions). Using different modes of sequence loading, the user can edit and view sequences, browse disk directories, etc. In the 'Vector + Target sequence construction' mode the user can assemble a construction from either sequence coordinates or restriction fragments of its expected components. To make it easier to orient in an assembled construction, the user can specify coordinates of target sequence insertions and take advantage of double numeration of sequence bases.

the user can look through the results and select a part of the nucleotides for the next step. In the table, for each oligonucleotide the results are given with respect to each of the above criteria, of which for the first four the threshold energies are shown. After the table has been browsed the user is prompted to select an oligonucleotide for the experiment. The final picture shows juxtaposition of the oligonucleotide and the target DNA, mismatched duplex bases, and temperature of dissociation of the oligonucleotide-target duplex. These results can be saved in a file.

Computer system OIROM

number of substitutions. In the end, all the results of restriction site generation (including the positions of sites at the fragment map and the number and location of the required substitutions) can be viewed and saved.

P o . i l I on of s i t e : P Subatitutlons Encrae BGLII

348 L

5

T

T

R

S

W

R

P

H

P

CCACTATCOACTACOCOATCATGOCOACCACACCCO A T AOATCT

Artificial gene construction

349 - loft baee 348 - currant base (•) l-.» 20 30 40 50 00 10 ACOCOATCATOOCQACCACAC^XOTCCTOTOQATCCTCTACaCCaQACOCATCOTOOCCa T R S V R P H P S C O S S T P D A S V P You w i l l c h a n g e n u c l e o t i d e a f r o s C n u a b e r 348 o n :

0 1 2 3 4 5

1 1 3 4

5 S

T 8 S 10

Tasted Paaaed Paaaed Paaaad Passed Passed

oligonucleotldes 1st criterion 1 and 2nd crlteron 1,2 and 3d criterion 1,2,3 and 4th criterion 1,2,3,4 and 5th criterion

TCOCCAAOATCTCOTAOT 0TC0CCAAOATCTCGTAGT TCOCCAAOATCTCGTAOTC aGTCOCCAAGATCTCGTAOT CTCDCCAACATCTCGTAOTC TCOCCAAOATCTCOTAGTCC TaOTCGCCAAOATCTCaTAOT QOTC0CCAACATCTCCTA0TC aTCCCCAAQATCTCOTAaTCO TCQCCAAOATCTCGTAOTCOA

AQATCT

10 8 2 i 2 2

-39.1

-12.5

-6.3

ALAS ALAS ALAS OK ALAS OK OK OK OK OK

OK OK OK ALAS OK OK ALAS ALAS OK ALAS

OK OK OK OK OK OK ALAS OK OK OK

N .T.

N .T. N .T. N .T. N .T. N .T. N .T. N .T. N .T. N .T.

ALAS ALAS OK ALAS OK OK ALAS OK OK OK

Cassette mutagenesis Cassette mutagenesis includes cutting of a part of an existing gene by a pair of restriction endonucleases and insertion of an artificial fragment with mutated sequence (mutagenic cassette) into the 'opened' vector. If the cassette is so long that it cannot be synthesized as a single oligonucleotide, it should be 'broken' into fragments that can be ligated in vitro. The user specifies a mutation (see 'Single-strand vector mutagenesis') and maximum length of mutagenic cassette. DIROM searches for a pair of flanking sites and thus determines the cassette borders. A set of restriction sites can be incorporated into the sequence of the cassette (see 'Restriction site generation'). If the cassette is too long, it can be broken into oligonucleotide pairs of the specified length. To find unique cohesive ends, the system finds such break sites. All the results can be saved.

319 378 5 ' -TTCOCTACTTGCAGCCACTATCGACTACOCaATCATaOCOACCACACCCOTCCTCTaaA-3 ' * *

3 ' -OCTGATCCTCTAClAACCaCTO- 5 ' Subatitution Troa C nuaber 348 to AGATCT

F. ••• Mutagenic oligonuclootide

n

"

Work with ••quince "E:\C\HC\PBR322.SN" Hutation:

Substitution froaj C nuaber 34B to AGATCT

Double-strand vector Positive strand Vector nioking site: ECORV (186) Hutacenic olifonucl.otid.: OTCGCCAAOATCTCOTAQTCO Dissociation teaperatur*: Filter hybridisation temperature: Oligonucleotide concentration

79.0 C 71.4 C 10.0 pool

Implementation results Let us demonstrate how the system works by the following three examples. Search for an optimal oligonucleotide for double-stranded vector mutagenesis (Figure 3) pBR322 plasmid has been used as both the vector and target sequence. Our objective was to incorporate a unique restric-

Fig. 3 . Selection of an optimal oligonucleotide for the insertion of a unique recognition site of Bgtl restrictase to the frt-gene of pBR322 plasmid. (A) Search for a potential site and the required substitutions. (B) Mutation specification in the line sequence editor. ( Q General results of oligonucleotide testing. (D) Oligonucleotide testing results: OK, oligonucleotide satisfies a given criterion; ALAS, oligonucleotide does not satisfy a given criterion; NT, oligonucleotide has not been tested. Threshold energy values are shown over first three criteria columns. (E) Juxtaposition of a mutagenic otigonucleotide and a target sequence. Asterisks denote mismatches in the oligonucleotide—targtt duplex. (F) Resulting file.

429

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Birmingham on June 7, 2015

The 'Artificial gene construction' block provides us with two modes for two different purposes. The first mode permits generation of the nucleotide sequence from an amino acid sequence and incorporation (if possible) of restrictase recognition sites. Search for and representation of the results are essentially the same as described for the 'Restriction site generation' block (except that in the absence of a pre-existing gene the number of substitutions becomes the undefined parameter). The ambiguity of back-translated DNA (outside the sites generated) is avoided by means of codon preference tables. The second mode implements a specific method of site-specific long sequence insertion. This method, called omega-mutagenesis, is described in detail in Mazin et al. (1990) and is a multi-step process of oligonucleotide-directed mutagenesis in which each mutagenic oligonucleotide carries a part of a long insertion. For each step an optimal oligonucleotide is selected in a fashion similar to single-stranded vector mutagenesis. The number of steps is determined for a single step by an overall insertion length and a maximum insertion length. Those parameters, along with the insertion sequence, are specified by the user. In the end, a list of all necessary oligonucleotides is available and can be saved.

K.S.Makarova el al.

" - CASSETE MUTAOENESIS -•• Recognition Nunbar 1

sit«» of CFRI w«s found Position «

Recognition Nuabor

sit«* of CLA1 was found Position

Recognition Nuab«r 1 2

sltas of HINDU Position It 33

Work with aequonco "E:\C\HC\NS3.SEQ" Mutation: Substitution froM C number 5976 to QAAOAACCA FrosK

S A A Q R R C B V G 5'-. . . TCTOCQOCCCAACGCCCTOOGAGAOTCaGA. . .-3'

To:

S A A Q E E A R V Q 5'-. . .TCTOCOOCCCAAQAAOAAOCAAQAOTCOflA. . .-3"

*«s found

Caaaete i a flanked b j : Bgll (S9S9)

...

Nhel

(6237)

Position of site:

Caaaete conaiata of 6 oligonuclaotide pair(a) Pair 1

Oligonucleotid* containa Mutation! E E A ********* 5 • -COOCCCAAaAAaAAOCAAQAOTCOOAAaACAOaAOGGAAQAACAOA-3' 3'-i

Position of sit*: I T K C O G I D K R T I ATCTACAAATOCQOTCCTATCaATAAAAaAACCATC

Pair 2 S' -TaAATACATATACTCTOGACAOTOTOATGATaATOATAOTOQACTT-3 ' 3' -TOTCTACTTATaTATATaAGACCTOTCACACTACTACTACTATCAC-S'

Position of slt«: I V V 1 C H V D S C I S ATCOTOOTaATCOOTCACOTCOACTCCCCTAAATCC OTTRAC

Pair 3 3'

I

|0

2 0 - - -

30-

-tO

50

60

I V V I O H V D S a H S T T T O I I L l Y ATCCTaOTaATCCaTCACCTCOACTCCOaTAAATCCACCACCACCOOTCACTTCATCTAC

01

CTYRAC

Rccaar

HlndH

Crrl

t C O Q I D K R T I AAATCCOGTOCTATCGACAAAAGAACCATC ATCGAT Cl.I

Fig. 4. Generation of an artificial gene sequence. The gene encodes the N-terminal fragment of translation elongation factor Ef-la and has recognition sites for Cfii, Cla\ and Wi'ndll restrictases. The E.coli codon preference table was used. (A) General results of search for potential sites. (B) Incorporation of restriction sites into the degenerated gene sequence. ( Q Resulting Hie.

tion site into the tetracycline resistance gene of the plasmid. Using the 'Restriction analysis' block we have found a set of restrictases that do not cut pBR322 DNA, and by the use of 'Restriction site generation' functions searched the /«-gene fragment for potential sites with conservation of amino acid sequence. It was found that the Bg[Q site could be created at position 348. The necessary substitutions were ordered for double-stranded vector mutagenesis. EcoKV restrictase (the unique site at position 186) was selected for nicking. Location of the site (upstream of the mutation site) determines the target strand (positive) which the mutagenic oligonucleotide should complement. In accordance with these data, a set of possible mutagenic oligonucleotides (10 oligonucleotides of the length up to 19) was generated and tested by four criteria (1st, 2nd, 3rd and 5th) in an optimal-parameter mode (discarding 40% of oligonucleotides at each step). Of the two oligonucleotides satisfying all the criteria, one was selected and its dissociation temperature was calculated. The results were saved in a file. Artificial gene construction {Figure 4) For an amino acid sequence we took the N-terminal part of 430

8' -OTOCAOTaQAAOaAAOCOCAaATACTTCTTQACAACATAACAACAC-3' -CTQAACACGTCACCTTCCTItXKXrrCTATaAAQAACTOTreTATTC-B'

Pair 4 5 ' -TOCGGGGGCCrraTOGCTACCTTCTATOGACCAOAGCAOOACAAGAT3 ' -TTaTGACTCCCCCOGACACWlATQGAAQATACCTOOTCTCGTCCTO-5 '

3'

Pair 5 5 ' -aCCAGAGGTaOCOOGTCATTTCCOCCTCACTOAAOAGAAAAaAAAG-S ' 3 ' -TTCTAC0GTCTCCACCGCCCAGTAAAGOCGGA0TOAUriClCTTTT-5 '

6 ' -CATTTTCGACATCTTCTCACCCACTOTGACTTCACOCCGTGC-3" 3' -CTTTCGTAAAAGCTGTAGAAGAGTOOOTGACACTGAAOTOCGGCACCQATC-5' Fig. 5. Cassette mutagenesis. Mutation at the alternate processing site of the NS3 gene of the tick-bome encephalitis virus. Resulting file.

the translation elongation factor EF-la from Drosophila mclanogaster (Hovemann et al., 1988). Assuming that the designed gene should be cloned and expressed in Escherichia coli, we selected the codon preference table for this species. A set of restrictases with hexanucleotide recognition sites was tested for an ability to incorporate their sites in the gene sequence. For three of them (Clal, Cfii and Hindil) the sites were generated (in a non-overlapping way). The results were saved in a file. Cassette mutagenesis (Figure 5) To demonstrate the work of the 'Cassette mutagenesis' block we induced a mutation in the site of alternate processing in the NS3 protein of tick-bome encephalitis virus (Pletnev et al., 1990). Inside the NS3 gene, the sequence CGCCGTGGG (5975-5983) is replaced by GAAGAAGCA. On the amino acid level the mutation is RRG — EEA. In the case of maximum cassette length of 400 nucleotides the system found a pair of restriction sites: BgK (5959) and Hnel (6237). The cassette was

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Birmingham on June 7, 2015

3 O I S T T T 0 H L I T TCCGOTAAATCCACCACCACCOOCCACCTOATCTAC RCCOOT

Computer system DIROM

broken into six oligonucleotide pairs each no longer than 50 nucleotides. The results were saved in a file. Discussion

The proposed interactive computer system is intended for molecular biologists—experimenters who use the gene engineering methods in their routine work. On the one hand, there are a number of commercial software products designed for this group of users. These systems implement mainly 'lowlevel' operations on sequences: sequence data base support, sequence statistics, automatic transcription and translation, etc. This software has become an essential tool for molecular biologists. Nevertheless, despite its helpfulness for experimenters, more complex operations concerning experiment planning and results prognosis remain non-automated. On the other hand, in the literature there is an abundance of publications (e.g. Jiang et ai, 1990) describing software systems Received on April 20, 1991; accepted on January oriented to those 'high-level' goals. Using the ideology of so-called 'Knowledge-based design', those systems take full Circle No. 1 on Reader Enquiry Card advantage of object-oriented programming, implement complex heuristic algorithms, etc. Such software requires high specification computer (RAM and disk size, performance, etc), and its complexity and cost restricts its use in a laboratory. Our system is an attempt to find a compromise between those extremes. On the one hand, its modest size, user-friendly interface, acceptable performance (a full-scale test task takes no more than 1 — 2 h) and implementation within the IBM PC standard makes DIROM an inexpensive and easy-to-use tool. On the other hand, the system helps solve non-trivial problems that can usually be solved only by human intuition. The system formalizes an empirical knowledge on the experimental use of oligonucleotides and provides an algorithm for qualitative estimation of oligonucleotide validity. Also the proposed original algorithm enables one to search for potential recognition sites in the generated sequences. Moreover, oligonucleotide-directed mutagenesis as a widely used experimental technique is strangely abandoned by the computer systems known from the literature, so we hope to find a vacant 'ecological niche' in the application software world. Part of our work, such as the search for potential restriction sites in artificially designed genes, is self-evident. The other part, concerning oligonucleotide testing, needs further experimental proof of the validity of our criteria. This work is being carried out now and will be the subject of future publications.

I, 1992

References Breslauer.KJ.. Ronald.F., Blocker.H. and Marky.L.A. (1986) Predicting DNA duplex stability from the base sequence. Proc. Nail. Acad. Sri. USA, 83, 3746-3750. Cockwell.K.Y. and Giles.I.G. (1989) Software tools for motif and pattern scanning: program description including a universal sequence reading algorithm. Compur. Applic. Biosci., 5, 227-232. Jacobson.A.B., Good.L., SimoncttU. and Zuker.M. (1984) Some simple

431

Downloaded from http://bioinformatics.oxfordjournals.org/ at University of Birmingham on June 7, 2015

computational methods to improve the folding of large RNAs. Nucleic Acids Res., 1 2 , 4 5 - 5 2 . Jiang.K., ZhengJ., Higgins.S.B., Watterson,D.M.,Craig,T.A., Lukas.T.J. and Van Hdik.LJ. (1990) A knowledge-based experimental designed system for nucleic acid engineering. Comput. Applic. Biosci., 6, 205-212. Hovemann.B., Richter.S., Waldorf.U. and Czeipluch.C. (1988) Two genes encode related cytoplasmic elongation factor l a (EF-la) in Drosophila melanogaster with continuous and specific expression. Nucleic Acids Res., 16, 3175-3194. Mazin.A.V., Saparbaev.M.K., Ovchinnikova.L.P., Dianov.G.L. and Salganik.R.I. (1990) Site-directed insertion of long single-stranded DNA fragments into plasmid DNA. DNA Cell Biol., 9, 6 3 - 6 9 . Pletnev.A.G., YamshcDcov.V.F. and Blinov.V.M. (1990) Nucleotide sequence of genome and complete amino acid sequence of the polyprotein of tickborne encephalitis virus. Virology, 174, 250-263. Pressnell.S. and Benner.S. (1988) The design of synthetic genes. Nudeic Acids Res., 16, 1693-1702. Smith.M. (1985) In vitro mutagenesis. Annu. Rev. Genet., 19, 423-462. Rychlik.W. and Rhoads.R.E. (1989) A computer program for choosing optimal oligonucleotides for filter hybridization, sequencing and in vitro amplification of DNA. Nucleic Acids Res., 17, 8543-8552.

DIROM: an experimental design interactive system for directed mutagenesis and nucleic acids engineering.

A computer system DIROM for oligonucleotide-directed mutagenesis and artificial gene design has been designed for better experimental planning and con...
587KB Sizes 0 Downloads 0 Views