DNA strand displacement system running logic programs.

BioSystems 115 (2014) 5–12

Contents lists available at ScienceDirect

BioSystems journal homepage: www.elsevier.com/locate/biosystems

DNA strand displacement system running logic programs ˜ Alfonso Rodríguez-Patón a , Inaki Sainz de Murieta a,c , Petr Sosík b,a,∗ a Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo s/n, Boadilla del Monte, 28660 Madrid, Spain b Research Institute of the IT4Innovations Centre of Excellence, Faculty of Philosophy and Science, Silesian University in Opava, 74601 Opava, Czech Republic c Centre for Synthetic Biology and Innovation and Department of Bioengineering, Imperial College London, London SW7 2AZ, UK

a r t i c l e

i n f o

Article history: Received 12 July 2013 Received in revised form 8 October 2013 Accepted 28 October 2013 Keywords: DNA strand displacement DNA computing Logic program Horn clause

a b s t r a c t The paper presents a DNA-based computing model which is enzyme-free and autonomous, not requiring a human intervention during the computation. The model is able to perform iterated resolution steps with logical formulae in conjunctive normal form. The implementation is based on the technique of DNA strand displacement, with each clause encoded in a separate DNA molecule. Propositions are encoded assigning a strand to each proposition p, and its complementary strand to the proposition ¬p; clauses are encoded comprising different propositions in the same strand. The model allows to run logic programs composed of Horn clauses by cascading resolution steps. The potential of the model is demonstrated also by its theoretical capability of solving SAT. The resulting SAT algorithm has a linear time complexity in the number of resolution steps, whereas its spatial complexity is exponential in the number of variables of the formula. © 2013 Elsevier Ireland Ltd. All rights reserved.

1. Introduction A construction of programmable biomolecular devices is one of the ultimate goals on the crossroad of biotechnology, nanotechnology and computer science. Their applications proposed so far in the literature include, among others, genetic engineering (Deans et al., 2007), biomedicine and drug delivery (Benenson et al., 2004) and programmed molecule pattern formation (Basu et al., 2005). Besides the input and output interface, i.e., the ability to sense and act in the complex inter- or intracellular environment, the key part of such a device is its internal logic controlling its behavior. Promising solutions for a physical implementation of logical operations with biomolecules were proposed in the area of DNA computing. After initial years when the discipline studied mainly combinatorial problems, another research line emerged, focusing on molecular logic for automated nano-devices. Among recent promising approaches we mention, e.g., the Rondelez toolkit (Montagne et al., in press) and genelets (Franco et al., 2011). Among many contributions to this broad topic, when focusing more on logic and automata, we cite for example the design of logic gates (Seelig et al., 2006; Frezza et al., 2007), DNA automata

∗ Corresponding author at: Research Institute of the IT4Innovations Centre of Excellence, Faculty of Philosophy and Science, Silesian University in Opava, 74601 Opava, Czech Republic. Tel.: +420 553 684 364. E-mail addresses: arpaton@fi.upm.es (A. Rodríguez-Patón), [email protected] (I. Sainz de Murieta), [email protected] (P. Sosík). 0303-2647/$ – see front matter © 2013 Elsevier Ireland Ltd. All rights reserved. http://dx.doi.org/10.1016/j.biosystems.2013.10.006

(Takahashi et al., 2006), as well as theoretical models (Cardelli, 2009). These selected papers share a common property: the use of competitive hybridization commonly known as DNA strand displacement. Two complementary single strands of nucleic acids are joined to form a double helix under conditions of suitable temperature and pH. A further step in the DNA-based logic are computing models that implement logical inference. Several results demonstrating theoretical principles of DNA-based inference were previously published, such as Kobayashi (1999) based on the technique of Whiplash PCR Uejima et al. (2002) applying DNA self-assembly, or Wasiewicz et al. (2000) which used plasmid molecules and which presented also experimental results using human-assisted protocol. Recently, in the work of Shapiro et al. in Ran et al. (2009) applying a similar automaton concept like the one previously proposed in Benenson et al. (2001), the authors developed and tested a system able to perform autonomously simple logical deductions with DNA molecules based on hairpin manipulation and unfolding. Later, Rodrí guez-Patón et al. (Rodríguez-Patón et al., 2010; Sainz de Murieta and Rodríguez-Patón, 2012) presented another inference model with DNA. This article presents a DNA computing model based on strand displacement of four-way junctions (Panyutin and Hsieh, 1994; Biswas et al., 1998), that encodes logical formulae in conjunctive normal form (CNF) and autonomously perform logical resolution of their clauses. Unlike (Ran et al., 2009), the model presented here is enzyme-free and its implementation can be built on the general protocol derived in Panyutin and Hsieh (1994) and Biswas et al. (1998). Due to the terminology introduced in Chiniforooshan

6

A. Rodríguez-Patón et al. / BioSystems 115 (2014) 5–12

et al. (2011) (see the next section for details), our solution is partly scalable (this term refers only to the preparation of initial DNA molecules), time-responsive and energy-efficient. We also demonstrate that our model is theoretically capable of solving the boolean satisfiability problem (SAT). The SAT problem continued to be approached in DNA computing after Lipton’s work (Lipton, 1995). Most of the subsequent proposals have still followed the approach of generating all the possible solutions and then select the right ones using laboratory operations; improvements were introduced changing the way the initial set of solutions is encoded, reducing the number of laboratory steps (Sakamoto et al., 2000; Liu et al., 2000; Manca and Zandron, 2002). Others have adapted classical SAT resolution algorithms like Davis–Putnam (Ogihara and Ray, 1996; Wang et al., 2008; Brun, 2012) into a DNA algorithm, following a constructive solution generation (instead of generating all and then selecting the right ones). Here we propose an alternative approach: we encode the full formula into DNA molecules representing clauses and let the system massively perform parallel logical resolution operations between the clauses. The rest of the paper is structured as follows: Section 2 includes a description of the DNA strand displacement process, a quick review of the main concepts in propositional logic and the principles of the proposed DNA model. Section 3 describes how the model performs the autonomous resolution. Section 4 shows how the parallel application of multiple resolution steps, in combination with a few laboratory operations, can solve the SAT problem. Finally, Section 5 summarizes the conclusions and future work. 2. Principles of the model Basic concepts of propositional logic used throughout the article are summarized first:

Fig. 1. The DNA encoding of clauses with sticky ends in cover strands.

the DNA sequence 5 − ATAAGG − 3 , then A = False will be encoded with its complementary strand 3 − TATTCC − 5 . This dual encoding feature allows the logical resolution to be applied to different clauses in an autonomous way. The same encoding principle is used, e.g., also in Rodríguez-Patón et al. (2010) and Sainz de Murieta and Rodríguez-Patón (2012), Sakamoto et al. (2000). Each proposition is annealed to a “cover” strand of same length with the following structure: • A sticky end will be located at the 3 end when the cover anneals to an affirmed proposition (e.g. ¬aT in Fig. 1), or at the 5 end when the cover anneals to a negated proposition (e.g. zT in Fig. 1). In both cases the sticky end stays single stranded and does not anneal to the proposition strand. • In affirmed propositions, the cover strand anneals its 5 region with he proposition strand. A free sticky end stays single stranded at the 3 end of the cover strand (e.g. aT at Fig. 1). • In negated propositions, the cover strand anneals its 3 region with the proposition strand. A free sticky end stays single stranded at the 5 end of the cover strand (e.g. ¬zT in Fig. 1). • The cover strand of a given proposition P must be complementary to the cover strand of ¬P. 2.2. Encoding of clauses

Proposition: The minimal syntactic entity in propositional logic. A proposition P can take values True or False. P = True is represented as P, whilst P = False is represented as ¬P. Formula: A formula in propositional logic can be recursively defined as follows: • Each propositional variable is a formula. • If ϕ is a formula, then ¬ϕ is a formula. • If ϕ and are formulas and ◦ is any binary connective, then (ϕ ◦ ) is a formula. Here ◦ could be ∨, ∧, →, or ↔. A logical formula consisting of a disjunction of propositions. Conjunctive normal form (CNF): A logical formula is defined in conjunctive normal form iff it is expressed as a conjunction of clauses. Logical resolution: If C and C are clauses and p is a proposition, then any assignment that satisfies both (C ∨ p) and (C ∨ ¬ p) also satisfies (C ∨ C ). Hypothetical syllogism: Inference rule stating the following: if we have A → B and B → C, we can deduce A → C. SAT: The Boolean satisfiability problem is the problem of determining whether the variables of a given formula can be assigned values in such a way as to make the formula evaluate to True.

Clause:

Assume that propositions are named with capital letters from A to Z. Clauses in conjunctive normal form (CNF) are initially encoded as follows (see Fig. 1): • Propositions belonging to the same clause are encoded in the same DNA strand and sorted: positive propositions in ascending order, starting from the 3 end of the strand; then negative propositions in descending order, finishing at the 5 end. • Each cover strand is labeled with a fluorophore reporter. It does not play any role in the resolution mechanism but it will be required, together with a length separation technique (electrophoresis), to decide whether the empty clause was obtained during the resolution. During the reactions corresponding to resolution steps, doublestranded parts with no cover strand can appear in the molecules, see Fig. 2. These double stranded sequences do not encode any proposition. Therefore, in general, a molecule encoding a clause with x propositions consists of these (and only these) parts: • x proposition strands annealed with their corresponding cover strands; • y ≥ 0 proposition strands fully annealed with their complementary sequences. These parts do not encode propositions any more.

2.1. Encoding of propositions Every proposition is assigned a single nucleotide sequence denoting that A = True, whereas its logical complement denoting A = False (or ¬A) is encoded with the complementary nucleotide sequence. For example, if the proposition A = True is encoded with

The representation of the empty clause results as a special case: a non-fuorescent double-stranded molecule containing y ≥ 0 proposition strands, with nicks in one strand between all parts encoding propositions. The mechanism introducing the nicks is explained in detail in Section 4.


7

Fig. 2. Basic resolution step. The clauses A ∨ B and ¬B ∨ C are resolved on B, producing the resolvant clause A ∨ C.

2.3. Physical implementation There already exist several experimental implementations of molecular logic based on strand displacement so that we can to a large extent rely on the obtained results and properties. The pioneering work of Seelig et al. (2006) presented the construction of enzyme-free DNA logic gates based on strand displacement. Their “dual-rail” representation can be considered as a base for the physical encoding in our design in the sense that both a logical proposition and its negation are represented by the presence of a certain DNA strand species; in our case, however, these species are mutually complementary. Later, building on the recent and general work (Soloveichik et al., 2010), the article (Chiniforooshan et al., 2011) introduced an improved design of molecular circuits with the following declared properties: scalable, time-responsive, energyefficient and digital. We briefly summarize these properties and note how they apply to our construction. scalable: The preparation of the DNA strands can be done in a single test tube instead of a separate tube for each DNA species. This is only partly true in our case: all cover strands with sticky ends encoding propositions can be divided into two groups, positive and negative, and each group can be prepared together in a single test tube. However, the long DNA strands encoding clauses of the initial program must be prepared separately since they may contain mutually complementary parts. time-responsive: After some initial computation, the system is still able to react to changes of the input and accordingly recompute the output. In our design, some initial resolution steps can proceed in the mixture of DNA species encoding clauses, but when new species encoding new facts or rules are added, new resolution steps will start, hence the system responds. However, whenever there are enough facts for refutation, then the species corresponding to the empty clause is irreversibly produced. energy-efficient: If inputs are given ideally (i.e., with 0 concentration of unwanted species), then the system converts into steady state when outputs are also ideal and the system is in an energetic equilibrium; this is true in our design, too. digital: The DNA reactions are able to perform signal restoration when the concentration of DNA species encoding specific logic information is low, to obtain a digital abstraction of analog concentration values. Such a property is the

Fig. 3. 4-Way branch migration process that implements the resolution step of two unary clauses B and ¬B.

subject of further research in our case since a large number of intermediate clauses can emerge during the resolution and some kind of “universal signal restoration gate” would be needed. 3. Autonomous resolution using DNA The resolution is perhaps the most studied propositional refutation system due to its simplicity and importance in many automatic theorem proving procedures (Davis and Putnam, 1960). The resolution principle states that, if C and C are clauses and p is a proposition, then any assignment that satisfies both (C ∨ p) and (C ∨ ¬ p) also satisfies (C ∨ C ). Consequently, (C ∨ p) and (C ∨ ¬ p) is unsatisfiable if and only if (C ∨ C ) is unsatisfiable. Having a look at Fig. 2, we can see how resolution is automatically performed between two clauses encoded as described in Section 2: the molecules encoding clauses A ∨ B and ¬B ∨ C merge together into another molecule that encodes the resolvant clause, A ∨ C, which can be later used in further resolution iterations. Although the resolvant contains an extra double stranded region (formed by B and ¬B), it has no free sticky end and it cannot alter computations reacting with other clauses. The example depicted in Fig. 3 shows a resolution reaction of two unary clauses, B and ¬B, in detail. • In the first step, the sticky ends of both cover strands anneal (¬b T with bT ), and so do the sticky ends of the proposition strands (¬bT with bT ). A Holiday junction (Stahl, 1994; Liu and West, 2004) is formed as result of both hybridizations. • At this point, the junction starts a spontaneous branch migration in a random walk (Panyutin and Hsieh, 1994; Biswas et al., 1998). • When the Holiday junction reaches the distal end of the double stranded region, the strand exchange completes producing two stable duplexes: one of them made with the cover strands, the other one with B and ¬B. At the end of the process, B and ¬B form a fully hybridized double stranded molecule; unlike in Fig. 2, it contains no single

8


stranded segments that encode propositions, meaning that the formula B ∧ ¬ B is not satisfiable. This resolution reaction has evolved into an empty clause, represented as Ø. This principle can be cascaded as in Murphy et al. (2006), resulting in more complex logic steps as the hypothetical syllogism: if we have A → B and B → C, we can deduce A → C. As every implication can be formulated as a disjunction, we can rewrite the previous statement as follows: if we have ¬A ∨ B and ¬B ∨ C, we can deduce ¬A ∨ C, which also matches the example of Fig. 2. The model also allows multiple iterations of the hypothetical syllogism. Finally, consider a logic program consisting of a sequence of Horn clauses in the form (A1 ∧ A2 ∧ · · · ∧ An ) → B which can be rewritten as ¬A1 ∨ ¬A2 ∨ · · · ∨ ¬An ∨ B. Each such clause (and, of course, each unary clause representing a fact) can be encoded as a separate DNA species as described above and then the autonomous resolution can proceed, until the empty clause is eventually obtained, indicating that the refutation process was successful. A detection of the empty clause is described in the next section. 3.1. Preventing undesired hybridization When preparing a library of molecules for pre-programmed DNA reactions, it is always important to design DNA sequences in such a way that no other reactions between molecules may occur, which would (i) lead to production of false results (false positives), or (ii) bind the molecules in unexpected reactions which would prevent them to react in the planned manner (false negatives). The library of molecules we use here can be prepared so that undesired reactions cannot occur, unless the number of encoded logical variables is too high. The principles described in Section 2 can guarantee that each single-stranded molecule encoding a proposition can hybridize only to the corresponding cover strand, and to no other DNA molecule in the test tube. Furthermore, there exist further types of DNA bonds which are undesired in our protocol – hairpins (unlike (Sakamoto et al., 2000) where hairpins are applied to compute) and bulges. Hairpin is a single-stranded DNA molecule with two complementary parts which bind to themselves. Many computational and experimental methods are known (Kari, 2012) which can be used to design a library of DNA molecules encoding propositions without possible hairpins within them. Considering longer molecules encoding clauses: if two parts of such a molecule form hairpin, it would mean that the same clause contains a proposition and its negation, in which case the clause would always be true. For example, A ∨ B ∨ ¬ A, the clause is true regardless of the value assignments to A and B. Therefore, the formation of hairpin does not alter the result of refutation as this clause cannot contribute to eventually obtain the empty clause. Similarly, the formation of bulges can be prevented within short molecules encoding propositions by a proper encoding. Considering long molecules encoding clauses, for a bulge to be formed we would need two clauses that can be resolved in two different propositions, such that in one of the clauses both resolving propositions are consecutive, and in the other clause, the same two propositions are not consecutive. Fig. 4 shows an example of clauses A ∨ B ∨ C and ¬Av ¬ C that comply with this statement. The motif at the top shows how the bulge structure would look like if it could be formed, whereas the motifs at the bottom show the initial configuration of the clauses. Given the stiffness of double stranded DNA, it seems quite difficult that the central double stranded region in the clause

Fig. 4. A hypothetical possibility of a bulge formation. The sequence encoding B in both parts of the figure is double-stranded.

A ∨ B ∨ C encoding proposition B can be folded such that both ¬A and ¬C can attach to A and C at the same time. Therefore, the encoding model should prevent the formation of bulges. 3.2. Modus Ponens and Modus Tollens These two simple but important cases of the resolution with DNA will be demonstrated in detail. Modus Ponens is the reasoning defined as follows: formulas A and A → B deduce B. The DNA implementation is illustrated at Fig. 5. Similarly, Modus Tollens is the deduction rule allowing to derive ¬A, given the formulas A → B and ¬B. Its implementation is illustrated at Fig. 6. We note that previous attempts to implement these two classical resolution steps in terms of DNA molecules can be found, e.g., in Rodríguez-Patón et al. (2010) and Sainz de Murieta and Rodríguez-Patón (2012). 4. The boolean satisfiability problem (SAT) The problem of satisfiability of boolean formulae in CNF was the first decision problem proved to be NP-complete (Cook, 1971). Given any boolean formula, we can rewrite it to CNF in a way that both of them are equisatisfiable, and then encode it using our

Fig. 5. A molecular implementation of Modus Ponens. The clausular form of implication ¬A ∨ B is used.


9

Fig. 6. A molecular implementation of Modus Tollens. The clausular form of implication ¬A ∨ B is used.

model. Our algorithm will apply the resolution to all clauses and propositions, looking for potential refutations to the initial formula. A resolution refutation of a non-satisfiable CNF formula ϕ is a sequence of clauses C1 . . . Cs , where each Ci is either a clause of ϕ or it is inferred from earlier clauses by the resolution rule, and Cs is the empty clause. If a refutation can be derived from an initial formula, then the formula is unsatisfiable. We can encode the initial formula following our DNA model, adding several copies of each molecule to allow all possible resolution reactions to be performed in parallel. 4.1. Identifying the empty clause The question now is, how our model will determine whether the empty clause has been derived or not. Let us start with some simple cases to understand the general rule: • Fig. 7 shows a simple case of unsatisfiable formula: ¬A ∧ (A ∨ B) ∧ ¬ B. The logical resolution sequence would be the following: ¬A, (A ∨ B), ¬B A, ¬A Ø As in Fig. 3, we can see a molecule containing no cover strands, labeled as Ø. It matches the expected logical output detailed in the previous lines and the formula is UNSAT. • Fig. 8 shows a case of satisfiable formula: (A ∨ B) ∧ (¬ B ∨ ¬ A). When trying to derive an empty clause, the logical resolution sequence would be the following: (A ∨ B), (¬B ∨ ¬A) (A ∨ ¬A) TRUE Although we can find a molecule containing no cover strands, it does not represent an empty clause in this case. It is so because the second reaction depicted in Fig. 8 does not correspond to a resolution operation since the annealed propositions belong to the same clause.

Fig. 7. Unsatisfiable formula: ¬A ∧ (A ∨ B) ∧ ¬ B. The first resolution between the clauses (A ∨ B) and ¬B yields the clause A. In the second reaction, A and ¬A are resolved into the empty clause Ø, meaning the formula is UNSAT.

The cases described at Figs. 7 and 8 are almost identical except for the nick between ¬A and ¬B at Fig. 7. If they occurred simultaneously, it would be impossible to distinguish them by electrophoresis. Therefore, we add to the 5 end of each non-cover strand a few extra nucleotides which will not bind to cover strands but instead it will form a single-stranded free end as depicted at Fig. 9. Its purpose is to increase the weight of the molecule during electrophoresis and to allow the differentiation of these two cases. Denote n the length of non-cover strands encoding propositions and k the length of the extra single-stranded end. The following cases may occur: (a) The empty clause produced by a refutation of x literals contains x double-stranded sequences, each of length n, hence 2xn nucleotides in total. Furthermore, it contains x + 1 singlestranded free ends (one in the initial clause and one more is added during each step), each of the length k < n, hence (x + 1)k nucleotides in total. The sum is LUNSAT = 2xn + (x + 1)k,

x≥1.

(1)

Simultaneously, any molecule composed of non-cover strands with x literals and x + 1 free ends had to be produced by a correct refutation process. Therefore, a formula is UNSAT iff such a molecule can be produced. (b) Any other molecule composed of x non-cover strands but not encoding the empty clause has the total size (i.e., the number of nucleotides) LSAT = 2xn + yk,

x≥1,

2 ≤ y ≤ x.

(2)

It is because such a molecule could only be produced by a reaction of two (or more) pairs of positive and negative literals in

10


In both cases mentioned above we obtained a molecule with the total size L3−CNF = 2xn + yk,

x≥1,

2 x ≤ y ≤ x. 3

(3)

Now we are interested in distinguishing the case (a) from (b) or (c). The following result holds true (we denote by gcd(x, y) the greatest common divisor of x and y). Theorem 1. Consider our DNA encoding of clauses with parameters n (the length of molecules encoding literals) and k (the length of free ends of non-cover strands) such that gcd(2n + k, k) = 1. Let there be a molecule encoding the empty clause obtained after x refutation steps. 1 If x ≤ 2n, then there is no other molecule of the same length without cover strands, but not encoding the empty clause. 2 In the case of 3-CNF the same holds for each x ≤ 6n + 2k + 2. Proof. Assume two molecules of the same size composed only of non-cover strands, but only the first one encoding the empty clause, i.e., LSAT = LUNSAT . By (1) and (2), there must be x, x ≥ 1 such that 2xn + (x + 1)k = 2x n + yk,

2 ≤ y ≤ x .

(4)

The following derivation can be done: x(2n + k) + k = x (2n + k) − x k + yk

(5)

(x − x)(2n + k) = (z + 1)k,

(6)

x − x = Fig. 8. Satisfiable formula: (A ∨ B) ∧ (¬ B ∨ ¬ A). The first resolution between the clauses (A ∨ B) and (¬ B ∨ ¬ A) yields the clause (A ∨ ¬ A) which is always True. Thus the second hybridization does not represent a resolution operation, since the propositions involved belong to the same clause, but the logic operation A ∨ ¬ A which is always true.

two different molecules as at Fig. 8. Therefore, the number of refutation steps and, hence, free ends is strictly smaller. (c) Assume the special variant 3-CNF when each clause consists of at most 3 literals. Two cases of molecules composed of noncover strands which do not encode the empty clause may occur: 1. Two molecules of the form (A ∨ B ∨ C) and (¬ A ∨ ¬ B ∨ ¬ C) interact (where A, B, C are literals). The resulting molecule will consist of two annealed non-cover strands of length 3n with two free ends. 2. More complex structures of non-cover strands are produced. Within them, each non-cover strand of length 3n is annealed to at least two other non-cover strands. A simple inductive argument shows that the number of the non-cover strands (and hence free ends) is at least (2/3)x, where x is the number of literals involved.

where z = x − y, 0 ≤ z ≤ x − 2

(z + 1)k 2n + k

(7)

Since x − x must be an integer and since gcd(2n + k, k) = 1, a minimum z which conforms with this condition is zmin = 2n + k − 1 which, by the inequality in (6), corresponds to xmin = 2n + k + 1.

(8)

Then, by (7), xmin = xmin − (zmin + 1)

k − k = 2n + 1. = xmin 2n + k

(9)

One might argue that xmin could be even smaller if we used a value z > zmin in (9). Assume z = zmin + m, for some m ≥ 1. By (6) and (8) we obtain x ≥xmin + m. By (7) we can write x≥xmin + m − (zmin + m + 1)

k k = xmin + m 1 − 2n + k 2n + k

and since k/(2n + k) < 1, we conclude that x > xmin , hence the value of xmin is truly the minimal one.

Fig. 9. An example of molecules with free ends allowing to distinguish satisfiable and non-satisfiable formulas.


Finally, in the case of 3-CNF the condition Then the inequality in (6) adopts the form 0 ≤ z ≤ (1/3)x .

2 x 3

≤ y ≤ x applies. (10)

An analogous derivation as above leads to xmin = 6n + 2k + 3.

(11)

By Theorem 1, molecules encoding all successful refutations up to a certain length are clearly distinguishable from other molecules producible by the possible reactions. Realistic parameters could be, e.g., n = 40 and k = 3. In this case we can reliably distinguish refutations of the length up to x = 80 steps for general clauses, and up to x = 248 steps for 3-CNF. The maximum size of the resulting molecules in the general case due to (1) will be LUNSAT = 2xn + (x + 1)k = 6400 + 81 × 3 =6643 nucleotides. We assume these values more than sufficient for possible practical nano-applications: at this size of refutations other problems would prevail, as the unreliability of reactions or the concentration bottleneck. 4.2. The DNA algorithm We outline, step by step, the application of our model to a resolution of a set of clauses, and how to determine whether the formula in CNF is satisfiable or not. Let n be the length of non-cover strands encoding propositions and k the length of free ends. Let the formula contain v ≤ 2n variables. Algorithm 2. 1 Encode the clauses as DNA molecules, according to Section 2 and Fig. 1. Attach a fluorophore reporter to all non-cover strands. 2 Mix all the clauses in the same dissolution and let all possible resolution reactions take place. 3 Filter the result using a (gel or capillary) electrophoresis. If all electrophoretic bands show fluorescence, it means no empty clause has been generated by any resolution reaction (as in Fig. 2), hence the formula is SAT. The process ends here. 4 Otherwise, if there are bands without fluorescence and at least one of them corresponds to the length of 2xn + (x + 1)k nucleotides for 1 ≤ x ≤ 2n, then the formula is UNSAT. A short analysis of the time and space complexity of the algorithm follows: • Step 1 (clause encoding) can be considered constant in time. • Step 2 (massive parallel resolutions) would depend on the minimal number of parallel resolution steps to derive the empty clause. • Step 3 (gel electrophoresis) is another constant time operation. Concerning the necessary number of the parallel resolution steps, it is equal to the minimal depth of a refutation tree or, more generally, DAG (Directed Acyclic Graph) for a given formula since any such DAG is a part of the resolution performed by our algorithm. Assuming the regular tree resolution which is complete (see, e.g., (Esteban and Torán, 2001) for definitions), the depth of the tree is at most the number of variables in the formula, see Theorem 3.1 in Esteban and Torán (2001) and its proof. Therefore, the time complexity of the algorithm is O(v) where v is the number of variables in the original formula. The spatial complexity of the algorithm depends generally on its Step 2, more precisely on the size of the refutation, i.e., the number of occurrences of clauses. It was proven already in Haken (1985)

11

that even the size of general refutation (the smallest refutation with no restrictions on the sequence of resolution steps) is exponential in the number of variables of the formula, establishing the lower bound for our space complexity. Since our algorithm cannot generate more than all possible clauses on v variables whose number is 3v (each variable is either affirmed or negated or absent), the upper bound is asymptotically the same. Hence, the space complexity of the algorithm measured as the number of different DNA molecules is 2(v) . This implies that the algorithm must also start with this amount of copies of molecules encoding original clauses, to be able to perform all possible resolution steps. The following result follows: Theorem 3. A propositional formula in CNF with v logical variables is unsatisfiable if and only if the Algorithm 1 produces a molecule encoding the empty clause during v reaction steps. The amount of DNA molecules used during the algorithm is 2(v) . 5. Conclusions and future works We introduced a new DNA model for representation of logical formulas in CNF which contains an implicit resolution mechanism that acts on any pair of clauses containing complementary propositions. The model is based on the strand displacement techniques and the resolution process is completely autonomous, except for the last detection step when a possibly human-assisted electrophoresis is performed. Our model is completely enzyme-free and its implementation can be based on experimentally verified and general design derived in Panyutin and Hsieh (1994) and Biswas et al. (1998). According to the terminology defined in Chiniforooshan et al. (2011), the model can be characterized as partly scalable, time-responsive and energy-efficient. The first presented application of our model is the autonomous resolution including the hypothetical syllogisms as its special case, and, by cascading the resolution steps, the refutation proving technique for sentences in propositional logic. In this way, the model could serve as a part of a programmable nano-automaton sensitive to various inputs and running a simple logic program. The main drawback we see in this sense is the detection step based on the electrophoresis, when the output signal is analogue and its level can be low (it depends on the presence of molecules of certain lengths which might be possibly present in low concentrations). To demonstrate the computational potential of the model, we have shown that it can theoretically resolve any instance of SAT by applying parallel refutation technique and eventually deriving the empty clause. If no molecule is a candidate representing the empty clause, we can automatically answer that the problem is SAT. An interesting feature is that, if there are several refutations that derive the empty clause, all of them are “registered” in the output. By examining this we can also find the shortest refutation path. The algorithm works in the time proportional to the number v of logical variables in the original formula. In contrast, the number of distinct DNA molecules used during the algorithm and defining its space complexity grows exponentially with the number of variables in the formula. The application to SAT is based on the assumption that all the original and resolvent clauses are available during the whole reaction process. In practice, however, some of them could be consumed during previous reactions. This obstacle can be overcome by using a number of copies of each input clause-encoding molecules which is anyway necessary, as the spatial analysis provided in the previous section confirms. A deeper analysis could provide insight into the probability of finding a solution with relation to the input redundancy. There is a room for improvement of the presented model. It would be highly desirable to achieve spatial conformations that

12


significantly differ for the cases of the empty clause and double implication, which would eliminate the need of electrophoresis. This would allow to connect the algorithm directly to further biochemical steps based on the result of the resolution process. We also need to include some kind of signal restoration at the output, to be able to react to the presence of output molecules in low concentrations. These two goals are mutually interconnected and they aim at making the output digital and autonomously connected to further biochemical processes. Acknowledgements The research was partially supported by the project BACTOCOM funded by grant from the European Commission under the VII Framework Programme (FET Proactive area), by the Ministerio de Ciencia e Innovación, Spain, under project TIN2012-36992, by the European Regional Development Fund in the IT4Innovations Centre of Excellence project (CZ.1.05/1.1.00/02.0070), and by the Silesian University in Opava under project SGS/7/2011. References Basu, S., Gerchman, Y., Collins, C.H., Arnold, F.H., Weiss, R., 2005. A synthetic multicellular system for programmed pattern formation. Nature 434 (7037), 1130–1134. Benenson, Y., Paz-Elizur, T., Adar, R., Keinan, E., Livneh, Z., Shapiro, E., 2001. Programmable and autonomous computing machine made of biomolecules. Nature 414 (6862), 430–434. Benenson, Y., Gil, B., Ben-Dor, U., Adar, R., Shapiro, E., 2004. An autonomous molecular computer for logical control of gene expression. Nature 429, 423–429. Biswas, I., Yamamoto, A., Hsieh, P., 1998. Branch migration through DNA sequence heterology. Journal of Molecular Biology 279 (4), 795–806. Brun, Y., 2012. Efficient 3-sat algorithms in the tile assembly model. Natural Computing 11 (2), 209–229. Cardelli, L., 2009. Strand algebras for DNA computing. In: Deaton, R., Suyama, A. (Eds.), DNA Computing and Molecular Programming, vol. 5877. Springer, Berlin, Heidelberg, pp. 12–24 (Chapter 2). Chiniforooshan, E., Doty, D., Kari, L., Seki, S., 2011. Scalable, time-responsive, digital, energy-efficient molecular circuits using DNA strand displacement. In: Sakakibara, Y., Mi, Y. (Eds.), DNA Computing 16, vol. 651 of Lecture Notes in Computer Science. Springer, Berlin, Heidelberg>, pp. 25–36. Cook, S.A., 1971. The complexity of theorem-proving procedures. In: STOC’71: Proceedings of the Third Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 151–158. Davis, M., Putnam, H., 1960. A computing procedure for quantification theory. Journal of the ACM 7 (3), 201–215. Deans, T.L., Cantor, C.R., Collins, J.J., 2007. A tunable genetic switch based on RNAi and repressor proteins for regulating gene expression in mammalian cells. Cell 130 (2), 363–372. Esteban, J.L., Torán, J., 2001. Space bounds for resolution. Information and Computation 171 (1), 84–97. Franco, E., Friedrichs, E., Kim, J., Jungmann, R., Murray, R., Winfree, E., Simmel, F.C., 2011. Timing molecular motion and production with a synthetic transcriptional clock. Proceedings of the National Academy of Sciences 108 (40), 784–793.

Frezza, B.M., Cockroft, S.L., Ghadiri, M.R., 2007. Modular multi-level circuits from immobilized DNA-based logic gates. Journal of the American Chemical Society 129 (48), 14875–14879. Haken, A., 1985. The intractability of resolution. Theoretical Computer Science 39, 297–308. L. Kari (Section Ed.), 2012. Section IV: Molecular Computation, in: G. Rozenberg, T. Back, J. Kok (Eds.), Handbook of Natural Computing, Springer, Berlin, Heidelberg, pp. 1071–1377. Kobayashi, S., 1999. Horn clause computation with DNA molecules. Journal of Combinatorial Optimization 3, 277–299. Lipton, R.J., 1995. DNA solution of hard computational problems. Science 268 (5210), 542–545. Liu, Y., West, S.C., 2004. Happy hollidays: 40th anniversary of the holliday junction. Nature Reviews Molecular Cell Biology 5 (11), 937–944. Liu, Q., Wang, L., Frutos, A.G., Condon, A.E., Corn, R.M., Smith, L.M., 2000. DNA computing on surfaces. Nature 403 (6766), 175–179. Manca, V., Zandron, C., 2002. A clause string DNA algorithm for SAT. In: Jonoska, N., Seeman, N.C. (Eds.), DNA Computing, vol. 2340 of Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp. 172–181, Ch. 16. Montagne, K., Plasson, R., Sakai, Y., Fujii, T., Rondelez, Y., 2011. Programming an in vitro DNA oscillator using a molecular networking strategy. Molecular Systems Biology 7, 466. Murphy, N., Woods, D., Naughton, T.J., 2006. Bio-computation using holliday junctions. In: 4th International Conference on Information and 4th Irish Conference on the Mathematical Foundations of Computer Science and Information Technology (MFCSIT), pp. 317–320. Ogihara, M., Ray, A., 1996. Breadth First Search 3SAT Algorithms for DNA Computers. University of Rochester, Rochester, NY. Panyutin, I.G., Hsieh, P., 1994. The kinetics of spontaneous DNA branch migration. Proceedings of the National Academy of Sciences 91 (6), 2021–2025. Ran, T., Kaplan, S., Shapiro, E., 2009. Molecular implementation of simple logic programs. Nature Nanotechnology 4 (10), 642–648. Rodríguez-Patón, A., Larrea, J.M., Sainz de Murieta, I., 2010. Inference with DNA molecules. In: Calude, C.S., Hagiya, M., Morita, K., Rozenberg, G., Timmis, J. (Eds.), Unconventional Computation, vol. 6079. Springer, Berlin, Heidelberg, p. 192 (Chapter 25). Sainz de Murieta, I., Rodríguez-Patón, A., 2012. DNA biosensors that reason. Biosystems 109 (2), 91–104. Sakamoto, K., Gouzu, H., Komiya, K., Kiga, D., Yokoyama, S., Yokomori, T., Hagiya, M., 2000. Molecular computation by DNA hairpin formation. Science 288 (5469), 1223–1226. Seelig, G., Soloveichik, D., Zhang, D.Y., Winfree, E., 2006. Enzyme-free nucleic acid logic circuits. Science 314 (5805), 1585–1588. Soloveichik, D., Seelig, G., Winfree, E., 2010. DNA as a universal substrate for chemical kinetics. Proceedings of the National Academy of Sciences 107 (12), 5393–5398. Stahl, F.W., 1994. The holliday junction on its thirtieth anniversary. Genetics 138 (2), 241–246. Takahashi, K., Yaegashi, S., Kameda, A., Hagiya, M., 2006. Chain reaction systems based on loop dissociation of DNA. In: Carbone, A., Pierce, N.A. (Eds.), DNA Computing, vol. 3892. Springer, Berlin, Heidelberg, pp. 347–358 (Chapter 27). Uejima, H., Hagiya, M., Kobayashi, S., 2002. Horn clause computation by selfassembly of DNA molecules. In: Jonoska, N., Seeman, N.C. (Eds.), DNA Computing 7, vol. 2340 of Lecture Notes in Computer Science. Springer, pp. 308–320. Wang, X., Bao, Z., Hu, J., Wang, S., Zhan, A., 2008. Solving the SAT problem using a DNA computing algorithm based on ligase chain reaction. Biosystems 91 (1), 117–125. Wasiewicz, P., Janczak, T., Mulawka, J.J., Plucienniczak, A., 2000. The inference based on molecular computing. Cybernetics and Systems: An International Journal 31 (3), 283–315.

DNA logic gate based on metallo-toehold strand displacement.

Molecular aptamer beacon tuned DNA strand displacement to transform small molecules into DNA logic outputs.

Four-way junction-driven DNA strand displacement and its application in building majority logic circuit.

Robustness of localized DNA strand displacement cascades.

Approaching mathematical model of the immune network based DNA Strand Displacement system.

A novel bio-sensor based on DNA strand displacement.

DNA strand displacement reaction for programmable release of biomolecules.

Programmable energy landscapes for kinetic control of DNA strand displacement.

Strand displacement amplification--an isothermal, in vitro DNA amplification technique.

Rational design of pH-controlled DNA strand displacement.

Strand displacement synthesis by yeast DNA polymerase ε.

The Effect of Basepair Mismatch on DNA Strand Displacement.

Control of gold nanoparticles based on circular DNA strand displacement.

A DNA logic gate based on strand displacement reaction and rolling circle amplification, responding to multiple low-abundance DNA fragment input signals, and its application in detecting miRNAs.

Catalytic molecular logic devices by DNAzyme displacement.

Protected DNA strand displacement for enhanced single nucleotide discrimination in double-stranded DNA.

A strong strand displacement activity of thermostable DNA polymerase markedly improves the results of DNA amplification.

Logic gating by macrocycle displacement using a double-stranded DNA [3]rotaxane shuttle.

Modelling toehold-mediated RNA strand displacement.

Multiple types of logic gates based on a single G-quadruplex DNA strand.

Signal-off Electrochemiluminescence Biosensor Based on Phi29 DNA Polymerase Mediated Strand Displacement Amplification for MicroRNA Detection.

A Novel Computational Method to Reduce Leaky Reaction in DNA Strand Displacement.

Energetically biased DNA motor containing a thermodynamically stable partial strand displacement state.

Regulation of yeast DNA polymerase δ-mediated strand displacement synthesis by 5'-flaps.