HHS Public Access Author manuscript Author Manuscript
Inf Comput. Author manuscript; available in PMC 2016 June 01. Published in final edited form as: Inf Comput. 2015 June ; 242: 340–353. doi:10.1016/j.ic.2015.04.001.
Existence of constants in regular splicing languages Paola Bonizzonia and Nataša Jonoskab Paola Bonizzoni:
[email protected]; Nataša Jonoska:
[email protected] aDipartimento
di Informatica Sistemistica e Comunicazione, Univ. degli Studi di Milano-Bicocca, Viale Sarca 336, 20126 Milano, Italy bDepartment
of Mathematics and Statistics, University of South Florida, Tampa, FL, USA
Author Manuscript
Abstract In spite of wide investigations of finite splicing systems in formal language theory, basic questions, such as their characterization, remain unsolved. It has been conjectured that a necessary condition for a regular language L to be a splicing language is that L must have a constant in the Schutzenberger sense. We prove this longstanding conjecture to be true. The result is based on properties of strongly connected components of the minimal deterministic finite state automaton for a regular splicing language. Using constants of the corresponding languages, we also provide properties of transitive automata and pathautomata.
1. Introduction Author Manuscript
A splicing system, originally introduced in [12], is a formal model that uses contextual cross-over operation over words to generate languages called splicing languages. This crossover splicing formalizes the behavior of basic biomolecular processes involving cut and paste of DNA performed by restriction enzymes and a ligase. Restriction enzymes act on double stranded DNA molecules by cleaving certain recognized segments leaving short single stranded overhangs. Molecules with same overhangs can join (in a cross-over fashion) in presence of a ligase enzyme. In the introductory paper, T. Head proved that if the splicing is performed by a finite set of certain simple rules, then splicing of finite set of words can generate the class of strictly locally testable languages [9]. The splicing notion was reformulated by G. Paun at a less restrictive level of generality, giving rise to the splicing operation that is commonly adopted and appears nowadays as a standard [17].
Author Manuscript
Theoretical results in splicing systems have contributed to new research in formal language theory focused on modeling of biochemical processes [18]. On the other side, the field suggested new ideas in the framework of biomolecular science, for example, the design of automated enzymatic processes. In this paper, we focus on finite splicing systems, called here simply as splicing systems. A splicing system is meant to have a finite set of rules (modeling enzymes) applied on a finite set of initial strings (modeling DNA sequences). A splicing system (or H-system) is a triple
Correspondence to: Paola Bonizzoni,
[email protected].
Bonizzoni and Jonoska
Page 2
Author Manuscript
H = (A, I, R), where A is a finite alphabet, I ⊆ A* is the initial language and R is the set of rules, (see Section 4 for the definitions). The formal language generated by the splicing system is the smallest language containing I and closed under the splicing operation. There have been successes in characterizing certain subclasses of splicing languages, for example those generated by reflexive rules and those generated by symmetric rules [2]. Reflexivity and symmetry are natural properties for splicing systems because they assure splicing of molecules cut with the same enzyme, as well as recombining molecules resulting of the same type of cut [12]. The formal language of a general splicing system may have a set of rules R that is not necessarily symmetric, nor reflexive. Under the formal model, a splicing system is a generative mechanism for a language which belongs to a class that is a proper subclass of the regular languages. This basic result has been firstly proved in [8], and later proved in several other papers by using different approaches (see for example [19,21]).
Author Manuscript
In spite of the vast literature on the topic, a structural characterization of the finite splicing systems is still an open problem, although decidability of regular splicing languages has been recently proved in [15]. On the other hand, progress has been made towards the characterization of certain subclasses of splicing systems. Authors in [11] prove that it is decidable whether a regular language is a reflexive splicing language and provide an example of a regular splicing language that is neither reflexive nor symmetric, A quite different characterization of reflexive symmetric splicing languages is given in [3] and it has been extended to the general class of reflexive regular languages in [4,5]. This characterization has been given by using the concept of a constant of a language introduced by Schutzenberger [20].
Author Manuscript
In order to solve the open problem of characterizing he whole class of splicing languages, it seems necessary to understand the role of constants. Indeed, since the introduction of splicing languages it has been conjectured, and more formally in [10], and in [11], that existence of a constant is a necessary condition for a regular language to be splicing. In this paper we solve this longstanding open question by proving this conjecture true. This result is proved by investigating structural properties of connected components of the transition graph given by the minimal finite state automaton for a regular splicing language. More precisely, properties of the factor language of transitive components are related to the notion of synchronizing words [7]. Synchronizing words have been studied in automata theory for a long time and are of interest in both coding theory [1] and symbolic dynamics [16,14]. Our proof uses an old observation that a synchronizing word for an automaton is a constant for the language recognized by the automaton [20].
Author Manuscript
The paper is organized as follows. In Section 2 we introduce preliminary concepts, including the notion of a synchronizing word and a constant. In Section 3 we introduce the notion of a transitive automaton and a path-automaton, as well as show several results connecting terminal components automata and synchronizing words. Moreover, we show a relationship between transitive languages, transitive automata, transitive components, and constants of the language. Then in Section 4 we recall the basic notion of a splicing system and revisit the notion of splicing rules of a Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 3
Author Manuscript
splicing system by providing properties that are necessary in proving the main result of the paper. Finally in Section 5 we give examples of non reflexive splicing languages, show a relationship between transitive languages and splicing languages and we prove the main result of the paper. A preliminary extended abstract of this paper appeared in [6]
2. Preliminaries
Author Manuscript Author Manuscript
We refer the reader to [13] for the background of automata theory, and assume some familiarity of the subject. Let A* be the free monoid over a finite alphabet A and let A+ = A* \ 1, where 1 is the empty word. A deterministic finite state automaton (DFA) is a 5-tuple = (Q, A, I, T, ), where Q is a finite set of states, I ⊆ Q is the set of initial states, T ⊆ Q is the set of terminal (final) states and ⊆ Q × A × Q, is the set of transitions such that for every q ∈ Q and every a ∈ A the set {q′ | (q, a, q′) ∈ , q ∈ Q, a ∈ A} consists of at most one element. Given a deterministic finite state automaton , the set of transitions defines a partial action of A* on Q. It is generated with a : Q → Q for a ∈ A defined with q(a) = q′ iff q′ ∈ Q is the unique state with (q, a, q′) ∈ . We use the standard notation qa to denote q′. If such q′ does not exist, we write qa = ∅. Inductively, we extend the notation on words with qwa = (qw)a. Similarly, we write Q w for the image of the set Q under the map w : Q → Q defined with w(q) = qw. If qa is defined for all q ∈ Q and a ∈ A we say that is complete. A deterministic finite state automaton is usually depicted as a directed graph with vertices Q and a set of directed edges . For an edge e = (q, a, q′) we say that q is its “start” state, q′ is its “end” state (also refer to as an end-point) and a is its label. A word w is accepted by an automaton if there is a path with label w that starts at an initial state and ends at a terminal state. We denote with L( ) the language recognized by , that is, the set of all words accepted by [13]. Given a regular language L ⊆ A* it is well-known that there is a unique minimal complete deterministic finite state automaton (mDFA) = (Q, A, {q0}, T, ) that recognizes L such that all other complete DFA with one initial state that recognize L map homomorphically onto [13]. This automaton is unique up to possible renaming of the states, i.e., up to an isomorphism. We reserve the notation (L) to denote this automaton. Given a language L, the language F(L) is the set of all factors of words in L, where x is factor of a word w if w = zxy for z, y ∈ A*. We say L is factor-closed if F(L) = L. The right context of a word w ∈ A* with respect to a language L is defined with ∈ A* | wx ∈ L}. Symmetrically, the left context of w with respect of L is the set ∈ A* | xw ∈ L}.
(w) = {x (w) = {x
Author Manuscript
The right context of a state in is (q) = {x ∈ A* | qx ∈ T}. An automaton is said to be reduced if there are no two states in with the same right context. Observe that the right context depends only on the terminal states in the automaton. In other words, if the initial state(s) are changed in but the transitions and the set of terminal states remain, the right contexts of the states don’t change. It is well-known (see for ex. [13]) that given a regular language L, there is a one-to-one correspondence between the right contexts of words with respect to L and the right contexts of the states in the minimal deterministic finite state automaton for L, i.e.,
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 4
Author Manuscript
In fact, in the mDFA , it also holds (w) = therefore (q) = (q′) implies q = q′.
(q) iff
(wa) =
(qa) for all a ∈ A, and
When the language and the DFA are fixed, we drop the subscripts and write
(w) and
(q).
Author Manuscript
Note that every state in an mDFA is accessible, i.e., for each state q ∈ Q there is an x ∈ A* such that q0x = q. A state q is co-accessible, if (q) ≠ ∅. In an mDFA, there is at most one state that is not co-accessible, since for each q ∈ Q, there is u ∈ A* such that qu ∈ T iff (q) ≠ ∅. If such a state in exists, we call it zero and denote it with z. A trimmed mDFA for language L is the DFA obtained from the mDFA for L by erasing the state z and all transitions that terminate in z. The trimmed mDFA is denoted trim . More generally, a trimmed DFA and co-accessible.
is an automaton in which all states are both accessible
Finally, for a finite set S, by #S, we denote the cardinality of the set S. Definition 1—Given a DFA and a state q of the automaton, the set of follower words for q relative to is the set (q) = {x | qx ≠ ∅}. For states q and q′ of , we say that they are follower-equivalent if (q) = (q′). For a state q the set of states in that are follower equivalent to q is denoted μq( ).
Author Manuscript
For a state q of we say that it is minimal-follower with respect to if whenever (q) for a state q′ of , it implies that q and q′ are follower-equivalent.
(q′) ⊆
Recall the definition of a constant of a language L introduced by Schutzenberger in [20]. Definition 2—A word w ∈ A+ is a constant of a language L if w is a factor of some word in L and for all words u1, u2, v1, v2 in A* we have:
A characterization of constants, which is more or less folklore, is stated below.
Author Manuscript
Proposition 1—Let L ⊆ A* be a regular language and let be the mDFA recognizing L. A word w ∈ A+ is a constant of L if and only if Q w \ {z} is a singleton, i.e., there is a unique non-zero state qw such that qw ≠ z implies qw = qw for all q ∈ Q. Suppose w is a label of a path in a finite state automaton. If for a word w there is a state qw such that every path in the automaton with label w terminates in qw, we say that w is a synchronizing word and we say that qw is a synchronizing state, synchronized by w. By Proposition 1, in a trimmed mDFA, trim , of a regular language L, the set of synchronizing
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 5
Author Manuscript
words for trim coincides with the set of constants of L. In general, if w is a synchronizing word for an automaton then it is a constant for the language recognized by . The context of w with respect to L is the set CL(w) = {(u, v) | u, v ∈ A*, uwv ∈ L}. We define the left projection of the context of w (resp. right projection) as the set (respectively ). A constant w of L defines a constant language Const(w) with respect to the language L with the set . Given two constants w1 and w2 of L, a split language for w1 and
w2 with respect to L is a language (possibly empty) of w1 and
where
is a prefix
is a suffix (possibly empty) of w2.
3. Transitive components and synchronizing words Author Manuscript
In this section we provide structural characterizations of transitive components in a minimal DFA using the notion of synchronizing words. We define the notions of a transitive automaton and of a path-automaton, and give properties that are used to prove the main result of the paper. We first introduce definitions and properties that are used in the rest of the paper.
Author Manuscript
Recall the notion of a transitive component in a deterministic automaton. A strongly connected component of the directed graph for a deterministic automaton is called a transitive component for . If in a transitive component, every edge that starts at a state in this component also ends at the same component, then the transitive component is called terminal. For every state in the mDFA of a language L, there is a path that leads from that state to a terminal component. For a transitive component , we say that is induced by q if q is a state in . We write L( ) for the set of labels of all paths in and say that recognizes L( ). A transitive component is called trivial if L( ) = {1}. A language L is said to be transitive if for every pair of words u, v ∈ L there is a word w ∈ A* such that uwv ∈ L. Note that for a transitive component the language L = L( ) is transitive. Remark 1—Notice that if F(L( )) = L( ).
is a transitive component, then L( ) is factor-closed, i.e.,
Author Manuscript
Two transitive components and are called factor-equivalent if L( ) = L( ). In the following we often use the term component to denote a transitive component. A component is said to be maximal for a collection of components C if for every transitive component in C, we have that L( ) ⊆ L( ) implies L( ) = L( ). Analogously, a transitive component is called minimal for a collection C if whenever L( ) ⊆ L( ) we have L( ) = L( ).
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 6
3.1. Transitive automata
Author Manuscript
In this section we relate the notion of a synchronizing word to properties of a transitive automaton. An automaton is called transitive if it consists of only one transitive component. Remark 2—Note that if
is transitive, then L( ) is also transitive. Consider two words u,
v ∈ L( ). There are initial states q0 and
such that q0u,
there is a word w that is a label of a path from q0u to
. Since
is transitive,
in , so uwv ∈ L( ).
Example 3.1—Consider the example shown in Fig. 1. This language is transitive, the automaton is reduced and deterministic, hence it is the mDFA for the language. However, there is no deterministic transitive automaton that recognizes this language. Notice that this language has no constants.
Author Manuscript
Remark 3—If L is transitive such that L = L( ) for a transitive component , then for each state q in , (q) = (q) since all states in are terminal. We consider several observations about transitive automata, transitive components and languages. The following observations are proved in [14] (see also [16]): Lemma 2—For every regular factor-closed transitive language L there is a unique minimal deterministic transitive automaton recognizing L. Lemma 3—For a regular factor-closed transitive language L and its unique minimal deterministic transitive automaton the following properties hold: i.
Every state in is synchronizing.
Author Manuscript
ii. A word w ∈ L is a constant for L if and only if w is synchronizing for . iii. Every two states q̂ and p̂ in (q̂ ≠ p̂) are not follower-equivalent.
with L( ) = L there is an onto homomorphism ϕ : such that for every state q̂ in , (q) = (q̂) for each q ∈ ϕ−1(q̂).
iv. For every transitive DFA
Observe that if a state q of a transitive component synchronizing.
is synchronizing, then all states in
→
are
Consider the action of A* on the set of states of . In order to simplify the notation, the action of w on the set is denoted as w instead of w and moreover we say that q is a state of if q ∈ .
Author Manuscript
Remark 4—If c is a constant of L( ) for a transitive automaton , and is the minimal transitive deterministic automaton for L( ) such that c synchronizes onto q̂, then by Remark 3 and Lemma 3(ii–iv) every state q in c maps with ϕ onto q̂, and has the same follower set as q̂. We say that q is follower-equivalent to q̂. In particular, if c is a constant such that qc = q in , then q̂c = q̂ in , and for every q ∈ c, the state qc is in c and is follower-equivalent to q̂, and to q.
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 7
Author Manuscript
Remark 5—If q is a state in a transitive automaton and is the minimal transitive deterministic automaton as in Lemma 3 with q̂ ∈ follower-equivalent to q, then for all q′ ∈ μq( ) there are constants c, c′ ∈ L( ) such that q̂c = q̂c′ = q̂, q′c = q and qc′ = q′. Take any constant c1 ∈ L( ) such that q̂c1 = q̂. Then by transitivity there are x, y such that qc1x = q′ and q′c1y = q. By Remark 4, c = c1x, and c′ = c1y are the constants sought. If is a transitive deterministic automaton without a synchronizing word, then for every word w, # w ≥ k for some k ≥ 2. We call the minimal such k the degree of . A word w such that # w = k is called k-synchronizing. Therefore, the minimal transitive DFA for L( ) has degree 1, and all constants coincide with (1-)synchronizing words. It follows from Lemmas 2 and 3 that if has degree k and w is a constant that is k-synchronizing, then all states in w are follower-equivalent. Moreover, in that case for all x ∈ A* with wx ≠ ∅, wx is also k-synchronizing.
Author Manuscript
The following lemma relates the right contexts of states in a transitive automaton reached by reading a word w that is k-synchronizing. Lemma 4—Let = (Q, A, Q, Q, ) be a transitive DFA with degree k ≥ 2 and let = (Q, A, I, T, ) be a reduced DFA obtained from by choosing a subset of states I as initial states, and some proper subset T of Q as terminal states. If w is k-synchronizing and q, q′ ∈ w, then (q) \ (q′) ≠ ∅. Proof: Let k ≥2 be the degree of , w a k-synchronizing word, and q1,
k-synchronizing, for all words z ∈ A*, either both q1z, rest of the proof we drop the subscript
from
are undefined or
. Suppose
. Since
Author Manuscript
because otherwise, if
is transitive, there is x2 such that
Again, similarly as with q2 and and
. Denote
.
with q3, and set
.
which implies that both
are in T. In fact,
. We continue in this way where
and consider the pairs of states
. Since
is
for some i < j. But . Because
Author Manuscript
contradiction with the assumption that
.
. We have that
, we have that
finite, there are i and j such that
is
then
which is a contradiction with our assumption that Since
. In the
and q1x1 ∉ T. Set q2 = q1x1 and
reduced, there is a word x1 such that Then
. Since w is
is reduced. Therefore,
, this is a .
Example 3.2—Consider the reduced automata in Fig. 2(a). It contains two terminal components that are mutually factor-equivalent recognizing a*. Moreover, the states q1, q2 and q3 in Fig. 2(a) are follower-equivalent, and the component that contains these three states has degree 3. Every word ak for k ≥ 0 is 3-synchronizing. Consider the automaton that consists of {q1, q2, q3} with q1 being an initial and also the terminal state. Then (q1) = (a3)*, (q2) = aa(a3)* and (q3) = a(a3)*.
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 8
Author Manuscript
Example 3.3—The transitive component of the automaton in Fig. 2(b) has degree 2. All words that end with symbol a are labels of paths that end in states q2 and q3, and all words that end with symbol b are labels of paths that end in states q1 and q2. The action of c on the states of is the identity. Hence every word that contains symbols a or b is 2-synchronizing. Note that all states are follower-equivalent but is reduced. Moreover, a ∈ (q1) \ (q2) and aa ∈ (q2) \ (q1), also aa ∈ (q2) \ (q3) and a ∈ (q3) \ (q2). However, c ∈ (q1) \ (q3), but (q3) ⊂ (q1). This last condition does not violate Lemma 4 because there are no 2-synchronizing words that label paths ending at states q1 and q3. 3.2. Path-automata
Author Manuscript
In this section we provide structural characterizations of path-automata that do not have synchronizing words. More precisely, we show that a path-automaton having no synchronizing words has a unique maximal component, which is the terminal one, whose language contains all factors of the language accepted by the path-automaton. Definition 3 (Path-automaton)—An automaton path-automaton if the following is satisfied: i.
with an initial state q0 is called a
There is at most one transition in which starts at the component induced by q0 and terminates in another component.
ii. There is only one terminal transitive component in . iii. For every transitive component which does not contain q0 there is precisely one transition that starts in a state outside but terminates in , and if is not terminal, there is precisely one transition that starts at a state in but terminates in a state outside .
Author Manuscript
Let be a path automaton and one of its transitive components. The state of that is the end point of the transition starting outside but ending at is called the entrance state for and the state that is the start point of a transition that starts in but terminates outside is called the exit of . The initial component of has no entrance, and the terminal component has no exit. A path π from an initial state in an automaton to a terminal component in induces a path-automaton which consists of all transitive components in induced by states visited by π.
Author Manuscript
Let be a terminal component of the path-automaton , and let q be the entrance of . We define the language accepted by the component induced by the path π, denoted by Lπ( ), as the language accepted by the automaton with initial state q. Lemma 5—Every trimmed deterministic path-automaton with two transitive components, and whose terminal component is trivial, has a synchronizing word. Proof: Let q0 be an initial state for a path-automaton with two transitive components having a trivial terminal component. Let x be the label of the edge starting from the initial component (say state s) and ending at the terminal component (say state t). Let be the
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 9
Author Manuscript
initial component for , and let k be the degree of . Let w be k-synchronizing for . Because is transitive, we can extend w such that there is a path with label w that ends at s, i.e., # w = k and s ∈ w. Since k is the degree of , for all z ∈ A*, either wz ∉ L( ) or wz ⊆ with # wz = k. As is deterministic, there is only one edge starting at s with label x and this edge leads outside . Therefore wx ∉ L( ), but wx ∈ F(L( )). Hence wx is synchronizing that synchronizes onto t. We first give a technical lemma that is used later. Lemma 6—Given a deterministic path automaton , let be the terminal component of . If has no synchronizing word, then is a unique maximal transitive component in . Proof: Assume
is an automaton with transition function δ that has no synchronizing
Author Manuscript
words. Let , …, the component
=
be the transitive components of . Let
(i = 2, …, k − 1) such that
state of . For i = 1 we only have
and
is the entrance state of
and for i = k we only have
be the states in and
is the exit
. We set
for the
for a fixed terminal state q′. For i = 1, …, k − 1, let xi be the label
initial state q0, and
of the transition from state to state , i.e., . Consider L( ), …, L( ). Because these languages are all transitive, there is a maximal transitive among them. Assume L1, …, Ls are all distinct maximal transitive languages such that for each j = 1, …, s, there is a transitive component with Lj = L( ). Then for each i = 1, …, s, there are words wi ∈ Li such that wi ∉ Lj if i ≠ j (as Li is maximal, for each j ≠ i there is wij ∈ Li \ Lj, and due to the transitivity of Li, there are zij such that wi = wi1zi1wi2 ··· zis−1wis−1). Note that for each language Li there might be several transitive components that recognize it.
Author Manuscript
We consider words yi (i = 1, …, k) such that yi is a label of a path in from to in the following way. (i) If L( ) = Lj is a maximal transitive language, then wj is a factor of yi, and
yi is a constant for L( ) which uniquely determines the follower-equivalence class of
,
meaning, . This is always possible by Lemma 3 and the transitivity of L( ). (ii) If L( ) is not a maximal transitive language, then yi is a label of the shortest path between
and
.
Author Manuscript
Consider the word y1x1 ··· yk−1xk−1yk. Let p be the smallest index of 1, …, k such that L( ) is maximal transitive and r be the largest index such that L( ) is maximal transitive. Then u = ypxpyp+1 ··· xr−1yr is a word that starts at a maximal transitive component, visits all maximal transitive components, and terminates at the last maximal transitive component. Since has no synchronizing words, there must be at least one more path in with label u. But, by the choice of yi and Lemma 5, every path with label ypxp must start in a transitive component recognizing L( ) and must have a transition with label xp leading outside the component, because ypxp ∉ L( ). Let i1, i2, …, iν be all indexes between p and r such that i1 = p, iν = r and L( ) is a maximal transitive language. By the choice of yi’s, yi1 = yp, yi2, …, yiν = yr uniquely determine the languages of the transitive components Ci1, …, Ciν, that is yij ∈ L(Cij) but yij ∉ L(Cit) if j ≠ t. Therefore, there is a one-to-one correspondence between the order of appearance of yp, yp+1, …, yr in u and the order of the maximal
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 10
Author Manuscript
transitive components. Hence, the only possibility for existence of another path with label u is if such a path also starts at . Although there might be many paths with label yp in , by Lemma 3 they all end at follower-equivalent states, and due to determinism, there is at most one of those states that is the start of a transition with label xp, and that is
(by Lemma 5,
ypxp is synchronizing for ). Hence, u (or uxr) is a synchronizing word unless p = r = k, i.e., xp does not exist. As we assumed that there are no synchronizing words for , there is at most one maximal transitive language and it must be recognized by the terminal transitive component. The following lemma characterizes a path-automaton with no synchronizing words. Proposition 7—Given a deterministic path-automaton , let of . Then one of the following holds:
be the terminal component
Author Manuscript
has a synchronizing word, or,
a.
F(L( )) = L( ).
b.
Proof: We prove the proposition by induction on the number of transitive components in the path automaton. If consists of a single component, then the lemma holds trivially as = . Now assume that lemma holds for all path automata with less then k transitive components, and suppose that has k transitive components , …, with being initial and
=
terminal. Denote with
the entrance of
and
the exit of . Consider the
Author Manuscript
and transitive components , …, . As this path path automaton with initial state automaton has k − 1 components, by the inductive hypothesis, either has a synchronizing word, or F(L( )) = L( ). Note that L( ) ⊆ F(L( )) holds trivially, so we only consider the converse inequality. Case 1: The path automaton for the automaton of
has a synchronizing word. Let y be the synchronizing word
which consists of
with trivial terminal component consisting
. By Lemma 5, y exists, and we can assume that y synchronizes onto
synchronizing for assume that
, and since for every state q in
there is a path from
. Let w be to q, we can
. We observe that yw is synchronizing for . There is no path in
with
label y, since y is synchronizing for , hence every path in that has a label y terminates in a state in . Since w is synchronizing for , every path in with label yw terminates in a single state. Thus yw is synchronizing for and part (a) is satisfied.
Author Manuscript
Case 2: The path automaton has no synchronizing word. Then by the inductive hypothesis, F(L( )) ⊆ L( ). Assume that has no synchronizing word. We show that all words in F(L( )) appear as labels of paths in = . As in Case 1, consider which consists of with trivial terminal component consisting of path in . If there is a path in with label w then, w ∈ L( ).
Inf Comput. Author manuscript; available in PMC 2016 June 01.
. Let w be a label of a
Bonizzoni and Jonoska
Page 11
Assume now that all paths with label w start in . If all paths with label w also end at
Author Manuscript
then, by Lemma 5, w is a factor of a word y that synchronizes onto synchronizing for , and lemma holds.
of , and hence y is
Suppose there is a path in with label w that starts at and terminates in . We observe that in this case also, has a synchronizing word. Let u be the shortest word such that w =
uxv where x is a symbol,
. Let c ∈ L( ) be a constant
and
for L( ) that fixes the follower-equivalence class of , meaning, . Such c exists by Lemma 2 and Lemma 3. By transitivity of , there is a word c′ such that cc ′u also fixes the follower equivalence class for
and is a label of a path that terminates at
Author Manuscript
. Consider cc′w = cc′uxv. Then cc′ux is synchronizing for in , by Lemma 5. But cc ′w is not synchronizing for , hence there must be another path in with label cc′w, and by our assumption, it starts in and must terminate in . Such a path must use the transition , either with a portion of the path labeled cc′ or with a portion labeled w. In the first case w is a label of a path in , hence w ∈ L( ). In the second case, there must be u′ and v′ such that cc′w = cc′u′xv′ = cc′uxv. Since u was the shortest word such that , it must be that u = u′, in which case cc′ux is synchronizing for . It is impossible that u is a proper prefix of u′ because this would imply cc′ux ⊆ which would contradict the fact that cc′ux synchronizes onto
in .
Example 3.4—The automaton in Fig. 3 is a path-automaton with no synchronizing words. It has only one terminal component which is maximal and the factors of all words in the language are labels of paths in the terminal component. This illustrates the situation (b) in Proposition 7.
Author Manuscript
The following result is used to prove the main result (Theorem 15, Section 5.3) of the paper. Proposition 8—Let L be a regular language, x ∈ F(L) and trim for L. At least one of the two cases holds: i.
be the trimmed mDFA
x is a factor of a constant for L,
ii. there is a path-automaton induced by a path of trim
containing a path labeled x and having a non-trivial terminal transitive component with at least two states.
Author Manuscript
Proof: Let trim = (Q, A, {q0}, T, ) be the trimmed mDFA for the language L. Suppose x ∈ F(L) is not a factor of a constant, i.e., for every v, v′ ∈ A*, vxv′ is not a constant for L, and therefore not synchronizing for trim . Consider a word w such that #Q xw = min#{Q xu|u ∈ A*} and let Pw = Q xw. Since xw is not synchronizing, by Proposition 1, #Pw > 1. Then for every word u ∈ A* we have that either Q xwu = ∅ or #Q xwu = #Pw. Therefore, we can assume that all states in Pw are in terminal components of trim , (if not, we can concatenate w with words that are labels of paths that lead to terminal components). If all terminal components in trim are trivial, then because trim is reduced, there is only one trivial terminal transitive component implying #Pw = 1, which is a contradiction with our assumption that x does not extend to a constant. Thus there must be at least one terminal
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 12
Author Manuscript
transitive component which is not trivial. If there is a state in Pw that belongs to a component that is not a single state component then (ii) holds. Assume to the contrary that each state in Pw is in a distinct transitive component consisting of only one state having loops at itself. Let y be a label of one of these loops. Since Pw y ≠ ∅ implies Pw y = Pw, i.e., for every q ∈ Pw we have qy = q. This means that all states in Pw are terminal, their loops must have the same labels, and therefore their right contexts are equal. Hence the states in Pw cannot be distinct in a reduced automaton. Thus again implies that Pw has cardinality 1, a contradiction. Hence, there must be at least one state in Pw that belongs to a terminal transitive component with at least two states.
4. Splicing languages and properties of splicing rules
Author Manuscript
As mentioned, in this paper we consider the general notion of the splicing operation and the splicing system given by Paun [17], as defined below. Definition 4—A finite splicing system is a triple S = (A, I, R) where, I ⊂ A* is a finite set of strings, called an initial language, R is a finite set of splicing rules of the form r = (u1, u2) (u3, u4), with ui ∈ A* for i = 1, 2, 3, 4. Given two words x = x1u1u2x2, y = y1u3u4y2, with x1, x2, y1, y2 ∈ A* and a rule r = (u1, u2) (u3, u4), the splicing rule produces w = x1u1u4y2 denoted (x, y) ⊢r w. We also say that u1u2, u3u4 are splice sites of r and u1u4 is the paste site of r. To simplify the notation, in the following, by a splicing system we mean a finite splicing system.
Author Manuscript
Let L ⊆ A*. We denote σ(L) = {w ∈ A*|(x, y) ⊢r w, x, y ∈ L, r ∈ R}. The (iterated) splicing operation is defined as follows: σ0(L) = L, σi+1(L) = σi(L) ∪ σ(σi(L)), i ≥ 0. Finally, σ*(L) = ⋃i≥0 σi(L). Definition 5 (Splicing language)—Given a finite splicing system S = (A, I, R), the language L(S) = σ*(I) is the language generated by S. A language L is a splicing language if there is a splicing system S such that L = L(S). For a word w and a set of states Q, we use notation
(Q w) for ⋃q∈Q
(qw).
Definition 6 (Paste site at p)—Let be the a DFA for a regular splicing language L. The word u1u4 is said to be a paste site at a state p ∈ Q for a splicing rule r = (u1, u2)(u3, u4) if (Q u3u4) ⊆ (pu1u4) and pu1u2 ≠ ∅.
Author Manuscript
More precisely, the notion of a paste site at a state q is used to identify states of the automaton where a rule can be applied. Fig. 4 depicts the situation for a paste site at state p. The doted path with label u3 may not exist in the automaton, but the right context of qu3u4 (wherever a path with such a label exists) must be included in the right context of pu1u4. In what follows we assume that every splicing system is such that all rules are applied at least once during the generation of the splicing language. The following lemma shows an
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 13
Author Manuscript
equivalence between splicing systems with respect to the extension of sites and paste sites of rules. Lemma 9—Let S = (A, I, R) be a finite splicing system and r = (u1, u2)(u3, u4) be a splicing rule in R. Let c ∈ A*. Then L(S) is the language generated with the splicing system S′ = (A, I, R′) where R′ = R ∪ {r′} for r′ = (u1, u2)(u3, u4c). Proof: It is clear that L(S) ⊆ L(S′) since R′ contains R. The converse also holds since whenever we have (x, y) ⊢r′ w we also have (x, y) ⊢r w. Lemma 10—Let S = (A, I, R) be a finite splicing system and a DFA for L = L(S). If u1u4 is a paste site at state p for a rule r = (u1, u2)(u3, u4) ∈ R then for every c ∈ A* with pu1u4c ≠ ∅, u1u4c is a paste site at p for a rule r′ = (u1, u2)(u3, u4c).
Author Manuscript
Proof: Suppose that u1u4 is a paste site at state p for rule r = (u1, u2)(u3, u4), and let pu1u4c ≠ ∅. Then by Lemma 9, L is also generated by the splicing system S = (A, I, R′) for the set of rules
where
. The first splice site of r equals r′ thus
pu1u2 ≠ ∅. It only remains to show that then cy ∈
(Q u3u4) ⊆
. But if (pu1u4) and so y ∈
(u1u4c). It follows that
is a paste site at state p for rule r′.
5. Splicing languages must have a constant 5.1. Reflexive and non-reflexive splicing languages
Author Manuscript
It is known that every splicing language generated by a finite splicing system is always regular [8,19]. More precisely, regular splicing languages form a proper subclass of the class of regular languages.
Author Manuscript
Recall that a splicing system S is said to be reflexive if for every rule r = (u1, u2)(u3, u4) in R, both (u1, u2)(u1, u2) and (u3, u4)(u3, u4) are rules in R. A language L is said to be a reflexive splicing language if there is a reflexive splicing system S such that L = L(S). It is said that S is symmetric if (u1, u2), (u3, u4) being in R implies that (u3, u4), (u1, u2) is in R. The notion of a constant of a language turned out to be essential in providing a characterization of the class of reflexive regular splicing languages [11,3]. Indeed, a fundamental property of a reflexive regular splicing language L is that there exists a splicing system generating L that has rules whose splicing sites consist of constants for the language L. A more precise characterization shows that the class of reflexive and symmetric splicing languages is equivalent to a class of regular languages, the so-called PA-con-split languages [3]. This result has been extended to the non-symmetric case [4]. In [4], itis shown that each language L in this class is constructed from a finite set of constants for L, as L is expressed as a union consisting of a finite set X, and a finite union of constant and split languages (see end of Section 2). The characterization is given with the following proposition.
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 14
Author Manuscript
Proposition 11. (See [4].)—A regular language L is a reflexive splicing language if and only if there is a finite set X ⊂ A*, a finite set of constants K1 of L and a finite set K2 of pairs of constants of L such that
The characterization of reflexive languages in Proposition 11 helps to describe factor-closed transitive regular languages as reflexive splicing languages. Proposition 12—If L is a factor-closed transitive regular language then L is a reflexive splicing language.
Author Manuscript
Proof: Since L is factor-closed, by Lemma 2, consider its minimal deterministic transitive automaton . By Lemma 3, all states in are synchronizing. Every word w ∈ L is a label of a path in that passes through some state q, hence w = w′w″ where w′ is in the left context of a constant that labels a path starting at q and w″ is in the right context of a constant that synchronizes onto q. Because all states in are initial and terminal, we have that and for every constant w of L. Let M be a set that consists of constants by choosing one constant mq for each state q in that is a label of a path starting and ending at the state q. Then L = ⋃mq∈M Split(mq, mq) where and splitting is performed by taking the empty prefix and the empty suffix of mq. The conclusion follows directly from the characterization in Proposition 11.
Author Manuscript
An example of non-reflexive regular splicing language is given in [11]; this is the language L = a+b+a+b+a+ ∪ a+b+a+. Example 5.1—The path-automaton of Fig. 5 generalizes the example of regular nonreflexive language given in [11]. More precisely, it is possible to show, similarly as in [11], that any splicing system for the language Lk = ⋃1≤i≤k(a+b+)ia+ with k ≥ 3 must have a rule whose both splice sites are not constants for the language. A splicing system S for Lk can be defined with an initial language Ik = ∪1≤i≤k{a(ba)i, a2(ba)i, (ab)ia2} and rules:
Author Manuscript
The proof that such splicing system generates the language Lk is along the same lines of the ones given in [11]. Observe that both splice sites of rules r3,i = (a, (ba)i)((ab)k−i, ab), for i > 1, are not constants for the language Lk. More precisely, rules r1,k and r2,k are used to increase the initial and final number of a’s in language a+(ab)ka+, respectively. Rules r3,i are used to increase the number of a’s in the (k − i)th appearance of a’s in (a+b+)ka+, for i ≤ k.
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 15
Author Manuscript
Similarly, rules r4,i are used to increase the number of b’s in language Lk. The rules r3,i are also used to obtain (a+b+)ja+, for j < k. The following lemma shows another example of non-reflexive splicing language whose trimmed mDFA is not a pathautomaton (Fig. 6). Lemma 13—The regular language L = b(a3)* + cba* + da(a3)* is a non-reflexive splicing language.
Author Manuscript
Proof: First we note that L ⊆ A*, for A = {a, b, c, d} is splicing. A splicing system S = (A, I, R) for language L consists of rules R = {r1 = (cba, 1)(cb, a), r2 = (daa3, 1)(da, 1), r3 = (b, a3)(da, 1)}, while the initial language I consists of language I = {ba3, b, cba, cb, daa3, da}. By induction on the number k of iteration steps of splicing rules, we first show that L(S) ⊆ L. If k = 0, since I ⊆ L(S), the inclusion holds. Assume that w ∈ L(S) is generated with k > 0 iterations by applying a rule r to a pair of words w1, w2 ∈ L(S). By induction w1, w2 ∈ L are obtained with k − 1 iterations. Checking splice sites in w1 and w2 for all of the rules, it is immediate to see that w ∈ L. In order to show that L ⊆ L(S), we observe that language L1 = da(a3)* is generated by rule r2 applied to words in the same language daa3. Similarly, we see that language L2 = cba* is generated by rule r1 starting from words from the same language. Language L3 = b(a3)* is generated by rule r3 applied to words of language da(a3)* and of language b(a3)*. By induction on i ≥ 0, indeed we can observe that b(a3)i ∈ L(S), i ≥ 0. If i = 0 or i = 1, being b, ba3 ∈ I, the result is immediate. Otherwise, given words b(a3)i−1 ∈ L(S), for i > 1 and word da(a3)i ∈ L(S), by rule r3 is immediate to generate word b(a3)i ∈ L(S).
Author Manuscript
Finally, notice that language L is not reflexive, that is, it cannot be generated by a splicing system by reflexive splicing rules. Suppose L is reflexive splicing language generated by a reflexive system S. We obtain a contradiction by considering generation of words in language b(a3)*. Since in language L the only words that start with a b are those in language b(a3)* and there must be splicing rules in S to generate words of the form b(a3)k for arbitrarily large k, there must be a rule r with splice site u1u2 that is a factor of b(a3)* But, because S is reflexive, S must also contain a rule (u1, u2)(u1, u2). Then this rule can be applied to x =b(a3)k and y = cb(a3)ka, for some large k > 0, to generate a word w = b(a3)ka ∉ L. Therefore the language L cannot be generated by reflexive rules. 5.2. Canonical and special words The proof of the main result (Theorem 15) is based on special words in a regular splicing language L that must be generated by a splicing rule whose splice site u3u4 is a constant of L. For a lack of a better name, we call these words q-canonical and k-special words.
Author Manuscript
Informally, the q-canonical word of a component is a word c such that qc = q and every such path with label c crosses all states in the component, and moreover, the word c is able to identify the language L( ) of the component . Definition 7 (q-Canonical)—Let be an automaton and let be a component of . Let q be a state of . Then a word c ∈ A+ such that c ∈ L( ) and qc = q is called q-canonical for
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 16
Author Manuscript
with respect to ⊆ L( ).
if whenever c ∈ L( ), for another component
of , implies that L( )
In the following we show the existence of a q-canonical word for every state in every transitive component in . We give a constructive proof based on the notion of a k-special word for L( ) as defined below. Definition 8 (k-Special)—Let be an automaton. A word c in L( ) is k-special for the language L if every word of F(L) of length ≤k is a factor of c. Example 5.2—Consider the automaton of Fig. 6. Then the word a3 is q2-canonical for the terminal component consisting of states q2, q3, q4. Given the language L = L( ), then the word a3 is k-special for the language L for k ≤ 3.
Author Manuscript
Lemma 14—Given a non-trivial transitive component in a DFA , let k = (#Q)2. Then for every state q in there is a q-canonical of that is a k-special constant of L( ). Proof: Let {x1,…, xn} = L( ) ∩ A≤k. Being a transitive component, there are y1,…, yn−1 such that x1 y1x2 ···yn−1xn ∈ L( ). Set c = x1 y1x2 ···yn−1xn. Due to transitivity, for every q ∈
there are yq and
such that
is a label of a path that starts and ends at q. By
Author Manuscript
Remark 5, yq, can be chosen so that wq is a constant. We show that wq is q-canonical. Assume that wq ∈ L( ) for some transitive component . Take the shortest word z ∈ L( ) \ L( ). Since L( ) \ L( ) = L( ) ∩ (L( ))c, it can be recognized by an automaton with at most #Q ( ) · #Q ( ) ≤ k states [13], the shortest word in this language has length at most k. Thus |z| ≤ k and therefore z must be a factor of c, i.e., z must be in L( ), contradicting the existence of z. 5.3. Proof of the main result Considering the importance of constants in characterization of sub-classes of regular splicing languages, it has been conjectured that every splicing language must have a constant [10,11]. Our main result proves this conjecture to be true. Theorem 15 (Main result)—If L is a regular splicing language, then L has a constant. Example 5.3—The path-automaton of Fig. 3 has no synchronizing word (see Example 3.4) and thus the language L( ) = a*c(c*ac*a)* has no constant. By Proposition 15, L( ) is not a regular splicing language.
Author Manuscript
Example 5.4—The transitive regular language L recognized by the automaton in Fig. 1 has no constants. By Theorem 15, the language L is not a splicing language. Example 5.5—The regular language L = b(a3)* + cba* + da(a3)* is another example of non-reflexive splicing language, as proved in Lemma 6. Fig. 2(a) shows the trimmed mDFA graph for language L. Observe that not every path-automaton induced by a path in the mDFA from the initial state q0 to a terminal component has necessarily a constant of L.
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 17
Author Manuscript
Indeed, the path-automaton in Fig. 2(a) recognizing language b(a3)* ⊂ L does not have any constant of the language L because every word in b(a3)* is also a substring of a word in cba* and therefore is not a synchronizing word for the automaton of L. Given a splicing regular language L, the proof of Theorem 15 shows existence of a splicing rule r = (u1, u2)(u3, u4) such that the word u1u4 ends in a non-trivial terminal component of the trimmed mDFA trim . More precisely, u1u4 ends in a state which we show to be synchronizing for the automaton trim . Let L be a regular splicing language and let trim = (Q, A, {q0}, T, ) be the trimmed mDFA for language L. We introduce some basic notations that are used in the proof. We are interested in states of the automaton trim that are found as follows.
Author Manuscript
Consider a non-trivial terminal component that is minimal among the non-trivial terminal components in the automaton trim . If a non-trivial component does not exist, then by Proposition 15, trim must have a constant and Theorem 15 holds. Let q ∈ be a minimalfollower state with respect to and recall that with μq( ) we denote the set of states in that are follower-equivalent to q. Let C = { = , , ···, } be the set of all terminal components of the automaton that are factor-equivalent to . Consider the set follower sets in
(note that by Lemma 3, for each i = 1,…, k, the collection of coincides with the collection of follower sets in ).
Then, a candidate state of trim is a state q̄ ∈ F with q̄ ∈ for some component that (q̄) is minimal in the following sense: for all q ∈ F, whenever (q) ⊆ holds that (q)= (q̄), i.e., being trim reduced it holds that q = q̄.
∈ C such (q̄), it
Author Manuscript
The main idea of the proof is to show that either the automaton has no non-trivial components, and in this case a constant exists (see Proposition 8), otherwise there exists a candidate state that is synchronizing for the automaton trim . Example 5.6—Consider the automaton in Fig. 6. Observe that the minimal terminal component induced by state q2 has language L( ) = a*, with L( ) = {a3}*, and is factorequivalent to the component induced by the state q1. Then the set F, corresponding to the candidate component , is the set of states F = {q1, q2, q3, q4} because all these states belong to only one follower-equivalence, thereby the minimal follower-equivalence class. Then the candidate states are q2, q3, q4. We will use the following lemma.
Author Manuscript
Lemma 16—Let q̄ ∈ state.
be a candidate state and let q̄1 ∈ F ∩ . Then q̄1 is also a candidate
Proof: Let be the minimal deterministic transitive automaton for ∈ C from Lemma 3. Suppose q̄ ∈ is a candidate state and let q̄1 ∈ F ∩ . Let further q̂ ∈ be the followerequivalent state to q̄. Then by Remark 5 there is a constant c of L( ) such that q̂c = q̂ and q̄1c = q̄.
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 18
Author Manuscript
Let q′ ∈ F. First, suppose that q′ ∈
for some
Because c is a constant, by Remark 4,
≠ . Consider
such that
.
is follower-equivalent to q̂, so we have
.
Because q̄ is a candidate state, there are two possibilities (a) right contexts of q̄ and incomparable, i.e.,
and
, or (b)
(equality cannot hold because trim must be a Therefore, in both cases Also, if q′ ∈
(q′) ⊈
∩ F then by Lemma 4,
are
is reduced). In both cases here
. Then q′cz is a terminal state while q̄1cz is not. (q̄1). (q′) \
(q̄1) ≠ ∅.
Therefore q̄1 is a candidate state.
Author Manuscript
Before we present details of the proof of Theorem 15 we outline the steps involved in the proof by illustrating the situation in Example 5.6 shown in Fig. 6. i.
In trim we identify a candidate state q̄ within a non-trivial component C̄ as outlined above. (For Example 5.6 we choose state q2.)
ii. We consider a q̄-canonical word c and observe that there must be a rule (u1, u2)(u3, u4) with a paste site u1u4 at a state p that lies on a path labeled wcsx, for some s, where q0 w = q̄ and q̄x is terminal (see Fig. 7). (For Example 5.6, p = q0 and wcsx = b(a3)s with w = b and c = a3, the rule in question is r3 = (b, a3)(da, 1).)
Author Manuscript
iii. We observe that there is a state q ∈ Q u3u4 such that, for arbitrarily large i, ci is a factor of the right context of q. (For Example 5.6, such states are only q2, q3, q4, because q1 ∉ Q daa3x for any x.) We choose a sufficiently large i such that for some z, all states in Q u3u4zci belong to non-trivial components and we set . We observe that all states in end in non-trivial components that are factor-equivalent to the non-trivial component . By Lemma 10 and obtain is a paste site for p, given rule . (For Example 5.6, we that 3 3 can choose z = 1 and have a new rule r3 = (b, a )(da, a ), and Q daa3 = q2.) iv. We show that for every synchronizing.
, it must be
, therefore
is
We now present the proof of the main result.
Author Manuscript
Proof of Theorem 15—Let L be a regular splicing language, and let trim = (Q, A, {q0}, T, ) be its trimmed mDFA. By Proposition 8, if the automaton trim has only a trivial terminal component (note that since trim is reduced, there could be only one such component), it must have a constant and thus the theorem holds. Therefore we consider the case that trim has at least one non-trivial terminal component. i.
Consider a non-trivial terminal component that is minimal among the non-trivial terminal components in the automaton trim and let q ∈ be minimal-follower state in . Let C = { = , ···, } be the set of all terminal components of the
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 19
Author Manuscript
automaton that are factor-equivalent to and set choose a candidate state q̄ ∈ F in a component ∈ C.
. We
Author Manuscript
ii. Let w ∈ A* be the shortest word such that q0w = q̄. Consider a word c which is a constant of L( ) and is q̄-canonical for . Such a word exists by Lemma 14. Then wc*x ⊆ L for some x ∈ A*. Since there is a finite number of rules in the splicing system, there are an infinite number of indexes s such that wcsx are obtained by using the same splicing rule r = (u1, u2)(u3, u4) where u1u4 is a subword of wcsx for every such s. More precisely, there must exist an infinite number of pairs of words v = v′u1u2 v″ ∈ L and w′u3u4 w″ ∈ L such that v′u1u4 w″ ∈ wc*x. Thus v′u1 is a prefix of wcix for some i ≥ 0. Let p be such that pu1u2 ≠ ∅ where p = q0 v′. Moreover, if y″ ∈ (Qu3u4), since there is y′ such that y = y′u3u4 y″ ∈ L, by splicing words v = v′u1u2 v″ and y = y′u3u4 y″ with rule r, we obtain v′u1u4 y″ ∈ L and thus y″ ∈ (pu1u4). Therefore, (Qu3u4) ⊆ (pu1u4). We obtain that u1u4 is a paste site at state p for rule r = (u1u2)(u3u4). Refer to Fig. 7. iii. In the following we show that there are states in Q u3u4 such that ci is a factor of a word in their right context for arbitrarily large i’s.
Author Manuscript
Let p′ = q0 v′u1u4 where v′ is a prefix of wc*x. Being trim deterministic, and since is terminal, by the choice of p′ it must be that p′ is either inside the component or otherwise lies along a path with label w from state q0 to state q̄. In the latter case when p′ is not a state inside , v′u1u4 is a prefix of w. In this case cix must be a suffix of w″ in the splicing of v′u1u2 v″ ∈ L and w′u3u4 w″ ∈ L that produces v′u1u4 w″ = wci x. Hence, for arbitrarily large i’s, it must be that ci is a factor of a word in the right context of a state q ∈ Q u3u4. Since there are infinite number of i’s with this property, there is a state q ∈ Q u3u4 such that ci ∈ F ( (qu3u4)) for arbitrarily large i.
Author Manuscript
Now suppose p′ is a state in (see Fig. 7). By Proposition 8, u3u4 is either a factor of a constant, which proves the statement of the theorem, or there is a pathautomaton (a sub-automaton of trim ) with a non-trivial terminal component such that u3u4 is a label of path π in . Then, by Lemma 14, there is a q-canonical word c′ for some q ∈ such that u3u4zc′ is a label of a path in for some z, that is zc′ ∈ (qu3u4). Since u1u4 is a paste site for the rule r at state p, we have that zc′ ∈ (qu3u4) ⊆ (pu1u4) = (p′) ⊆ L( ). But because c′ is q-canonical for it follows that L( ) ⊆ L( ) and by the minimality of component and since is factor equivalent to we have that L( ) = L( ), i.e., c* ⊆ L( ). Therefore ci is a factor of the right context of a state q ∈ Q u3u4 for arbitrarily large i. We now consider states in Q u3u4 whose right context has words with factors ci for arbitrarily large i’s. We fix i sufficiently large, such that for some z, every state in Q u3u4zci belongs to a non-trivial component, and for every state q̂ ∈ Q u3u4zci, the language of the component by Lemma 10,
containing q̂ contains the word c. Given,
is a paste site at the same state p for the rule
Inf Comput. Author manuscript; available in PMC 2016 June 01.
, .
Bonizzoni and Jonoska
Page 20
Author Manuscript
Observe that z can be chosen such that . If p′ = pu1u4 is not in , by the argument above, cix is a suffix of w″, hence we can chose z such that w″ = zcix, i.e., p′z = q̄. Because c is a constant for L( ) such that q̄c = q̄, by Lemma 3 and Remark 4, every state q ∈ c is follower-equivalent to q̄. Therefore, the state is follower-equivalent to q̄, and hence in F. Having q̄0 ∈ F ∩ , by Lemma 16, q̄0 is also a candidate state. iv. Let q be a state in
. We conclude with the observation that q = q̄0 and
Author Manuscript
therefore is a synchronizing word for q̄0, proving the theorem. The proof of this last step consists first in showing that L( ) = L( ) where is the component in trim containing q. Then we show that is terminal and thus ∈ C. By the fact that q̄0 is a candidate state we are able to show that q̄0 = q. We first observe that L( ) = L(
). As
, by Definition 6 of paste site, it holds that
(*)
Since c is q̄-canonical, by Definition 7 we have that L( ) ⊆ L( ). If L( ) \ L( ) ≠ ∅ then (q) ⊈ (q̄0) which contradicts (*). Therefore it must be that L( ) = L( ), that is and are factor-equivalent.
Author Manuscript
Next we see that is terminal. Assume to the contrary that is not terminal and thus there is an edge labeled a that starts in and terminates in a state q′ outside . By Lemma 5 the automaton that consists of together with the edge labeled a ending at q′ has a synchronizing word that ends at q′. Let ua be that word. Then ua ∉ / L( ) = L( ), because otherwise ua would not be synchronizing. By the transitivity of we can assume that ua is a label of a path that starts at q. This implies that ua must be a prefix of a word in (q)\ (q̄0), again contradicting (*). Consequently is terminal and in C. Moreover q ∈ F, since by the choice of the constant c, by Remark 4, q is factor-equivalent to q̄ (hence, to q̄0), and F consists of all states that are factor-equivalent to q̄ and belong to components in C. Thus by (*) (i.e., (q) ⊆ (q̄0)) and the fact that q̄0 as a candidate state (q̄0)= (q). Because trim is reduced, q̄0 = q, which concludes the proof.
Author Manuscript
The proof of Proposition 15 is based on the effective computation of a synchronizing state in the automaton for a regular splicing language in the case of an automaton having non-trivial terminal components. As a main corollary of the above Proposition we can state the following fact. Corollary 17—Let trim be the trimmed minimal deterministic automaton recognizing a splicing regular language. Then every state in a terminal component that contains a candidate state for trim is synchronizing.
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 21
Author Manuscript
6. Concluding remarks In this paper we solve a conjecture posed by T. Head in his seminal works on regular splicing languages about the existence of a constant as a necessary condition for a regular language to be splicing. We solve this open problem in an affirmative way, by providing a constructive proof that leads to a procedure for finding a synchronizing state in a mDFA for a regular splicing language. The use of constants allows to determine a necessary and sufficient condition for a regular language to be reflexive splicing [3,4]; identifying such a condition for non reflexive splicing languages is still an open problem.
Author Manuscript
Recently, decidability of regular splicing languages has been proved in [15] by providing an upper bound on the lengths of the words included in the splicing rules. This bound is quadratic with respect to the size of the syntactic monoid of the language. The decidability follows from the fact that the bound allows brute-force search and comparison of the given language with splicing languages obtained through all possible finite sets of rules of certain size. Although the existence of the algorithm was long waited, the procedure it provides is useless for all practical purposes. Having a practical procedure to decide whether a regular language is splicing remains a challenging open problem. We believe that finding a characterization of minimal splicing systems recognizing splicing languages, where minimality of the system is given in terms of both the number of splice sites of rules and the length of the splicing sites, would be a promising direction for obtaining a practical decision procedure. Moreover, since splicing rules are built from constants in reflexive languages, the notions of constants and synchronizing words again seem to be vital for answering most of the above questions.
Author Manuscript
Acknowledgments We thank the reviewers for numerous valuable comments. P. Bonizzoni is partially supported by MIUR PRIN 2010–2011 grant “Automi e Linguaggi Formali: Aspetti Matematici e Applicativi”, code H41J12000190001, N. Jonoska is supported in part by the NSF grant CCF-1117254 and the NIH grant R01GM109459-01.
References
Author Manuscript
1. Berstel, J.; Perrin, D. Theory of Codes. Academic Press, Inc; Orlando, Florida: 1985. 2. Bonizzoni, P.; De Felice, C.; Mauri, G.; Zizza, R. Regular languages generated by reflexive finite linear splicing systems. Lect Notes Comput Sci; Proc. Development in Language Theory; Berlin: Springer; 2003. p. 134-145. 3. Bonizzoni P, De Felice C, Zizza R. The structure of reflexive regular splicing languages via Schützenberger constants. Theor Comput Sci. 2005; 334(1–3):71–98. 4. Bonizzoni P, Mauri G. Regular splicing languages and subclasses. Theor Comput Sci. 2005; 340:349–363. 5. Bonizzoni P. Constants and label-equivalence: a decision procedure for reflexive regular splicing languages. Theor Comput Sci. 2010; 411(6):865–877. 6. Bonizzoni, P.; Jonoska, N. Regular splicing languages must have a constant. Lect Notes Comput Sci; Proc. Developments in Language Theory; Berlin: Springer; 2011. p. 82-92. 7. Černý J. Poznámka k homogénnym eksperimentom s konecnými automatami. Mat-Fyz čas Slov Akad Vied. 1964; 14:208–216.
Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 22
Author Manuscript Author Manuscript
8. Culik K, Harju T. Splicing semigroups of dominoes and DNA. Discrete Appl Math. 1991; 31:261– 277. 9. De Luca A, Restivo A. A characterization of strictly locally testable languages and its application to semigroups of free semigroup. Inf Control. 1980; 44:300–319. 10. Goode, E. PhD Thesis. Binghamton University; 1999. Constants and splicing systems. 11. Goode E, Pixton D. Recognizing splicing languages: syntactic monoids and simultaneous pumping. Discrete Appl Math. 2007; 155:989–1006. 12. Head T. Formal language theory and DNA: an analysis of the generative capacity of specific recombinant behaviours. Bull Math Biol. 1987; 49:737–759. [PubMed: 2832024] 13. Hopcroft, JE.; Motwani, R.; Ullman, JD. Introduction to Automata Theory, Languages, and Computation. Addison–Wesley; Reading, Mass: 2001. 14. Jonoska N. Sofic systems with synchronizing representations. Theor Comput Sci. 1996; 158(1–2): 81–115. 15. Kari, L.; Kopecki, S. Deciding if a regular language is generated by a splicing system. Lect Notes Comput Sci; Proc. DNA Computing and Molecular Programming – 18th International Conference; Berlin: Springer; 2012. p. 98-109. 16. Lind, D.; Marcus, B. An Introduction to Symbolic Dynamics. Cambridge University Press; New York: 1995. 17. Paun G. On the splicing operation. Discrete Appl Math. 1996; 70:57–79. 18. Paun, G.; Rozenberg, G.; Salomaa, A. New Computing Paradigms. Springer-Verlag; Berlin: 1998. DNA Computing. 19. Pixton D. Regularity of splicing languages. Discrete Appl Math. 1996; 69:101–124. 20. Schützenberger MP. Sur certaines opérations de fermeture dans le langages rationnels. Symp Math. 1975; 15:245–253. 21. Verlan, S. PhD Thesis. University of Metz; 2004. Head systems and applications to bioinformatics.
Author Manuscript Author Manuscript Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 23
Author Manuscript Fig. 1.
A transitive language that doesn’t have a transitive deterministic automaton. The initial state is indicated with an arrow and the terminal states are shaded.
Author Manuscript Author Manuscript Author Manuscript Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 24
Author Manuscript
Fig. 2.
Two automata, initial states are indicated with an arrow pointing to them and the terminal states are shaded.
Author Manuscript Author Manuscript Author Manuscript Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 25
Author Manuscript
Fig. 3.
A path-automaton with no synchronizing words.
Author Manuscript Author Manuscript Author Manuscript Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 26
Author Manuscript Fig. 4.
Author Manuscript
Paste site at state p, the dotted path with label u3 may or may not exist. But the right context of qu3u4 is included in the right context of pu1u4 for every q.
Author Manuscript Author Manuscript Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 27
Author Manuscript
Fig. 5.
A path automaton that recognizes a non reflexive splicing language.
Author Manuscript Author Manuscript Author Manuscript Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 28
Author Manuscript Fig. 6.
The figure reports the automaton of Fig. 2(a) detailing the minimal terminal component which consists of states q2, q4 and q3.
Author Manuscript Author Manuscript Author Manuscript Inf Comput. Author manuscript; available in PMC 2016 June 01.
Bonizzoni and Jonoska
Page 29
Author Manuscript Author Manuscript Fig. 7.
A possible paste site at state p.
Author Manuscript Author Manuscript Inf Comput. Author manuscript; available in PMC 2016 June 01.