1776

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 8, AUGUST 2015

Finite-Horizon Near-Optimal Output Feedback Neural Network Control of Quantized Nonlinear Discrete-Time Systems With Input Constraint Hao Xu, Member, IEEE, Qiming Zhao, and Sarangapani Jagannathan, Senior Member, IEEE

Abstract— The output feedback-based near-optimal regulation of uncertain and quantized nonlinear discrete-time systems in affine form with control constraint over finite horizon is addressed in this paper. First, the effect of input constraint is handled using a nonquadratic cost functional. Next, a neural network (NN)-based Luenberger observer is proposed to reconstruct both the system states and the control coefficient matrix so that a separate identifier is not needed. Then, approximate dynamic programming-based actor-critic framework is utilized to approximate the time-varying solution of the Hamilton–Jacobi–Bellman using NNs with constant weights and time-dependent activation functions. A new error term is defined and incorporated in the NN update law so that the terminal constraint error is also minimized over time. Finally, a novel dynamic quantizer for the control inputs with adaptive step size is designed to eliminate the quantization error overtime, thus overcoming the drawback of the traditional uniform quantizer. The proposed scheme functions in a forward-in-time manner without offline training phase. Lyapunov analysis is used to investigate the stability. Simulation results are given to show the effectiveness and feasibility of the proposed method. Index Terms— Approximate dynamic programming, finite horizon, Hamilton–Jacobi–Bellman (HJB) equation, neural network (NN), optimal regulation, quantization.

W σ (xk ) ˆk W WV σ V (xk , k) ˆ Vk W eB,k eN,k Wu σ u (ˆxk , k) euk αI, α V, αu ε¯ k , εVk , ε uk

NN-based observer target weights. NN-based observer activation function. NN-based observer estimated weights at time k. Critic NN target weights. Critic NN time-varying activation function at time k. Estimated weight matrix of Critic NN at time k. Bellman equation residual error. Terminal constraint error. Actor NN target weight. Actor NN time-varying activation function at time k. Actor NN estimation error. Tuning parameters for NN-based observer, critic NN and actor NN, respectively. Reconstruction errors for NN-based observer, critic NN and actor NN, respectively. I. I NTRODUCTION

qd (•) uqk ψ(xN ) W(uk ) ϕ(•) A

N OMENCLATURE Dynamic quantizer. Quantized control input. Terminal constraint. Nonquadratic function. Bounded one-to-one function. Hurwitz matrix.

Manuscript received November 5, 2013; revised October 1, 2014, January 14, 2015, and February 13, 2015; accepted February 21, 2015. Date of publication March 18, 2015; date of current version July 15, 2015. This work was supported in part by the National Science Foundation under Grant ECCS-1128281 and in part by the Intelligent Systems Center. H. Xu is with the College of Science and Engineering, Texas A&M University—Corpus Christi, Corpus Christi, TX 78412 USA (e-mail: [email protected]). Q. Zhao is with DENSO International America, Inc., Southfield, MI 48033 USA (e-mail: [email protected]). S. Jagannathan is with the Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO 65409 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2015.2409301

A

CTUATOR saturation is very common in practical applications due to physical limitations. Control of systems with saturating actuators has been one of the focuses of many researchers for several years [1], [2]. However, most of these approaches considered only stabilization, whereas optimality is not considered. To address optimal control problem with actuator constraint, Lyshevski [3] presented a general framework for the design of optimal control laws based on dynamic programming. It has been shown in [3] that the use of a nonquadratic functional can effectively tackle the input constraint while achieving optimality. On the other hand, in practice, the interface between the plant and the controller is often connected via analog-to-digital and digital-to-analog devices, which normally quantize the signals [4]. As a result, the design of control systems with quantization effect has attracted a great deal of attention to the control researchers since quantization process is unavoidable in the computer-based control systems. However, quantization error never vanishes when the signals are processed by a traditional uniform quantizer [5]–[7].

2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

XU et al.: FINITE-HORIZON NEAR-OPTIMAL OUTPUT FEEDBACK NN CONTROL

In addition, in many practical situations, the state vector is difficult or expensive to measure. Several traditional nonlinear observers, such as high-gain or sliding-mode observers, have been developed [8], [9]. However, these observer designs [8], [9] are applicable to systems expressed in a specific structure and require the system dynamics a priori. On the other hand, the optimal regulation of nonlinear systems can be addressed either for infinite or finite fixed time scenario. The finite-horizon optimal regulation still remains unresolved due to the following reasons. First, the solution to the optimal control of nonlinear system in finite horizon becomes essentially time-varying thus complicating the analysis, in contrast with the infinite-horizon case, where the solution is time independent. In the finite horizon problems, the terminal constraint is explicitly imposed in the cost function, whereas the terminal constraint is normally ignored in the infinite horizon. Wei and Liu [10], Liu and Wei [11], [12], and Wei et al. [13], [14] provided some insights into solving finitehorizon optimal regulation of nonlinear systems. However, these schemes are either backward-in-time or require offline training with iterative approaches, which are not suitable for real-time implementation. Furthermore, all the existing literature considered only state feedback case without quantization effect. Therefore, the finite-horizon optimal regulation scheme for uncertain nonlinear quantized systems with an actuator constraint, which can be implemented in an online and forward-in-time manner with output measurements and value and policy iterations, is yet to be developed. Motivated by the aforementioned deficiencies, in this paper, an extended neural network (NN)-based Luenberger observer is first proposed to estimate the system state vector as well as the control coefficient matrix. Note that proposed observer can maintain the system stability while the NN learns the system dynamics. The actor-critic architecture is then utilized to generate the near-optimal control policy wherein the value function is approximated using the critic NN and the optimal policy is generated using the approximated value function and the control coefficient matrix generated from the observer given an initial admissible control. Finally, a novel dynamic quantizer is proposed to mitigate the effect of quantization error for the control inputs. Due to the presence of observer errors, the control policy will be near optimal. To handle the time-varying nature of the solution to the Hamilton–Jacobi–Bellman (HJB) equation or value function, NNs with constant weights and time-varying activation functions are utilized. In addition, in contrast with [12] and [15], the control policy is updated once a sampling instant and hence value/policy iterations are not performed. An error term corresponding to the terminal constraint is defined and minimized overtime so as to satisfy the terminal constraint. A novel update law for tuning the NN is developed such that the critic NN weights will be tuned using not only the Bellman error but also the terminal constraint errors. Finally, stability of our proposed design scheme is demonstrated using Lyapunov stability analysis. Therefore, the main contribution of this paper includes the development of a novel approach to solve the finite-horizon

1777

Fig. 1.

Block diagram of the quantized system with input saturation.

output feedback-based near-optimal control of uncertain quantized nonlinear discrete-time systems in affine form in an online and forward-in-time manner without utilizing value and/or policy iterations. A novel dynamic quantizer as well as an online NN observer is introduced for eliminating the quantization errors and generating both the state vector and control coefficient matrix, respectively, so that an explicit need for an identifier is relaxed. Tuning laws for all the NNs are also derived. Lyapunov stability is also demonstrated. The remainder of this paper is organized as follows. In Section II, background and formulation of finite-horizon optimal control problem for nonlinear quantized systems are given. Section III presents the main algorithm developed for the finite-horizon problem. In Section IV, simulation results are shown to verify the feasibility of proposed method. Finally, the conclusions are drawn in Section V. II. P ROBLEM F ORMULATION In this paper, the finite-horizon optimal control of general quantized nonlinear discrete-time system in affine form is studied. Consider the nonlinear discrete-time system of the form xk+1 = f (xk ) + g(xk )uqk , yk = Cxk

(1)

where xk ∈ x ⊂ n and yk ∈  y ⊂  p are the system state and output vectors, respectively, uqk = qd (uk ) ∈ u ⊂ m is the quantized control input vector, where qd (•) is the dynamic quantizer defined later, uk ∈ U ⊂ m , where U = {u = (u 1 , u 2 , . . . , u m ) ∈ m : ai ≤ u i ≤ bi , i = 1, 2, . . . , m} with ai , bi being the constant bounds [16], f (xk ) : n → n , g(xk ) : n → n×m represents the unknown nonlinear control coefficient matrix, and C ∈  p×n is the known output matrix. In addition, the input matrix g(xk ) is considered to be bounded such that 0 < g(xk ) < g M , where g M is a positive constant. The general structure of the quantized nonlinear discretetime system considered in this paper is shown in Fig. 1. It is important to note that digital communication network is usually used to connect sensor, controller, and actuator in practical [17] scenario. Due to limited communication bandwidth, system states and control inputs should be quantized before transmission [4]. In [18], state quantization has been considered. Therefore, control input quantization is considered here.

1778

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 8, AUGUST 2015

Assumption 1: The nonlinear system given in (1) is controllable and observable [9]. Here, the system output, yk ∈  y , is considered measurable. The controllability ensures that a stabilizing controller can be designed, while the observability guarantees that the state vector can be estimated from output measurements. The objective of the control design is to determine a feedback control policy that minimizes the following time-varying cost function: V (xk , k) = ψ(x N ) +

N−1 

(Q(xi , i ) + W (ui ))

(2)

Fig. 2.

Ideal and realistic quantizer.

i=k

which is subjected to the system dynamics (1), [k, N] is the time interval of interest, ψ(x N ) is the terminal constraint that penalizes the terminal state x N ∈ x , Q(xk , k) ∈  is a positive semidefinite function, and W (uk ) ∈  is positive definite. It should be noted that in the finitehorizon scenario, the control inputs can be time-varying, i.e., uk = μ(xk , k) ∈ u . Setting k = N, the terminal constraint for the value function is given as V (x N , N) = ψ(x N ).

(3)

For unconstrained control inputs, W (uk ) generally takes the form W (uk ) = ukT Ruk , with R ∈ m×m being a positive definite and symmetric weighting matrix. However, in this paper, to confront the actuator saturation, we employ a nonquadratic functional [3] as  uk  T ϕ −1 (v) Rdv W (uk ) = 2 (4) 0

with ϕ(v) = [ φ(v1 )

···

ϕ −1 (uk ) = [ φ −1 (u 1,k )

φ(vm ) ]T ···

φ −1 (u m,k ) ]

(5)

where v ∈ m and ϕ(•) is a bounded function which belongs to the continuous function set Cϕ . Define the notation w(v) = ϕ −1 (v)R, and  uk  u 1 (k)  u m (k) wT (v)dv ≡ w1 (v1 )dv1 + · · · + wm (vm )dvm 0

0

0

(6) is a scalar, for uk ∈ u ⊂ m , v ∈ v ⊂ m , and w(v) = [ w1 , . . . , wm ] ∈ w ⊂ m . Moreover, it is a monotonic odd function with its first derivative bounded by a constant U . An example is the hyperbolic tangent function φ(•) = tanh(•). Note that W (uk ) is positive definite, since φ −1 (uk ) is monotonic odd and R is positive definite. By Bellman’s principle of optimality [19], [20], the optimal value function should satisfy the HJB equation   uk V ∗ (xk , k) = min Q(xk , k) + 2 (ϕ −1 (v))T Rdv uk 0  ∗ (7) +V (xk+1 , k + 1) .

The optimal control policy u∗k ∈ u that minimizes the value function V ∗ (xk , k) is revealed to be   uk (ϕ −1 (v))T Rdv u∗k = arg min Q(xk , k) + 2 uk 0  ∗ +V (xk+1 , k + 1) . According to Lagrange theory, optimal  u control can be attained by solving (∂{Q(xk , k) + 2 0 k (ϕ −1 (v))T Rdv + V ∗ (xk+1 , k + 1)})/(∂u k ) = 0

1 −1 T ∂ V ∗ (xk+1 , k + 1) ∗ R g (xk ) . (8) uk = −ϕ 2 ∂xk+1 It is clear from (8) that the optimal control policy cannot be obtained for the nonlinear discrete-time system even with available system state vector due to its dependence on the future state vector xk+1 ∈ x . To avoid this drawback and relax the requirement for system dynamics, iteration-based schemes are normally utilized using NNs with offline training [21]. However, iteration-based schemes are not preferable for hardware implementation since the number of iterations to ensure the stability cannot be easily determined [17]. Moreover, the iterative methods require the control coefficient matrix g(xk ) to generate the control policy [22]. Therefore, in this paper, a solution is found with system outputs and completely unknown system dynamics without utilizing the iterative approach and in the presence of quantization effect. In order to incorporate the quantization effect on the control inputs, the uniform quantizer with finite number of bits, which is shown in Fig. 2, has been utilized. Let z be the signal to be quantized and M be the quantization range for the quantizer. If z does not belong to the quantization range, the quantizer saturates. Let e be the quantization error, and it is assumed that the following two conditions hold: 1. if |z| ≤ M, then 0 ≤ e = |q(z) − z| ≤ /2 2. if |z| > M, then 0 ≤ e = |q(z) − z| < ∞ where quantization value ⎧ ⎨M, q(z) =  · ( z/ + 1/2), ⎩ −M,

(9)

z>M −M ≤ z ≤ M z < −M

is a nonlinear mapping that represents a general uniform quantizer representation with the step-size  defined as  = M/2 R with R being the number of bits of the quantizer.

XU et al.: FINITE-HORIZON NEAR-OPTIMAL OUTPUT FEEDBACK NN CONTROL

In addition, theoretically, when the number of bits of the quantizer approaches to infinity, the quantization error will reduce to zero and hence infinite precision of the quantizer can be achieved. In a realistic scenario, however, both the quantization range and the number of bits cannot be arbitrarily large. To circumvent these drawbacks, a dynamic quantizer scheme is proposed in this paper in the form similar to [23] as z q = qd (z) = μq(z/μ)

(10)

where μ is a scaling factor. III. F INITE -H ORIZON N EAR -O PTIMAL R EGULATOR D ESIGN U SING O UTPUT F EEDBACK W ITH C ONTROL C ONSTRAINT In this section, the output feedback-based near-optimal regulation scheme over finite horizon for uncertain quantized nonlinear discrete-time systems with an input constraint is addressed. First, due to unavailability of the system state vector and uncertain system dynamics, an extended version of Luenberger observer using an NN is proposed to reconstruct both the system state vector and control coefficient matrix in an online manner. Thus, the proposed observer design relaxes the need for an explicit identifier and naturally lends itself to stability. Next, the approximate dynamic programming framework is utilized to approximate the time-varying value function with actor-critic structure, while both NNs are represented by constant weights and time-varying activation functions. An error term corresponding to the terminal constraint is defined and minimized overtime. Finally, a novel dynamic quantizer is proposed to reduce the quantization error overtime. The stability of the closed-loop system is demonstrated using Lyapunov theory to show that the parameter estimation remains bounded as the system evolves provided an initial admissible control input is chosen. A. Observer Design The system dynamics (1) can be reformulated as xk+1 = Axk + F(xk ) + g(xk )uqk , yk = Cxk

(11)

where A is a Hurwitz matrix such that (A, C) is observable and F(xk ) = f (xk ) − Axk . An NN has been proved to be an effective method in the estimation and control of nonlinear systems due to its online learning capability [24]. According to the universal approximation property [25], [26], the system state vector can be represented using NN on a compact set  as xk+1 = Axk + F(xk ) + g(xk )uqk = Axk + WTF σ F (xk ) + WTg σg (xk )uqk + ε F k + ε gk uqk T     WF 1 σ F (xk ) 0 = Axk + uqk Wg 0 σg (xk )   1 +[ ε F k ε gk ] uqk = Axk + WT σ (xk )u¯ qk + ε¯ k

(12)

1779

where

 WF ∈  L×n Wg   σ F (xk ) 0 σ (xk ) = ∈  L×(1+m) 0 σg (xk )   1 u¯ qk = ∈ (1+m) uqk 

W=

and ε¯ k = [ε F k ε gk ]u¯ qk ∈ n , with L being the number of hidden neurons. In addition, the target NN weights, activation function, and reconstruction errors are assumed to be upper bounded by W ≤ W M , σ (xk ) ≤ σ M , and ¯ε k  ≤ ε¯ M , where W M , σ M , and ε¯ M are positive constants. Then, the system states xk+1 = Axk + F(xk ) + g(xk )uqk can be identified by updating the target NN weight matrix W. Since the true system state vector is unavailable for the controller, we propose the following extended Luenberger observer using an NN described by: ˆ kT σ (ˆxk )u¯ qk + L(yk − Cˆxk ), yˆ k = Cˆxk (13) xˆ k+1 = Aˆxk + W ˆ k is the estimated value of the target NN weights W, where W xˆ k is the reconstructed system state vector, yˆ k is the estimated output vector, and L ∈ n× p is the observer gain selected by the designer, respectively. Now, define the state estimation error as x˜ k+1 = xk+1 − xˆ k+1 = Axk + WT σ (xk )u¯ qk + ε¯ k   T ˆ k+1 − Aˆxk + W σ (ˆxk )u¯ qk + L(yk − Cˆxk ) ˜ kT σ (ˆxk )u¯ qk + ε¯ Ok = Ac x˜ k + W

(14)

˜ k = W−W ˆk where Ac = A − LC is the closed-loop matrix, W is the NN weights estimation error, σ˜ (xk , xˆ k ) = σ (xk )−σ (ˆxk ) and ε¯ Ok = WT σ˜ (xk , xˆ k )u¯ qk + ε¯ k are bounded terms due to the bounded values of ideal NN weights, activation functions, and reconstruction errors. Remark 1: The observer presented in (13) is novel, since it generates both the reconstructed system state vector and the control coefficient matrix g(xk ) for the near-optimal controller design, which can be viewed as an NN-based identifier. Now, select the tuning law for the NN weights as T ˆ k + β I σ (ˆxk )u¯ qk y˜ k+1 ˆ k+1 = (1 − α I )W lT W

(15)

where α I and β I are the tuning parameters, y˜ k+1 = yk+1 − yˆ k+1 is the output error, and l ∈ n× p is selected as column vectors with all ones to match the dimension. Hence, the NN weight estimation error dynamics, by recalling from (14), are revealed to be ˆ k+1 ˜ k+1 = W − W W T ˜ k + α I W − β I σ (ˆxk )u¯ qk y˜ k+1 = (1 − α I )W lT ˜ k + α I W − β I σ (ˆxk )u¯ qk x˜ kT AcT CT lT = (1 − α I )W T T ˜ k C T lT −β I σ (ˆxk )u¯ qk u¯ qk σ (ˆxk )W −β I σ (ˆxk )u¯ qk ε¯ TOk CT lT .

(16)

Next, the boundedness of the NN weights estimation ˜ k will be demonstrated in Theorem 1. Before error W proceeding, the following definitions are required.

1780

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 8, AUGUST 2015

Definition 1 [26]: An equilibrium point xe is said to be uniformly ultimately bounded (UUB) if there exists a compact set x ⊂ n so that for all initial values of x0 ∈ x , there exists a bound B and a time T (B, x0 ) such that xk − xe  ≤ B for all k ≥ k0 + T . Definition 2 (Persistence of Excitation): The function xk ∈ n is said to be persistently exciting (PE) if and only if there exist positive constant δ1 , l such that for all k0 ≥ 0 k0  + l−1

xk xkT > δIn

i=k0

where In ∈  is identity matrix and l is termed as excitation period of xk . Theorem 1 (Boundedness of the Observer Error): Let the ˆ k be selected within the cominitial NN observer weights W pact set  I D . Given an admissible control input u0 ∈ u and Assumption 1, let the proposed observer be given by (13) while its NN weight tuning law be given by (15). Using both the PE condition on the control input signals and pole placement [15] method to identify A, L satis2 )))1/2 , fying Ac  = A − LC ≤ (1/(1 + 4C2M (1 + σmin there exist positive constants α I and β I satisfying 0 < 2 + 1) with 0 < C ≤ C β I 0 is a design parameter. To find the error dynamics for the actor NN weights, first observe that 0 = WuT σu (xk , k) + εu (xk , k) 1 + ϕ R−1 g T (xk ) 2

  T × ∇σV (xk+1 , k + 1)WV + ∇ε V (xk+1 , k + 1) . (33) Subtracting (33) from (31), we have 1 T ˜ uk ˜ Vk σu (ˆxk , k)− L ϕ R−1 g T (xk )∇σVT (ˆxk+1 , k +1)W euk = −W 2 1 − L φ R−1 g˜ T (ˆxk )∇σVT (ˆxk+1 , k + 1)WV 2 1 ˜ Vk − L φ R−1 (g T (ˆxk ) − g T (xk ))∇σVT (ˆxk+1 , k + 1)W 2 1 ˜ V k + ε¯ u (xk , k) + L φ R−1 g˜ T (ˆxk )∇σVT (ˆxk+1 , k + 1)W 2 (34) ˆ uk , L φ is the positive Lipschitz constant ˜ uk = Wu − W where W for the saturation function φ(•), σ˜ u (xk , xˆ k , k) = σu (xk , k) − ˆ V ,k ) σu (ˆxk , k), φ˜ k = φ((1/2)R−1 gˆ T (ˆxk )∇σVT (ˆxk+1 , k + 1)W

1782

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 8, AUGUST 2015

−φ((1/2)R−1 g T (xk )(∇σVT (xk+1 , k + 1)WV + ∇ε V (xk+1 , k + 1))) and ε¯ u (xk , k) = −εu (xk , k) + (1/2) L φ R−1 g T (xk )∇ σ˜ VT (xk+1 , xˆ k+1 , k + 1)WV + (1/2)L φ R−1 (g T (ˆxk ) − g T (xk ))∇σVT (ˆxk+1 , k + 1)WV − (1/2)L φ R−1 g T (xk ) × ∇εVT (ˆxk+1 , k + 1) − WuT σ˜ u (xk , xˆ k , k). Note that σ˜ u (xk , xˆ k , k) and ε¯ u (xk , k) are all bounded due to the boundedness of NN activation function and reconstruction error. Then, the error dynamics for the actor NN weights are revealed to be ˜ uk + αu ˜ uk+1 = W W

T σu (ˆxk , k)euk . 1 + σuT (ˆxk , k)σu (ˆxk , k)

(35)

Remark 2: The actor NN weight tuning based on gradient descent approach is similar to that in [22], the difference being the estimated state vector xˆ k is utilized as the input to the actor NN activation function instead of measured state vector xk . In addition, the total error comprising of Bellman and terminal constraint errors are utilized to tune the weights whereas in [22], the terminal constraint is ignored. Furthermore, the optimal control scheme in this paper utilizes the identified control coefficient matrix g(ˆ ˆ xk ), whereas in [22], the control coefficient matrix g(xk ) is considered known. Due to these differences, the stability analysis differs significantly from [22]. C. Dynamic Quantizer Design To handle the saturation caused by limited quantization range in a realistic quantizer, a new parameter μk is introduced. The proposed dynamic quantizers for the control input is defined as uqk = qd (uk ) = μk q(uk /μk )

(36)

where μk is a time-varying scaling parameter to be defined later for the control input quantizers. Normally, the dynamics of the quantization error cannot be established, since it is mainly a roundoff error. Instead, we will consider the quantization error bound as presented next, which will aid in the stability analysis. Given the dynamic quantizer in the form (36), the quantization error for the control inputs (31) is bounded, as long as saturation does not occur with the bound 1 μk k = e M,k (37) 2 where e M,k is upper bound for control input quantization error. Next, define the scaling parameter μk as euk  ≤

μk = uk /(λk M)

(38)

where 0 < λ < 1. Recall from representation (36) that the signals to be quantized can be scaled back into the quantization range with the decaying rate of λk , and thus eliminating the saturation effect. To complete this section, the flowchart of our proposed finite-horizon near-optimal regulation scheme is shown in Fig. 3. We initialize the system with an admissible control input, and for proper parameter selection, the NN weights are initialized. The control input is then quantized using the proposed dynamic quantizer. The NNs for observer, critic, and actor are updated based on our proposed weights tuning

Fig. 3.

Flowchart of the proposed finite-horizon near-optimal regulator.

laws at each sampling interval beginning with an initial time and until the final fixed time instant in an online and forward-in-time fashion. D. Stability Analysis In this section, the system stability will be investigated. It will be shown that the overall closed-loop system remain bounded under the proposed near-optimal regulator design. Before proceeding, the following lemma is needed. Lemma [24] (Bounds on the Optimal Closed-Loop Dynamics): Consider the nonlinear discrete-time system (1) with Assumption 1. There exists an optimal control policy u∗k such that closed-loop system dynamics f (xk ) + g(xk )u∗k can be expressed as  f (xk ) + g(xk )u∗k 2 ≤ ρxk 2

(39)

where 0 < ρ < 1 is a constant. Theorem 3 (Boundedness of the Closed-Loop System): Let the Assumption 1 holds and an initial control input admissible. Let the NNs weights be selected within a compact set with the observer be provided by (13) and the NN weight tuning for the observer, critic network and action network be given by (15), (24), and√(32), respectively. Then, there exists positive constant (2 − 2)/2 < α I < 1, 0 < αV < 1/6, and

XU et al.: FINITE-HORIZON NEAR-OPTIMAL OUTPUT FEEDBACK NN CONTROL

1783

0 < αu < 1, such that the system state xk , observer error x˜ k , ˜ k , and critic and action NN observer weight estimation error W ˜ uk are all UUB, ˜ network weights estimation errors WV k and W with the ultimate bounds given by (A.20)–(A.24). In addition, the estimated control input is bounded closed to the optimal ˆ xk , k) ≤ εuo for a small value such that u∗ (xk , k) − u(ˆ positive constant εuo . Proof: See the Appendix. IV. S IMULATION R ESULTS In this section, a practical example is considered to illustrate our proposed near-optimal regulation design scheme. Consider the two-link planar robot arm [21] x˙ = f (x) + g(x)u, y = Cx

Fig. 4.

System response and control inputs.

(40)

where f (x) and g(x) can be found from [21]. The system is discretized with a sampling time of h = 5 ms and control constraint is set to be U = 1.5, i.e., −1.5 ≤ u 1 ≤ 1.5 and −1.5 ≤ u 2 ≤ 1.5. Define the performance index V (xk , k) = ψ(x N ) +

N−1 

i=k  × Q(xi , i ) + 2

ui 0

U tanh−T

v Rdv (41) U

where Q(xk , k), for simplicity, is selected as standard quadratic ¯ k with Q ¯ = 0.1I4 form of the system states as Q(xk , k) = xkT Qx and weighting matrix R is selected as R = 0.001I2 , where I denotes the identity matrix with appropriate dimension. The Hurwitz matrix A is selected as a 4 × 4 block diagonal matrix whose blocks Aii are chosen to be   0.9 0.1 Aii = . 0 0.9 The terminal constraint is chosen as ψ(x N ) = 3. The horizon length is 5 s. For the NN setup, the inputs for the NN observer are selected as zk = [ˆxk , uk ]. Inspired from [27], the time-varying activation functions for the critic and actor network are chosen as sigmoid function with input to be [xˆ1 , . . . xˆ4 , τ, xˆ1 xˆ2 , . . . , xˆ3 xˆ4 , τ 2 , xˆ1 τ, . . . xˆ4 τ, xˆ12 , . . . , xˆ42 ] and [xˆ1 , . . . xˆ4 , xˆ1 τ, . . . , xˆ4 τ ], which result in 24 and 8 neurons, respectively, and τ = (N − k)/N is the normalized time-to-go. The number of bits for the quantizer is chosen to be 4 while the design parameters are selected as α I = 0.7, β I = 0.01, αV = 0.1, αu = 0.03, and λ = 0.9. The initial system states and the observer states are selected as x0 = [π/3, π/6, 0, 0]T and xˆ 0 = [0, 0, 0, 0]T , respectively. Using pole placement method, the initial admissible control input is chosen as u(0) = [0.2; −1] and the observer gain is chosen as L = [−0.3, 0.1, 0.7, 1]T and the matching matrices Bl and l are selected as column vectors with all ones. All the NN weights are initialized at random. First, the system response and control input are shown in Fig. 4. Both system states and control clearly converge close enough to the origin within finite time, which illustrates the stability of the proposed design scheme. Next, the quantization errors for the control inputs with proposed dynamic quantizer

Fig. 5. (a) Quantization error with dynamic quantizer. (b) Quantization error with static quantizer.

and traditional uniform quantizer are shown in Fig. 5, respectively. Compared with Fig. 5(a) and (b), it is clear that the quantization errors are decreasing overtime instead of they being bounded for traditional uniform quantizer, illustrating the effectiveness of the proposed dynamic quantizer design. Next, the error history in the design procedure is given in Fig. 6. From the figure, it can be seen that Bellman equation error eventually converges close to zero, which illustrates the fact that the optimality is indeed achieved. More importantly, the convergence of the terminal constraint error demonstrates that the terminal constraint is also satisfied with our proposed design. Finally, the convergence of critic and actor NN weights

1784

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 8, AUGUST 2015

update laws and Lyapunov stability theory demonstrated that the approximated control inputs converge close to its optimal value as time evolves. The performance of the proposed finite time near-optimal regulator is demonstrated via simulation. A PPENDIX Proof of Theorem 1: Consider the following Lyapunov candidate: L IO (k) = L x˜ ,k + L W,k ˜

Fig. 6.

History of error terms.

(A.1)

˜ kT W ˜ k }, and where L x˜ ,k = x˜ kT x˜ k , L W,k = tr{W ˜ 2 L×L the identity matrix  = (2(1 + σmin )/β I )I, with I ∈  2 < σ (ˆ and 0 < σmin xk )2 < σ (ˆxk )u¯ qk 2 is ensured to exist by the PE conditions, and tr{•} denotes the trace operator. The first difference of L IO (k) is given by L IO (k) = L x˜ ,k + L W,k ˜ .

(A.2)

Next, we consider each term in (A.2) individually. First, recall from the observer error dynamics (14), we have T x˜ k+1 − x˜ kT x˜ k L x˜ ,k = x˜ k+1   ˜ kT σ (ˆxk )u¯ qk + ε¯ Ok T = Ac x˜ k + W   ˜ kT σ (ˆxk )u¯ qk + ε¯ Ok − x˜ kT x˜ k × Ac x˜ k + W

˜ kW ˜ kT σ (ˆxk )u¯ qk = x˜ kT AcT Ac x˜ k + [σ (ˆxk )u¯ qk ]T W T

Fig. 7.

Convergence of critic/actor NN weights.

is shown in Fig. 7. It can be observed from the results that the novel NN structure with our proposed tuning law guarantees that the NN weights converge to constants and remain bounded, as desired. This illustrates the feasibility of NN approximation for time-varying functions. V. C ONCLUSION In this paper, the NN-based finite-horizon near-optimal regulator design using output feedback for uncertain quantized nonlinear discrete-time system in affine form is addressed. Compared with the traditional finite-horizon optimal regulator design, the proposed scheme not only relaxes the need for the system state vector and control coefficient matrix, but also takes input constraint and quantization effect into account while functioning in an online and forward-in-time manner without using value/policy iterations. However, an initial admissible control input helps to stabilize the system initially while the NN learns the uncertain dynamics. The nonquadratic cost functional helps to handle the input constraint. The dynamic quantizer effectively mitigates the quantization error for the control inputs, while the NN-based Luenberger observer relaxes the need for an additional identifier. Time-dependence nature of the finite-horizon is handled by an NN structure with constant weights and time-varying activation function. The terminal constraint is properly satisfied by minimizing an additional error term along the system trajectory. All NN weights are tuned online using proposed

˜ k σ (ˆxk )u¯ qk + ε¯ TOk ε¯ Ok + 2˜xkT AcT W T

˜ k σ (ˆxk )u¯ qk − x˜ kT x˜ k + 2˜xkT AcT ε¯ Ok + 2ε¯ TOk W ˜ k 2 σ (ˆxk )u¯ qk 2 + 3¯ε Ok 2 ≤ −(1 − γ)˜xk 2 + 3W (A.3) where γ = 3Ac 2 . Next, recall from (16), we have   T   T ˜ ˜ ˜ ˜ L W,k ˜ = tr Wk+1 ΛWk+1 − tr Wk ΛWk ˜ k 2 ≤ −(1 − 2(1 − α I )2 )W + 6β I2 σ (ˆxk )u¯ qk 2 Ac 2 lC2 ˜xk 2   2 ˜ k 2 σ (ˆxk )u¯ qk 2 W −2β I (1−α I )lC − β I σmin + εWM (A.4) 2 C2 ε¯ 2 where  = Λ and εWM = 6α 2I W2M + 6β I2 σM M OM 2 2 2 2 with 0 ≤ σ (x k ) ≤ σ M and ¯ε Ok  ≤ ε¯ O M . Therefore, the first difference of the total Lyapunov candidate, by combining (A.3) and (A.4), is given as

L IO (k) = L x˜ ,k + L W,k ˜      2 γ ˜xk 2 + εOM ≤ − 1 − 1 + 4C2M 1 + σmin ˜ k 2 σ (ˆxk )u¯ qk 2 − (1 − 2(1 − α I )2 )W ˜ k 2 −W

(A.5)

where εOM = 3¯ε Ok 2 + εWM . Using Lyapunov stability [25], when √ L IO,k is less than zero outside a compact set 2 + 1) (2 − 2)/2 < α I < 1, 0 < β I < 2(1 − α I )CM /(σmin and the following conditions hold:  εOM   ≡ bx˜  ˜xk  > (A.6) 2 γ 1 − 1 + 4C2M 1 + σmin

XU et al.: FINITE-HORIZON NEAR-OPTIMAL OUTPUT FEEDBACK NN CONTROL

1785

˜ Vk) = W ˜ TV k+1 W ˜ TV k W ˜ V k+1 − W ˜ Vk L(W  T (σˆ V k + σˆ V k+1 + σV M Bl )e B,k σ ˆ e V N N,k ˜ V k − αV = W − αV 1 + σˆ V k + σˆ V k+1 + σV M Bl 2 1 + σˆ VT N σˆ V N   (σˆ V k + σˆ V k+1 + σV M Bl )e B,k σ ˆ e V N N,k ˜ TV k W ˜ V k − αV ˜ Vk −W × W − αV 1 + σˆ V k + σˆ V k+1 + σV M Bl 2 1 + σˆ VT N σˆ V N = −2αV +αV2

˜ TV k (σˆ V k + σˆ  W V k+1 + σV M Bl )e B,k 1 + σˆ VT k σˆ V k

 2 χmin

εOM ≡ bW˜ . + (1 − 2(1 − α I )2 )

˜ V k ) + L(˜xk ) + L(W ˜ k) L W˜ V (k) = L(W ˜ Vk, ˜ TV k W W

(A.7) − 2αV

(A.8)

˜ Vk) = where L(W L(˜xk ) = = T 2 2 ˜ ˜ (tr{Wk Wk }) and  = (αV (1 + 3αV ) (L Q + WV M L σV ) )/ 2 )(1 − 3γ 2 )). Next, take each term in (A.8) indi((1 + σmin ˜ V k ), by recalling (28), is vidually. The first difference of L(W given by (A.9), as shown at the top of this page. ˜ Vk) Recall from (26) and (27), the first difference of L(W can be further derived as ˜ k) (˜xkT x˜ k )2 L(W

˜ Vk) L(W

× − 2αV

− 2αV

  (L Q + WV M L σV )˜xk 2 ˜ TV k (σˆ V k + σˆ  +W + σV M Bl ) +εV Bk

1 + σˆ VT N σˆ V N  T 2 T ˜ ˆ V N σˆ V N 2 W V k σˆ V N + ε V N σ + αV  2 T 1 + σˆ V N σˆ V N  2 (L Q + WV M L σV )˜xk 2 ˜ TV k (σˆ V k + σˆ  +W V k+1 + σV M Bl ) + ε V Bk 2 + αV 1 + σˆ V k + σˆ V k+1 + σV M Bl 2 ×σˆ V k + σˆ V k+1 + σV M Bl 2

1 + σˆ V k + σˆ V k+1 + σV M Bl 2

2 ˜ TV k σˆ V k + σˆ  W V k+1 + σV M Bl  ε V Bk

1 + σˆ V k + σˆ V k+1 + σV M Bl 2

˜ TV k σˆ V N εV N W (L Q + WV M L σV )2 ˜xk 4 2 − 2αV + 3α V 1+σˆ V k + σˆ V k+1 +σV M Bl 2 1 + σˆ VT N σˆ V N + 3αV2 + 2αV2 + 3αV2 + 2αV2

1 + σˆ V k

+

ε2V Bk + σˆ V k+1 + σV M Bl 2

˜ TV k σˆ V N σˆ T W ˜ W V N Vk 1 + σˆ VT k σˆ V k 2 ˜ ˜ TV k σˆ V k + σˆ  W V k+1 + σV M Bl  W V k

1 + σˆ V k + σˆ V k+1 + σV M Bl 2

1 + σˆ V k

ε2V N + σV k+1 + σV M Bl 2

2 αV (1 − 6αV ) ωmin ˜ V k 2 W 2 2 1 + ωmin

− αV (1 − 2αV )

V k+1

1 + σˆ V k + σˆ V k+1 + σV M Bl 2  T  ˜ V k σˆ V N + εV N ˜ TV k σˆ V N W W

2 ˜ ˜ TV k σˆ V k + σˆ  W V k+1 + σV M Bl  W V k

˜ TV k σˆ V N σˆ T W ˜ W V N Vk − 2αV  1 + σˆ V k + σˆ V k+1 + σV M Bl 2

≤−

˜ TV k σˆ V k W

(A.9)

˜ TV k σˆ V k ˜xk 2 (L Q + WV M L σV )W ≤ −2αV σˆ V k + σˆ V k+1 + σV M Bl 2

Note that in (A.6), the denominator is guaranteed 2 (1 + σ 2 )) to be positive, i.e., 0 < γ < 1/(1 + 4C M min with 0 < C ≤ C M provided the designed parameters A and L are selected using pole placement [15] 2 )))1/2 . such that Ac  = A − LC ≤ (1/(1 + 4C2M (1 + σmin Proof of Theorem 2: First, for simplicity, denote σˆ V k = σV (ˆxk , k), σˆ V k+1 = σV (ˆxk+1 , k + 1), εV Bk = εV B (xk , k), and σˆ V N = σV (ˆx N , N). Consider the following Lyapunov candidate:

≤ −2αV

˜ TV k σˆ V N e N,k W 1 + σˆ VT N σˆ V N

e2B,k (σˆ V k + σˆ V k+1 + σV M Bl )T (σˆ V k + σˆ V k+1 + σV M Bl ) e2N,k σˆ VT N σˆ V N 2 + α    2 V 2 1 + σˆ VT k σˆ V k 1 + σˆ VT N σˆ V N

or ˜ k > W

− 2αV

2 σmin 2 1 + σmin

˜ V k 2 W

αV (1 + 3αV ) (L Q + WV M L σV )2 ˜xk 4 + εV TM 2 1 + ωmin

(A.10)

2 < σ where 0 < ωmin ˆ V k + σˆ V k+1 + σV M Bl 2 and

εV TM = αV (1 + 3αV )

ε2V Bk 2  1 + σˆ V k + σˆ V k+1 + σV M Bl 

+ 2WV M σV M + αV (1 + 2αV )

ε2V N 1 + σˆ VT k σˆ V k

.

Next, consider L(˜xk ). Recall (A.3) and apply Cauchy–Schwartz inequality, the first difference of L(˜xk ) is

1786

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 8, AUGUST 2015

given by

or

L(˜xk ) 2  2  T x˜ k+1 − x˜ kT x˜ k = x˜ k+1   ˜ k 2 σ (ˆxk )u¯ qk 2 + 3¯ε Ok 2 ≤ −(1 − γ )˜xk 2 + 3W   ˜ k 2 σ (ˆxk )u¯ qk 2 + 3¯ε Ok 2 × (1 + γ )˜xk 2 + 3W ˜ k 4 σ (ˆxk )u¯ qk 4 + 36¯ε Ok 4 . ≤ −(1 − 2γ 2 )˜xk 4 + 36W (A.11)

 W˜ k  >

4

Proof of Theorem 3: Consider the following Lyapunov function candidate as: L(k) = L x (k) + L IO (k) + L W˜ V (k) +

˜ k ) as Next, recall (A.4) and write the difference L(W ˜ ) L(W  k T   T   ˜ k+1 W ˜ k+1 2 − tr W ˜ k W ˜k 2 = tr W  ˜ k 2 − 2β I ησ (ˆxk )u¯ qk 4 W ˜ k 2 ≤ − (1 − 2(1 − α I )2 )W  2 2 2 2 2 + 6β I σ (ˆxk )u¯ qk  Ac  lC ˜xk  + εWM  ˜ k 2 − 2β I σ (ˆxk )u¯ qk 4 W ˜ 2 × (1 − 2(1 − α I )2 )W  k 2 2 2 2 2 + 6β I σ (ˆxk )u¯ qk  Ac  lC ˜xk  + εWM ˜ k 4 ≤ −(1 − 8(1 − α I )4 )2 W 2 ˜ k 4 − 4(2(1 − α I ) − 3η)β I η2 σ (ˆxk )u¯ qk 4 W 2 4 4 4 2 + 210 Ac  lC ˜xk  + 5εWM (A.12) where η = 2(1 − αI )CM − βI σ (ˆxk )u¯ qk 2 . Therefore, combining (A.11) and (A.12) yields ˜ k) L(˜xk ) + L(W 2 ˜ k 4 σ (ˆxk )u¯ qk 4 ≤ −(1 − 2γ )˜xk 4 + 36W ˜ k 4 + 36¯ε Ok 4 − (1 − 8(1 − α I )4 )2 W

˜ k 4 − 4(2(1 − α I )2 − 3η)β I η2 σ (ˆxk )u¯ qk 4 W

2 + 52.52γ 2 Ac 4 lC4 ˜xk 4 + 5εWM 2 4 4 ˜ ≤ −(1 − 3γ )˜xk  − 4σ (ˆxk )u¯ qk  Wk 4 ˜ k 4 + ε4 − (1 − 8(1 − α I )4 )2 W

(A.13)

2 . Finally, combining (A.10) and where ε4 = 36¯ε Ok 4 + 5εWM (A.13) yields the first difference of total Lyapunov candidate

˜ V k ) + L(˜xk ) + L(W ˜ k) L W˜ V (k) = L(W 2 αV (1 − 6αV ) ωmin ˜ V k 2 ≤− W 2 2 1 + ωmin 2 σmin ˜ V k 2 − αV (1 − 2αV ) W 2 1 + σmin αV (1 + 3αV ) − (L Q + WV M L σV )2 ˜xk 4 2 1 + σmin ˜ k 4 − 4σ (ˆxk )u¯ qk 4 W 4 ˜ k 4 + ε1 − (1 − 8(1 − α I ) )2 W (A.14) where ε1 = εV TM + ε4 . Using standard Lyapunov stability analysis [26], L is less than zero outside a compact set as long as 0 < αV < 1/6 and the following conditions hold:  ε1 ≡ bx˜ (A.15) ˜xk  > 4 α (1 + 3α ) V V (L Q + WV M L σV )2 2 1 + ωmin

or

  ˜ WV k  >  

ε1 2 αV (1−6αV ) ωmin 2 2 1+ωmin

σ2

+ αV (1 − 2αV ) 1+σmin2

≡ bW˜ V

min

(A.16)

ε1 ≡ bW˜ . (A.17) 4 + (1 − 8(1 − α )4 )2 4σmin I

 2b  L W˜ u (k) + L e (k) 2 σuM u (A.18)

where L IO (k) and L W˜ V (k) are defined in (A.1) and (A.8), ˜ uk , and L eu (k) = e2 respectively, L W˜ u (k) = W Mu,k is the upper bound for the quantization error defined later. Moreover, the first term is defined as L x (k) = xk  with  = (αu /2σuM )(σu2 min /1 + σu2 min ). Denote σuk = σu (xk , k), σˆ uk = σu (ˆxk , k), ∇ σˆ V k+1 = ∇σV (ˆxk+1 , k + 1), ε¯ uk = ε¯ u (xk , k), gkT = g T (xk ), gˆ kT = g T (ˆxk ), and g˜ kT = g˜ T (ˆxk ) for simplicity, then the first difference of L W˜ u (k) recalling from (34) and (35), is given by L W˜ u (k) ˜ uk  ˜ uk+1  − W = W      L 2 R−2 g 2 σˆ T σˆ uk ε¯ T  σu2 min uk φ   M uk ˜ uk  + αu  W ≤ 1 − αu 2   2 T   1 + σu min 4 1 + σˆ uk σˆ uk

2 αu L 2φ σg2  T 5  + ∇σ ˜ k 2 ˜ + αu 1 + W W V k+1 V k 4¯εuk  4¯εuk     L 2 R−2 σˆ T σˆ uk ∇σ T ∇σV k+1 W 2 ε¯ T   φ uk V k+1 V M uk  + αu   2  Tσ   4 1 + σˆ uk ˆ uk    L 2 R−2 g 2 σˆ T σˆ uk ε¯ T  uk   φ M uk + αu   2  T   1 + σˆ uk σˆ uk    L 2 R−2 σˆ T σˆ uk   φ  2 ˜ 2 uk + αu    σ Wk   4 1 + σˆ T σˆ uk 2  g uk    σˆ T ε¯ T   uk uk  ˜ uk 2 + αu   − W Tσ  1 + σˆ uk ˆ uk 

σu2 min 5 ˜ uk  + αu 1 + ˜ V k 2 ≤ −αu  W ∇σV2 M W 4¯εuk  1 + σu2 min   λmax (R−1 ) 1 2 2 ˜ k 2 + ε¯ T M  W +  + αu L φ σg 4¯εuk  4 1 + σu2 min (A.19) where ε¯ T M = αu



5L 2φ λmax (R−2 )g 2M L 2φ λmax (R−2 )∇σV2 M WV2 M     + 4 1 + σu2 min 4 1 + σu2 min  σuM + ε¯ uM 1 + σu2 min

and 0 < σu min < σˆ uk  < σuM . Next, consider L eu (k). T ˆ uk The control input is given as u(ˆxk , k) = W σˆ uk = T ˜ uk σˆ uk . Then, the quantization error bound is WuT σˆ uk − W

XU et al.: FINITE-HORIZON NEAR-OPTIMAL OUTPUT FEEDBACK NN CONTROL

1787

2 αV (1 − 2αV ) ωmin ˜ V k 2 W 2 2 1 + ωmin αV (1 + 3αV ) − (L Q + WV M L σV )2 ˜xk 4 2 1 + ωmin ˜ k 4 − (1 − 8(1 − α I )4 )2 W ˜ k 4 −4σ (ˆxk )u¯ qk 4 W

given by



  T T W σˆ uk − W ˜ uk σˆ uk  u(ˆxk , k) u = e Mu,k = 2b 2b WuM σuM σuM ˜ ≤ + b Wuk  ≡ e2Mu,k . 2b 2 The first difference of L eu (k) is given as

L eu (k) = e2Mu,k+1 − e2Mu,k WuM σuM σuM ˜ = + b W uk+1  b 2 2

σuM ˜ WuM σuM +  W  − uk 2b 2b σuM ˜ ˜ uk ). (A.20) = b (W uk+1  − W 2 Recalling from (A.19), we further have σuM ˜ ˜ uk ) L eu (k) = b (W uk+1  − W 2 σuM σu2 min ˜ uk  + σuM ε¯ T M ≤ −αu b W 2 1 + σu2 min 2b

σuM 5 ˜ V k 2 + αu b 1 + ∇σV2 M W 2 4¯εuk    1 λmax (R−1 ) 2 2 σuM ˜ k 2 .  W +  + αu L ϕ σg b 2 4¯εuk  4 1 + σu2 min (A.21)

−αu 

σu2 min 1 + σu2 min

where εCLM = εOM + ε1 + ε¯ T M + g M WuM σuM . Using Lyapunov stability analysis [26], √ L is less than zero outside a compact set as long as (2 − 2)/2 < α I < 1, 0 < αV < 1/6, 0 < αu < 1, and the following conditions hold: 2εCLM ≡ bx xk  > (A.22) (1 − ρ) or   εCLM ˜ WV k  >  ≡ bW˜ V  α (1−6α ) ω2 ω2 V V min + αV (1 − 2αV ) min2 2 2 1+ωmin

or

⎫ ⎧ εCLM ⎪ ⎪ ⎪ ⎪    , ⎪ ⎬ ⎨ 1 − 1 + 4γ C2 1 + σ 2 γ ⎪ M min  ≡ b (A.24) x˜k  > min εCLM ⎪ x˜ ⎪ ⎪ ⎪ ⎪ 4 αV (1+3αV ) (L + W L )2 ⎪ ⎭ ⎩ Q VM σV 2 1+ωmin

Combine (A.5), (A.14), (A.19), and (A.21) to obtain the first difference of the total Lyapunov candidate as L(k)

˜ k  > min W

2b   L W˜ u (k) + L eu (k) 2 σu M ≤ f(xk ) + g(xk )u∗k  − xk    ˜ Tuk  +gM WTu (σuk − σˆ uk ) + W      2 γ ˜xk 2 − 1 − 1 + 4γ C2M 1 + σmin ˜ k 2 σ (ˆxk )u¯ qk 2 − 1 (1 − 2(1 − αI )2 )W ˜ k 2 −W 2 2 αV (1 − 6αV ) ωmin ˜ V k 2 − W 2 2 1 + ωmin +



2 αV (1 − 2αV ) ωmin ˜ V k 2 W 2 2 1 + ωmin

2 σu2 min αV (1 − 2αV ) ωmin ˜ V k 2 − αu  ˜ uk  −  W W 2 2 1 + ωmin 1 + σu2 min αV (1 + 3αV ) − (L Q + WV M L σV )2 ˜xk 4 2 1 + ωmin ˜ k 4 −4σ (ˆxk )u¯ qk 4 W

˜ k 4 + εCLM −(1 − 8(1 − α I )4 )2 W      2 2 1 + σmin γ x˜k 2 ≤ −(1 − ρ)x k  − 1 − 1 + 4γ C M 1 −W˜ k 2 σ (xˆk )u¯ qk 2 − (1 − 2(1 − α I )2 )W˜ k 2 2 2 αV (1 − 6αV ) ωmin ˜ V k 2 − W 2 2 1 + ωmin

1+ωmin

(A.23)

or

= L x (k) + L I O (k) + L W˜ V (k)

˜ uk  + εCLM W

⎧ ⎪ ⎪ ⎪ ⎨

⎫ ⎪ ⎪ ⎪ ⎬

εCLM

, 2 + 1 (1 − 2(1 − α )2 ) σmin I 2 ≡ bW˜  ⎪ ⎪ εCLM ⎪ ⎪ ⎪ ⎪ 4 ⎩ 4 + (1 − 8(1 − α )4 )2 ⎭ 4σmin I (A.25)

or

  ˜  Wuk  >

εCLM σ2

αu  1+σu min 2

≡ bW˜ u

(A.26)

u min

where

 = min

⎧ ⎪ ⎪ ⎪ ⎨

2 ε¯ αV (1 − 2αV )σmin uM

,

⎫ ⎪ ⎪ ⎪ ⎬

2 + 2¯ αu (5 + 4¯εuM )∇σV2 M σmin εuM . 2 2 ⎪ (1 − 2(1 − α I ) )(1 + σmin )¯εuM ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ −1 ⎭ 2 + ε¯ 2αu σg2 L 2φ (1 + σmin uM λmax (R ))

Eventually, recall (A.21), the difference between the ideal optimal control and proposed near-optimal control inputs is represented as  ∗  u (xk , k) − u(ˆ ˆ xk , k)   T ˆ uk = WuT σu (xk , k) + εu (xk , k) − W σu (ˆxk , k)  T  ˜ σu (ˆxk , k) + WT σ˜ u (xk , xˆ k , k) + εu (xk , k) = W uk

u

≤ bW˜ u σuM + WuM σ˜ u (xk , xˆ k , k) + εuM ≤ bW˜ u σuM + lσ WuM ˜xk  + εuM ≤ bW˜ u σuM + lσ WuM bx˜ + εuM ≡ εuo

(A.27)

where lσ is the Lipschitz constant of σu (•), and bW˜ u , bx˜ are given in (A.26) and (A.23).

1788

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 8, AUGUST 2015

R EFERENCES [1] H. J. Sussmann, E. D. Sontag, and Y. Yang, “A general result on the stabilization of linear systems using bounded controls,” IEEE Trans. Autom. Control, vol. 39, no. 12, pp. 2411–2425, Dec. 1994. [2] A. Saberi, Z. Lin, and A. R. Teel, “Control of linear systems with saturating actuators,” IEEE Trans. Autom. Control, vol. 41, no. 3, pp. 368–378, Mar. 1996. [3] S. E. Lyshevski, “Optimal control of nonlinear continuous-time systems: Design of bounded controllers via generalized nonquadratic functionals,” in Proc. Amer. Control Conf., Philadelphia, PA, USA, Jun. 1998, pp. 205–209. [4] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge, U.K.: Cambridge Univ. Press, 2005. [5] L. Tan, Digital Signal Processing: Fundamentals and Applications. New York, NY, USA: Academic, 2007, ch. 2. [6] D. F. Delchamps, “Stabilizing a linear system with quantized state feedback,” IEEE Trans. Autom. Control, vol. 35, no. 8, pp. 916–924, Aug. 1990. [7] R. W. Brockett and D. Liberzon, “Quantized feedback stabilization of linear systems,” IEEE Trans. Autom. Control, vol. 45, no. 7, pp. 1279–1289, Jul. 2000. [8] J. E. Slotine and W. Li, Applied Nonlinear Control. Englewood Cliffs, NJ, USA: Prentice-Hall, 1991. [9] H. K. Khalil and L. Praly, “High-gain observers in nonlinear feedback control,” Int. J. Robust Nonlinear Control, vol. 24, no. 6, pp. 993–1015, 2014. [10] Q. Wei and D. Liu, “A novel iterative θ -adaptive dynamic programming for discrete-time nonlinear systems,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 4, pp. 1176–1190, Oct. 2014. [11] D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 3, pp. 621–634, Mar. 2014. [12] D. Liu and Q. Wei, “Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems,” IEEE Trans. Cybern., vol. 43, no. 2, pp. 779–789, Apr. 2013. [13] Q. Wei, F.-Y. Wang, D. Liu, and X. Yang, “Finite-approximation-errorbased discrete-time iterative adaptive dynamic programming,” IEEE Trans. Cybern., vol. 44, no. 12, pp. 2820–2833, Dec. 2014. [14] Q. Wei, D. Liu, and G. Shi, “A novel dual iterative Q-learning method for optimal battery management in smart residential environments,” IEEE Trans. Ind. Electron., vol. 62, no. 4, pp. 2509–2518, Oct. 2014. [15] C. T. Chen, Linear System Theory and Design. New York, NY, USA: Oxford Univ. Press, 2012. [16] M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779–791, 2005. [17] H. Xu and S. Jagannathan, “Stochastic optimal controller design for uncertain nonlinear networked control system via neuro dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 3, pp. 471–484, Mar. 2013. [18] Q. Zhao, H. Xu, and S. Jagannathan, “Adaptive dynamic programmingbased state quantized networked control system without value and/or policy iterations,” in Proc. Int. Joint Conf. Neural Netw., Jun. 2012, pp. 1–7. [19] Q. Wei and D. Liu, “Numerical adaptive learning control scheme for discrete-time non-linear systems,” IET Control Theory Appl., vol. 7, no. 11, pp. 1472–1486, Jul. 2013. [20] F. L. Lewis and V. L. Syrmos, Optimal Control, 2nd ed. New York, NY, USA: Wiley, 1995. [21] Z. Chen and S. Jagannathan, “Generalized Hamilton–Jacobi–Bellman formulation-based neural network control of affine nonlinear discretetime systems,” IEEE Trans. Neural Netw., vol. 19, no. 1, pp. 90–106, Jan. 2008. [22] T. Dierks and S. Jagannathan, “Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using timebased policy update,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 7, pp. 1118–1129, Jul. 2012. [23] D. Liberzon, “Hybrid feedback stabilization of systems with quantized signals,” Automatica, vol. 39, no. 9, pp. 1543–1554, 2003. [24] Q. Wei and D. Liu, “Neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems with approximation errors,” Neurocomputing, vol. 149, pp. 106–115, Feb. 2015. [25] K. S. Narendra and K. Parthasarathy, “Identification and control of dynamical systems using neural networks,” IEEE Trans. Neural Netw., vol. 1, no. 1, pp. 4–27, Mar. 1990.

[26] S. Jagannathan, Neural Network Control of Nonlinear Discrete-Time Systems. Boca Raton, FL, USA: CRC Press, 2006. [27] A. Heydari and S. N. Balakrishnan, “Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 1, pp. 145–157, Jan. 2013.

Hao Xu (M’12) was born in Nanjing, China, in 1984. He received the master’s degree in electrical engineering from Southeast University, Nanjing, in 2009, and the Ph.D. degree from the Missouri University of Science and Technology, Rolla, MO, USA, in 2012. He is currently with Texas A&M University–Corpus Christi, Corpus Christi, TX, USA, where he is an Assistant Professor with the School of Engineering and Computing Sciences and the Director of Unmanned Systems Research Laboratory. His current research interests include autonomous unmanned aircraft systems, wireless passive sensor network, localization, detection, networked control system, cyber-physical system, distributed network protocol design, optimal control, and adaptive control.

Qiming Zhao was born in Xi’an, China, in 1985. He received the master’s degree from the Department of Automation, Northwestern Polytechnical University, Fremont, CA, USA, in 2010, and the Ph.D. degree in electrical engineering from the Missouri University of Science and Technology, Rolla, MO, USA, in 2013. He is currently a Research and Development Engineer with DENSO International America, Inc., Southfield, MI, USA. His current research interests include approximate/adaptive dynamic programming, neuron network-based control, optimal control, and adaptive control.

Sarangapani Jagannathan (SM’99) is currently with the Missouri University of Science and Technology, Rolla, MO, USA, where he is a RutledgeEmerson Endowed Chair Professor of Electrical and Computer Engineering and the Site Director with the NSF Industry/University Cooperative Research Center on Intelligent Maintenance Systems. He has co-authored 129 peer reviewed journal articles most of them in the IEEE T RANSACTIONS and 235 refereed IEEE conference articles, several book chapters, three books, and holds 20 U.S. patents. He supervised to graduation around 18 doctoral and 29 M.S. level students and his funding is in excess of 14 million from various U.S. federal and industrial members. His current research interests include neural network control, adaptive event-triggered control, secure networked control systems, prognostics, and autonomous systems/robotics. Mr.Jagannathan is a fellow of the Institute of Measurement and Control, U.K. He received many awards and has been on Organizing Committees of several IEEE Conferences. He is the IEEE CSS Tech Committee Chair on Intelligent Control. He was the Co-editor for the IET book series on control from 2010 to 2013 and serves as the Editor-in-Chief for Discrete Dynamics and Society and on many editorial boards.

Finite-Horizon Near-Optimal Output Feedback Neural Network Control of Quantized Nonlinear Discrete-Time Systems With Input Constraint.

The output feedback-based near-optimal regulation of uncertain and quantized nonlinear discrete-time systems in affine form with control constraint ov...
1MB Sizes 0 Downloads 15 Views