Full text
Transkript
CONVERGENCE OF SPLIT-COMPLEX BACKPROPAGATION ALGORITHM WITH A MOMENTUM∗ Huisheng Zhang†, Wei Wu‡ Abstract: This paper investigates a split-complex backpropagation algorithm with momentum (SCBPM) for complex-valued neural networks. Convergence results for SCBPM are proved under relaxed conditions and compared with the existing results. Monotonicity of the error function during the training iteration process is also guaranteed. Two numerical examples are given to support the theoretical findings. Key words: Complex-valued neural networks, split-complex backpropagation algorithm, convergence Received: November 18, 2008 Revised and accepted: January 10, 2011 1. Introduction In recent years great interest has been aroused in the complex-valued neural networks (CVNNs) for their powerful capability in processing complex-valued signals [1, 2, 3]. CVNNs are extensions of real-valued neural networks [4, 5]. A fully complex backpropagation (BP) algorithm and a split-complex BP algorithm are two types of complex backpropagation algorithms for training CVNNs. Unlike the case of the fully complex BP algorithm [6, 7], the operation of activation function in the split-complex BP algorithm is split into a real part and an imaginary part [2, 4, 5, 8]. This split-complex BP algorithm avoids occurrence of singular points in the adaptive training process. This paper considers the split-complex BP algorithms. ∗ This work is partially supported by the National Natural Science Foundation of China (10871220,70971014), and by the Fundamental Research Funds for the Central Universities (2009QN075). † Huisheng Zhang Department of Mathematics, Dalian Maritime University, Dalian, China, 116026 ‡ Wei Wu – Corresponding author School of Mathematical Sciences, Dalian University of Technology, Dalian, China, 116024, E-mail: [email protected] c °ICS AS CR 2011 75 Neural Network World 1/11, 75-90 Convergence properties of the algorithm are an important issue of its successful training for neural networks. The convergence of BP algorithm for real-valued neural networks has been analyzed by many authors from different aspects [9, 10, 11]. By using the contraction mapping theorem, the convergences in the mean and in the mean square for recurrent neurons were obtained by Mandic and Goh [12, 13]. For recent convergence analyses of complex-valued perceptrons and the BP algorithm for complex-valued neural networks, we refer the readers to [1, 14, 15]. It is well known that the learning process of BP algorithm can be very slow. To speed up and stabilize the learning procedure, a momentum is often added to the BP algorithm. The BP algorithm with a momentum (BPM in short) can be viewed as a memory gradient method in optimization theory [16]. Starting from a point close enough to a minimum point of the objective function, the memory gradient method converges under certain conditions [16, 17]. This local convergence result is also obtained in [18] where the convergence of BPM for a neural network is considered. However, these results cannot be applied to a more complex case where the initial weights are chosen stochastically. Some convergence results for the BPM are given in [19]; these results are of global nature since they are valid for arbitrarily given initial values of the weights. Our contribution in this paper is to present some convergence results of the split-complex backpropagation algorithm with a momentum (SCBPM). We borrow some ideas from [19], but we employ different proof techniques resulting in a new learning rate restriction which is much relaxed and easier to check than the counterpart in [19]. Actually, our approach can be applied to the BPM in [19] to relax the learning rate restriction there. Two numerical examples are given to support our theoretical findings. We also mention a recent paper on split quaternion nonlinear adaptive filtering [20], and we expect to extend our results to quaternion networks in future. The rest of this paper is organized as follows. A CVNN model with one hidden layer and the SCBPM are described in the next section. Section 3 presents the main convergence theorem. Section 4 gives two numerical examples to support our theoretical findings. The proof of the theorem is presented in Section 5. 2. The Structure of the Network and the Learning Method Fig. 1 shows the CVNN structure considered in this paper. It is a network with one hidden layer and one output neuron. Let the numbers of input and hidden neurons be L and M , respectively. We use “(·)T ” as the vector transpose operator, and write R I wm = wm + iwm = (wm1 , wm2 , · · · , wmL )T ∈ CL as the weight vector between the I R I R ∈ R, and wml , wml +iwml input√neurons and m-th hidden neuron, where wml = wml R i = −1, m = 1, · · · , M , and l = 1, · · · , L. Similarly, write wM +1 = wM +1 + I T M iwM +1 = (wM +1,1 , wM +1,2 , · · · , wM +1,M ) ∈ C as the weight vector between R I the hidden neurons and the output neuron, where wM +1,m = wM +1,m + iwM +1,m , m = 1, · · · , M . For simplicity, all the weight vectors are incorporated into a total weight vector W = ((w1 )T , (w2 )T , · · · , (wM +1 )T )T ∈ CM L+M . 76 (1) Zhang H., Wu W.: Convergence of split-complex backpropagation. . . w11 z1 wM+ 1,1 z2 . wM+ 1,2 . . w M + 1,M- 1 . . . O (OutputLayer) zL wM+ 1,M wML (InputNeurons) (HideenLayer) Fig. 1 Structure of a CVNN. For an input signal z = (z1 , z2 , · · · , zL )T = x+iy ∈ CL , where x = (x1 , x2 , · · · , xL )T ⊂ RL , and y = (y1 , y2 , · · · , yL )T ⊂ RL , the input to the m-th hidden neuron is R I Um = Um + iUm = L L X X R I I R (wml xl − wml yl ) + i (wml xl + wml yl ) l=1 l=1 R I I R = xT wm − y T wm + i(xT wm + yT wm ). (2) In order to apply the SCBPM to train a CVNN, we consider the following popular real-imaginary-type activation function [22] fC (U ) = f (U R ) + if (U I ) (3) for any U = U R + iU I ∈ C, where f is a given real function (e.g., a sigmoid function). The output Hm of the hidden neuron m is given by: R I R I Hm = Hm + iHm = f (Um ) + if (Um ). (4) Similarly, the input to the output neuron is S = S R + iS I = M X R R I I (wM +1,m Hm − wM +1,m Hm ) + i m=1 M X I R R I (wM +1,m Hm + wM +1,m Hm ) m=1 ¡ R T I ¢ R I T I I T R = (HR )T wM +1 − (H ) wM +1 + i (H ) wM +1 + (H ) wM +1 (5) and the output of the network is given by O = OR + iOI = g(S R ) + ig(S I ), (6) 77 Neural Network World 1/11, 75-90 R T I T where HR = (H1R , H2R , · · · , HM ) , HI = (H1I , H2I , · · · , HM ) , and g is a given real function. A bias weight can be added to each neuron, but it is omitted for simplicity of the presentation and deduction. Let the network be supplied with a given set of training examples {zj , dj }Jj=1 ⊂ CL × C. For each input zj = xj + iyj (1 ≤ j ≤ J) from the training set, we j j,R j,I write Um = Um + iUm (1 ≤ m ≤ M ) as the input for the hidden neuron m, j j,R j,I Hm = Hm + iHm (1 ≤ m ≤ M ) as the output for the hidden neuron m, S j = S j,R + iS j,I as the input to the output neuron, and Oj = Oj,R + iOj,I as the final output. The square error function of CVNN trained by the SCBPM can be represented as follows: J 1X j E(W) = (O − dj )(Oj − dj )∗ 2 j=1 J = = 1 X j,R [(O − dj,R )2 + (Oj,I − dj,I )2 ] 2 j=1 J X £ ¤ gjR (S j,R ) + gjI (S j,I ) , (7) j=1 where “(·)∗ ” denotes the complex conjugate operator, dj,R and dj,I are the real part and imaginary part of the desired output dj respectively, and gjR (t) = 1 1 (g(t) − dj,R )2 , gjI (t) = (g(t) − dj,I )2 , t ∈ R, 1 ≤ j ≤ J. 2 2 (8) The purpose of the network training is to find WF to minimize E(W). By writing Hj = Hj,R + iHj,I j,R T j,I T = (H1j,R , H2j,R , · · · , HM ) + i(H1j,I , H2j,I , · · · , HM ) , (9) R I we get the gradients of E(W) with respect to wm and wm respectively as follows J ¤ ∂E(W) X £ 0 0 = gjR (S j,R )Hj,R + gjI (S j,I )Hj,I , R ∂wM +1 j=1 (10) J ¤ ∂E(W) X £ 0 0 = −gjR (S j,R )Hj,I + gjI (S j,I )Hj,R , I ∂wM +1 j=1 (11) J · ¡ R ¢ ∂E(W) X 0 0 j,R j I 0 j,I j = gjR (S j,R ) wM +1,m f (Um )x − wM +1,m f (Um )y R ∂wm j=1 ¸ ¡ I ¢ 0 0 j,R j R 0 j,I j , 1 ≤ m ≤ M, (12) + gjI (S j,I ) wM f (U )x + w f (U )y +1,m m M +1,m m J · ¡ ¢ ∂E(W) X 0 R 0 j,R j I 0 j,I j = gjR (S j,R ) − wM +1,m f (Um )y − wM +1,m f (Um )x I ∂wm j=1 78 Zhang H., Wu W.: Convergence of split-complex backpropagation. . . ¸ ¡ ¢ 0 I 0 j,R j R 0 j,I j + gjI (S j,I ) − wM f (U )y + w f (U )x , +1,m m M +1,m m 1 ≤ m ≤ M. (13) n T T 0 0,R Write Wn = ((w1n )T , (w2n )T , · · · , (wM (n = 0, 1, · · · ). Let wm = wm + +1 ) ) 0,I 0,R 0,I iwm be arbitrarily chosen initial weights. Let ∆wm = ∆wm = 0. Then the R I SCBPM algorithm updates the real part wm and the imaginary part wm of the weights wm separately: ∂E(Wn ) n,R n,R + τm ∆wm , R ∂wm ∂E(Wn ) n,I n,I ∆wm , = −η + τm I ∂wm n+1,R n+1,R n,R ∆wm = wm − wm = −η n+1,I n+1,I n,I ∆wm = wm − wm (14) n,R n,I where η ∈ (0, 1) is the learning rate, τm and τm are the momentum factors, m = 1, 2, · · · , M + 1, and n = 0, 1, · · · . For simplicity, let us denote ∂E(Wn ) , R ∂wm ∂E(Wn ) = . I ∂wm pn,R m = pn,I m (15) Then (14) can be rewritten as n,R n,R n+1,R − ηpn,R ∆wm = τm ∆wm m , n+1,I n,I n,I ∆wm = τm ∆wm − ηpn,I m . (16) n,R n,I Similarly to the BPM [19], we choose the adaptive momentum factor τm and τm as follows ( τ kpn,R k n,R m if k∆wm k 6= 0, n,R , n,R τm = k∆wm k (17) 0, else, ( τ kpn,I k n,I m k 6= 0, if k∆wm n,I , n,I τm = k∆wm k (18) 0, else, where τ ∈ (0, 1) is a constant parameter and k · k is the usual Euclidian norm. 3. Main Results The following assumptions will be used in our discussion. (A1): There exists a constant C1 > 0 such that max{|f (t)|, |g(t)|, |f 0 (t)|, |g 0 (t)|, |f 00 (t)|, |g 00 (t)|} ≤ C1 . t∈R n,R n,I (A2): There exists a constant C2 > 0 such that kwM +1 k ≤ C2 and kwM +1 k ≤ C2 for all n = 0, 1, 2, · · · . = 0, ∂E(W) = 0, m = 1, · · · , M + 1} contains only (A3): The set Φ0 = {W| ∂E(W) R I ∂wm ∂wm finite points. 79 Neural Network World 1/11, 75-90 n Theorem 1 Suppose that Assumptions (A1) and (A2) are valid, and that {wm } are the weight vector sequences generated by (14). Then, there exists a constant C F > 0 such that for 0 < s < 1, τ = sη and η ≤ C F1−s , the following results (1+s)2 hold: (i) E(Wn+1 ) ≤ E(Wn ), n = 0, 1, 2, · · · ; (ii) There is E F ≥ 0 such that lim E(Wn ) = E F ; n→∞ ° ° n ° n ° )° )° = 0 and lim ° ∂E(W = 0, 0 ≤ m ≤ M + 1. (iii) lim ° ∂E(W ∂wR ∂wI n→∞ n→∞ m m Furthermore, if Assumption (A3) also holds, then there exists a point W F ∈ Φ0 such that (iv) lim Wn = WF . n→∞ The monotonicity and convergence of the error function E(W) during the learning process are shown in Conclusions (i) and (ii), respectively. Conclusion (iii) n n ) ) indicates the convergence of ∂E(W and ∂E(W , referred to as weak convergence. R I ∂wm ∂wm The strong convergence of Wn is given in Conclusion (iv). We note that the restriction η ≤ C F1−s in Theorem 1 is less restrictive and easier to check than the (1+s)2 corresponding condition in [19]. We also mention that our results are of deterministic nature compared with a related work in [1], where the convergences in the mean and in the mean square for complex-valued perceptrons are obtained. 4. Numerical Example In the following subsections we illustrate the convergence behavior of the SCBPM by using two numerical examples. In the both examples, we set the transfer function to be tansig(·) in MATLAB, which is a commonly used sigmoid function, and carry out 10 independent tests with the initial components of the weights stochastically chosen in [-0.5, 0.5]. The average of errors and the average of gradient norms for all the tests in each example are plotted. 4.1 XOR problem The well-known XOR problem is a benchmark used in literature on neural networks. As in [22], the training samples of the encoded XOR problem for a CVNN is presented as follows: {z1 = −1 − i, d1 = 1}, {z2 = −1 + i, d2 = 0}, {z3 = 1 − i, d3 = 1 + i}, {z4 = 1 + i, d4 = i}. This example uses a network with one input node, three hidden nodes, and one output node. The learning rate η and the momentum parameter τ are set to be 0.1 and 0.01, respectively. The simulation results are presented in Fig. 2, which shows that the gradient tends to zero and the square error decreases monotonically as the number of iteration increases and at last it tends to a constant. This supports our theoretical findings. 80 Zhang H., Wu W.: Convergence of split-complex backpropagation. . . 2 10 square error norm of gradient 1 10 0 10 −1 10 −2 10 −3 10 −4 10 0 50 100 150 number of iteration Fig. 2 Convergence behavior of SCBPM for solving the XOR problem (norm of M +1 P 2 n,I 2 gradient = (kpn,R m k + kpm k )). m=1 4.2 Approximation problem In this example, the synthetic complex-valued function [23] defined as h(z) = (z1 )2 + (z2 )2 (19) is approximated, where z is a two dimensional complex-valued vector comprised of z1 and z2 . 10000 input points are selected from an evenly spaced 10 × 10 × 10 × 10 grid on −0.5 ≤ Re(z1 ), Im(z1 ), Re(z2 ), Im(z2 ) ≤ 0.5, where Re(·) and Im(·) represent the real part and imaginary part of a complex number, respectively. We use a network with 2 input neurons, 25 hidden neurons and 1 output neuron. The learning rate η and the momentum parameter τ are set to be 0.1 and 0.01, respectively. Fig. 3 illustrates the simulation results, which also support our convergence theorem. 5. Proofs In this section, we first present two lemmas, then we use them to prove the main theorem. Lemma 1 Suppose that E : R2M L+2M −→ R is continuous and differentiable on a compact set Φ ⊂ R2M L+2M , and that Φ1 = {v| ∂E(v) = 0} contains only finite ∂v 81 Neural Network World 1/11, 75-90 1 10 square error norm of gradient 0 10 −1 10 −2 10 −3 10 −4 10 −5 10 0 10 20 30 40 50 number of iteration Fig. 3 Convergence behavior of SCBPM for solving the approximation problem M +1 P 2 n,I 2 (kpn,R (norm of gradient = m k + kpm k )). m=1 points. If a sequence {vn }∞ n=1 ⊂ Φ satisfies lim kvn+1 − vn k = 0, lim k∇E(vn )k = 0, n→∞ n→∞ then there exists a point vF ∈ Φ1 such that lim vn = vF . n→∞ Proof This result is almost the same as Theorem 14.1.5 in [21], and the detail of the proof is omitted. 2 For any 1 ≤ j < J, 1 ≤ m ≤ M and n = 0, 1, 2, · · · , the following symbols will be used in our proof later on: ¡ ¢ n,j n,j,R n,j,I n,R n,I n,I n,R Um = Um + iUm = (xj )T wm − (yj )T wm + i (xj )T wm + (yj )T wm , n,j n,j,R n,j,I n,j,R n,j,I Hm = Hm + iHm = f (Um ) + if (Um ), n,j,R T n,j,I T Hn,j,R = (H1n,j,R , · · · , HM ) , Hn,j,I = (H1n,j,I , · · · , HM ) , ³ n,I n,j n,j,R n,j,I n,j,R T n,R n,j,I T n,I S =S + iS = (H ) wM +1 −(H ) wM +1+i (Hn,j,R )T wM +1 + ´ n,R +(Hn,j,I )T wM +1 , ψ n,j,R = Hn+1,j,R − Hn,j,R , ψ n,j,I = Hn+1,j,I − Hn,j,I . 82 (20) Zhang H., Wu W.: Convergence of split-complex backpropagation. . . Lemma 2 Suppose Assumptions (A1) and (A2) hold, then for any 1 ≤ j ≤ J and n = 0, 1, 2, · · · , we have |Oj,R | ≤ C0 , |Oj,I | ≤ C0 , kHn,j,R k ≤ C0 , kHn,j,I k ≤ C0 , 0 0 00 00 |gjR (t)| ≤ C3 , |gjI (t)| ≤ C3 , |gjR (t)| ≤ C3 , |gjI (t)| ≤ C3 , t ∈ R, max{kψ n,j,R k2 , kψ n,j,I k2 } ≤ C4 (τ + η)2 M X 2 n,I 2 (kpn,R m k + kpm k ), (21) (22) (23) m=1 J ³ ³ ´ X n+1,R n+1,I 0 n,j,I T gjR (S n,j,R ) (Hn,j,R )T ∆wM ) ∆wM +1 − (H +1 j=1 ³ ´´ n+1,I n+1,R 0 n,j,I T + gjI (S n,j,I ) (Hn,j,R )T ∆wM + (H ) ∆w +1 M +1 ¡ n,R 2 ¢ n,I 2 ≤ (τ − η) kpM +1 k + kpM +1 k , (24) J ³ ³ ´ X n,R n,j,I T n,I 0 gjR (S n,j,R ) (ψ n,j,R )T wM ) wM +1 +1 − (ψ j=1 ³ ´´ n,I n,j,I T n,R 0 (S n,j,I ) (ψ n,j,R )T wM + (ψ ) w +gjI +1 M +1 ≤ (τ − η + C5 (τ + η)2 ) M X 2 n,I 2 (kpn,R m k + kpm k ), (25) m=1 J ³ ³ ´ X n+1,R n+1,I n,j,I T 0 (S n,j,R ) (ψ n,j,R )T ∆wM − (ψ ) ∆w gjR +1 M +1 j=1 ³ ´´ n+1,I n+1,R n,j,I T 0 (S n,j,I ) (ψ n,j,R )T ∆wM ) ∆wM +gjI +1 + (ψ +1 ≤ C6 (τ + η)2 M +1 X 2 n,I 2 (kpn,R m k + kpm k ), (26) m=1 J 1 X 00 n,j n+1,j,I 00 (tn,j − S n,j,I )2 ) (g (t )(S n+1,j,R − S n,j,R )2 + gjI 2 )(S 2 j=1 jR 1 ≤ C7 (τ + η)2 M +1 X 2 n,I 2 (kpn,R m k + kpm k )), (27) m=1 where Ci (i = 0, 3, · · · , 7) are constants independent of n and j, each tn,j 1 ∈ R lies on the segment between S n+1,j,R and S n,j,R , and each tn,j ∈ R lies on the segment 2 between S n+1,j,I and S n,j,I . Proof The validation of (21) results easily from (4)–(6) when the set of samples is fixed and Assumptions (A1) and (A2) are satisfied. By (8), we have 0 gjR (t) = g 0 (t)(g(t) − Oj,R ), 0 gjI (t) = g 0 (t)(g(t) − Oj,I ), 00 gjR (t) = g 00 (t)(g(t) − Oj,R ) + (g 0 (t))2 , 83 Neural Network World 1/11, 75-90 00 gjI (t) = g 00 (t)(g(t) − Oj,I ) + (g 0 (t))2 , 1 ≤ j ≤ J, t ∈ R. Then (22) follows directly from Assumption (A1) by defining C3 = C1 (C1 + C0 ) + (C1 )2 . By (16), (17) and (18), for m = 1, · · · , M + 1, n+1,R n,R n,R n,R n,R n,R n,R k∆wm k = kτm ∆wm − ηpn,R m k ≤ τm k∆wm k + ηkpm k ≤ (τ + η)kpm k. (28) Similarly, we have n+1,I k∆wm k ≤ (τ + η)kpn,I m k. (29) If Assumption (A1) is valid, it follows from (20), (28), (29), the Cauchy-Schwartz Inequality and the Mean-Value Theorem for multivariate functions that for any 1 ≤ j ≤ J and n = 0, 1, 2, · · · ° °2 ° f (U1n+1,j,R ) − f (U1n,j,R ) ° ° ° ° ° .. kψ n,j,R k2 = kHn+1,j,R − Hn,j,R k2 = ° ° . ° ° ° f (U n+1,j,R ) − f (U n,j,R ) ° M M ° 0 n,j °2 ° f (s1 )((xj )T ∆w1n+1,R − (yj )T ∆w1n+1,I ) ° ° ° ° ° .. = ° ° . ° ° ° f 0 (sn,j )((xj )T ∆wn+1,R − (yj )T ∆wn+1,I ) ° M = M X M M j T n+1,R n+1,I 2 (f 0 (sn,j − (yj )T ∆wm )) m )((x ) ∆wm m=1 M X ≤ 2(C1 )2 n+1,R 2 n+1,I 2 ((xj )T ∆wm ) + ((yj )T ∆wm ) ) m=1 M X ≤ 2(C1 )2 n+1,R 2 n+1,I 2 (k∆wm k kxj k2 + k∆wm k kyj k2 ) m=1 ≤ C4 M X n+1,R 2 n+1,I 2 (k∆wm k + k∆wm k ) m=1 ≤ C4 (τ + η)2 M X 2 n,I 2 (kpn,R m k + kpm k ), (30) m=1 where C4 = 2(C1 )2 max {kxj k2 , kyj k2 } and sn,j m is on the segment between 1≤j≤J n+1,j,R n,j,R Um and Um for m = 1, · · · , M . Similarly we can get kψ n,j,I k ≤ C4 (τ + η)2 M X 2 n,I 2 (kpn,R m k + kpm k ). m=1 Thus, we have (23). 84 (31) Zhang H., Wu W.: Convergence of split-complex backpropagation. . . According to (16), (17) and (18), we have T n+1,R T n+1,I (pn,R + (pn,I m ) ∆wm m ) ∆wm ¡ n,R 2 ¢ 2 n,R T n,R n,I n,I T n,I = −η kpm k + kpn,I + τm (pn,R m k m ) ∆wm + τm (pm ) ∆wm ¡ ¢ 2 n,I 2 n,R n,R n,I n,I n,I ≤ −η kpn,R + τm k∆wm kkpn,R m k + kpm k m k + τm k∆wm kkpm k ¡ n,R 2 ¢ 2 ≤ (τ − η) kpm k + kpn,I (32) m k , where m = 1, · · · , M, M + 1. This together with (10) and (11) validates (24): J ³ X ³ ´ n+1,R n+1,I 0 n,j,I T (S n,j,R ) (Hn,j,R )T ∆wM gjR ) ∆wM +1 − (H +1 j=1 = ³ ´´ n+1,I n+1,R 0 n,j,I T (S n,j,I ) (Hn,j,R )T ∆wM + gjI + (H ) ∆w +1 M +1 J X ¡ 0 n+1,R n+1,R 0 n,j,I gjR (S n,j,R )(Hn,j,R )T ∆wM )(Hn,j,I )T ∆wM +1 + gjI (S +1 j=1 n+1,I n+1,I 0 n,j,I 0 )(Hn,j,R )T ∆wM − gjR (S n,j,R )(Hn,j,I )T ∆wM +1 +1 + gjI (S ¢ n+1,R n,I n+1,I T T = (pn,R M +1 ) ∆wM +1 + (pM +1 ) ∆wM +1 ¡ n,R 2 ¢ 2 ≤ (τ − η) kpM +1 k + kpn,I M +1 k . (33) Next, we prove (25). By (2), (4), (20) and Taylor’s formula, for any 1 ≤ j ≤ J, 1 ≤ m ≤ M and n = 0, 1, 2, · · · , we have n,j,R n+1,j,R n,j,R n+1,j,R ) ) − f (Um = f (Um − Hm Hm 1 n,j,R n+1,j,R n,j,R n+1,j,R n,j,R 2 = f 0 (Um )(Um − Um ) + f 00 (tn,j,R )(Um − Um ) m 2 (34) and n,j,I n+1,j,I n,j,I n+1,j,I ) ) − f (Um = f (Um − Hm Hm 1 n,j,I n+1,j,I n,j,I n+1,j,I n,j,I 2 = f 0 (Um )(Um − Um ) + f 00 (tn,j,I − Um ) , (35) m )(Um 2 where tn,j,R is an intermediate point on the line segment between the two points m n+1,j,R n,j,R n+1,j,I n,j,I Um and Um , and tn,j,I between the two points Um and Um . Thus, m according to (2), (12), (13), (14), (20), (32) and (34)–(35) we have J ³ ³ ´ X n,R n,j,I T n,I 0 gjR (S n,j,R ) (ψ n,j,R )T wM − (ψ ) w +1 M +1 j=1 = ³ ´´ n,I n,j,I T n,R 0 +gjI (S n,j,I ) (ψ n,j,R )T wM ) wM +1 +1 + (ψ J X M ³ X j=1 m=1 ¡ ¢ n,R 0 0 n,j,R n+1,R n+1,I gjR (S n,j,R )wM ) (xj )T ∆wm − (yj )T ∆wm +1,m f (Um ¡ ¢ n,I 0 0 n,j,I n+1,I n+1,R − gjR (S n,j,R )wM ) (xj )T ∆wm + (yj )T ∆wm +1,m f (Um 85 Neural Network World 1/11, 75-90 ¡ ¢ n,I 0 0 n,j,R n+1,R n+1,I + gjI (S n,j,I )wM ) (xj )T ∆wm − (yj )T ∆wm +1,m f (Um ¡ j T ¢´ n,R 0 0 n,j,I n+1,I j T n+1,R +gjI (S n,j,I )wM f (U ) (x ) ∆w + (y ) ∆w + δ1 m m m +1,m = M µµ X J h ´ ³ X n,R n,I 0 n,j,I 0 0 n,j,R )yj )xj − wM gjR (S n,j,R ) wM +1,m f (Um +1,m f (Um m=1 j=1 ³ ´i ¶T n,I n,R 0 0 n,j,R j n+1,R 0 n,j,I j + gjI (S n,j,I ) wM f (U )x + w ∆wm f (U )y m m +1,m M +1,m µX J h ³ ´ n,R n,I 0 0 n,j,R j 0 n,j,I j + gjR (S n,j,R ) −wM )y )x f (U − w f (U m m +1,m M +1,m j=1 + = ¶ ³ ´i¶T n,I n,R 0 n,j,R j 0 n,j,I j n+1,I ∆wm +δ1 −wM +1,m f (Um )y +wM +1,m f (Um )x 0 gjI (S n,j,I ) M X ¡ ¢ T n+1,R T n+1,I (pn,R + (pn,I + δ1 m ) ∆wm m ) ∆wm m=1 ≤ (τ − η) M X ¡ ¢ 2 n,R 2 kpn,I + δ1 , m k + kpm k (36) m=1 where J M ¡ ¢ 1XX³ 0 n,R 00 n,j,R n+1,R n+1,I 2 ) (xj )T ∆wm + (yj )T ∆wm gjR (S n,j,R )wM +1,m f (tm 2 j=1 m=1 ¡ j T ¢ n,I 0 00 n,j,I n+1,I n+1,R 2 − gjR (S n,j,R )wM + (yj )T ∆wm +1,m f (tm ) (x ) ∆wm ¡ ¢ n,I 0 00 n,j,R n+1,R n+1,I 2 + gjI (S n,j,I )wM ) (xj )T ∆wm + (yj )T ∆wm +1,m f (tm ¢ ´ ¡ j T n,R j T n+1,R 2 n+1,I 00 n,j,I 0 + (y ) ∆w . (37) ) (x ) ∆w (S n,j,I )wM f (t + gjI m m m +1,m δ1 = Using Assumptions (A1) and (A2), (17), (18), (22) and the triangular inequality we immediately get δ1 ≤ |δ1 | ≤C5 M X n+1,R 2 n+1,I 2 (k∆wm k + k∆wm k ) m=1 M X ¡ n,R ¢ n,R 2 n,I n,I n,I 2 ≤ C5 (τm k∆wm k + ηkpn,R m k) + (τm k∆wm k + ηkpm k) m=1 ≤ C5 (τ + η)2 M X 2 n,I 2 (kpn,R m k + kpm k ), (38) m=1 where C5 = 2JC1 C2 C3 max {kxj k2 + ky j k2 }. Now, (25) results from (36) and 1≤j≤J (38). According to (20), (22), (23) and (28) we have J ³ X j=1 86 ³ ´ n+1,R n+1,I n,j,I T 0 gjR (S n,j,R ) (ψ n,j,R )T ∆wM − (ψ ) ∆w +1 M +1 Zhang H., Wu W.: Convergence of split-complex backpropagation. . . ´´ ³ n+1,I n+1,R n,j,I T 0 +gjI ) ∆wM (S n,j,I ) (ψ n,j,R )T ∆wM +1 + (ψ +1 ≤ C3 J ³ X j=1 ≤ C3 J X n+1,R 2 n+1,I 2 n,j,R 2 k∆wM k + kψ n,j,I k2 +1 k + k∆wM +1 k + kψ Ã (τ + η) 2 2 (kpn,R M +1 k + 2 kpn,I M +1 k ) + 2C4 (τ + η) j=1 ≤ C6 (τ + η)2 2 ´ M X 2 (kpn,R m k m=1 ! + M +1 X 2 n,I 2 (kpn,R m k + kpm k ) 2 kpn,I m k ) (39) m=1 and J 1 X 00 n,j 00 n+1,j,I (g (t )(S n+1,j,R − S n,j,R )2 + gjI (tn,j − S n,j,I )2 ) 2 )(S 2 j=1 jR 1 ≤ ≤ J C3 X n+1,j,R ((S − S n,j,R )2 + (S n+1,j,I − S n,j,I )2 ) 2 j=1 J µ C3 X ³ n+1,j,R T n+1,I n+1,R n,j,R T n,R n+1,j,I T ) wM +1 − ) ∆wM (H ) ∆wM +1 + (ψ +1 − (H 2 j=1 ´2 n,I −(ψ n,j,I )T wM +1 ³ n+1,I n+1,R n,j,R T n,I n+1,j,I T + (Hn+1,j,R )T ∆wM ) ∆wM ) wM +1 + +1 + (H +1 + (ψ ¶ ´2 n,R +(ψ n,j,I )T wM +1 n+1,I 2 n+1,R 2 n,j,R 2 k +kψ n,j,I k2 ) ≤ 2C3 J max {(C0 )2 + (C2 )2 }(k∆wM +1 k + k∆wM +1 k +kψ ≤ C7 (τ + η)2 M +1 X 2 n,I 2 (kpn,R m k + kpm k ), (40) m=1 where C6 = JC3 max{1, 2C4 } and C7 = 2JC3 max {(C0 )2 + (C2 )2 } max {1, 2C4 }. Finally we obtain (26) and (27). 2 Now, we are ready to prove Theorem 1 in terms of the two lemmas above. Proof of Theorem 1. First we prove (i). By (24)–(27) and Taylor’s formula we have E(Wn+1 ) − E(Wn ) = J X (gjR (S n+1,j,R ) − gjR (S n,j,R ) + gjI (S n+1,j,I ) − gjI (S n,j,I )) j=1 = J X ¡ 0 0 gjR (S n,j,R )(S n+1,j,R − S n,j,R ) + gjI (S n,j,I )(S n+1,j,I − S n,j,I ) j=1 ¢ 1 00 n,j 1 00 n,j + gjR (t1 )(S n+1,j,R − S n,j,R )2 + gjI (t2 )(S n+1,j,I − S n,j,I )2 2 2 87 Neural Network World 1/11, 75-90 = J ³ X j=1 ´ ³ n+1,I n+1,R 0 n,j,I T gjR (S n,j,R ) (Hn,j,R )T ∆wM ) ∆wM +1 +1 − (H ³ ´ n+1,I n+1,R 0 n,j,I T + gjI ) ∆wM (S n,j,I ) (Hn,j,R )T ∆wM +1 + (H +1 ´ ³ n,R n,j,I T n,I 0 + gjR (S n,j,R ) (ψ n,j,R )T wM − (ψ ) w M +1 +1 ³ ´ n,I n,R 0 + gjI (S n,j,I ) (ψ n,j,R )T wM +1 + (ψ n,j,I )T wM +1 ³ ´ n+1,R n+1,I n,j,I T 0 + gjR (S n,j,R ) (ψ n,j,R )T ∆wM − (ψ ) ∆w +1 M +1 ³ ´ n+1,I n+1,R n,j,R n,j,I 0 + gjI (S n,j,I ) (ψ )T ∆wM +1 + (ψ )T ∆wM +1 ´ 1 00 n,j 1 00 n,j (t1 )(S n+1,j,R − S n,j,R )2 + gjI (t2 )(S n+1,j,I − S n,j,I )2 + gjR 2 2 ¡ n,R 2 ¢ 2 2 ≤ (τ − η + (C6 + C7 )(τ + η) ) kpM +1 k + kpn,I M +1 k + (τ − η + (C5 + C6 + C7 )(τ + η)2 ) M X 2 n,I 2 (kpn,R m k + kpm k ) (41) m=1 where tn,j ∈ R is on the segment between S n+1,j,R and S n,j,R , and tn,j ∈ R is on 1 2 the segment between S n+1,j,I and S n,j,I . Obviously, by choosing C F = C5 + C6 + C7 and the learning rate η to satisfy 0<η< 1−s , C F (1 + s)2 (42) we have τ − η + (C6 + C7 )(τ + η)2 ≤ 0, τ − η + (C5 + C6 + C7 )(τ + η)2 ≤ 0. Then there holds E(Wn+1 ) ≤ E(Wn ), n = 0, 1, 2, · · · (43) Conclusion (ii) immediately results from Conclusion (i) since E(Wn ) ≥ 0. Next, we prove (iii). In the following we suppose (42) is valid. Let α = −(τ − η + (C6 + C7 )(τ + η)2 ), β = −(τ − η + (C5 + C6 + C7 )(τ + η)2 ), then α ≥ 0, β ≥ 0. According to (41), we have n,I 2 2 E(Wn+1 ) ≤ E(Wn ) − α(kpn,R M +1 k + kpM +1 k ) − β M X 2 n,I 2 (kpn,R m k + kpm k ) m=1 n M X X ¡ ¢ k,I 2 2 2 k,I 2 ≤ · · · ≤ E(W0 )− α(kpk,R k +kp k )+β (kpk,R m k +kpm k ) . M +1 M +1 m=1 k=0 (44) Since E(Wn+1 ) ≥ 0, there holds n X ¡ k=0 88 k,I 2 2 α(kpk,R M +1 k + kpM +1 k ) + β M X ¢ 2 k,I 2 0 (kpk,R m k + kpm k ) ≤ E(W ). m=1 Zhang H., Wu W.: Convergence of split-complex backpropagation. . . Let n → ∞, then ∞ X ¡ k,I 2 2 α(kpk,R M +1 k + kpM +1 k ) + β M X ¢ 2 k,I 2 0 (kpk,R m k + kpm k ) ≤ E(W ) < ∞. m=1 k=0 Hence, there holds ¡° ∂E(Wn ) °2 ° ∂E(Wn ) °2 ¢ 2 n,I 2 ° +° ° = lim (kpn,R lim ° m k + kpm k ) = 0, R I n→∞ n→∞ ∂wm ∂wm which implies n ° ° ∂E(Wn ) ° ° ° = lim ° ∂E(W ) ° = 0, 1 ≤ m ≤ M + 1. lim ° R I n→∞ n→∞ ∂wm ∂wm (45) Finally, we prove (iv). We use (14), (15), (28), (29), and (45) to obtain n+1,R n,R n+1,I n,I lim kwm − wm k = 0, lim kwm − wm k = 0, m = 1, · · · , M + 1. (46) n→∞ n→∞ I T T R T I T Write v = ((w1R )T , · · · , (wM +1 ) , (w1 ) , · · · , (wM +1 ) ) , then E(W) can be viewed as a function of v, and denoted as E(v): E(W) ≡ E(v). Obviously, E(v) is a continuously differentiable real-valued function and ∂E(W) ∂E(v) ∂E(W) ∂E(v) ≡ , ≡ , m = 1, · · · , M + 1. R R I I ∂wm ∂wm ∂wm ∂wm n,R T n,I T n,I T T Let vn = ((w1n,R )T , · · · , (wM +1 ) , (w1 ) , · · · , (wM +1 ) ) , then by (45) we have n ° ° ∂E(vn ) ° ° ° = lim ° ∂E(v ) ° = 0, m = 1, · · · , M + 1. lim ° R I n→∞ n→∞ ∂wm ∂wm (47) From Assumption (A3), (46), (47) and Lemma 1 we know that there is a vF satisfying lim vn = vF . By considering the relationship between vn and Wn , we n→∞ immediately get the desired result. We thus complete the proof. 2 References [1] Mandic D. P., Goh S. L.: Complex Valued Nonlinear Adaptive Filters: Noncircularity, Widely Linear and Neural Models. NewYork: Wiley, 2009. [2] Hirose A.: Complex-Valued Neural Networks. New York: Springer-Verlag New York Inc., 2006. [3] Sanjay S. P. R., William W. H.: Complex-valued neural networks for nonlinear complex principal component analysis. Neural Networks, 18, 2005, pp. 61-69. [4] George G. M., Koutsougeras C.: Complex Domain Backpropagation. IEEE Trans. Circuits and Systems – II: Analog and Digital Signal Processing, 39, 5, 1992, pp. 330-334. [5] Benvenuto T., Piazza F.: On the complex backpropagation algorithm. IEEE Trans. Signal Processing, 40, 4, 1992, pp. 967-969. 89 Neural Network World 1/11, 75-90 [6] Kim T., Adali T.: Fully complex backpropagation for constant envelop signal processing. In: Proc. IEEE Workshop Neural Network Signal Process., Sydney, Australia, 2000, pp. 231-240. [7] Kim T., Adali T.: Nonlinear satellite channel equalization using fully complex feed-forward neural networks. In: Proc. IEEE Workshop Nonlinear Signal Image Process., Baltimore, MD, 2001, pp. 141-150. [8] Yang S. S., Siu S., Ho C. L.: Analysis of the initial values in split-complex backpropagation algorithm. IEEE Trans. Neural Networks, 19, 9, 2008, pp. 1564-1573. [9] Wu W., Feng G. R., Li Z. X., Xu, Y. S.: Deterministic convergence of an online gradient method for BP neural networks. IEEE Trans. Neural Networks, 16, 3, 2005, pp. 533-540. [10] White H.: Some asymptotic results for learning in single hidden-layer feedforward network models. J. Amer. Statist. Assoc., 84, 1989, pp. 1003-1013. [11] Zhang H., Wu W., Liu F., Yao M.: Boundedness and convergence of online gadient method with penalty for feedforward neural networks. IEEE Trans. Neural Networks, 20, 6, 2009, pp. 1050-1054. [12] Mandic D. P., Chambers J. A.: Recurrent neural networks for prediction: learning algorithms, architectures and stability. NewYork: Wiley, 2001. [13] Goh S. L., Mandic D. P.: A complex-valued RTRL algorithm for recurrent neural networks. Neural Computation. 16, 12, 2004, pp. 2699-2713. [14] Xu D. P., Zhang H. S., Liu L. J.: Convergence analysis of three classes of split-complex gradient algorithms for complex-valued recurrent neural networks. Neural Computation, 22, 10, 2010, pp. 2655-2677. [15] Zhang H. S., Xu D. P., Wang, Z. P.: Convergence of an online split-complex gradient algorithm for complex-valued neural networks. Discrete Dyn. Nat. Soc., 2010, pp. 1-27, DOI: 10.1155/2010/829692. [16] Polyak B. T.: Introduction to Optimization, Optimization Software. New York: Inc. Publications Division, 1987. [17] Shi Z. J., Guo J.: Convergence of memory gradient method. International Journal of Computer Mathematics, 85, 2008, pp. 1039-1053. [18] Qian N.: On the momentum term in gradient descent learning algorithms. Neural Networks, 12, 1999, pp. 145-151. [19] Wu W., Zhang N. M., Li Z. X., Li L., Liu Y.: Convergence of gradient method with momentum for back-propagation neural networks. Journal of Computational Mathematics, 26, 2008, pp. 613-623. [20] Ujang B. C., Took C. C., Mandic D. P.: Split quaternion nonlinear adaptive filtering. Neural Networks, 23, 2010, pp. 426-434. [21] Ortega J. M., Rheinboldt W. C.: Iterative Solution of Nonlinear Equations in Several Variables. New York: Academic Press, 1970. [22] Nitta T.: Orthogonality of decision boundaries in complex-valued neural networks. Neural Computation, 16, 2003, pp. 73-97. [23] Savitha R., Suresh S., Sundararajan N.: A fully complex-valued radial basis function, International Journal of Neural Systems, 19, 4, 2009, pp. 253-267. 90
Podobné dokumenty
ČIGO - LIGO
Účastníci: Guláž, Fipo, Krtek
Ač se nejednalo o výpravu oddílovou, ale pouze o putování tří
nadśenců, řekl jsem si, že by
i tak stála za zaznamenání. Už
když jsem vycházel ráno z domu,
tuśil jsem,...
Hledání vlastní cesty
převážně subdodavatelské, přejímat
se ale snaží od podniků autonom‑
ních, fungujících na trhu finálních
zákazníků a spotřebitelů. Absence
vlivu finálního zákazníka může být
fatální. Když všichni hr...
Přehled historie komplexních čísel
po práce Bernharda Riemanna. Jména matematiků co přispěli vývoji komplexnı́ch čı́sel jsou doplněna jejich portréty.
Klı́čová slova a fráze: kvadratická rovnice, kubická rovnice, kompl...
Percepce krajiny
2.2.1 Objektivita estetické hodnoty
Ačkoli převažují názory na estetickou hodnotu jako výsledek interakce mezi vlastnostmi krajiny a
psychologickými procesy v lidské mysli (např. Dvořák, 1983; Ulri...