Communication: Separable potential energy surfaces from multiplicative artificial neural networks Werner Koch and Dong H. Zhang Citation: The Journal of Chemical Physics 141, 021101 (2014); doi: 10.1063/1.4887508 View online: http://dx.doi.org/10.1063/1.4887508 View Table of Contents: http://scitation.aip.org/content/aip/journal/jcp/141/2?ver=pdfcov Published by the AIP Publishing Articles you may be interested in Communication: MULTIMODE calculations of low-lying vibrational states of NO3 using an adiabatic potential energy surface J. Chem. Phys. 141, 161104 (2014); 10.1063/1.4900734 Communication: An accurate global potential energy surface for the OH + CO → H + CO2 reaction using neural networks J. Chem. Phys. 138, 221104 (2013); 10.1063/1.4811109 Ab initio calculations of triplet excited states and potential-energy surfaces of vinyl chloride: Insights into spectroscopy and photodissociation dynamics J. Chem. Phys. 122, 194321 (2005); 10.1063/1.1898208 Ab initio potential energy surfaces, total absorption cross sections, and product quantum state distributions for the low-lying electronic states of N 2 O J. Chem. Phys. 122, 054305 (2005); 10.1063/1.1830436 Ab initio theoretical studies of potential energy surfaces in the photodissociation of the vinyl radical. I. Ã state dissociation J. Chem. Phys. 119, 6524 (2003); 10.1063/1.1604378

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 155.247.166.234 On: Sat, 22 Nov 2014 12:23:52

THE JOURNAL OF CHEMICAL PHYSICS 141, 021101 (2014)

Communication: Separable potential energy surfaces from multiplicative artificial neural networks Werner Kocha) and Dong H. Zhang State Key Laboratory of Molecular Reaction Dynamics and Center for Theoretical Computational Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian, China

(Received 16 May 2014; accepted 26 June 2014; published online 8 July 2014) We present a potential energy surface fitting scheme based on multiplicative artificial neural networks. It has the sum of products form required for efficient computation of the dynamics of multidimensional quantum systems with the multi configuration time dependent Hartree method. Moreover, it results in analytic potential energy matrix elements when combined with quantum dynamics methods using Gaussian basis functions, eliminating the need for a local harmonic approximation. Scaling behavior with respect to the complexity of the potential as well as the requested accuracy is discussed. © 2014 AIP Publishing LLC. [http://dx.doi.org/10.1063/1.4887508] A substantial part of the exponentially growing effort of the theoretical investigation of physical systems with a growing number of degrees of freedom is due to the evaluation of the potential energy surfaces (PES) of the system and the computation of matrix elements thereof. Avoiding the need to do so on a grid in high dimensional space is key for efficient approaches. This Communication introduces a new representation of surfaces that can instead be grown only in the regions that are explored by the wave packet as has been done in other interpolation approaches.1, 2 It inherits the flexibility and graceful asymptotic behavior of artificial neural network function approximators3 and can be used on its own or as a building block of compound methods.4 This new representation provides key advantages for two particularly successful classes of approaches for solving the time dependent Schrödinger equation (TDSE). The first is the Multi Configuration Time Dependent Hartree (MCTDH) family of methods.5–7 By expanding (x) the f dimensional wave function of the system, in a sum of (j ) products of basis functions φi each depending on only one coordinate xj (x; t) =



ci (t)

f 

(j )

φi (xj ; t)

(1)

j =1

i

but coupled through variational equations of motion, these approaches have significantly ameliorated the aforementioned exponentially growing computational effort.8, 9 To make efficient use of this improvement, however, the potential of the system V (x) needs to be given as a sum of products of one dimensional terms such that the f dimensional integral of the potential energy matrix elements is reduced to nf one dimensional integrals  (2) i |V|l  = df x∗i (x)V (x)l (x) =

f  n  

(j )∗

dxj φi

(j )

(j )

(xj )vk (xj )φl (xj ) . (3)

k=1 j =1 a) Electronic mail: [email protected]

0021-9606/2014/141(2)/021101/4/$30.00

Equations (1)–(3) are given for a single state system, but the extension to multiple states follows naturally. For potentials that are not given in a sum of products form, notably those obtained by means of ab initio quantum chemistry methods for use in molecular quantum dynamics, very accurate fits can be obtained from the POTFIT algorithm.10, 11 Unfortunately, it requires the potential to be known on a full grid and has to be restarted if the configuration space needs to be extended. The approaches of the second class addressed here such as ab initio multiple spawning,12 methods derived from MCTDH such as variational multi configuration Gaussian,13, 14 or even semiclassical methods15 use a basis of multidimensional (stationary or evolving) Gaussian wave packets (GWPs) of the form gq ,σ i

i

, pi (x)

= exp{(x − q i )T [Ai (x − q i ) + i pi ] + ηi }, (4)

where q i and pi are vectors of position and momentum parameters, Ai is a diagonal matrix whose elements Ai, jj = −(2σ i, j )−2 are composed of the elements of the vector of widths σ i , and ηi is a real normalization parameter. They are ideally suited for the expansion of strongly localized wave packets as they are often found in molecular quantum dynamics. The integral of Eq. (2) is typically computed in local harmonic approximation where the Taylor series of the potential around the center coordinates of the GWPs q i is truncated at second order. To avoid inaccuracies, however, the width parameters σ i need to be chosen conservatively narrow requiring a large number of basis functions to cover the extent of the wave packet. Our new approach for fitting potential surfaces allows for potential matrix elements of GWP propagation methods to be computed analytically without resorting to approximations while also maintaining the sum of products form required for MCTDH type methods. It employs a special type of artificial neural network,16 a concept that has been used among many other applications with considerable success as a highly accurate and flexible means of approximating potential energy surfaces for almost two decades.3 The most common neural network structure for this application is a network of nodes

141, 021101-1

© 2014 AIP Publishing LLC

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 155.247.166.234 On: Sat, 22 Nov 2014 12:23:52

021101-2

W. Koch and D. H. Zhang

J. Chem. Phys. 141, 021101 (2014)

organized in layers. The values z (i) of the nodes of a layer i are computed through application of the transfer function F(·) to the sum of the values of the previous layer z (i−1) weighted by the weights w(i) and shifted by the biases μ(i)   ni−1  (i) (i) (i) (i−1) zj = F μj + w j k zk . (5) k=1

The last layer is often computed only from the weighted sum without a transfer function. For evaluation of a potential surface, the spatial coordinates are assigned to the initial layer z 0 = x and the last layer contains a single node representing the desired function value V NN (x). A network with a single internal layer thus results in the following functional form: ⎛ ⎞ n1 f   (2) (2) (1) (1) w1k F ⎝μk + wk k xk ⎠ . (6) V NN (x) = μ1 + k1 =1

1

1

1 0

k0 =1

0

Any exchange symmetry of the investigated physical system can be accounted for exactly without increasing the total number of weights and biases N by including mirror nodes in the first internal layer of the network with identical, but suitably exchanged weights w (1) . Using a set of sample geometries and corresponding potential values  = {x i , V (x i )}, 1 ≤ i ≤ n , a process called “training” adjusts the weights from random initial values such that the root mean square error (RMSE) δ=

n

1  NN (V (x i ) − V (x i )) n i=1



1 2

(7)

of the results computed by the network is minimized. We use the Levenberg–Marquardt algorithm17 for performing this highly non-trivial process for its fast convergence and high stability. If the training stagnates, the number of nodes of the network is grown and training continues until a chosen limiting RMSE is surpassed. The numerical effort of the training algorithm scales roughly with O(N 2 n ). Even though it has been shown that such neural networks can approximate any functional dependence to any desired accuracy with just a single internal layer and a suitable nonlinear transfer function,18 the number of nodes of such a single layer network would be impractically large. Especially if the function to be reproduced is multivariate and complex in structure, multilayer networks result in more modest network sizes since they provide a much higher degree of internal flexibility. However, this internal complexity prevents the computation of the potential matrix elements (2) without resorting to numerical quadrature or some form of approximation. We therefore propose an alternative based on multiplicative artificial neural networks. Such networks have been demonstrated to achieve a similar degree of complexity and non-linearity as multilayer sum networks.19, 20 The network structure we employ is a combination of product and sum nodes that result in the functional form V

NN

(x) =

μ(2) 1

+

n1  k1 =1

(2) w1k

1

f  k0 =1



(1) F μ(1) k k + wk k xk . 1 0

1 0

0

(8)

This construction does not fit individual degrees of freedom separately with a subsequent combination of the results. Instead, since each of the n1 network nodes computes a nonlinear combination of the f transfer functions, each additional degree of freedom also increases the flexibility of the network fit. This allows for the complexity of the network construction to grow with the system size and describe ever more complex potentials without requiring excessively many internal nodes. Equation (8) is exactly of the sum of products form required for an efficient MCTDH computation. For special combinations of the transfer function F(·) and basis functions φ (j) of the dynamical method, the potential matrix elements can even be computed analytically. Product network potential surfaces are therefore an ideal companion for MCTDH. However, due to the even greater simplification possible with Gaussian basis functions, we will now focus on those and leave the application to MCTDH to be investigated at a later time. If the transfer function of the network is chosen to be the error function F(·) = erf(·), inserting Eqs. (4) and (8) into the matrix element of Eq. (2) yields gi |V NN |gl  ⎧ ⎛ ⎞⎫ wk(1)k ξ˜k ⎪ ⎪ (1) ⎪ ⎪ 1 0 0 ⎪ n f 1 ⎬ ⎨ ⎜ μk1 k0 − 2σ˜ k ⎟⎪   ⎜ ⎟ (2) (2) 0 , (9) = Sil · μ1 + w1k erf ⎜  ⎟  2 1/2 1 ⎪ ⎝ ⎠⎪ wk(1)k ⎪ ⎪ k =1 k =1 1 0 ⎪ ⎪ 1 0 ⎭ ⎩ 1− σ˜ k

0

where Sil = gi |gl  denotes the overlap and σ˜ = σ l + σ i and ξ˜ = −2(Al q l + Ai q i ) + i( pl − pi ). Evaluation of Eq. (9) is fairly costly due to the special function evaluation but scales linearly with the number of nodes n1 . Other PES interpolation approaches are cheaper to evaluate for small systems but scale quadratically with the number of points (for instance the modified Shepard interpolation1 ) or as in the case of multilayer neural networks with the product of the number of nodes of consecutive layers. The computational cost of Eq. (9) is therefore expected to be lower for larger systems. Furthermore, Eq. (9) computes the entire f dimensional integral in one step without resorting to an approximation! The accuracy of this expression is solely determined by the network fit procedure and is thus controlled before the propagation starts. There is no need to check for validity of the local harmonic approximation of the potential with respect to the GWP widths during the propagation and no risk for deviations to go unnoticed. Even simple, one dimensional systems can pose severe challenges if strongly anharmonic potentials require excessively narrow GWPs and each additional degree of freedom that contains such features exacerbates the problem. While in model systems critical potential regions are easily identified and can be accounted for by choosing appropriate basis set parameters, in real physical systems such information may not be available. Obtaining it from the potential surface itself can become extremely laborious for multidimensional systems, leaving multiple dynamics calculations with successively narrower basis functions to check for convergence as the only option. As demonstrated below, this issue can be significant even for relatively smooth surfaces.

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 155.247.166.234 On: Sat, 22 Nov 2014 12:23:52

W. Koch and D. H. Zhang

J. Chem. Phys. 141, 021101 (2014)

In the remainder we will demonstrate that our construction can indeed fit the PES of a well known molecular system with network sizes competitive with traditional multilayer sum networks. We apply the presented approach to the photodissociation of NOCl on the S1 surface. This process is routinely used when investigating the performance of new quantum dynamics methods and potential surface fit procedures and the results obtained for it are well established.10, 21 The potential fit of the multireference (single and double) configuration interaction surface of Schinke et al.22 in 3D cartesian body fixed coordinates available in the Heidelberg MCTDH program package23 is used here as the reference potential. The initial state is a single stationary Gaussian with initial coordinates q 0 = (xO , zO , zCl )T = (0.7937, −2.9353, 1.9921)T a.u. and widths σ 0 = (0.0533, 0.0769, 0.0367)T a.u. and atomic units with ¯ = 1 are used throughout. The dissociation spectrum is calculated from the autocorrelation function of the evolving wave packet obtained using the Basis Expansion Leaping Multi Configuration Gaussian (BELMCG) method.24 This method expands the wave packet in a set of stationary Gaussian basis functions with equations of motion for the time dependent coefficients c(t) obtained from the Schrödinger Equation i˙c(t) = S −1 H c(t),  ci (0) = (S −1 )ij gj |0 ,

(10) (11)

j

where S and H denote overlap    with re Hamilton matrices   and spective elements Sil = gi  gl and Hil = gi  H gl and  0 is the initial wave function of the system. To avoid the need for a huge basis set spanning the entire space, Eq. (10) is propagated for a short time using a small set sufficient to span the immediate vicinity of  0 . The overcompleteness of the Gaussian wave packets then allows for a change of expansion {gq ,σ , p , ci } → {gq i ,σ i , pi , ci } so that the new set more approi i i priately describes the evolved population (see Ref. 24 for details of this process). The long term time evolution is thus obtained by alternating between propagating the expansion coefficients {ci (t)}m with a fixed basis {gq ,σ , p }m and upi i i dating the basis parameters {q i , σ i , pi }m → {q i , σ i , pi }m+1 . This procedure results in very small basis sizes along with extremely simple equations of motion. The Gaussian center coordinates {{q i }m } of the sequence of expansions can serve directly as sampling points for the potential surface along the lines of Ref. 2 as follows. A sample of 20 vectors are chosen randomly from the initial wave packet expansion {q i }0 and serve along with the corresponding values of the reference PES as the initial training set 0 = {q i , V (q i )} for a neural network fit. A BELMCG propagation on the resulting surface V NN (x) then yields a sequence of sets of coordinates {{q i }m }. From these sets we choose another 20 vectors and add them to the set . The random choice from {{q i }m } is weighted with the associated probabilities |ci |2 of the respective expansions to ensure that PES regions that are more important for the wave packet dynamics are represented more accurately. Point selection, network training, and wave packet propagation are repeated until the spectrum computed from the propagated wave packet has converged.

100 σ(E)/arb. units

021101-3

80 60 40 20 0 0.5

1

1.5

2

2.5

E/eV FIG. 1. Dissociation spectrum of NOCl on S1 surface, from wave packet propagation until T = 10, 20, 30 fs, successively shifted upwards by 20 units. Results from PES with RMSE = 3 × 10−3 a.u. (dashed red), 3 × 10−4 a.u. (short dashed green), 3 × 10−5 a.u. (dotted blue), and reference surface (solid black). All results computed with BELMCG accuracy = 10−3 .

Three such surface growing procedures were performed for wave packet propagation until T = 10, 20, 30 fs. The results shown in Fig. 1 demonstrate the strong dependence of the convergence of the spectra on the PES complexity. Due to a combination of two factors, the intermediate propagation time T = 20 fs leads to the fastest convergence. At T = 20 fs the autocorrelation function has almost vanished, requiring low accuracy in the outlying regions of the wave packet. On the other hand, the wave packet is spread out far enough for the point selection procedure to select sufficiently many points away from the initial coordinates q 0 such that even the lowest accuracy considered here ( = 3 × 10−3 a.u.) converges the spectrum to within line thickness. At the shorter time T = 10 fs, the wave packet is still so far contracted that the point selection occurs close to q 0 but the outlying regions are still important as the autocorrelation function has not yet vanished making a more accurate PES fit necessary. Conversely, at the longest time T = 30 fs the wave packet explores such a large region of the potential that many more points are necessary to obtain useful results at all and a high accuracy is required to converge the spectrum. The most accurate surfaces with = 3 × 10−5 a.u. required n = 160, 10 fs n = 220, n = 380 points for convergence of the auto20 fs 30 fs correlation functions (not shown) and spectra. These final point sets 10 fs , 20 fs , 30 fs were used to train both product and dual layer sum networks in order to evaluate the performance of this new approach. Figure 2 shows the network sizes required to fit the three point sets for residual errors 3 × 10−3 a.u. ≤ ≤ 3 × 10−5 a.u. Even though the spectra for T = 20 fs and T = 30 fs in Fig. 1 are quite similar, the wave packet propagates in the last 10 fs across a much larger extend of the potential surface with more complex features. Therefore, the node count for fitting with residual error = 3 × 10−5 a.u. grows from fairly modest numbers around 10 and 15 nodes per layer for the 10 fs and 20 fs point sets to around 30 nodes for the much more spread out set 30 fs . It should be noted that since the network training starts from randomized initial weights, the results in Fig 2 fluctuate to a certain extend. A change of the parameters of the training algorithms may reduce the final node count by a few nodes in some cases, but this comes at the price of much longer training times and is rarely justified.

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 155.247.166.234 On: Sat, 22 Nov 2014 12:23:52

nodes per layer

CPU time/s

021101-4

W. Koch and D. H. Zhang

J. Chem. Phys. 141, 021101 (2014)

104 103 102 10 30 20 10 10−4

10−3 Δ/a.u.

FIG. 2. Network training: Nodes per layer (lower panel) and total CPU time (upper panel) vs. requested RMSE for sum (blue circles) and product (red crosses) networks. Connecting lines as guides for the eye for point sets 10 fs (dotted), 20 fs (dashed), 30 fs (solid).

The product networks require more nodes per layer than the traditional dual layer sum networks. This means, however, that the product networks contain just over half the total nodes and a fraction of the adjustable weights of the sum networks yet still achieve the same quality of PES reproduction. Due to the O(N 2 n ) scaling of the training procedure with the number of weight parameters mentioned above, dual layer sum networks take substantially longer to train until the lowest residual errors for the largest point set. Moreover, the multiplicative surface allows for wave packet propagation with much wider and thus far fewer basis functions leading to substantially lower computational effort. The propagation until T = 30 fs shown in Fig. 1 took 2170 s on the multiplicative surface and 31 031 s on the additive surface (both with RMSE = 3 × 10−5 a.u.) even though computing a single potential matrix element in local harmonic approximation for the latter only takes 5% of the CPU time of evaluating Eq. (9) for the former. The total CPU time on the additive surface is larger since up to 3.4 times more Gaussian basis functions were required towards the end of the propagation because their widths needed to be restricted to σ ≤ 1.5 × 10−2 a.u., the largest value that did not incur significant deviations due to the approximation. This limit could be chosen separately for each degree of freedom to reduce the effort required but that would incur even more test calculations to ascertain that no unphysical deviations occur. On the multiplicative surface no such parameter tuning is necessary. In conclusion, this paper presents a novel way to construct separable potential energy surfaces with multiplicative neural networks. It is demonstrated that they exhibit similar performance at lower computational cost than traditional sum networks over two orders of magnitude of accuracy. If applied to Gaussian basis propagation methods, product networks allow for fast and accurate quantum dynamics calculations that are free of the local harmonic approximation and thus eliminate a potential source of error. The basis function width can therefore be chosen without restriction to minimize the basis set size leading to ever larger savings in computational time as systems with a larger number of strongly anharmonic degrees of freedom are investigated. If a new wave packet propagation with different initial conditions is desired, the training process does not have to be restarted. Since the network construction and training set se-

lection is combined with the wave packet propagation, points from newly explored PES regions are simply added to the existing point set and the network training continues with little extra effort. Due to its sum of products form, the same neural network structure that has been applied here to multidimensional Gaussian basis functions can seamlessly be used with MCTDH type propagation methods. Without ever requiring a potential expression on a full grid, this form of potential construction can combine the high flexibility, accuracy, and computational efficiency of neural network approaches with the dramatic gain in propagation speed of MCTDH dynamics calculation when used with separable potentials. Application of product networks to MCTDH propagation will be pursued at a later time. The present approach was developed within the MCTDH framework and will be made available in the development version of the Heidelberg MCTDH package. It should be noted that all quoted computational times are single core equivalent CPU times and both training as well as propagation parallelize trivially with a speedup >15.7 on a 16 core system. The authors would like to thank Terry Frankcombe and Michael Collins for valuable insights into the sampling strategies employed with GROW. This work was supported by the National Natural Science Foundation of China (Grant No. 90921014) and Ministry of Science and Technology of China (Grant No. 2013CB834601).

1 J.

Ischtwan and M. A. Collins, J. Chem. Phys. 100, 8080 (1994). J. Frankcombe, M. A. Collins, and G. A. Worth, Chem. Phys. Lett. 489, 242 (2010). 3 T. B. Blank, S. D. Brown, A. W. Calhoun, and D. J. Doren, J. Chem. Phys. 103, 4129 (1995). 4 S. Manzhos and T. Carrington, Jr., J. Chem. Phys. 125, 084109 (2006). 5 H.-D. Meyer, U. Manthe, and L. Cederbaum, Chem. Phys. Lett. 165, 73 (1990). 6 H. Wang and M. Thoss, J. Chem. Phys. 119, 1289 (2003). 7 U. Manthe, J. Chem. Phys. 128, 164116 (2008). 8 H.-D. Meyer, F. Gatti, and W. Worth, Multidimensional Quantum Dynamics: MCTDH Theory and Applications (Wiley-VCH, Weinheim, 2009). 9 H.-D. Meyer, Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2, 351 (2012). 10 A. Jäckle and H.-D. Meyer, J. Chem. Phys. 104, 7974 (1996). 11 D. Peláez and H.-D. Meyer, J. Chem. Phys. 138, 014108 (2013). 12 M. Ben-Nun, J. Quenneville, and T. J. Martínez, J. Phys. Chem. A 104, 5161 (2000). 13 G. A. Worth, M. A. Robb, and I. Burghardt, Faraday Discuss. 127, 307 (2004). 14 B. Lasorne, M. J. Bearpark, M. A. Robb, and G. A. Worth, Chem. Phys. Lett. 432, 604 (2006). 15 M. F. Herman and E. Kluk, Chem. Phys. 91, 27 (1984). 16 W. S. McCulloch and W. Pitts, Bull. Math. Biophys. 5, 115 (1943). 17 M. Hagan and M.-B. Menhaj, IEEE Trans. Neural Networks 5, 989 (1994). 18 K. Hornik, Neural Networks 4, 251 (1991). 19 M. Schmitt, Neural Comput. 14, 241 (2002). 20 C. Hervás-Martínez, F. J. Martínez-Estudillo, and M. Carbonero-Ruz, Neural Networks 21, 951 (2008). 21 B. Lasorne, M. A. Robb, and G. A. Worth, Phys. Chem. Chem. Phys. 9, 3210 (2007). 22 R. Schinke, M. Nonella, H. U. Suter, and J. R. Huber, J. Chem. Phys. 93, 1098 (1990). 23 G. A. Worth, M. H. Beck, A. Jäckle, and H.-D. Meyer, The MCTDH Package, Version 8.4 (University of Heidelberg, 2007). See http://mctdh.uni-hd.de. 24 W. Koch and T. J. Frankcombe, Phys. Rev. Lett. 110, 263202 (2013). 2 T.

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 155.247.166.234 On: Sat, 22 Nov 2014 12:23:52

Communication: separable potential energy surfaces from multiplicative artificial neural networks.

We present a potential energy surface fitting scheme based on multiplicative artificial neural networks. It has the sum of products form required for ...
405KB Sizes 0 Downloads 4 Views