Evaluation of the Factors Impacting the Accuracy of 13C NMR Chemical Shift Predictions using Density Functional Theory - The Advantage of Long-Range Corrected Functionals.

Subscriber access provided by Gothenburg University Library

Article

Evaluation of the Factors Impacting the Accuracy of 13C NMR Chemical Shift Predictions using Density Functional Theory – The Advantage of Long-Range Corrected Functionals Mark A. Iron J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b00772 • Publication Date (Web): 10 Oct 2017 Downloaded from http://pubs.acs.org on October 11, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Evaluation of the Factors Impacting the Accuracy of 13C NMR Chemical Shift Predictions using Density Functional Theory – The Advantage of Long-Range Corrected Functionals Mark A. Iron* Computational Chemistry Unit, Department of Chemical Research Support, Weizmann Institute of Science, Rehovot 7610001, Israel.

Supporting Information Placeholder

The various factors influencing the accuracy of 13C NMR calculations using density functional theory (DFT), including the basis set, exchange-correlation (XC) functional and isotropic shielding calculation method, are evaluated. A wide selection of XC functionals (over 70) were considered, and it was found that long-range corrected functionals offer a significant improvement over the other classes of functionals. Based on a thorough study, it is recommended that for calculating NMR chemical shifts (δ) one should use the CSGT method, the COSMO solvation model and the LCTPSSTPSS exchange-correlation functional in conjunction with the cc-pVTZ basis set. A selection of problems in natural product identification are considered in light of the newly-recommended level of theory.

Introduction Nuclear magnetic resonance (NMR) is an indispensable tool in organic, organometallic and inorganic chemistry. Its use is key in determining the structure of compounds – including stereochemistry – and in monitoring reactions. On occasion, however, it is not always definitive, and the actual structure may be debated. One recent example that generated some discussion in the literature is the structure of hexacyclinol.1 Despite a substantive NMR study, subsequent experimental2 and computational3 studies revealed a structure that differs from that originally proposed. This highlights how computational determination of chemical shifts can greatly assist in assigning structure.4 In another case – vannusal B – the authors noted that calculations would have greatly saved effort in the determination of its structure.5 In fact Willoughby et al. have published a Nature Protocol for assigning 1H and 13C chemical shifts.6 Moreover, it is one of the few methods that can be used to identify reaction intermediates, such as in our recent study of the reaction of hypervalent iodine reagents with ketones and

enolates; it was found that the reaction proceeds via the less stable enolonium intermediate rather than the more stable ketone form based on the comparison of the experimental low-temperature 13C{1H} NMR spectrum and the calculated spectra of the different potential intermediates.7 In recent years, there has been a paradigm shift in the field of computational chemistry. Whereas until a few years ago the art of using calculations to understand and/or predict chemical properties was relegated to a select group of experts armed with specialized computational equipment, recent advances in method development and computers have allowed experimentalists to carry out their own computational studies who otherwise would never have considered such a daunting task. The computational tools needed to predict NMR spectra are implemented in a number of commercial software packages, such as GAUSSIAN8 (which is used in this study), QCHEM9 and AMSTERDAM DENSITY FUNCTIONAL (ADF),10 as well as several free (at least for academic use) packages including ORCA,11 NWCHEM,12 GAMESS,13 and DALTON.14 While these are primarily intended to be run under Linux, many of these packages have Microsoft Windows and/or Apple Mac OS X versions, and calculations on small to medium molecules can even be run on desktop computers within a reasonable amount of time. These calculations, once a particular level of theory has been chosen, can almost be run in a “black box” style after careful consideration of the results presented herein. A number of evaluation studies of NMR chemical shift predictions have recently be reported. Toomsalu and Burk published a study where a select number of common exchange-correlation functionals, basis sets and NMR methods were considered.15 Pierens considered linear regression as a means of determining chemical shift but used a limited set of methods.16 There is an older study by Benzi et al. where a number of cases were examined.17 Teale et al. recently considered the perfor-

ACS Paragon Plus Environment


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

mance of various methods, including coupled-cluster and DFT, for the prediction of absolute isotropic shifts and coupling constants of small molecules.18 Meanwhile, Krivdin and co-workers evaluated the performance of selected DFT functionals and basis sets in predicting chemical shifts for 15N,19 31P20 and a few other nuclides,21 and also considered solvent effects in 15N22 and 29Si (and also relativistic effects)23 NMR. Likewise, Toukach and Ananikov recently reviewed the computational prediction of the NMR parameters of carbohydrates,24 while Vícha et al. examined various factors in the prediction of the NMR spectra of square-planar transition metal (Pd, Pt, Au, Rh) complexes.25 Furthermore, fragmentation methods have been considered for the calculation of the NMR spectra of very large molecules (e.g., large peptides) with remarkable accuracy compared to direct calculations on the whole system.26 Based on some initial findings, we recently used calculated 13C chemical shifts to identify the key intermediate in the reaction of ketones or enolates with hypervalent iodine reagents.7 Herein is an in-depth evaluation of the different factors affecting the prediction of chemical shifts, including the exchange-correlation method, the basis set, the integration grids, solvation model and the NMR methodology. Over seventy functionals are considered, including many newer ones that have yet to be considered. Based on these results, four case studies are considered: the hitherto unknown structure of the benzoquinazoline alkaloid samoquasine A, the four stereoisomers of artaboral, glabramycins B and C and the use of a characteristic 3 JHH coupling constant to help differentiate between stereoisomers, and the six stereocenters of elatenyne and the recommendation of using a second, complementary spectroscopic method such as vibrational circular dichroism (VCD) to aid in discriminating between the thirty-two diasteriomeric pairs. There are two parts to this paper. The first deals with the technical aspects of predicting NMR spectra using DFT. The second is comprised of the four case studies based on the results in the first section. The reader primarily interested in the applications to organic chemistry is invited to skip directly to the second section.

o

•

Local spin-density approximation (LSDA)

SVWN5: Slater ρ4/³ exchange (S)29 with Vosko–Wilk–Nusair correlation functional V (VWN5)30

•

Exchange functionals o Becke88 (B) exchange31 o Perdew–Wang 1991 (PW91)32 o Adamo and Barone’s33 modified PW91 (mPW)34 o Gill-96 (G96)35 o The exchange component of the Perdew–Burke–Ernzerhof (PBE)36 o Handy’s OPTX modification of Becke88 (O)37 o The exchange component of the Tao– Perdew–Staroverov–Scuseria (TPSS)38 o The revised version of TPSS (revTPSS)39 o The 1989 exchange functional of Becke and Roussel (BRx)40 o The exchange component of the Perdew–Kurth–Zupan–Blaha (PKZB) functional41 o The exchange part of the screened Coulomb potential of Heyd, Scuseria and Ernzerhof (alternatively knowns as ωPBEh or HSE)42 o The 1998 revision of PBE (PBEh)43

•

Correlation functionals o The correlation component of the Lee– Yang–Parr (LYP) correlation44 o The Perdew-86 (P86) functionals45 o Perdew and Wang’s 1991 (PW91) gradient-corrected correlation functional32 o Becke’s 1995 τ-dependent (meta-GGA) correlation functional46 o The correlation component of the Perdew–Burke–Ernzerhof (PBE)36 o The τ-dependent correlation part of TPSS38 o The revised version of TPSS (revTPSS)39 o The Krieger–Chen–Iafrate–Savin correlation functional (KCIS)47 o Becke–Roussel (BRc) correlation functional40

•

Exchange-correlation combinations o A variety combinations of the above exchange and correlation functionals – denoted by combining their abbreviations

Computational Methods Most calculations were performed using GAUSSIAN09 REVISIONS D.0127 and E.01;28 calculations involving MN15, MN15-L, M08-HX and PW6B95 (vide infra) were done using GAUSSIAN16 REVISION A.03.8 Some calculations were also done using NWCHEM 6.612 (vide infra). A large number of DFT exchange-correlation functionals we used:

Page 2 of 26

2


Page 3 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


o

o o

o o

o

o

o o o o o o

o

o

o o

o

Becke’s 3-parameter hybrid functional48 using B exchange and LYP correlation (B3LYP)49 Austin–Frisch–Petersson functional with dispersion (APFD)50 Cohen and Handy’s 3-parameter hybrid functional using OPTX exchange (O3LYP)51 Xu and Goddard’s XC functional (X3LYP)52 Becke 1-paramter functionals46 using B exchange and B95 correlation (B1B95),46 B exchange and LYP correlation (B1LYP)53 and mPW exchange34 and PW91 correlation32 34 (mPW1PW91) Handy and coworkers’ long range corrected version of B3LYP using the Coulomb-attenuating method (CAMB3LYP)54 Adamo and Barone’s hybrid XC function using PBE exchange and correlation (PBE0)55 Hamprecht–Cohen–Tozer–Handy GGA functional (HCTH, the 407 version)56 The τ-dependent version of HCTH (τHCTH)57 The hybrid version of τHCTH (τHCTHhyb)57 Boese and Martin’s τ-dependent hybrid functional for kinetics (BMK)58 The hybrid version of TPSSTPSS (TPSSh)38 Grimme’s XC functional including dispersion (B97-D)59 based on the Becke97 functional form60 The modification by Hamprecht et al. of the B97 functional (B97-1),56c including exact HF exchange Wilson, Bradley and Tozer’s modification to B97 (B97-2),61 including exact HF exchange Becke’s 1998 modification of the B97 XC functional (B98)60,62 Head-Gordon and coworkers’ dispersion corrected long-range corrected hybrid functional (ωB97X-D)63 Cramer and coworkers’ parameterized functionals designed for 13C and 1H NMR chemical shifts (WC04 and WP04, respectively)64,65

van Voorhis and Scuseria’s τ-dependent gradient-corrected correlation functional (VSXC)66 o Zhao and Truhlar’s hybrid meta-GGA functional designed for broad accuracy for thermochemistry (PW6B95)67 including PW91 exchange32 and B95 correlation46 o Truhlar’s Minnesota-06 (M06) family of functions, including M06 and M062X with twice the exchange,68 M06-HF with 100% HF exchange69 and the local (non-hybrid) version M06-L33 o Truhlar’s M0570 and M05-2X71 functionals, earlier versions of the Minnesota-06 functionals, o Truhlar’s Minnesota-11 family of functionals including the M1172 and the local version M11-L73 o Truhlar’s 2011 second-order GGA functional (SOGGA11)74 and its hybrid variant (SOGGA11X)75 o Truhlar’s Minnesota-12 family of functionals, including the local nonseparable gradient approximation (NGA) N12,76 the τ-dependent MN12L77 and the screened exchange hybrid versions (N12-SX and MN12-SX)78 o The Heyd–Scuseria–Ernzerhof rangeseparated hybrid functional (HSE06)79 o Henderson, Izmaylov, Scuseria and Savin’s range-separated hybrid functional (HISS)80 o Swart–Solà–Bickelhaupt dispersion corrected function based on spin states and SN2 barriers (SSB-D)81 run with NWCHEM o The long-range corrected version of ωPBE (LC-ωPBE)82 o Hirao and coworkers’ long range correction method (LC-) that can be applied to any pure (non-hybrid) functional83 o Truhlar’s Minnesota-15 (MN15) hybrid meta-GGA functional84 and its local counterpart MN15-L85 o Truhlar’s Minnesota-08 functionals with “high exchange” (M08-HX)86 In addition to the above DFT methods, from wavefunction theory, the second-order Møller–Plesset perturbation theory87 and the Hartree–Fock reference were used. o

3



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

With the above computational methods, a number of families of basis sets were used: •

Dunning’s cc-pVnZ (n = D, T, Q, 5) correlation consistent basis sets and the corresponding augmented (diffuse) versions (aug-cc-pVnZ)88

•

Selected members of Truhlar’s “calendar” basis sets where only s and p diffuse functions are added to the heavy atoms (i.e., beyond He)89

•

Dunning’s cc-pCVnZ (n = D, T, Q, 5) correlation consistent core-valence basis sets and the corresponding augmented (aug-cc-pCVnZ) versions (which by definition are cc-pVnZ and aug-cc-pVnZ for H and He)88a,90

•

Pople’s 6-31G and 6-311G family of functionals without and with diffuse functions (identified by “++” before the “G”) and with various sets of polarization functions91

•

The second revision (def2) of Ahlrichs and coworkers’ basis sets92 (def2-SVP, def2-SVPD, def2-TZVP, def2-TZVPP, def2-TZVPD, def2TZVPPD, def2-QZVP)93

•

Jensen’s polarized-consistent basis sets (pc-n, n=0-4) and the augmented versions (aug-pc-n)94

•

Jensen’s pc-n basis sets for nuclear magnetic shielding (pcS-n) and their augmented versions95

•

coupling constants is beyond the scope of this study, although it does merit a study in of itself. Relativistic effects, when considered, were included using a second-order Douglas–Kroll–Hess (DKH2) Hamiltonian102 run using NWCHEM with the TZP-DKH (triple-ζ plus polarization specifically recontracted for DKH2 calculations) basis set.103 Geometries were optimized with the PBE functional with the def2-SVP basis set. To improve efficiency of the calculations using pure XC functionals, densityfitting (DF) was used.104 This requires the use of a density fitting basis set, specifically def2-SV designed for use with def2-SVP.93e This combination is denoted as DFPBE/def2-SVP/def2-SV. For technical reasons, densityfitting was not used during NMR chemical shift calculations. Bulk solvent effects were approximated by single point energy calculations using various implicit models:

The basis sets IGLO-II and IGLO-III from Kutzelnigg et al.’s individual gauge for localized orbital (IGLO) NMR method.97 Where required, basis sets were obtained from the Basis Set Exchange.98 Four methods for calculating chemical shifts were used: •

Continuous (CSGT)100

•

Individual gauges for atoms in molecules (IGAIM)100a,b

set

of

gauge

a polarizable continuum model (PCM),105 specifically the integral equation formalism model (IEF-PCM),105a,b,106

•

the CPCM polarizable conductor calculation model,107

•

a PCM calculation using Klamt’s form of the conductor reaction field (COSMO),108

Truhlar’s empirically parameterized version of IEF-PCM – Solvation Model Density (SMD).109 Accurate energies were calculated using a Kozuch and Martin’s dispersion corrected (D3BJ), spin component scaled (i.e., an SCS110-MP287-like correlation contribution), double hybrid (DSD) functional, specifically DSD-PBEP86:111 this functional incorporates the PBE exchange36 and the Perdew-86 (P86) correlation45 functionals. This class of DFT functionals has been shown to provide energies approaching that of the “Gold Standard” in computational chemistry, specifically CCSD(T). There are a number of reviews and benchmark studies of double-hybrid functionals, which clearly show that the use of this class of functionals is highly recommended.111-112 The def2-TZVPP basis set and SMD solvation model (vide supra) with the relevant solvent were used for these calculations. Conformational analyses were done using VEGA ZZ113,114 starting from DF-PBE/def2-SVP/def2SV optimized geometries; this allows for conformers whose bonds and bond angles should not be too far from ideal. The energies of the top fifty conformers provided were calculated at the DF-PBE/6-31G*/auto level of theory (where “auto” indicates that the automatic density fitting basis set generation algorithm in GAUSSIAN09 was used). Of these, the lowest ten structures were optimized at the final level of theory. As noted above, the DSDPBEP86 double-hybrid functional was used to obtain accurate energies for the Boltzmann distribution. (In no case were all ten structures energetically relevant ac-

•

Gauge-independent atomic orbitals (GIAO)99

•

•

Jensen’s segment contracted pcS-n basis sets (pcSseg-n) and their augmented versions (augpcSseg-n)96

•

Page 4 of 26

transformations

• Single origin (SO) When NMR coupling constants (J) are calculated, for technical reasons the GIAO method was used using Deng et al.’s two-step method (“mixed” option in GAUSSIAN09).101 Truhlar’s minimally-augmented (i.e., calendar) may-cc-pVTZ basis set was used89 in conjunction with the LC-TPSSTPSS long-range corrected exchange-correlation functional. This functional was chosen based on the results presented hereinafter for chemical shifts; finding the best conditions for calculating

4


Page 5 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


𝛿& = 𝑚 ⋅ 𝜎 & + 𝑏

cording to the Boltzmann distribution – defined as >5% probability.) NMR chemical shifts were then calculated for any conformers with >5% probability in the Boltzmann distribution; when weighting the chemical shifts to obtain the final average chemical shifts, the Boltzmann distributions were recalculated using the final set of structures. (After completion of this study, an alternate means of performing conformer searches was found using Grimme’s QMDFF force-field, which is determined based on the DFT energy surface of the molecule in question,115 or using a tight-binding method specifically parameterized for geometries, vibrations and noncovalent interactions – GFN-xTB.116) Vibrational circular dichroism (VCD) spectra117 were calculated118 at the PBE/def2-SVP level of theory (i.e., the same level of theory as the geometry optimization but without density fitting, which is incompatible with the GIAO part of the VCD calculation). The mathematical derivation of the modified DP4 probabilities (𝑃"#$ ) was assisted by WOLFRAM MATHEMATICA 10 version 10.4.0.0 for Mac OS X (Wolfram Research Inc.).

Pierens recently used this method – but flipped the axis definitions – to evaluate a few DFT methods.16 For the set of calibration compounds, the reported experimental chemical shifts of common solvents in various NMR solvents was used.120 Specifically, the list of chosen compounds is: acetic acid, acetone, acetonitrile, benzene, tert-butanol, carbon dioxide (13C shifts only), carbon disulphide (13C shifts only), carbon tetrachloride (13C shifts only), chloroform, cyclohexane, 1,2dichloroethane, dichloromethane (DCM), diethyl ether, dimethylformamide (DMF), 1,4-dioxane, dimethoxyethane (DME), ethane, ethanol, ethyl acetate, ethylene, hexamethyldisiloxane (HDMSO), hexamethylphosphoramide (HMPA), hydrogen (1H shifts only), imidazole, methane, methanol, nitromethane, n-pentane, propane, 2-propanol, propylene, pyridine, pyrrole, pyrrolidine, tetrahydrofuran (THF), tetramethylsilane (TMS, standard 0.00 ppm reference for 13C and 1H NMR spectroscopy), toluene, triethylamine (TEA) and water (1H shifts only). There are several computational aspects that potentially could affect the accuracy of the chemical shift predictions: (i) the exchange-correlation (XC) functional, (ii) basis set, (iii) integration grids, (iv) solvation model, (v) NMR method, (vi) relativistic effects, (vii) zero-point vibrational contributions, and (viii) choice of starting geometry. Each factor will be considered in turn (but not necessarily in this order). It is assumed that the effects of each factor are (mostly) independent of changes of the other factors. There are another two issues that in general should be considered regardless of the underlying computational methodology: averaging chemically equivalent signals and Boltzmann weighting of conformers.6 The NMR calculations are done on a single, optimized geometry, and this results in separate signals for “chemically equivalent” protons or carbon atoms. For example, a methyl or tert-butyl group is expected to rotate quickly relative to the NMR measurement timescale. Thus, a single proton signal is observed for the methyl group and a single 13C signal is observed for the tert-butyl methyl carbons (obviously the central quaternary carbon will have a separate signal). Therefore, when one has such a group in one’s molecule, the “calculated” isotropic shift of the group is the average of the individual isotropic shifts obtained from the NMR calculations. This is done for all such cases in this study. The second issue is more complicated and to a degree related to the first. A molecule may have multiple conformers that can rapidly convert on the NMR timescale. For example, for methylcyclohexane there are two chair configurations – one with the methyl in an axial position and another with the methyl in an equatorial position. This will give rise to two distinct sets of NMR signals, but since the two conformers can rapidly interchange,

Results and Discussion – DFT and NMR One needs to obtain chemical shifts, which can be compared to experimental measurements, from the isotropic shifts (IS) obtained from the calculations. There are two common methods. One could either use a reference compound (tetramethylsilane is the experimental calibration standard for 1H and 13C NMR) or by linear regression from a set of calibration molecules.16 In the former method, the IS of the reference compound is calculated, along with the ISs of the molecule of interest. The chemical shift is equal to: 𝛿 & = 𝜎)*+ − 𝜎& + 𝛿)*+ where δi and δref are the chemical shifts of the molecule of interest and of the reference, respectively, while σi and σref are the corresponding ISs. This is the simpler and more commonly used method. One problem is that there is not a systematic method of evaluating the associated error in δi. One assumption of this method is that any errors associated with calculating σi would be cancelled by the errors associated with σref. However, the more δi and δref differ, the less ideal this error cancellation would be. One could try using a reference more similar to the molecule of interest, but what happens if one has a series of molecules to consider?119 The alternative – linear regression – is to use a series of reference compounds. The wider range would provide more error cancellation as well as a measure of the error in the calculation. Here, the ISs of a series of molecules are calculated and the experimental δref (i.e., on y-axis) is plotted versus the calculated σref (i.e., on x-axis). From linear regression of this plot, one obtains the slope (m) and y-axis intercept (b) and the chemical shifts of the molecule of interest are:

5



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the observed spectrum will be a Boltzmann-weighted average of the two. In such a case, one needs to optimize the structures of all possible conformers, calculate their relative energies and average the signals appropriately: −∆𝐺& exp 𝑅⋅𝑇 𝑝& = −∆𝐺9 9 exp 𝑅 ⋅ 𝑇 where pi is the probability of species i and ∆Gi is the relative Gibbs free energy of species i (relative to the most stable conformer). From this it follows that the observed chemical shift δobs would be the weighted average of the chemical shifts δi of each conformer: −∆𝐺& & δ> ⋅ exp 𝑅 ⋅ 𝑇 𝛿:;< = 𝑝& ⋅ 𝛿& = −∆𝐺9 & 9 exp 𝑅 ⋅ 𝑇 In the above case, the energy difference was calculated to be 2.00 kcal/mol in favor of the equatorial conformer, resulting in a 97:3 equatorial:axial ratio.6 The observed chemical shift would be the weighted average of the calculated shifts of each conformer. In the evaluation set used, only small molecules are chosen to avoid instances of multiple conformers that would need to be considered. In the second part of this study, where selected case studies are examined using our methodology, this conformer issue is crucial. In their Nature Protocol, Willoughby et al. use the M06-2X/631+G(d,p) level of theory for determining the energy differences. However, the Boltzmann weighting is sensitive to the accuracy of the energy calculation, and therefore it is recommended that the energies be calculated as accurately as possible. Double-hybrid functionals112a are a newer class of DFT exchange-correlation functionals that incorporate an MP2-like correlation term. Their accuracy has been shown to rival the ab initio CCSD(T) “Golden Standard.”111-112 Therefore, here the relative energies are calculated using the DSD-PBEP86 double hybrid functional111a with a much larger basis set (def2TZVPP – see Computational Details section for full details). Using various techniques – such as the RIJCOSX approximation121 in ORCA11 – energy calculations can be run quite efficiently even for large molecules with large basis sets. For evaluation purposes, initially the PBE0 XC functional, the def2-TZVPD basis set, the SMD solvation model (in dichloromethane) and the GIAO NMR method were used. PBE0 is a popular functional that has been shown to be reasonable, def2-TZVPD is a reasonably large basis set and GIAO is often used. Likewise, the linear regression method of calibrating the NMR chemical shifts will be used. The geometries of all species were optimized at the DF-PBE/def2-SVP/def2-SV level of theory (see Computational Details for full explanations of all methods described hereinafter).

Page 6 of 26

Integration Grids. Integration grids are essential to the calculation of the XC energy. Generally, a larger grid leads to more reliable results. The five built-in grids in GAUSSIAN09 were considered (Table 1). Clearly, at least with this combination of functional and basis set (e.g., the Minnesota-06 functionals have been shown to be sensitive to the integration grid for the calculations of energy and geometry122), the choice of integration grid does not have any significant impact on the accuracy of the chemical shift predictions. One could thus choose to use any grid, and the default fine grid was chosen for the rest of the evaluations hereinafter. Table 1. Evaluation (coefficient of determination R2 and standard error se(δ) in ppm) of the impact of the integration grid on the accuracy of the 1H and 13C NMR chemical shift predictions using the PBE0 exchange-correlation functional, the def2-TZVPD basis set, and the SMD solvation model.

δ(13C)

grida

δ(1H)

R²

se(δ)

R²

se(δ)

coarse (cg)

(35,110)

0.977

8.42

0.986

0.303

SG1

(50,194)

0.977

8.40

0.986

0.306

fine (fg)

(75,302)

0.977

8.41

0.986

0.307

ultrafine (ufg)

(99,590)

0.977

8.40

0.986

0.303

superfine (sfg)

(150,974)

0.977

8.40

0.986

0.303

a

The names are those given to the grids in GAUSSIAN09. In parentheses are the numbers of radial and angular points in the integration grid.

Basis Set Effects. Five families of basis sets were evaluated: (i) Ahlrichs – specifically the def2 version, (ii) Jensen’s polarization consistent (pc) basis sets, including variants specially optimized for NMR shielding (pcS and pcSseg), (iii) Dunning’s correlation consistent (cc) basis sets, (iv) Pople’s 6-31G and 6-311G sets of basis sets with varying levels of diffuse and polarization functions, and (v) Kutzelnigg’s IGLO basis sets specifically designed for NMR calculations. With the exception of the single-ζ pcS-0 basis set, which was considered only for completeness, within a family of basis sets, there is little variation (Table 2). Neither increasing ζ nor adding polarization and/or diffuse functions has a significant impact on the linear regression. Of the five families, the Dunning and IGLO basis sets are the best performers, but the differences are small. This is somewhat unexpected given that the pcS-n and pcSseg-n basis sets were designed for chemical shifts.95-96 Because the IGLO-III basis set is available only for a limited set of atoms, the Dunning basis set (specifically cc-pVTZ) is recommended. It should be noted that there were some serious convergence problems for some members of the calibration set for the aug-pc-n, aug-pcS-n and augpcSseg-n basis sets for n ≥ 3.

6


Page 7 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


One thing that is clear is that using basis sets beyond triple-ζ quality does not meaningfully improve the results (and in some case, the errors slightly increase for the larger basis sets). The use of augmented basis sets, beyond double-ζ, does not meaningfully improve the results, especially given the significant increase in computer time, but the use of Truhlar’s “calendar” basis sets, which add only s and p diffuse functions on the heavy atoms,89d actually leads to slightly worse results than either the fully-augmented or unaugmented basis sets. Previous studies also found that larger basis sets were not beneficial.15 Moreover, the use of a core-valence (ccpCVTZ) basis set, which would capture any influence of the core on the NMR calculation, was found to have a negligible impact on the accuracy of the NMR calculations.

Basis Set

R²

se(δ)

9.96

0.985

0.318

aug-pcS-2

0.969

9.77

0.984

0.320

pcSseg-0

0.965

10.38

0.955

0.540

pcSseg-1

0.967

10.03

0.981

0.349

pcSseg-2

0.968

9.91

0.985

0.313

pcSseg-3

0.968

9.86

0.984

0.319

pcSseg-4

0.968

9.86

0.984

0.321

aug-pcSseg-0

0.965

10.42

0.971

0.436

aug-pcSseg-1

0.968

9.93

0.985

0.314

aug-pcSseg-2

0.967

10.00

0.984

0.321

cc-pVDZ

0.971

9.40

0.978

0.377

cc-pVTZ

0.978

8.28

0.987

0.294

cc-pVQZ

0.978

8.17

0.988

0.283

cc-pV5Z

0.979

8.08

0.988

0.278

a

0.966

10.23

cc-pV{T,Q}Za

0.972

9.31

aug-cc-pVDZ

0.975

8.85

0.986

0.300

aug-cc-pVTZ

0.977

8.33

0.988

0.285

0.978

8.13

0.988

0.279

0.988

0.279

cc-pV{D,T}Z

δ(1H) R²

0.968

Dunning’s correlation consistent basis sets

Table 2. Evaluation (coefficient of determination R2 and standard error se(δ) in ppm) of selected basis sets using the PBE0 exchange-correlation functional and the SMD solvation model for predicting 1H and 13C NMR chemical shifts.

δ(13C)

aug-pcS-1

se(δ)

Ahlrichs Basis Sets SVP

0.962

10.83

0.976

0.396

aug-cc-pVQZ

SVPD

0.963

10.60

0.983

0.335

aug-cc-pV5Z

0.979

8.04

a

TZVP

0.966

10.24

0.979

0.367

aug-cc-pV{D,T}Z

0.969

9.79

TZVP (ufg)

0.966

10.24

0.979

0.367

aug-cc-pV{T,Q}Za

0.971

9.42

TZVPP

0.966

10.22

0.982

0.346

jun-cc-pVDZ

0.968

9.84

0.986

0.298

0.972

9.22

0.988

0.282

TZVPD

0.966

10.20

0.980

0.359

may-cc-pVTZ

TZVPPD

0.966

10.19

0.982

0.339

apr-cc-pVQZ

0.973

9.13

0.988

0.278

QZVP

0.966

9.80

0.979

0.326

mar-cc-pV5Z

0.973

9.04

0.988

0.279

cc-pCVDZ

0.965

10.38

0.979

0.374

0.489

cc-pCVTZ

0.978

8.16

0.987

0.294

0.973

9.03

0.986

0.299

Jensen’s Polarization Consistent Basis Sets pc-0

0.967

10.13

0.963

pc-1

0.954

11.94

0.979

0.375

cc-pCVQZ

pc-2

0.968

9.89

0.985

0.316

cc-pCV5Z

0.973

9.03

0.988

0.281

pc-3

0.969

9.81

0.984

0.320

aug-cc-pCVDZ

0.969

9.80

0.986

0.301

0.973

9.09

0.988

0.284

0.973

9.01

0.988

0.279

0.974

9.00

0.988

0.287

pc-4

0.968

9.89

0.984

0.321

aug-cc-pCVTZ

aug-pc-0

0.967

10.14

0.970

0.443

aug-cc-pCVQZ

aug-pc-1

0.957

11.44

0.983

0.333

aug-cc-pCV5Z

aug-pc-2

0.966

10.17

0.984

Pople’s basis sets

0.326

Jensen’s Polarization Consistent Basis Sets for NMR Shielding

6-31G(d,p)

0.973

9.09

0.983

0.338

6-31G(2d,p)

0.973

9.14

0.979

0.367

pcS-0

0.966

10.20

0.965

0.479

6-31G(2d,2p)

0.972

9.2

0.982

0.343

pcS-1

0.968

9.85

0.981

0.348

6-31G(2df,2pd)

0.974

8.90

0.985

0.315

pcS-2

0.969

9.74

0.985

0.309

6-31G(3df,3pd)

0.973

9.06

0.986

0.303

pcS-3

0.968

9.86

0.984

0.320

6-311G(d,p)

0.974

8.97

0.982

0.339

pcS-4

0.968

9.86

0.984

0.321

6-311G(2d,p)

0.971

9.37

0.982

0.340

aug-pcS-0

0.968

9.88

0.971

0.434

6-311G(2d,2p)

0.971

9.34

0.986

0.301

7



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

6-311G(2df,2pd)

0.972

9.19

0.987

0.292

6-311G(3df,3pd)

0.974

8.92

0.987

0.291

6-31++G(d,p)

0.974

8.98

0.988

0.283

6-31++G(2d,p)

0.973

9.00

0.987

0.291

6-31++G(2d,2p)

0.973

9.04

0.988

0.286

6-31++G(2df,2pd)

0.976

8.65

0.988

0.278

6-31++G(3df,3pd)

0.974

8.89

0.988

0.276

6-311++G(d,p)

0.974

9.00

0.986

0.303

6-311++G(2d,p)

0.972

9.25

0.985

0.309

6-311++G(2d,2p)

0.972

9.24

0.987

0.288

6-311++G(2df,2pd)

0.973

9.11

0.988

0.278

6-311++G(3df,3pd)

0.975

8.75

0.988

0.275

Page 8 of 26

ppm for 1H does not allow for meaningful predictions, although trends should still be reliable. In Table 3, the top six performers for each nucleus are highlighted. The top XC functional for 13C predictions is LC-ωPBE, while, for what it is worth, the best functional for 1H is MN12-SX. Two methods are on the top-six list for both nuclei: LC-BLYP and HISS. For carbon shifts, a number of functionals outperform MP2, a wavefunction method that is often used when a “benchmark standard” is needed; moreover, its higher costs makes it unfeasible for larger systems. Given the performance of MP2 one might expect the double-hybrid functionals (5th rung of Perdew’s Ladder of DFT functionals124) would perform well, but evaluation of this class of functionals was not possible for technical reasons.

Kutzelnigg’s IGLO basis sets IGLO-II

0.977

8.40

0.983

0.329

IGLO-III

0.979

8.05

0.987

0.293

a

Two-point extrapolation to the complete basis set limit using the two n-tuple basis sets listed in the braces; see text.

One method for improving energy or property predictions is to extrapolate to the basis set limit.123 Using the 13 C isotropic shifts, the complete basis set (CBS) limits for the cc-pVnZ and aug-cc-pVnZ basis sets were determined using an 𝐴 + 𝐵 ⋅ 𝐿BC (where L is the highest angular momentum in the larger basis set, which in practice is n) extrapolation; for these two basis set groups, exponents α of 2.07 and 1.77, respectively, were found. Thus, this property seems to converge relatively slowly with respect to basis set size. The basis set convergence for five selected carbon signals are shown in Figure 1. One could use a two-point extrapolation to find the CBS limit: 𝐸G − 𝐸G 𝐸 𝐿 → ∞ = 𝐸GH + H C I 𝐿J −1 𝐿K However, it is apparent from the data in Table 2 that this yields poorer results than simply using the larger basis set in the extrapolation. Therefore, despite its advantages for other properties (notably energies), likely because of its slow convergence, its use here is not beneficial. (See SI for more details.) DFT Exchange-Correlation Functional. There is a virtual alphabet soup of XC functionals available. A large variety of functions was evaluated and the results are listed in Table 3. In this table, the functionals are sorted by class (see Computational Methods for explanation) and then alphabetically by name. One thing that is readily apparent is that the range of the errors is fairly small (6.8-11.2 ppm for 13C and 0.27-0.44 ppm for 1H). While an error of ±7 ppm for 13C NMR chemical shifts would still allow for reasonable predictive ability, ±0.3

Figure 1. Basis set convergence for five selected 13C chemical isotopic shifts (ppm) for the cc-pVnZ (top) and aug-ccpVnZ (bottom) basis sets as a function of L. Note that the CH3CN (red) and CH4 (green) shifts are on the right axis.

8


Page 9 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Table 3. Evaluation (coefficient of determination R2 and standard error se(δ) in ppm) of selected DFT exchangecorrelation functionals using the def2-TZVPD basis set and the SMD solvation model for predicting 1H and 13C NMR chemical shifts. The six functionals for each nucleus with the lowest errors are highlighted.

functional SVWN5 (LSDA)

δ(13C)

M08-HX

0.977

8.43

0.986

0.306

0.977

8.35

0.987

0.290

PW6B95

0.971

9.37

0.987

0.293

τ-HCTHhyb

0.972

9.36

0.985

0.311

0.972

9.25

0.985

0.310

0.987

0.288

MN15

a

TPSSh

δ(1H)

R²

se(δ)

R²

se(δ)

0.966

10.18

0.980

0.363

range-separated GGA LC-BLYP

0.984

6.97

range-separated hybrid

GGA

CAM-B3LYP

0.980

7.86

0.988

0.284

B97-D

0.963

10.71

0.982

0.339

HISS

0.982

7.50

0.987

0.288

BLYP

0.962

10.82

0.981

0.352

HSE06

0.977

8.41

0.986

0.307

HCTH

0.964

10.46

0.982

0.342

LC-ωPBE

0.985

6.83

0.986

0.305

N12a

0.963

10.65

0.983

0.336

M11a

0.981

7.61

0.985

0.315

SOGGA11

0.962

10.86

0.980

0.365

MN12-SX

meta-GGA

a

0.978

8.20

0.989

0.270

N12-SXa

0.978

8.18

0.985

0.295

ωB97X-D

0.980

7.92

0.988

0.295

M06-L

0.963

10.65

0.983

0.335

M11-La

0.960

11.15

0.984

0.323

MN12-La

0.972

9.21

0.896

0.302

HF (SCF)

0.979

7.99

0.987

0.290

MN15-La

0.968

9.83

0.986

0.303

MP2

0.979

7.96

0.989

0.272

τ-HCTH

0.964

10.49

0.983

0.334

a

TPSS

0.967

10.03

0.983

0.331

VSXC

0.961

10.98

0.981

0.356

WFT

Two functionals from the Cramer group – WP04 and WC04 – were specifically designed to predict proton and carbon chemical shifts, respectively (hence the “P” and “C” in their names).64 While not in the top-five list for their respective nuclei, they do perform well. They should, however, not be used for other nuclei. Of the top six functionals, four are range-separated. Furthermore, most of the range-separated functionals appear at the top of the list. They correct, using various schemes, the tendency of the non-Coulomb part of the exchange functional to die off too quickly. It would appear that this is a significant factor affecting the performance. The “LC” correction of Hirao and coworkers83 can be applied to any pure (i.e., non-hybrid) functional. Thus, it would be of interest to see if there is a pair of exchange and correlation functionals that, within this LC-correction scheme, gives better chemical shift predictions. Due to the large number of available exchange and correlation functionals, this would lead to too many combinations to be easily managed. Thus, initially the Becke88 (B) exchange functional was chosen and used with a variety of correlation functionals (Table 4). For 13 C chemical shifts, many of the correlation functionals have similar performance (±6.7 ppm) and are an improvement on LC-BLYP. Of these correlation functionals, PBE was chosen for the screening of the exchange functionals. Twelve exchange functionals were next considered using the PBE correlation functional (Table 5). With one

hybrid APFD

0.977

8.46

0.986

0.306

B1LYP

0.975

8.84

0.987

0.294

B3LYP

0.973

9.13

0.986

0.302

B97-1

0.974

8.96

0.986

0.306

B97-2

0.975

8.70

0.986

0.302

B98

0.974

9.07

0.986

0.300

mPW1PW91

0.977

8.36

0.986

0.301

O3LYP

0.971

9.46

0.985

0.316

PBE0

0.966

10.20

0.980

0.359

SOGGA11X

0.981

7.61

0.985

0.310

WC04

0.979

7.95

0.982

0.345

WP04

0.964

10.55

0.987

0.290

X3LYP

0.974

9.01

0.986

0.299

hybrid meta-GGA B1B95

0.978

8.25

0.986

0.299

BMK

0.977

8.45

0.986

0.303

M05

0.973

9.05

0.983

0.330

M05-2X

0.982

7.40

0.986

0.306

M06

0.972

9.27

0.986

0.305

M06-2X

0.979

8.01

0.986

0.305

M06-HF

0.973

9.17

0.971

0.437

NGA-based rather than GGA-based.

9



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

exception, the choice of the exchange functional had little impact of the accuracy of carbon shift predictions. Two exchange functionals lowered the errors on 13C chemical shifts: PKZB and BRx. These are parts of a designed pair of exchange-correlation functionals. Thus, along with a number of other functional XC pairs, these were also evaluated (Table 6). Not surprisingly, these two pairs performed well for 13C chemical shifts, as did LC-TPSSTPSS. Reviewing all of the many functional combinations considered here, clearly a range-separated functional is recommended. For predicting carbon shifts, the preferred combinations, in order of increasing errors, are: LC-BRxPBE (±6.6 ppm), LC-PKZBPKZB (±6.7 ppm), LC-PKZBPBE (±6.7 ppm), LC-TPSSTPSS (±6.7 ppm), LC-revTPSSrevTPSS (±6.7 ppm) and LC-OPBE (±6.7 ppm); it should be noted that many of the XC combination with the LC correction have very similar performance. If the LC correction is unavailable, another range separated functional, such as HISS (±7.5 ppm), M11 (±7.1 ppm), CAM-B3LYP( ±7.9 ppm) or ωB97X-D (±7.9 ppm), can be used. If these are not available, M05-2X (±7.4 ppm), SOGGA11X (±7.6 ppm), WC04 (±7.9 ppm) – a reparameterization of B3LYP – or M06-2X (±8.0 ppm) can be used.

Cencek and Szalewicz examined the behavior of longrange corrected functionals and concluded that, in general, a single “universal” (i.e., system-independent) ω of 0.4 bohr–1 does not perform well and that ω should be tuned (based on the ionization potential) for each system;125 this is in agreement with the conclusions of this paper in that a LC-functional with a standard value of ω is recommended. Needless to say, this is too complicated for a general purpose methodology, especially if several candidate structures (vide infra) are being considered. Nonetheless, Alipour and Fallahzadeh did consider this and tuned ω when predicting 31P–1H spin-spin coupling constants.126 NMR Method. There are four methods for calculating NMR shielding tensors, from which the chemical shifts are determined. In GAUSSIAN09, the default is to use the gauge-independent atomic orbital (GIAO) method, and, therefore, this method was chosen for the evaluation of the various components of the chemical shift calculations. The other methods available are the continuous set of gauge transformations (CSGT), individual gauges for atoms in molecules (IGAIM, a variation on CSGT), and single origin (SO). The comparison of all four methods is listed in Table 7. SO was included for completeness and, as advertised in the GAUSSIAN09 users’ manual, does quite poorly. CSGT and IGAIM perform similarly and both outperform GIAO; similar observations were made by Toomsalu and Burk.15

Table 4. Evaluation (coefficient of determination R2 and standard error se(δ) in ppm) of LC-based functionals using the Becke88 (B) exchange functional and various correlation functionals and using the def2-TZVPD basis set and the SMD solvation model for predicting 1H and 13C NMR chemical shifts.

functional

δ(13C)

Table 5. Evaluation (coefficient of determination R2 and standard error se(δ) in ppm) of LC-based functionals using the PBE correlation functional and various exchange functionals and using the def2-TZVPD basis set and the SMD solvation model for predicting 1H and 13C NMR chemical shifts.

δ(1H)

R²

se(δ)

R²

se(δ)

LC-BLYP

0.984

6.97

0.987

0.288

LC-BP86

0.985

6.75

0.986

0.303

LC-BB95

0.985

6.84

0.985

0.310

LC-BPW91

0.985

6.75

0.986

0.306

LC-BPBE

0.985

6.73

0.986

0.307

LC-BTPSS

0.985

6.72

0.985

0.312

LC-BrevTPSS

0.985

6.74

0.985

0.310

LC-BKCIS

0.985

6.71

0.985

0.316

LC-BBRc

0.984

6.94

0.986

0.304

LC-BPKZB

0.985

6.73

0.986

0.306

Page 10 of 26

functional

The question remains why the range-separated and long-range corrected functionals perform so well. To the best of our knowledge, they have not previously been evaluated for NMR chemical shift predictions. One hypothesis is that they perform well specifically because they correct the long-range behavior of the functional, and it is known that NMR spectra are influenced by long range effects (e.g., long-range through-space and through-bond coupling constants).

δ(13C)

δ(1H)

R²

se(δ)

R²

se(δ)

LC-BPBE

0.985

6.73

0.986

0.307

LC-PW91PBE

0.985

6.74

0.986

0.307

LC-mPWPBE

0.985

6.74

0.986

0.307

LC-G96PBE

0.985

6.75

0.986

0.306

LC-PBEPBE

0.985

6.72

0.985

0.308

LC-OPBE

0.985

6.71

0.986

0.306

LC-TPSSPBE

0.985

6.75

0.987

0.295

LC-revTPSSPBE

0.985

6.72

0.987

0.294

LC-BRxPBE

0.986

6.61

0.986

0.302

LC-PKZBPBE

0.986

6.67

0.986

0.306

LC-ωPBEhPBE (HSEPBE)

0.981

7.56

0.984

0.326

LC-PBEhPBE

0.985

6.72

0.985

0.309

10


Page 11 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Table 6. Evaluation (coefficient of determination R2 and standard error se(δ) in ppm) of LC-based functionals using various exchange and correlation functionals using the def2TZVPD basis set and the SMD solvation model for predicting 1 H and 13C NMR chemical shifts.

functional

δ(13C)

a

See Computational Methods section for details on each method.

Relativistic Effects. Given that both carbon and hydrogen are at the top of the periodic table, one would expect that relativistic effects would be negligible. However, looking beyond simple organic chemistry, one may be interested in systems that include atoms – such as transition metals or the heavy main-group elements – where one cannot ignore their contributions. The effects of using the Douglas-Kroll-Hess second order scalar relativistic Hamiltonian (DKH) was considered (Table 9). While the impact is minimal, one needs a heavy atom to properly consider the impact of relativistic effects and the model used to account for them. One common method of accounting for relativistic effects is to use a relativistic effective core potential (RECP) on heavy atoms. However, in some cases this has been shown to be insufficient. (This is particularly true when calculating coupling constants (J) where an RECP could lead to physically meaningless results.) For instance, Iché-Tarrat and Marsden found that the RECPs that they tried yielded poor 19F NMR predictions in UVI complexes,128 yet Bayse had success in using an RECP in the prediction of 77 Se chemical shifts (at least similar errors were obtained with an RECP and in all-electron calculations).129 Cosentino et al. found that the choice of RECP core was significant in the prediction of 13C chemical shifts in lanthanide(III) complexes.130 In particular, a heavy atom situated next to light atoms has been shown to have a significant influence on the chemical shifts of the light atoms – the so-called HALA effect.131 Rusakov et al. systematically considered the effects of 3p and heavier groups 13-17 atoms (i.e., the heavy atom or HA) on 13C (i.e., the light atom or LA) chemical shifts and found that these effects become increasingly more important (and hence less ignorable) as one descends the periodic table.131 To properly account for relativistic effects, more rigorous two- or four-components methods, with special basis sets, may be required. Because of the inherent complexity of considering relativistic effects, and the need for specialized codes and basis sets, this is beyond the scope of this study, especially given the intended focus on organic molecules, where heavy atoms are less common. Nonetheless, it should be noted that Vícha et al. have presented a recommended methodology for calculating chemical shifts in platinum and iridium complexes;132 it is reasonable that this should be transferable to other transition metal and heavy main-group systems.

δ(1H)

R²

se(δ)

R²

se(δ)

LC-TPSSTPSS

0.985

6.71

0.986

0.299

LC-revTPSSrevTPSS

0.985

6.71

0.987

0.296

LC-PKZBPKZB

0.986

6.66

0.986

0.305

LC-BRxBRc

0.985

6.81

0.986

0.298

LC-HCTH

0.983

7.23

0.988

0.274

LC-τHCTH

0.982

7.44

0.989

0.270

LC-M06-L

0.983

7.32

0.987

0.292

LC-M11-L

0.960

11.15

0.984

0.323

LC-N12

0.983

7.30

0.987

0.295

Table 7. Evaluation (coefficient of determination R2 and standard error se(δ) in ppm) of the choice of NMR method on the accuracy of the 1H and 13C NMR chemical shift predictions using the COSMO-PBE0/def2-TZVPD level of theory.

functional

δ(13C)

δ(1H)

R²

se(δ)

R²

se(δ)

GIAO

0.966

10.20

0.980

0.359

CSGT

0.980

7.88

0.986

0.302

IGAIM

0.980

7.88

0.986

0.302

SO

0.926

15.07

0.158

2.35

Solvation Model. The SMD model has been shown to be reliable in many applications,109 and therefore was selected as the default method. However, as is evident from the data in Table 8, the COSMO method is more accurate for these purposes. It should be noted that when solute-solvent hydrogen bonds are present, inclusion of an explicit solvent molecule is beneficial;127 this is beyond the scope here and the systems where chosen such that this issue is avoided. Table 8. Evaluation (coefficient of determination R2 and standard error se(δ) in ppm) of the solvation model on the accuracy of the 1H and 13C NMR chemical shift predictions using the GIAO NMR method at the COSMO-PBE0/def2TZVPD level of theory.

Solvation Modela

δ(13C)

δ(1H)

R²

se(δ)

R²

se(δ)

SMD

0.966

10.20

0.980

0.359

IEF-PCM

0.978

8.29

0.985

0.314

COSMO

0.978

8.22

0.988

0.283

CPCM

0.978

8.28

0.985

0.310

11



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

tend to be more sensitive to grid issues112f but this was not observed here (see SI, Table S1).

Table 9. Evaluation (coefficient of determination R2 and standard error se(δ) in ppm) of the impact of relativistic effects (DKH model) on the accuracy of the 1H and 13C NMR chemical shift predictions using the PBE0 exchange-correlation functional and the SMD solvation model and run using NWCHEM.

functional

δ(13C)

Table 10. Evaluation (coefficient of determination R2 and standard error se(δ) in ppm) of the recommended methodology (COSMO-CSGT-LC-TPSS/cc-pVTZ) for 1H and 13C NMR chemical shift predictions using geometries optimized at different levels of theory.

δ(1H)

R²

se(δ)

R²

se(δ)

PBE0/def2-TZVP

0.977

8.36

0.986

PBE0/TZP-DKH

0.976

8.60

SSB-D/def2-TZVP

0.967

10.04

SSB-D/TZP-DKS

0.966

10.24

Page 12 of 26

δ(13C)

δ(1H)

0.299

level of theory for geometry optimization

R²

se(δ)

R²

se(δ)

0.987

0.295

DF-PBE/def2-SVP

0.982

7.45

0.984

0.327

0.979

0.372

DF-TPSSTPSS/def2-SVP

0.982

7.51

0.983

0.336

0.980

0.365

DF-LC-TPSSTPSS/def2SVP

0.986

6.48

0.981

0.350

Recommended Sets of Parameters. Based on the above results, a recommended methodology can be suggested. From Table 7 either CSGT or IGAIM is recommended over the more popular GIAO method. Of the DFT exchange-correlation functionals tested, LCBRxPBE (±6.6 ppm), LC-PKZBPKZB (±6.7 ppm), LCPKZBPBE (±6.7 ppm), LC-TPSSTPSS (±6.7 ppm), LCrevTPSSrevTPSS (±6.7 ppm) and LC-OPBE (±6.7 ppm) were the best performers. While the choice of grids did not seem to affect the accuracy of the results (Table 1), too small a grid is not recommended and, thus, the default grids in Gaussian09 can be used. Based on Table 2, one could use either a Dunning basis set (either ccpVTZ or aug-cc-pVDZ) or the IGLO-III basis set. However, since the latter is available only for a limited set of elements, the former is recommended. Not only are they available for the first row transition metals, but if a heavier nucleus is present in the molecule, Peterson’s ccpVnZ-PP RECP-basis set133 can be used in conjunction with the corresponding Dunning basis set (after first evaluating its performance as noted above). While the accuracy provided by DFT is sufficient for predicting 13C chemical shifts, even the smallest error (MN12-SX and LC-τHCTH at 0.27 ppm) is far too large for accurate 1H chemical shift predictions, although this might suffice if one were interested in trends rather that absolute values. In summary, the recommended method would be CSGT-LC-TPSSTPSS/cc-pVTZ. While LC-BRxPBE and LC-PKZBPKZB did have a slightly lower errors, LC-TPSSTPSS is recommended since this functional is more commonly available in the various DFT codes. Having determined a recommended level of theory, the effect of the choice of the level of theory of the geometry optimizations was considered. All geometries were reoptimized using the LC-TPSSTPSS and the related TPSSTPSS XC functionals (with density-fitting), and the NMR evaluations are given in Table 10. Clearly there is advantage to using LC-TPSSTPSS/def2-SVP geometries. It has been noted that meta-GGA functionals

Rotational-Vibrational Corrections. Molecular properties that do not depend on the nuclear momentum – such as NMR isotropic shifts (σ) – can be calculated within the Born–Oppenheimer approximation. Within this approximation, the value is not equal to the value for the equilibrium (i.e., corresponding to a minimum on the potential energy surface) geometry (i.e., σeq) – as is customarily done – but rather as the expectation value of the property as a function of nuclear coordinates 𝜎 𝐪 : 𝛹𝜎 𝐪 𝛹 𝜎 = 𝛹𝛹 where 𝛹 is the nuclear wavefunction of the appropriate vibrational state. This difference 𝛥𝜎 = 𝜎 − 𝜎*P can be calculated in different ways, although a convenient derivation using perturbation theory to give the zero-point vibrational (ZPV) correction is:134 K

𝛥𝜎 Q#R = $

YZ[\]

TWI

K UH V ⋅ ST UPTH

K

𝐪WX

−$

YZ[\]

YZ[\]

^T__ K UV ⋅ ST UPT 𝐪WX S_ TWI _WI

where 𝑛+)*P is the number of vibrational frequencies (equal to the number of degrees of freedom), ωi is the harmonic frequency of the ith normal mode, 𝜙&9b is the reduced cubic force constant with respect to these normal modes, and 𝑞& is the reduced normal coordinate.134135 The corresponding rotational-vibrational correction, which adds temperature dependence, is: 𝛥𝜎

):Bd&;

1 =− 2

jZ[\]

1 𝜕𝜎 ⋅ 𝜔& 𝜕𝑞&

&hK jZ[\]

⋅ 9hK

−

+

12


𝐪hi

𝜙&99 ℎ𝑐𝜔& ⋅ coth 𝜔9 2𝑘r 𝑇

𝑘r 𝑇 1 2𝜋𝑐 ℎ𝑐𝜔& 1 4

jZ[\]

&hK

Chw,y,z

𝑎&CC ℯ 𝐼CC

1 ℎ𝑐𝜔& 𝜕J𝜎 ⋅ coth ⋅ J ω> 2𝑘r 𝑇 𝜕𝑞&

𝐪hi

Page 13 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Table 11. Evaluation (coefficient of determination R2 and standard error se(δ) in ppm) of the effects of the zero-point vibrational (ZPV) corrections on the preferred methodology (COSMO-CSGT-LC-TPSS/cc-pVTZ) for 1H and 13C NMR chemical shift.

ℯ where 𝐼CC is the effective moment of inertia with respect to the normal mode and 𝑎&CC is the linear expansion coℯ 136 efficient of 𝐼CC . Because one needs the cubic force constants and the derivatives of the isotropic shifts (which themselves are already a second-derivative property – once with respect to the external magnetic field and once with respect to the nuclear magnetic moments of the atomic nuclei), the calculation of these corrections becomes increasingly most computationally expensive as the size of the system increases. Auer et al. calculated 𝛥𝜎 Q#R for a series of small molecules (acetone and smaller) at the HF and MP2 levels and found corrections in the range of -0.5 to -4.5 ppm.135b While not negligible when considering absolute shieldings, they are highly systematic and their impact will be lessened when calculating chemical shifts. Moreover, if one uses linear regression to determine chemical from isotropic shifts, any missing systematic error should, at least in part, be accounted for in the regression, thereby even further reducing their impact. In a different study, Teale et al. found that including vibrational corrections to the equilibrium geometry (i.e., adding 𝛥𝜎 Q#R ) actually worsened the results when using DFT, although they did help when using coupled cluster methods.137 The rotational corrections were found to be around two orders of magnitude or more smaller,134 and thus will have even less impact on the chemical shifts. The zero-point vibrational corrections 𝛥𝜎 Q#R were calculated using the suggested level of theory (COSMO(CH2Cl2)-LC-TPSSTPSS/cc-pVTZ//DF-LCTPSSTPSS/def2-SVP). For the most part, the corrections for 13C are between -1 and 0 and the average value is -0.15 ppm. However, this value is skewed by CCl4, where the correction is actually positive and large (13.44 ppm), possible because the carbon atom is surrounded by four heavier atoms. Other systems that have large magnitude corrections include TMS (average -2.25 ppm), CHCl3 (-2.36 ppm) and formic acid (0.49 ppm). For 1H, the corrections are between -0.5 and 0.4 ppm and therefore will have a larger impact. However, for both nuclei the improvement obtained by calculating the zero-point vibrational corrections (Table 11) is small, especially considering the need of the cubic force constants that are so costly to calculate and scale so rapidly with system size (see Table S2 for a selected timings); this is going to be especially true with the sizes of the systems considered in the Case Studies section. Given that the rotational-vibrational corrections were previously found to be at least 2 orders of magnitude smaller,134 they were not considered here.

δ(13C)

δ(1H)

R²

se(δ)

R²

se(δ)

uncorrected

0.982

7.51

0.983

0.336

ZPV-corrected

0.984

7.07

0.986

0.306

A Modified DP4 Probability. Smith and Goodman recently proposed the DP4 probability to assign computed structures to a set of experimental data138 (and subsequently improved upon by Grimblt et al.139). If the molecule has N NMR signals and M possible structures being considered, they define the DP4 probability as: „ bhK

𝑃"#$ 𝑖 =

1 − 𝑇•

… 9hK

„ bhK

& 𝛿€:•‚,b − 𝛿*w‚,b − 𝜇 𝜎

1 − 𝑇•

𝛥𝛿 − 𝜇 𝜎

where 𝑇 • is the cumulative Student t-distribution function with 𝜈 degrees of freedom, µ is the mean of the t distribution and σ is the standard deviation. Since the NMR chemical shifts are scaled against the empirical data, µ = 0. Smith and Goodman found σ = 2.306 ppm and 𝜈 = 11.38 by fitting calculated (gas-phase GIAOB3LYP/6-31G(d,p)) and experimental 13C NMR shifts. The DP4 parameters for the method of choice found here were determined using the approach used by Stevens and Goodman.138 Starting from the geometries in their paper, the geometries were reoptimized at the DFLC-TPSSTPSS/def2-SVP/def2-SV level of theory and the NMR isotropic shifts were calculated at the COSMO(solvent)-CSGT-LC-TPSSTPSS/cc-pVTZ level of theory (solvent = solvent used in the experimental studies).140 The original DP4 equation has terms involving the ‡ˆ B‰ cumulative Student t-distribution 1 − 𝑇 • . The V

intent is to have terms involving the probability of the error being greater than 𝛥𝛿 (i.e., the 𝛥𝛿 , ∞ tail of the t-distribution function), but the use of the cumulative w distribution function 𝑇 • 𝑥 = B• 𝑡 • 𝑡 𝑑𝑡 (where 𝑡 • 𝑥 is the probability distribution function used in the fitting), which is the probability that the error is less than 𝑥, means that the −∞, − Δδ tail is not considered – the Student t-distribution is after all an even function symmetric around zero. The t-distribution curve is fit using 𝛥𝛿, which can be both positive or negative. Thus, what one really wants are terms involving one minus twice the probability the error is between 0 and 𝛥𝛿 , or, with the assistance of Wolfram Mathematica 10: 1 𝜈 1 − 2 𝑇 • 𝑥 − 𝑇 • 0 = 1 − 𝐼z , 2 2

13



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

served in C6D6 but not in DCM-d2. b The alcohol OH and the pyrrole NH signals were not observed in CD3OD but were (except tBuOH) in DCM-d3.

where: 𝑥J +𝜈 𝛥𝛿 − 𝜇 𝑥= 𝜎 and 𝐼z [𝑎, 𝑏] is the regularized incomplete beta function (equivalent to the cumulative beta distribution function).141 (A more complete explanation and derivation is presented in the SI along with two alternate derivations.) Thus, the modified DP4 probabilities are: 1 𝜈 „ bhK 1 − 𝐼z“T 2 , 2 𝑃"#$ 𝑖 = 1 𝜈 … „ bhK 1 − 𝐼z _ 2 , 2 9hK “ where 𝑧=

𝑧b& = 𝑥&,b =

𝑥J

Samoquasine A. Samoquasine A, a benzoquinazoline alkaloid isolated from the seeds of Annona squamosa (custard apple), was isolated in 2000 and assigned structure 1a (Scheme 1).142 Compounds isolated from this tree have been shown to have various medicinal properties, including anti-ulcer143 and cytotoxic144 activities. After preparing 1a and determining that this is not the structure of the isolated samoquasine A, Yang et al. proposed an alternate structure (1b).145 A DFT study by Timmons and Wipf later proposed this same alternate structure (1b) based on comparison of calculated and experimental 13C NMR spectra.146 Nonetheless, 1b (also known as perlolidine) had previously been ruled out.147 Two other isomers (1c and 1d) were prepared by Monsieurs et al., characterized by NMR and powder X-ray diffraction, and conclusively demonstrated not to be samoquasine A.148 Thus, to the best of our knowledge, the structure of samoquasine A remains uncertain. Here, starting from the geometries of the 48 isomers provided by Timmons and Wipf in the SI to their paper,146 we applied our recommended methodology. All the geometries were reoptimized so that all calculation are consistent. From these structures, the 1H and 13C NMR spectra were calculated. As Timmons and Wipf did, the experimental and calculated chemical shifts were sorted numerically (vide infra) and the absolute differences ( 𝛥𝛿 ) between the experimental and each calculated spectrum were determined. Plotting the experimental shifts versus the calculated isotropic shifts gives the regression parameters to determine the calculated chemical shifts. Whereas Timmons and Wipf only used max 𝛥𝛿 as the determining factor in which is the most likely structure of samoquasine A, here the mean absolute deviation (MAD) is also used.

J 𝑥&,b J 𝑥&,b +𝜈

& & 𝛿

1H and 13C NMR scaling factors for the calculation of chemical shifts in commonly used solvents using density functional theory.

Calculating Electron-Transfer Coupling with Density Functional Theory: The Long-Range-Corrected Density Functionals.

Evaluation of Range-Corrected Density Functionals for the Simulation of Pyridinium-Containing Molecular Crystals.

Density functional investigation of intermolecular effects on 13C NMR chemical-shielding tensors modeled with molecular clusters.

Aqueous acidities of primary benzenesulfonamides: Quantum chemical predictions based on density functional theory and SMD.

Ions in solution: density corrected density functional theory (DC-DFT).

Correction: Fragment density functional theory calculation of NMR chemical shifts for proteins with implicit solvation.

Successes and failures of Hubbard-corrected density functional theory: the case of Mg doped LiCoO2.

Accurate predictions of C-SO2R bond dissociation enthalpies using density functional theory methods.

Symmetry-adapted perturbation theory with Kohn-Sham orbitals using non-empirically tuned, long-range-corrected density functionals.

Prediction of 13C NMR chemical shift sum using topological indices: Role of recently introduced Balaban F and G indices.

Metallophilic interactions from dispersion-corrected density-functional theory.

Fully automatic assignment of small molecules' NMR spectra without relying on chemical shift predictions.

Using Wannier functions to improve solid band gap predictions in density functional theory.

Perspective: Fifty years of density-functional theory in chemical physics.

Long-Range Corrected Hybrid Density Functionals with Improved Dispersion Corrections.

Calculation of Hyperfine Tensors and Paramagnetic NMR Shifts Using the Relativistic Zeroth-Order Regular Approximation and Density Functional Theory.

On the accuracy of density functional theory and wave function methods for calculating vertical ionization energies.

13C-NMR of Clostridium pasteurianum ferredoxin after reductive methylation of the amines using [13C]formaldehyde.

Quantum chemical density functional theory studies on the molecular structure and vibrational spectra of mannitol.

FT-IR spectroscopy and density functional theory calculations of 13C isotopologues of the helical peptide Z-Aib6-OtBu.

Density functional theory in the solid state.

Quest for a universal density functional: the accuracy of density functionals across a broad spectrum of databases in chemistry and physics.

1H, 15N, and 13C chemical shift assignments of cyanobacteriochrome NpR6012g4 in the red-absorbing dark state.