International Journal of Biological Macromolecules 79 (2015) 983–987

Contents lists available at ScienceDirect

International Journal of Biological Macromolecules journal homepage: www.elsevier.com/locate/ijbiomac

Rapid analysis of polysaccharides contents in Glycyrrhiza by near infrared spectroscopy and chemometrics Ci-Hai Zhang a , Yong-Huan Yun a , Wei Fan b , Yi-Zeng Liang a,∗ , Yue Yu a , Wen-Xian Tang a a b

Research Center of Modernization of Chinese Medicines, Central South University, Changsha 410083, China Joint Lab for Biological Quality and Safety, College of Bioscience and Biotechnology, Hunan Agriculture University, Changsha 410128, China

a r t i c l e

i n f o

Article history: Received 25 March 2015 Received in revised form 3 June 2015 Accepted 14 June 2015 Available online 17 June 2015 Keywords: NIR spectroscopy Polysaccharides Partial least squares regression Variable selection

a b s t r a c t A method for quantitative analysis of the polysaccharides contents in Glycyrrhiza was developed based on near infrared (NIR) spectroscopy, and by adopting the phenol–sulphuric acid method as the reference method. This is the first time to use this method for predicting polysaccharides contents in Glycyrrhiza. To improve the predictive ability (or robustness) of the model, the competitive adaptive reweighted sampling (CARS) mathematical strategy was used for selecting relevance wavelengths. By using the restricted relevance wavelengths, the PLS model was more efficient and parsimonious. The coefficient of determination of prediction (Rp 2 ) and the root mean square error of prediction (RMSEP) of the obtained optimum models were 0.9119 and 0.4350 for polysaccharides. The selected relevance wavelengths were also interpreted. It proved that all the wavelengths selected by CARS were related to functional groups of polysaccharide. The overall results show that NIR spectroscopy combined with chemometrics can be efficiently utilised for analysis of polysaccharides contents in Glycyrrhiza. © 2015 Elsevier B.V. All rights reserved.

1. Introduction Glycyrrhiza known as Chinese liquorice is widespread in many districts of China. It contains a variety of active substances and has a long history of use. The first recorded use was as a drug for the treatment of wounds as described in the book Wu Shi Er Bing Fang, dating to the second century B. C. [1]. Polysaccharide is one of the main active ingredients of Glycyrrhiza. Recently, it has been reported that Glycyrrhiza polysaccharides had many functions such as immunity regulation [2], phagocytosis [3], antivirus [4], antioxidant [5], antitumor [6], anticomplement [7], and it has low cellular toxicity [4]. By far, quantitative analysis of the polysaccharides contents in Glycyrrhiza is mainly made by the colorimetric method, such as the anthrone–sulphuric acid method or phenol–sulphuric acid method [8,9]. However, these methods are tedious, destructive, and require severe conditions of high temperature and strong acid, and not suitable for the fast determination and real-time analysis. As a rapid, economical and nondestructive analytical technique, NIR spectroscopy has gained wide acceptance in food industry, agriculture, pharmaceutical industry, and petrochemical industry [10–14]. The principle of NIR measurement is based on the

∗ Corresponding author. Tel.: +86 731 88830831; fax: +86 731 88830831. E-mail address: yizeng [email protected] (Y.-Z. Liang). http://dx.doi.org/10.1016/j.ijbiomac.2015.06.025 0141-8130/© 2015 Elsevier B.V. All rights reserved.

correlation between the NIR absorption at different wavelength and the data of sample composition obtained by reference method. The absorption peaks of NIR spectra are broad and overlap, making single wavelength calibration impossible due to large hidden information in spectral data. In such case, the data are often calibrated with the classical partial least squares (PLS) regression. But, this calibration process is built on the full-range spectra (all variables). It did not feature preliminary selection, but introduce latent variables comprised of combinations of the original features. It is hard to interpret the original features. The influence of data (like noise) that does not contain critical information can also severely corrupt the resulting calibration models. Therefore, variable selection in multivariate analysis is very important. Liang and co-workers confirmed that the predictive ability can be increased and the complexity of the model can be reduced by a judicious pre-selection of wavelengths [15–17]. The competitive adaptive reweighted sampling (CARS) method is a recently proposed variable selection method. Li et al. demonstrated that CARS performed a competitive selection of some key wavelengths which were interpretable to the chemical property of interest, by comparing CARS with a moving window (MW) and a Monte Carlo uninformative variable elimination (MC-UVE) selection method [18]. In this study, we aimed to evaluate the feasibility of using NIR spectroscopy for rapid and nondestructive measurement of the contents of polysaccharides in Glycyrrhiza. The specific objectives

984

C.-H. Zhang et al. / International Journal of Biological Macromolecules 79 (2015) 983–987

Table 1 A summary of the tested samples. Samples no.

Origins (province)

Collected time

1–6 7–18 19–28 29–39 40–53 54–89

Shandong Neimenggu Anhui Shanxi Ningxia Gansu

September, 2013 September, 2013 June, 2014 June, 2014 July, 2014 July, 2014

of this research were (1) to obtain informative variables applying the CARS mathematical strategy; (2) to establish the quantitative model between the NIR spectra and the contents of polysaccharides using PLS with full spectra and CARS-PLS to compare their predictive abilities; (3) to analyze the effective wavelengths in the models based on CARS algorithms. 2. Materials and methods 2.1. Samples A total of 89 samples were collected from different provinces of China in the period of September 2013–July 2014 (see Table 1). It provided a representative set of Glycyrrhiza consumed in China and was used to test the robustness of the NIR models as a function of the variability due to the sample nature and different origin. This work was carried out with the root of Glycyrrhiza. All the samples were dried in a oven at 50 ◦ C for about 4 h upon acquisition and ground into powder and passed through a 60-mesh (0.3 mm) sieve. These sieved powders were used for further analysis. 2.2. Reagents Analytical grade d-glucose was purchased from Sigma–Aldrich (Sigma, St. Louis, MO, USA). Other reagents including H2 SO4 , ethanol, phenol were of analytical grade from Peking Chemical Co. (Peking, China). The water used in all test was treated in a Milli-Q water purification system (Millipore, Bedford, MA, USA). 2.3. Chemical analysis Determination of the contents of polysaccharides: the reference method was the improved phenol–sulphuric acid method [19]. 2.4. NIR analysis 2.4.1. NIR spectroscopy measurement Spectral data were collected by measuring the diffuse reflectance from the Glycyrrhiza samples in the NIR region of 6000–11,000 cm−1 , using the i-Spec system (BWS015, BWTEK, USA). Each spectrum was obtained by an average of 64 scans with a resolution of 8 cm−1 using a background of the air. Duplicates of each sample were scanned three times. The average spectrum of each sample was used in subsequent data analysis. 2.4.2. Spectral preprocessing methods The objective of spectral preprocessing is to remove physical phenomena in the spectra in order to improve the subsequent multivariate regression, and to provide better end-models through careful selection of spectral preprocessing methods. In this study, two standard data preprocessing methods, i.e., standard normal variate (SNV) [20] and the first order Savitzky–Golay derivative (SG1) [21], were applied to reduce multi-collinearity and the baseline offset arising from scattering effects, thus enhancing the information related to chemical constituents.

2.4.3. Sample subsets partitioning The samples were divided into calibration and prediction subsets for multivariate modeling using the joint x–y distance (SPXY) algorithm. The SPXY method employs a stepwise procedure to select samples according to their differences in both x (instrumental responses) and y (predicted parameter) spaces. The models constructed by full cross-validation revealed that SPXY technique may be an advantageous alternative to Kennard–Stone and random sampling methods in F-tests at 95% confidence level [22]. 2.4.4. Variable selection: an introduction of the CARS As stated above, the CARS method is a newly developed strategy for wavelength selection. The major idea of CARS is to use the principle ‘survival of the fittest’ based on Darwin’s evolution theory. The key wavelengths selected by CARS are defined as the wavelengths with the large absolute coefficients in a multivariate linear regression model. The algorithm goes following: Suppose that the data matrix X contains m samples in rows and p variables in columns. Vector y with order m × 1 denotes the measured property of interest. When modeling, both X and y are mean-centered [18]. 2.4.5. Calibration and prediction Polysaccharides content were quantified by the PLS algorithm. The optimum number of PLS factors was determined by the 10fold cross-validation procedure [23]. Outliers were detected by Monte-Carlo method [24]. Both methods were applied to ensure the predictive ability of the calibration model and to avoid overfitting. The external prediction step was carried out after calibration procedure mentioned above. 2.4.6. Evaluation of the performance of the models The best calibration model for each analysis was selected in terms of root-mean-square error of cross-validation (RMSECV), root-mean-square error (RMSE), correlation coefficient (Rc 2 ) in the calibration set and correlation coefficient (Rp 2 ) in the prediction set. According to the use of sample set, the RMSE was expressed in detail as root-mean-square error of calibration (RMSEC) for calibration set and root-mean-square error of prediction (RMSEP) for prediction set, respectively [25]. Good models should have lower RMSEC, RMSEP, RMSECV and higher correlation coefficient, but also small differences between RMSEP and RMSEC. 2.4.7. Software The data pretreatment was done in workspace of MATLAB for windows (version 7.8, Mathworks). The code of CARS is available for academic research in the website: http://code.google.com/p/ carspls/ 3. Results and discussion 3.1. The reference value range To formulate a robust calibration, an appropriate experimental design has to be implemented. One of the most important principles is that collecting samples with property reference values span a relevant range, so that all expected sources of variance was expressed in the training data. The reference value for the polysaccharides contents in all those samples were determined by the improved phenol–sulphuric acid method [19]. According to those values, the range for polysaccharides was from 6.52% to 11.65% with the standard deviations of 1.38% (mean = 8.88%). 3.2. Spectral analysis In Fig. 1, the raw spectra of 89 Glycyrrhiza samples displayed similar trends and have remarkable absorption bands around

C.-H. Zhang et al. / International Journal of Biological Macromolecules 79 (2015) 983–987

985

Table 2 Effects of pretreatment methods on performance of PLS calibration model. Pretreatment method

Original SNV + SG1a a

Wavelength range (nm)

950–1650 950–1650

PLS results for polysaccharides Variable number

RMSECV

Factors

Rc 2

RMSEC

432 432

0.8191 0.6575

8 6

0.9484 0.9822

0.2840 0.1620

SG1: Savitzkye Golay first derivative.

Fig. 1. Raw NIR spectra.

1000 nm, 1200 nm and 1450 nm. These peaks correspond to the overtone and combination bands of C H, O H and N H. It is apparent that no distinct dissimilarities can be observed from the original spectra by the naked eye. In this study, chemometric methods (SNV, Savitzkye Golay first derivative with a 2nd-order polynomial and a 5-point window) were performed to enhance the spectral information by eliminating baseline shifts and noise from the spectra. The results of pretreatment methods on PLS calibration models are shown in Table 2. It was found that the RMSEC (0.162) was lower and Rc 2 (0.9822) was higher than original spectra when SNV + SG1 derivative were used.

3.3. PLS model building 3.3.1. Sample outlier detection Construction of high-quality PLS models depends on the execution of several steps, one of the most important of these steps is outlier detection. Outliers contained in the calibration data set may be caused by the instrument, operation, and sample preparation and outliers may have a significant effect on the quality of the model. A method based on Monte-Carlo cross-validation is a new strategy for outlier detection. This method had been proved more efficient than other methods [24]. Therefore, in our work, spectrum outlier detection was using this method. The diagnostic plot for outlier detection is obtained and shown in Fig. 2. Clearly, the distribution of prediction errors of the normal sample has an approximately zero mean and a small standard deviation. The top left area is outliers in X direction which have a large standard deviation (67, 68), and the lower right area gives outliers in y direction and model outliers (45, 78), which have a large mean value. Three abnormal plots (3, 37, 87) on the top right area are outliers both in X and y direction. After the outlier removal, there are 82 samples for polysaccharide analysis. The descriptive statistics for the polysaccharides contents of the calibration and validation sets, which were divided using the SPXY algorithm.

Fig. 2. The diagnostic plot for outlier detection.

3.3.2. Wavelength regions for model development NIR spectroscopy involves energy transfer between light and matter. The spectral features of samples in the near-infrared spectral region are associated with the vibrational modes of functional groups. Organic matter present in samples has distinct spectral fingerprints in the NIR region because of the relatively strong absorption of overtones and the combination modes relative to several functional groups, such as C H (aliphatic), C H (aromatic), C O (carboxyl), O H (hydroxyl) and N H (amine and amide), usually present in the organic compounds. So limited wavelength regions were tested for the models in order to see if excluding the region where N H overtone bonds absorbs (1000–1100 nm, 1450–1600 nm) [26] could allow for a model of reduced complexity that could be more targeted towards the prediction of polysaccharides; possibly enabling improved predictions. The performance of the NIR calibration models with different wavelength ranges are presented in Table 3. The results obtained using the restricted spectral ranges showed that the RMSEC (0.1527) decreased and Rc 2 (0.9841) increased compared to that of the full spectrum, which indicated the optimum wavelength range removing the uninformative variables. 3.3.3. Selecting key variables by CARS Under the best conditions of selecting the most appropriate wavelength range and the best spectra pretreatments, the PLS models for polysaccharides can be developed. But there were still many uninformative variables we can not select by our knowledge. So, computer aided variable selection is very important. In this study, the variables of polysaccharides were selected by CARS. During CARS, RMSECV decreased as the wavelengths with more information were retained while other unimportant ones were eliminated. Once any key wavelength was removed, RMSECV value would rise sharply. So the critical point with the lowest RMSECV

986

C.-H. Zhang et al. / International Journal of Biological Macromolecules 79 (2015) 983–987

Table 3 Parameters of the optimal model for Glycyrrhiza polysaccharides, and their corresponding results.

Pretreatment method Wavelength range (nm) Variable number RMSECV Factors Calibration Rc 2 RMSEC Prediction Rp 2 RMSEP

PLS

PLS

PLS + CARS

SNV + SG1 950–1650 432 0.6575 6

SNV + SG1 Excluding 1000–1100, 1450–1600 279 0.6354 7

SNV + SG1 Excluding 1000–1100, 1450–1600 39 0.2291 7

0.9822 0.1620

0.9841 0.1527

0.9897 0.1231

0.7283 0.8158

0.8382 0.6296

0.9119 0.4350

Fig. 3. Scatters plot of PLS model calibrated on the wavelengths selected by CARS.

corresponded to the optimal wavelengths subset, which implied that the valuable information could be retained better only when variables are appropriately reserved, and ultimately the selected key wavelengths of polysaccharides were 39. Then the calibration models were established by the selected wavelengths. The results are given in Table 3. As can be seen, the value of RMSEC (0.1231) was smaller, and the value of Rc 2 (0.9897) was larger than the one obtain only by PLS, which clearly demonstrated that better prediction was obtained by CARS combined with PLS. 3.3.4. External validation We further checked the robustness of the model by applying NIR technology to an independent prediction set of samples. The predicted values versus measured values plots with optimal models in the prediction set for polysaccharide is shown in Fig. 3. As can be seen, as the points are randomly distributed around the bisectrix line, and the prediction results are shown in Table 3. The value of RMSEP was 0.4350 and the value of Rp 2 was 0.9119. So, the proposed procedure allows direct determination of polysaccharides in Glycyrrhiza samples of different origins. 3.3.5. Interpretation of selected wavelengths Polysaccharides belong to carbohydrates. It consisted mostly of aliphatic cyclic groups with attached OH groups and ether linkages. Thus, the bands normally associated with these functional groups may be observed in the near-infrared (NIR) spectra of polysaccharides molecules. The selected wavelengths by CARS were shown in Fig. 4. As can be seen, these wavelengths were mainly distributed in four ranges (950–980, 1125–1235, 1355–1450, 1610–1650 nm). The absorption at 950–980 nm was related to the second overtone of O H around 970 nm [26]. The absorption of 1125–1235 nm and 1610–1650 nm were in agreement with the second and first overtone of C H [26]. The absorption around 1355–1450 nm could be attributed to the first overtone of O H [27,28] in polysaccharides.

Fig. 4. Selected wavelengths of polysaccharides by CARS.

So all the wavelengths selected by CARS are related to functional groups of polysaccharides. 4. Conclusion According to the results stated above, the model of NIR combined with PLS predicted the polysaccharides contents in Glycyrrhiza had been investigated. This method for predicted polysaccharides contents in Glycyrrhiza has not been reported in other article, this is the first time. To improve the predictive ability (or robustness) of the model, the CARS strategy was used for selecting relevance wavelengths. All the selected relevance wavelengths were interpreted. By using the restricted relevance wavelengths, the PLS model was more efficient and parsimonious. The coefficient of determination of prediction (Rp 2 ) and the root mean square error of prediction (RMSEP) of the obtained optimum models were 0.9119 and 0.4350 for polysaccharides. The overall results show that NIR spectroscopy combined with chemometrics can be efficiently utilised for analysis of polysaccharides contents in Glycyrrhiza. Acknowledgements This work is financially supported by the National Nature Foundation Committee of P.R. China (Grant Nos. 21275164, 21075138), Students Innovation Training of Central South University (2282014bks015). References [1] Z. Liu, L. Liu, Essentials of Chinese Medicine, Springer-Verlag, London, 2009. [2] A. Cheng, F. Wan, J. Wang, Z. Jin, X. Xu, Int. Immunopharmacol. 8 (2008) 43–50.

C.-H. Zhang et al. / International Journal of Biological Macromolecules 79 (2015) 983–987 [3] M. Nose, K. Terawaki, K. Oguri, Y. Ogihara, K. Yoshimatsu, K. Shimomura, Chem. Pharmacol. Bull. (1998) 1110–1111. [4] Y.W. Wang, H.B. Zhang, J. Lv, Y.R. Shi, M. He, S.S. Wang, Y.F. Ding, Acta Sci. Nat. Univ. Nankaiensis (2000) 46–48. [5] L. Yang, H. Wang, F. Luo, J. Tarim Univ. 19 (2007) 1–3. [6] C. Wang, G.R. Xi, Y.R. Shi, L.H. Zhang, Chin. J. Clin. Oncol. (2003) 85–87. [7] K. Takada, M. Tomoda, N. Shimizu, Chem. Pharm. Bull. 40 (1992) 2487–2490. [8] M. Dubois, K.A. Gilles, J.K. Hamilton, P.A. Rebers, F. Smith, Anal. Chem. (1956) 350–356. [9] R. Dreywood, Ind. Eng. Chem. – Anal. Ed. 18 (1946) 499. [10] W. Li, J. Chen, B.R. Xiang, D.K. An, Anal. Chim. Acta 408 (2000) 39–47. [11] S. Macho, M.S. Larrechi, TrAC Trend Anal. Chem. 21 (2002) 799–806. [12] H. Büning-Pfaue, Food Chem. 82 (2003) 107–115. [13] G.W. Small, TrAC Trends Anal. Chem. 25 (2006) 1057–1066. [14] D.W. Lachenmeier, Food Chem. 101 (2007) 825–832. [15] Y.-H. Yun, W.-T. Wang, B.-C. Deng, G.-B. Lai, X.-B. Liu, D.-B. Ren, Y.-Z. Liang, W. Fan, Q.-S. Xu, Anal. Chim. Acta 862 (2015) 14–23. [16] H.D. Li, Y.Z. Liang, X.X. Long, Y.H. Yun, Q.S. Xu, Chemom. Intell. Lab. 122 (2013) 23–30.

987

[17] Y.H. Yun, Y.Z. Liang, G.X. Xie, H.D. Li, D.S. Cao, Q.S. Xu, Analyst 138 (2013) 6412–6421. [18] H. Li, Y. Liang, Q. Xu, D. Cao, Anal. Chim. Acta 648 (2009) 77–84. [19] Y. Chen, M.Y. Xie, H. Zhang, Y.X. Wang, S.P. Nie, C. Li, Food Chem. 135 (2012) 268–275. [20] R.J. Barnes, M.S. Dhanoa, S.J. Lister, Appl. Spectrosc. (1989) 772–777. [21] A. Savitzky, M.J.E. Golay, Anal. Chem. (1964) 1627–1639. [22] R.K.H. Galvao, M.C.U. Araujo, G.E. Jose, M.J.C. Pontes, E.C. Silva, T.C.B. Saldanha, Talanta 67 (2005) 736–740. [23] Q.S. Xu, Y.Z. Liang, Chemom. Intell. Lab. 56 (2001) 1–11. [24] D.-S. Cao, Y.-Z. Liang, Q.-S. Xu, H.-D. Li, X. Chen, J. Comput. Chem. 31 (2010) 592–602. [25] J.Y. Shi, X.B. Zou, J.W. Zhao, M. Holmes, K.L. Wang, X. Wang, H. Chen, Spectrochim. Acta A 94 (2012) 271–276. [26] X.B. Zou, J.W. Zhao, M.J.W. Povey, M. Holmes, H.P. Mao, Anal. Chim. Acta 667 (2010) 14–32. [27] W.Z. Lu, H.F. Yuan, G.T. Xu, D.M. Qiang, The Technology of Modern Near Infrared Spectral Analysis, China Petrochemical Press, Beijing, 2000. [28] J. Workman Jr., L. Weyer, Practical Guide to Interpretive Near-Infrared Spectroscopy, CRC Press, 2007.

Rapid analysis of polysaccharides contents in Glycyrrhiza by near infrared spectroscopy and chemometrics.

A method for quantitative analysis of the polysaccharides contents in Glycyrrhiza was developed based on near infrared (NIR) spectroscopy, and by adop...
837KB Sizes 0 Downloads 10 Views