Accepted Manuscript Using FT-NIR spectroscopy technique to determine arginine content in fermented Cordyceps sinensis mycelium Chuanqi Xie, Ning Xu, Yongni Shao, Yong He PII: DOI: Reference:

S1386-1425(15)00616-2 http://dx.doi.org/10.1016/j.saa.2015.05.028 SAA 13695

To appear in:

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy

Received Date: Revised Date: Accepted Date:

19 October 2014 8 May 2015 9 May 2015

Please cite this article as: C. Xie, N. Xu, Y. Shao, Y. He, Using FT-NIR spectroscopy technique to determine arginine content in fermented Cordyceps sinensis mycelium, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy (2015), doi: http://dx.doi.org/10.1016/j.saa.2015.05.028

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1

Using FT-NIR spectroscopy technique to determine arginine content in

2

fermented Cordyceps sinensis mycelium

3

Chuanqi Xie a , Ning Xu b, Yongni Shao a, Yong He a,*

4

a

College of Biosystems Engineering and Food Science, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China

5 6

b

7

Corresponding author *: Yong He, Tel: +86-571-88982143,Fax: +86-571-88982143,Email:

8

[email protected]

9

ABSTRACT

College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China

10

This research investigated the feasibility of using Fourier transform near-infrared (FT-NIR) spectral

11

technique for determining arginine content in fermented Cordyceps sinensis (C. sinensis) mycelium.

12

Three different models were carried out to predict the arginine content. Wavebumber selection methods

13

such as competitive adaptive reweighted sampling (CARS) and successive projections algorithm (SPA)

14

were used to identify the most important wavenumbers and reduce the high dimensionality of the raw

15

spectral data. Only a few wavenumbers were selected by CARS and CARS-SPA as the optimal

16

waveumbers, respectively. Among the prediction models, CARS-least squares-support vector machine

17

(CARS-LS-SVM) model performed best with the highest values of the coefficient of determination of

18

prediction ( Rp =0.8370) and residual predictive deviation (RPD=2.4741), the lowest value of root

19

mean square error of prediction (RMSEP=0.0841). Moreover, the number of the input variables was

20

forty-five, which only accounts for 2.04% of that of the full wavenumbers. The results showed that

21

FT-NIR spectral technique has the potential to be an objective and non-destructive method to detect

22

arginine content in fermented C.sinensis mycelium.

23

Keywords: Fourier transform near-infrared (FT-NIR) spectra; Arginine; Competitive adaptive

24

reweighted sampling (CARS); Successive projections algorithm (SPA); Prediction

25

Introduction

2

26

Cordyceps sinensis (Clavicipitaceae, Ascomycete; C.sinensis), which is also called “winter-worm

27

and summer-grass”, is composed of a parasitic fungus of Cordyceps sp. and its host [1]. It is one of the

28

most famous and traditional medicines, which can be found only in Qinghai-Tibet Plateau of China. It is

29

welcome by many people because of its pharmacological activities such as protecting lung and kidney

1

30

function, modulating the immune response, improving hyperlipidemia, hyperglycemia and sexual

31

function, inhibiting tumor growth and scavenging free radical [2-4]. Since the natural C.sinensis is

32

expensive and in short supply, the cultivated mycelia of C.sinensis which possesses the same functions

33

has become the major substitutes of the natural species [5]. The amino acid is considered to be one of the

34

most important components for the function of C.sinensis [6]. Arginine is a common amino acid in

35

C.sinensis. It is known to possess many modulatory functions on the endocrine and immune system,

36

which can improve nitrogen balance after trauma and heal wound through angiogenesis, cell

37

proliferation, collagen synthesis, epithelialization [7]. Though arginine is not considered to be an

38

essential amino acid for adults, it is regarded as a conditionally essential amino acid and equally

39

important to human being [8].

40

The most common method to detect amino acid content is high performance liquid chromatography

41

(HPLC) combined with amino acid analyzer technique [9]. Other detection methods are mass

42

spectrometry (MS) and capillary electrophoresis (CE) [10]. However, all of these methods have

43

limitations for that they are time-consuming, laborious, low efficiency, and required professional

44

operation. In addition, they cannot be applied in on-line detection. Therefore, a fast, objective and

45

non-destructive method is in urgent demand.

46

At present, Fourier transform near-infrared (FT-NIR) spectral technique has been widely used in

47

many fields [11-12] due to the advantages such as fast, non-destructive, low cost, simplicity, accuracy,

48

little sample preparation. The principle of FT-NIR spectral technique is to acquire the content or

49

instruction of different components of samples through the analysis of spectral data in the region of

50

12500 to 4000 cm-1 [13]. Three different models such as partial least squares regression (PLSR),

51

principal components regression (PCR) and least squares-support vector machine (LS-SVM) models

52

were established in this study. The process of establishing the prediction model based on full spectrum is

53

time consuming and against the high-speed characteristic of spectral technique [14]. On this account, this

54

study was also carried out to select effective wavenumbers from the full spectral data.

55

In this study, Fourier transform near-infrared (FT-NIR) spectral technique was used for determining

56

arginine content in fermented C.sinensis mycelium. The objectives of this work were: (1) to find the

57

quantitative relationships between the FT-NIR spectral data and arginine content; (2) to acquire optimal

58

wavenumbers using competitive adaptive reweighted sampling (CARS) and successive projection

59

algorithm (SPA), respectively; (3) to compare the performances of different models based on full

2

60

spectral and selected wavenumbers, respectively; (4) to identify the optimal model for the prediction of

61

arginine content; (5) to explain why the selected wavenumbers could be used to detect arginine content.

62

Materials and methods

63

Instrument and software

64

In this study, the Bruker MPA FT-NIR spectrometer (Brucker Optics, Ettlingen, Germany) with the

65

spectral range of 12500-4000 cm-1 and resolution of 8 cm-1 was used to obtain the spectral absorbance

66

information of samples. The spectrometer includes a Rock-Solid interferometer and an integrating

67

sphere diffuse reflector. The signal-to-noise ratio (SNR) for this system is not less than 800:1. The

68

detector was plugged into the sample powder for acquiring the spectral absorbance information, and each

69

spectrum was obtained by an average of 32 scans. The spectral data were obtained by OPUS 5.5 software

70

and saved as OPUS files. The Unscrambler 9.7 (CAMO Software AS, Oslo, Norway) and MATLAB

71

R2009a (The Math Works, Natick, USA) were used to preprocess the spectral data and establish models.

72

Samples preparation

73

A total of 195 dried fermented C.sinensis myceliums, which were provided by Hangzhou Zhongmei

74

Huadong Pharmaceutial Co. Ltd, were used for study. The arginine content was determined by

75

Automatic Amino Acids Analyzer system (Biochrom 30+Series, Biochrom Ltd, Cambridge, UK). A

76

total of 60 mg of each sample was placed in a headspace vial (volum-20 ml), and 10ml HCl (6 mol/L)

77

was added into the sample. The sample was set in an oven at 110±1℃ for 24h after removing the air by

78

nitrogen. The hydrolyzed sample was transferred into a volumetric flask (volum-50ml) after being

79

cooled, and diluted to 50ml with purified water. Then 5ml of the solution was filtering, and 0.5ml of the

80

filtrate was dried in a vacuum drying oven at 60℃. About 1ml sodium citrate buffer (pH=2.2) was added

81

into the residue. Finally, 50µl of the solution was injected into the Automatic Amino Acids Analyzer

82

system. The wavelengths used for detection were 440 and 570nm.

83

In order to avoid bias in subset partition, the 195 samples were arranged in an ascending order

84

according to the Y values (arginine content), then they were divided into the calibration set and the

85

prediction set at a ratio of 2:1 [15]. One sample was picked out from every three ones consecutively

86

which resulted in 130 samples for the calibration set and 65 ones for the prediction set. Full

87

cross-validation was performed for the calibration and validation sets. The statistical arginine content of

88

each set was shown in Table 1.

89

Test flow chart

3

90

The main steps of this study were illustrated in Fig. 1. The spectral information of arginine was

91

obtained by the FT-NIR spectrometer across the wavenumbers of 12500 to 4000 cm-1 in the first step.

92

Then arginine content was determined using HPLC method. All samples were divided into two sets

93

(calibration and prediction) with the ratio of 2:1. After several pre-processing methods and effective

94

variables selection, different models were established to predict arginine content based on the full

95

spectral wavenumbers and selected wavenumbers, respectively. Finally, the optimal model was

96

determined on the basis of the values of coefficient of determination (R2 ), residual predictive deviation

97

(RPD), root mean square error of calibration (RMSEC) and root mean square error of prediction

98

(RMSEP). All of the steps mentioned above were analyzed as follows.

99

Calibration algorithms

100

PLSR is performed by establishing a linear regression model between the variable matrix Y (arginine

101

content) and the variable matrix X (spectral information), which has been widely used in many studies

102

[16-18]. The predicted result is achieved by extracting a set of orthogonal factors which have powerful

103

predicted ability [19]. The PLSR algorithm can be described as follows:

104

Y = aX +b

105

Where Y is the response matrix of the samples, X is the predicted matrix of the samples, a is the matrix of

106

regression coefficients obtained from PLSR, and b is the matrix of residual information.

(1)

107

PCR can compress the high dimensions of the original variables effectively and accelerate the

108

calculation by ignoring the minor components. This method has been widely studied and produced

109

many successful applications [20-21]. In PCR algorithm, the multi-collinearity problem, which may

110

bring about the instability of the prediction model, can be effectively avoided.

111

LS-SVM can handle both linear and nonlinear multivariate problem in a fast way, and therefore has

112

been widely used in many fields [22-23]. It employs nonlinear map function and maps the input features

113

to a high dimensional space, thus changing the optimal problem into equality constraint condition [24].

114

The LS-SVM algorithm could be shown as follows: N

y ( x ) = ∑ α k K ( x , x k ) + b (2)

115

k =1

αi

are Lagrange multipliers, K (x, xi ) is the kernel function, b is the bias value. RBF

116

Where

117

kernel was used as the kernel function of LS-SVM in this study. The LS-SVM parameters were the

118

regularization parameter gam (γ) and the width parameter sig2 (σ2). The gam (γ) determined the tradeoff 4

119

between minimizing the training error and minimizing model complexity, and the sig2 (σ2) was the

120

bandwidth and implicitly defined the nonlinear mapping from input space to high-dimensional feature

121

space [25]. Grid search was used to calculate the optimal parameter values of ( γ , σ 2 ) in this study. This

122

method was calculated by the free LS-SVM toolbox (LS-SVM v1.5, Suykens, Leuven, Belgium) in

123

MATLAB R2009a.

124

Wavenumbers selection

125

The spectral data with the wavenumber of 12500 to 4000 cm-1 was characterized by high

126

dimensionality with redundancy among contiguous wavenumbers. In most cases, the whole

127

wavenumbers do not improve the model performance, since some wavenumbers include irrelevant

128

information while others have low SNR [26]. Selecting a few wavenumbers which were related to the

129

chemical information is a critical step in spectral analysis [27]. The selected wavenumbers can be equal

130

to or more efficient than full wavenumbers [28].

131

Competitive adaptive reweighted sampling (CARS) is an effective variable selection method. It

132

selects optimal wavenumbers from the full spectral wavenumbers according to the “survival of the

133

fittest” principle. The first step of CARS is to remove the wavenumbers which are of small regression

134

coefficients by exponentially decreasing function (EDF), and the ratios of wavenumbers are calculated

135

by an EDF equation in the second step [29]. In each sampling run, it contains four steps [30]: (a) model

136

sampling based on Monte Carlo (MC); (b) wavenumbers selection by EDF; (c) competitive wavenumber

137

selection by adaptive reweighted sampling (ARS); (d) evaluation of subset using cross validation

138

method. Therefore, the wavenumbers which are of little or no effective information are eliminated and

139

effective wavenumbers are retained.

140

SPA is a forward variable selection method, which designs to solve the collinear problems by

141

selecting optimal variables with minimal redundancy [31]. This method applies a projection operation in

142

a vector space for the selection of variables with small collinearity [32]. The CARS and SPA algorithms

143

were calculated in MATLAB R2009a.

144

Model evaluation index

145

The performance of the model was evaluated by the values of R2, RPD, RMSEC and RMSEP [33]. A

146

robust and accurate model should be of low values of RMSEC and RMSEP and high values of R2 and

147

RPD [34]. The RPD value less than 1.0 means very poor model; between 1.0 and 1.4 means poor model

148

in which only high and low values can be distinguished; between 1.4 and 1.8 means fair model which

5

149

may be used for assessment and correlation; between 1.8 and 2.0 means good model; between 2.0 and 2.5

150

means very good quantitative model; greater than 2.5 indicates excellent performance of the model [35].

151

The R2 , RPD and RMSE could be calculated by the following equations:

152

n   ∑ ( xi − x )( yi − y )  i =1  R2 = n  n ∑ ( xi − x )2 ∑ ( yi − y )2 i =1

SEP =

155

n

∑(y

i

− xi )2

(4)

i =1

STD SEP

(5)

1 n ∑ (yi -xi - Bias)2 n − 1 i=1

Bias =

156 157

1 n

RPD =

154

(3)

i =1

RMSE =

153

2

(6)

1 n ∑ (yi -xi ) n i=1

Where xi is the measurement value of sample i; x is the average value of xi ; yi is the

158

predicted value of sample i; y is the average value of yi ;

159

measurement value; n is the number of samples.

160

Results and Discussion

161

Spectral feature

STD is standard deviation of the

162

Fig. 2 shows the absorbance spectra of the arginine content covering the wavenumbers of 12500 to

163

4000 cm-1 (x-wavenumber/cm-1, y-absorbance value). It is obvious that the general trend of absorbance

164

decreased initially and then increased. The low molecular absorbance was seen in the region from

165

10000 to 7550 cm-1, with a higher absorbance value in the first overtone region from 7550 to 5250 cm-1

166

and the highest absorbance value in the combination region from 5250 to 4000 cm-1. The most

167

dominant absorption bands in NIR spectral region are due to hydrogen bonds such as C-H, O-H, N-H,

168

S-H and P-H as they can give strong overtone and combination [36]. For arginine, the strong

169

absorptions observed around the wavenumbers of 7000 to 4000 cm-1 are corresponded to C=O, N-H,

170

C-H and C-C bond [37]. The bands from 7500 to 5500 cm−1 are related to C-H first overtone stretch

171

vibration modes in CH3 and CH2 groups. The absorption bands from 5000 to 4000 cm−1 are due to amide

172

[38] and C-H combination bands which are characteristic bands for proteins and amino acids [39].

173

Pre-processing and PLSR models 6

174

In this study, PLSR models were applied to determine the best pre-processing method in terms of the

175

values of R2, RPD, RMSEC and RMSEP. Nine different pre-processing methods were used in this study

176

including moving average smoothing (MAS), savitzky-golay smoothing (SGS), median filter smoothing

177

(MFS), Gussian Filter Smoothing (GFS), multiplicative scatter correction (MSC), savitzky-golay

178

derivatives (SGD) and standard normal variate (SNV). The calculation results with raw and different

179

pre-processing methods in calibration and prediction sets were shown in Table 2. According to the model

180

evaluation standard, raw spectral data without any pre-processing performed best with the highest values

181

of Rc (0.8463),

182

(0.0962). Therefore, all of the subsequent analysis was carried out by raw spectral data.

183

Regression models based on full wavenumbers

184

2

Rp2 (0.7862) and RPD (2.1789), the lowest values of RMSEC (0.0807) and RMSEP

In this study, another two models (PCR and LS-SVM) were established to predict arginine content,

Rp2

185

respectively. From Table 3, it can be seen that PCR model performed better with higher values of

186

(0.7470) and RPD (1.9880), a lower value of RMSEP (0.1047). The LS-SVM model also obtained an

187

acceptable result. Among the three models, PLSR model performed best among the three models.

188

However, the number of input variables used in the three models was too many.

189

Effective wavenumbers

190

In order to simplify the model and improve the prediction ability, CARS and SPA were used to select

191

the optimal wavenumbers in this study. Most of the selected wavenumbers concentrate in the region of

192

8000 to 4000 cm-1. It is because that most of the sensitive wavenumbers which are correlated with the

193

chemical groups in arginine molecule are located in the region of 8000 to 4000 cm-1. The region from

194

10000 to 7500 cm-1, which is assigned to second and third overtones, is of low intensity and SNR [40].

195

That is to say there is not too much useful information in this area for arginine detection.

196

Models based on CARS

197

In order to improve the performance of the prediction model, CARS was firstly used to select the

198

effective wavenumbers in this study. In Fig. 3 (a), it could be found that the number of sampled variables

199

decreased fast in the first stage and then slowly in the second stage of EDF. In (b), the value of root mean

200

square error of cross validation (RMSECV) first descend which indicates some uninformative variables

201

were eliminated, later it changed slightly to show that the variables do not change significantly, and

202

finally increased due to elimination of some useful variables. Each line in (c) represents the coefficient

203

of each variable at different sampling runs. Some variables could be extracted by each sampling run, and 7

204

the optimal variables with the lowest value of RMSECV were marked by the vertical asterisk line. After

205

the asterisk line, the RMSECV began to increase which was ascribed to the removing of some effective

206

variables. It could be found in Table 4 that a total of 45 wavenumbers were selected by CARS. The

207

number of selected variables was only 2.04% of that of the whole wavebands. These wavenumbers were

208

then used to replace the full wavenumbers for building prediction models.

209

Three models were established based on the selected wavenumbers. The results were shown in Table

Rp2 (0.8370) and

210

5. The CARS-LS-SVM model obtained the best results with the highest values of

211

RPD (2.4741), the lowest value of RMSEP (0.0841). Compared with the other two models (CARS-PLSR

212

and CARS-PCR), the value of

213

increased by 14.30% and 34.21%, and RMSEP decreased by 12.49% and 25.51%, respectively. In

214

CARS-LS-SVM model, the values of R2 increased by 5.29% in the calibration set and 15.93% in the

215

prediction set, and RMSE decreased by 17.55% in the calibration set and 23.48% in the prediction set

216

compared with LS-SVM model. Though there were a little decrement for

217

CARS-PCR models, the results were acceptable. Thus, CARS was effective to search for the optimal

218

wavenumbers in this study.

219

Models based on CARS-SPA

Rp2 in CARS-LS-SVM model increased by 6.49% and 17.89%, RPD

Rp2 values in CARS-PLSR and

220

However, 45 wavenumbers was also a little more for spectral analysis. Therefore, CARS combined

221

SPA (CARS-SPA) was finally carried out to select the most useful wavenumbers. It could be seen in

222

Table 4 that fourteen wavenumbers (12459, 12420, 12378, 8278, 7541, 7047, 6172, 5929, 5145, 4980,

223

4868, 4355, 4154, and 4065 cm-1) were selected as the optimal input variables. The number of selected

224

variables was only 0.64% of that of the full wavebands. Then the fourteen selected wavenumbers were

225

treated as new input variables for establishing prediction models. The predicted results based on

226

CARS-SPA are also shown in Table 5. In the three models established based on CARS-SPA, the

227

CARS-SPA-LS-SVM model performed excellently with the highest values of R2 and RPD, the lowest

228

values of ( Rc =0.8560,

229

the other two models (CARS-SPA-PLSR and CARS-SPA-PCR), the values of Rc increased by 3.98%

230

and 5.74%,

231

decreased by 7.85% and 11.23%, and RMSEP decreased by 1.00% and 11.331% in

2

Rp2 =0.8160, RPD=2.3277, RMSEC=0.0798, RMSEP=0.0894). Compared with 2

Rp2 increased by 0.62% and 6.39%, RPD increased by 1.03% and 12.54%, RMSEC

8

232

CARS-SPA-LS-SVM model, respectively. Each model based on CARS-SPA obtained a better result

233

than the corresponding model built based on full spectral wavebands, indicating that useful wavebands

234

were selected while those contain redundant information were rejected by CARS-SPA method.

235

Compared with CARS-PLS and CARS-PCR models, the

236

CARS-SPA-PCR models. Though there was a little decrease of Rp in CARS-SPA-LS-SVM, the result

237

is also excellent ( Rp =0.816). The result demonstrated CARS combined SPA method is also good at

238

selecting effective wavebands.

239

3.5 Optimal models

240

Rp2 increased in CARS-SPA-PLS and 2

2

2

The Raw-PLSR performed best with the highest values of Rc , 2

Rp2 and RPD, the lowest values of

Rp2 =0.7862, RPD=2.1789,RMSEC=0.0807, RMSEP=0.0962)

241

RMSEC and RMSEP ( Rc =0.8463,

242

among those models established based on full wavebands. However, the number of input variables was

243

too many. Among the models based on selected wavebands, the CARS-LS-SVM model obtained an

244

excellent result with the highest values of Rc ,

245

( Rc =0.8950,

246

variables decreased largely which means simpler models can be acquired. Thus, the selected

247

wavenumbers are more efficient than full wavenumbers. The predicted results of Raw-PLSR and

248

CARS-LS-SVM models were shown in Fig. 4.

249

Discussion

250

2

2

Rp2 and RPD, the lowest values of RMSEC and RMSEP

Rp2 =0.8370, RPD=2.4741, RMSEC=0.0686, RMSEP=0.0841). Moreover, the input

In this study, nine different pre-processing methods were used for selecting the best one. The worst 2

Rp2 of 0.1172. Also,

251

result was acquired by the SGD pre-processing with the Rc of 0.6954 and

252

there is a big difference between the values Rc and

253

perform well. For the other eight pre-processing methods, the results are very similar. However, The

254

best result was obtained by the raw data with the Rc of 0.8463,

255

RMSEC of 0.0807 and RMSEP of 0.0962. Among all the results, the values of Rc ,

256

the highest, and the RMSEC and RMSEP are the lowest. For the RPD value (2.1789), which is between

257

2.0 and 2.5, indicating that the prediction model is very good. Thus, all the analysis were based on the

Rp2 , which means SGD-based model did not

2

2

Rp2 of 0.7862, RPD of 2.1789, 2

9

Rp2 and RPD are

258

original data. For PCR and LS-SVM models, the obtained results were a little worse than the PLSR

259

models except the SGD-based model. However, the number of input variables for these models are too

260

many. Thus, CARS and SPA methods were carried out to identify the useful wavenumbers. For CARS

261

and CARS-SPA, forty-five and fourteen wavenumbers were obtained, respectively. Based on these

262

selected wavenumbers, PLSR, PCR and LS-SVM models were re-established. In Table 5, it can be

263

found CARS-SPA method performed better the full spectrum-based models. This is because raw

264

spectral data have too much useless information at some wavenumbers, and effective wavenumbers can

265

be selected by CARS-SPA, which is helpful to build a accurate and robust model. After wavenumbers

266

selection, CARS-LS-SVM and CARS-SPA-LS-SVM performed the best. The RPD values in the two

267

models are 2.4741 and 2.3277, which means they are very good models and very close to the excellent

268

models. Based on the results acquired by different pre-processing methods and models, it proved that

269

FT-NIR spectral signature can be used for arginine content detection. CARS-LS-SVM model

270

performed better than full spectrum-based models, which is because it rejected redundant information

271

and retained useful information from the full wavenumbers. There are many different groups such as

272

C-H, N-H, C-C, C-N, C=O and O-H in arginine molecule (Fig. 5). Sensitive wavenumbers, which are

273

corresponded to different groups, are not the same. Therefore, the obtained results using different

274

wavenumbers selection methods are also various. The wavenumber of 7093, 7070 and 7047 cm-1 were

275

assigned to the first NH/OH stretching overtones (6200-7400 cm-1); 4991, 4987, 4983, 4980, 4976 and

276

4868 cm-1 were assigned to the vibrational overtone of combined C=O amide and amino acid N-H

277

(5000-4000 cm-1); 4359, 4355, 4154 and 4065 cm-1 were assigned to C-H stretching vibration and C-H

278

deformation [41-43]. Many of the selected wavenumbers have a close correlation with the arginine

279

content. This might be the reason why FT-NIR spectra could be used to detect arginine content in

280

fermented C.sinensis mycelium. There are thirteen selected wavenumbers (7093, 7070, 7047, 4991,

281

4987, 4983, 4980, 4976, 4868, 4359, 4355, 4154, and 4065 cm-1) suggested by CARS and six

282

wavenumbers (7047, 4980, 4868, 4355, 4154 and 4065 cm-1) suggested by CARS-SPA that could be

283

considered to have a correlation with the arginine content. This might be the reason why the predicted

284

results based on CARS method performed the best among all models.

285

Conclusions

286

This study was carried out to evaluate the feasibility of using FT-NIR spectrometer, which covers the

287

spectral range of 12500-4000 cm-1, to determine arginine content in fermented C.sinensis mycelium. The

10

288

results indicate that FT-NIR spectral technique had the potential to be adopted as a fast, objective and

289

non-destructive method to predict the arginine content. Out of the 2203 variables, only a few effective

290

variables were selected by CARS and CARS-SPA methods, respectively. On the basis of the selected

291

wavenumbers, CARS-LS-SVM model performed the best. Also, a simple system based on selected

292

wavebands could be developed to replace the current FT-NIR spectrometer for detecting arginine

293

content. The selected wavenumbers not only simplified and improved the prediction models but also

294

explained why FT-NIR spectra could be used to detect arginine content.

295

However, this research only represents a preliminary work. In future study, more samples should be

296

used for improving the robustness and accuracy of the prediction ability. Other algorithms for selecting

297

optimal wavenumbers with higher accuracy and fewer variables should be considered in further studies.

298 299

Acknowledgements

300

This work was supported by 863 National High-Tech Research and Development Plan

301

(2013AA102301, 2011AA100705), Zhejiang Provincial Natural Science Foundation of China

302

(Z3090295) and the Fundamental Research Funds for the Central Universities of China (2012FZA6005).

303 304

References

305

[1] Z.Y. Zhang, Z.F. Lei, Y. Lü, Z.Z. Lü, Y. Chen, J. Biosci. Bioeng. 106(2) (2008) 188-193.

306

[2] J.W. Bok, L. Lermer, J. Chilton, H.G. Klingeman, G.H.N, Phytochemistry 51(7) (1999) 891-898.

307

[3] B.J. Wang, S.J. Won, Z.R. Yu, C.L. Su, Food Chem. Toxicol. 43(4) (2005) 543-552.

308

[4] T.H. Hsu, L.H. Shiao, C. Hsieh, D.M. Chang, Food Chem. 78(4) (2002) 463-469.

309

[5] J.Y. Yang, W.Y. Zhang, P.H. Shi, J.P. Chen, X.D. Han, Y. Wang, Pathol. Res. Pract. 201(11) (2005)

310

745-750.

311

[6] X.Z. Zhou, Z.H. Gong, Y. Su, J. Lin, K.X. Tang, J. Pharm. Pharmacol. 61(3) (2009) 279-291.

312

[7] M.B. Witte, F.J. Thornton, U. Tantry, A. Barbul, Metabolism 51(10) (2001) 1269-1273.

313

[8] W.J.D. Jonge, B. Marescau, R.D. Hooge, P.P.D. Deyn, Nutr. Neurosci. 131(10) (2001) 2732-2740.

314

[9] T. Teerlink, R.J. Nijveldt, S.D. Jong, P.A.M.V. Leeuwen, Anal. Biochem. 303(2) (2002) 131-137.

315

[10] C.H. Petter, N. Heigl, S. Bachmann, V.A.C. Huck-Pezzei, M. Najam-ul-Haq, R. Bakry, A.

316 317

Bernkop-Schnürch, G. Bonn, C.W. Huck, Amino Acids, 34(4) (2008) 605-616. [11] H.Y. Cen, Y. He, Trends Food Sci. Tech. 18(2) (2007) 72-93.

11

318 319

[12] B.B. Wedding, C. Wright, S. Grauf, R.D. White, B. Tilse, P. Gadek, Postharvest Biol. Tec. 75 (2013) 9-16.

320

[13] D. Wu, J.Y. Chen, B.Y. Lu, L.N. Xiong, Y. He, Y. Zhang, Food Chem. 135(4) (2012) 2147-2156.

321

[14] D. Wu, X.J. Chen, P.Y. Shi, S.H. Wang, F.Q. Feng, Y. He, Anal. Chim. Acta 634 (2009) 166-171.

322

[15] C.Q. Xie, H.L. Wang, Y.N. Shao, Y. H, Intell. Autom. Soft Co. 21(3) (2015) 395-407.

323

[16] D. Wu, X. Chen, X. Zhu, X. Guan, G. Wu, Anal. Methods 3(8) (2011) 1790-1796.

324

[17] Y. He, M. Huang, A. Garcia, A. Hernandez, H. Song, Comput. Electron. Agr. 58 (2007) 144-153.

325

[18] L.L. Jiang, F. Liu, Y. He, Sensors 12 (2012) 3498-3511.

326

[19] M. Kamruzzaman, G. ElMasry, D.W. Sun, P. Allen. Anal. Chim. Acta 714 (2012) 57-67.

327

[20] X.G. Shao, W. Wang, Z.Y. Hou, W.S. Cai, Talanta 69(3) (2006) 676-680.

328

[21] W. Wang, Y.K. Peng, H. Huang, J.H. Wu, Sens. Lett. 9(3) (2011) 1024-1030.

329

[22] X.J. Chen, D. Wu, Y. He, S. Liu, Food Bioprocess Tech. 4(5) (2011) 753-761.

330

[23] X.L. Zhang, F. Liu, Y. He, X.L. Li, Sensors 12 (2012) 17234-17246.

331

[24] D. Wu, D.W. Sun, Talanta 111(15) (2013) 39-46.

332

[25] F. Liu, Y.H Jiang, Y. He, Anal. Chim. Acta 635(1) (2009) 45-52.

333

[26] D. Wu, D.W. Sun, Innov. Food Sci. Emerg 19 (2013) 1-14.

334

[27] D.F. Barbin, G. ElMasry, D.W. Sun, P. Allen, Anal. Chim. Acta 719 (2012) 30-42.

335

[28] M. Kamruzzaman, G. Elmasry, D.W. Sun, P. Allen, J. Food Eng. 104(3) (2011) 332-340.

336

[29] H.D. Li, Y.Z. Liang, Q.S. Xu, D.S. Cao, Anal. Chim. Acta 648 (2009) 77-84.

337

[30] X. Wei, N. Xu, D. Wu, Y. He, Food Bioprocess Tech. 7 (2014) 184-190.

338

[31] M.C.U. Araújo, T.C.B. Saldanha, R.K.H. Galvão, T. Yoneyama, H.C. Chame, V. Visani,

339 340 341

Chemometr. Intell. Lab. 57 (2) (2001) 65-73. [32] R.K.H. Galvão, M.C.U. Araújo, W.D. Fragoso, E.C. Silva, G.E. José, S.F.C. Soares, H.M. Paiva, Chemometr. Intell. Lab. 92(1) (2008) 83-91.

342

[33] A.H. Gómez, Y. He, A.G. Pereira, J. Food Eng. 77(2) (2006) 313-319.

343

[34] X.L. Li, He. Y, Food Bioprocess Tech. 3(5) (2010) 651-661.

344

[35] R.A. Viscarra Rossel, R.N. McGlynn, A.B. Mcbratbey. Geoderma 137(1-2) (2006) 70-82.

345

[36] P.K. Ghosh, D.S. jayas, Sens. & Instrumen. Food Qual. 3(1) (2009) 3-11.

346

[37] M. Mecozzi, M. Pietroletti, A. Tornambè, Spectrochim. ACTA A 78(5) (2011) 1572-1580.

347

[38] S.W. Bruun, J. Holm, S.I. Hansen, S. Jacobsen , Appl. Spectrosc. 60(7) (2006) 737-746.

12

348

[39] Y. Chen, M.Y. Xie, H. Zhang, Y.X. Wang, S.P. Nie, C. Li, Food Chem. 135(1) (2012) 268-275.

349

[40] N. Gierlinger, M. Schwanninger, R. Wimmer, J. Near Infrared Spec. 12(2) (2004) 113-119.

350

[41] J. Wang, M.G. Sowa, M.K. Ahmed, H.H. Mantsch, J. Phys. Chem. 98(17) (1994) 4748-4755.

351

[42] M. Miyazawa, M. Sonoyama, J. Near Infrared Spec. 6(1) (1998) 253-257.

352

[43] X.L. Chu, Y.P. Xu, G.Y. Tian, Chemical Industry Press, Beijing, 2009, pp. 82-84.

353

13

354

Figure captions

355

Fig.1. Main steps of the study

356

Fig.2. Raw spectral absorbance curves of arginine in fermented C.sinensis mycelium

357

Fig.3.The calculation of CARS: (a) The changing trend of the number of sampled variables, (b) 10-fold

358

root mean square error of cross validation (RMSECV) values and (c) regression coefficients of

359

each variable with the increasing of sampling runs. The line marked by asterisk means the optimal

360

point where 10-fold RMSECV values achieve the lowest

361 362 363

Fig.4. Scatter plots of measured vs. predicted values of arginine of Raw-PLSR and CARS-LS-SVM models: (a) calibration(b) prediction (c) calibration(d) prediction Fig.5. Chemical structure of arginine molecule

364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381

14

382 383

Fig.1. Main steps of the study.

384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399

15

400 401

Fig.2. Raw spectral absorbance curves of arginine in fermented C.sinensis mycelium.

402 403 404 405 406 407 408 409 410 411 412 413

414 415 416

Fig.3. The calculation of CARS: (a) The changing trend of the number of sampled variables, (b) 10-fold root mean square error of cross validation (RMSECV) values and (c) regression

16

417

coefficients of each variable with the increasing of sampling runs. The line marked by asterisk

418

means the optimal point where 10-fold RMSECV values achieve the lowest.

419 420 421 422 423 424 425 426 427 428 429 430 431 432 433

434

17

435 436

Fig.4. Scatter plots of measured vs. predicted values of arginine of Raw-PLSR and CARS-LS-SVM models: (a) calibration (b) prediction (c) calibration (d) prediction.

437 438 439 440 441 442 443 444 445

446 447

Fig.5. Chemical structure of arginine molecule.

448

18

449

Table 1

450

Statistical values of arginine content in calibration and prediction sets (g/100g)

451

Data sets

Number

Range

Mean

S.D

Calibration

130

2.5826-3.5633

3.0740

0.2067

Prediction

65

2.6595-3.6615

3.0825

0.2097

All

195

2.5826-3.6615

3.0768

0.2072

S.D: Standard Deviation

452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473

Table 2

474

Predicted results of PLSR models using different pre-processing methods

19

Calibration

Prediction

Latent

Preprocessing

RPD Rc2

RMSEC

R 2p

RMSEP

Variables

Raw

0.8463

0.0807

0.7862

0.0962

11

2.1789

MAS

0.8318

0.0845

0.7823

0.0970

11

2.1581

SGS

0.8301

0.0849

0.7802

0.0975

11

2.1469

MFS

0.8329

0.0842

0.7800

0.0975

11

2.1471

GFS

0.8412

0.0821

0.7827

0.0970

11

2.1601

Normalize

0.8451

0.0810

0.7832

0.0965

10

2.1640

MSC

0.8122

0.0892

0.7572

0.1025

8

2.0443

SGD

0.6954

0.1136

0.1172

0.2022

3

1.0333

Baseline

0.8045

0.0910

0.7494

0.1038

9

2.0129

SNV

0.8253

0.0860

0.7693

0.1000

9

2.0979

475

MAS: Moving Average Smoothing; SGS: S.Golay Smoothing; MFS: Median Filter Smoothing; GFS:

476

Gussian Filter Smoothing; MSC: Multiplicative Scatter Correction; SGD: SavitzkyGolay Derivatives;

477

SNV: Standard Normal Variate

478 479 480 481 482 483 484 485 486 487 488 489 490

Table 3

491

Predicted results of different models based on full wavenumbers Model

Input

Sets

No.

R2

RMSE

PCR

2203

Calibration

130

0.7505

0.1029

20

RPD

LS-SVM

Prediction

65

0.7470

0.1047

Calibration

130

0.8500

0.0832

Prediction

65

0.7220

0.1099

1.9880

2203 1.8951

492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515

Table 4

516

Effective wavenumbers selected by CARS Algorithm

Number

CARS

45

Selected wavenumber/cm-1 12478, 12459, 12420, 12378, 12362, 12247, 12200, 12115, 12112, 12073, 12065, 12019, 11734, 11599, 11595, 11564, 10777, 8278, 7541, 7537, 7533,

21

7093, 7070, 7047, 6172, 6168, 6164, 5948, 5944, 5940, 5936, 5932, 5929, 5211, 5145, 4991, 4987, 4983, 4980, 4976, 4868, 4359, 4355, 4154, 4065 12495, 12420, 12378, 8278, 7541, 7047, 6172, 5929, 5145, 4980, 4868, CARS-SPA

14 4355, 4154, 4065

517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537

Table 5

538

Predicted results of different models based on CARS Model

Input

CARS-PLSR

45

CARS-PCR

45

Sets

No.

R2

RMSE

Calibration

130

0.8918

0.0677

Prediction

65

0.7860

0.0961

Calibration

130

0.9049

0.0635

22

RPD

2.1645

CARS-LS-SVM

CARS-SPA-PLSR

CARS-SPA-PCR

CARS-SPA-LS-SVM

Prediction

65

0.7100

0.1129

Calibration

130

0.8950

0.0686

Prediction

65

0.8370

0.0841

Calibration

130

0.8232

0.0866

Prediction

65

0.8110

0.0903

Calibration

130

0.8095

0.0899

Prediction

65

0. 7670

0.1008

Calibration

130

0.8560

0.0798

Prediction

65

0.8160

0.0894

1.8435

45 2.4741

14 2.3040

14 2.0683

14

539 540

23

2.3277

541

Highlights

5421) Spectral feature of arginine was studied. 5432) FT-NIR technique was a non-destructive method to detect arginine content. 5443) CARS and SPA were effective methods to select useful wavenumbers.

24

Using FT-NIR spectroscopy technique to determine arginine content in fermented Cordyceps sinensis mycelium.

This research investigated the feasibility of using Fourier transform near-infrared (FT-NIR) spectral technique for determining arginine content in fe...
923KB Sizes 0 Downloads 12 Views