Research Article Received: 6 July 2013

Revised: 9 December 2013

Accepted article published: 24 December 2013

Published online in Wiley Online Library: 24 January 2014

(wileyonlinelibrary.com) DOI 10.1002/jsfa.6548

Fast and nondestructive determination of protein content in rapeseeds (Brassica napus L.) using Fourier transform infrared photoacoustic spectroscopy (FTIR-PAS) Yuzhen Lu,a Changwen Du,a∗ Changbing Yub and Jianmin Zhoua Abstract BACKGROUND: Fast and non-destructive determination of rapeseed protein content carries significant implications in rapeseed production. This study presented the first attempt of using Fourier transform mid-infrared photoacoustic spectroscopy (FTIRPAS) to quantify protein content of rapeseed. The full-spectrum model was first built using partial least squares (PLS). Interval selection methods including interval partial least squares (iPLS), synergy interval partial least squares (siPLS), backward elimination interval partial least squares (biPLS) and dynamic backward elimination interval partial least squares (dyn-biPLS) were then employed to select the relevant band or band combination for PLS modeling. RESULTS: The full-spectrum PLS model achieved an ratio of prediction to deviation (RPD) of 2.047. In comparison, all interval selection methods produced better results than full-spectrum modeling. siPLS achieved the best predictive accuracy with an RPD of 3.215 when the spectrum was sectioned into 25 intervals, and two intervals (1198–1335 and 1614–1753 cm−1 ) were selected. iPLS excelled biPLS and dyn-biPLS, and dyn-biPLS performed slightly better than biPLS. CONCLUSION: FTIR-PAS was verified as a promising analytical tool to quantify rapeseed protein content. Interval selection could extract the relevant individual band or synergy band associated with the sample constituent of interest, and then improve the prediction accuracy of the full-spectrum model. c 2013 Society of Chemical Industry  Keywords: rapeseed; photoacoustic spectroscopy; protein content; quantification; partial least squares (PLS); interval selection

INTRODUCTION

J Sci Food Agric 2014; 94: 2239–2245

tool in grain and food industries owing to its rapidity, favorable economics, simplicity of sample preparation and absence of chemicals.7,8 For the determination of rapeseed protein content by NIRS, despite desirable results achieved by some authors,9 – 16 two omissions in previous works can be pinpointed. One is that there is no report of any application of mid-infrared spectroscopy (MIRS) or other competitive measuring modes for determination of rapeseed protein content. Weak absorption of overtone and combination, broad bands and lack of characteristics are typical of NIRS, thus making NIRS calibration models largely dependent on chemometric methods.17 Instead, MIRS is seen to have more potential for deriving models with improved interpretability and prediction accuracy, since it displays stronger basic



Correspondence to: Changwen Du, Institute of Soil Science, National Key Laboratory of Soil and Sustainable Agriculture Chinese Academy of Sciences, Nanjing 21008, China. E-mail: [email protected]

a Institute of Soil Science, National Key Laboratory of Soil and Sustainable Agriculture Chinese Academy of Sciences, Nanjing 21008, China b Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, Wuhan 430062, China

www.soci.org

c 2013 Society of Chemical Industry 

2239

Rapeseed is one of the most important oilseed crops.1 On average, rapeseed contain 22.5% crude protein.2 Rapeseed protein has a rational amino acid composition and is an important source of nutrition,3 especially for livestock. Development of high-protein cultivars has always been advocated as a breeding objective.4 Moreover, protein content of rapeseed is also valued as an indicator of the nitrogen content of rapeseeds and, further, the supply condition of soil nitrogen, which can serve to guide nitrogen fertilization for high yields. Thus quantitation of rapeseed protein content is essential in rapeseed production. Traditional procedures to measure protein content include the Kjeldahl and Dumas combustion methods.5 However, both methods involve tedious sample grinding, and wet or dry chemical procedures, which are time consuming, laborious, expensive and even dangerous. Furthermore, they are also destructive and therefore unsuitable for in situ quality monitoring. Thus, a fast and non-destructive method to determine the protein content of rapeseed is in high demand. For this purpose, much attention has been focused on nearinfrared reflectance spectroscopy (NIRS) as an alternative to traditional means. NIRS was developed in 1964 by Norris for the measurement of moisture,6 and has evolved into a powerful

www.soci.org frequency absorption and more well-resolved spectral features associated with the sample component of interest.18,19 Also, Fourier transform infrared photoacoustic spectroscopy (FTIR-PAS) has emerged as a novel sampling technology.20 Unlike reflectance spectroscopy, including NIRS and MIRS, PAS has the merits of being less insusceptible to scattering particles of samples,21 depth-resolved characterization,22 suitability for highly absorbing solid samples23 and no requirement for sample preparation.24 At present, PAS has found success as a quantitative analytical tool in many cases involving drugs,25 woods,26 pulps,27,28 and soils.29,30 However, no application of PAS is reported in the evaluation of rapeseed qualities. Another matter deserving attention is that all established models are based on full-spectrum information rather than certain wavenumbers associated with the rapeseed proteins. Recently, however, both theoretical31 and experimental32,33 evidence has shown that spectral variable selection can significantly improve the performance of full-spectrum calibration models in terms of the predictive accuracy, robustness, interpretability and computational rapidity. Also, parsimonious models have intriguing benefits in certain production situations where high-resolution instruments are too expensive or full-spectrum scanning takes too much time.34 Thus further study to explore the performance of spectral variable selection in the quantification of rapeseed protein content would be of benefit. At present, several techniques such as interactive variable selection,35 uninformative variable elimination (UVE),36 iterative predictor weighting,37 genetic algorithm (GA)38 and successive projections algorithm (SPA)39 have been proposed for spectral variable selection in partial least squares (PLS) modeling. Among the diverse selection methods, interval PLS (iPLS) developed by Norgaard et al.40 is a very straightforward and effective one.41,42 In iPLS the full spectrum is divided into non-overlapping equidistant subintervals, and the most relevant subinterval correlated with the sample constituent of target was selected by comparing each subinterval calibration model. Also, iPLS has several variants, including synergy interval PLS (siPLS),43 backward elimination interval PLS (biPLS) and dynamic backward elimination interval PLS (dyn-biPLS).45 In our study, Fourier transform photoacoustic spectroscopy in the mid-infrared range, combining the advantages of PAS and MIRS, was utilized to quantify protein content of rapeseeds. PLS is adopted for building calibration models, and interval selection methods including iPLS, siPLS, biPLS and dyn-biPLS were used to assist in PLS modeling. The main objectives of our study were: (i) to investigate photoacoustic spectral features of rapeseed; (ii) to evaluate the potential of FTIR-PAS for quantification of rapeseed protein content; and (iii) to evaluate the effectiveness of various interval selection methods for model calibration.

Y Lu et al.

MATERIALS AND METHODS Samples The sample set consisted of 180 rapeseed samples provided by the Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences. They were collected from an experimental field in the Yingtan Ecology Experimental Station. Each sample represented one cultivar, thus providing samples with a relatively broad span of protein content. Before spectral scanning, all samples were air dried to make the moisture content equivalent among samples, and then stored in plastic bags at room temperature. Reference methods Rapeseed samples were ground in an agate mortar. Individual ground samples of about 0.5 g were weighed on a lab balance (Sartorius BS210S, Germany) with a readability of 0.0001 g, and then used for determination of nitrogen content by the Kjeldahl method, multiplied by a factor of 6.25 to estimate the protein content. In the procedure each sample was digested with 8 mL concentrated sulfuric acid and an appropriate amount of hydrogen peroxide in a digestion instrument (LabTech EHD36, USA), and analyzed using an automatic Kjeldahl apparatus (Buchi 339, Switzerland). The repeatability error for three replications was less than 2% in the determination of rapeseed protein content using the Kjeldahl procedures. FTIR-PAS measurements Air-dried intact samples were directly used for spectral measuring. Photoacoustic spectra were recorded for all samples using a Fourier transform infrared spectrometer (Nicolet 6700, Thermo Scientific, USA) equipped with a photoacoustic cell (model 300, MTEC, USA). After placing the sample (about 100 mg) in the cell holding cup (diameter 5 mm, height 3 mm) and purging the cell with dry helium (10 mL min−1 ) for 10 s to ensure a CO2 -free and H2 O-free environment, the scans were conducted in the mid-infrared wavenumber range of 500–4000 cm−1 with a resolution of 4 cm−1 and a mirror velocity of 0.32 cm s−1 . A carbon black reference was used to collect the responding reference spectra for spectra intensity normalization. Thirty-two successive scans were recorded and the average for each sample was used in the chemometric analysis. Chemometric methods and software The smoothing algorithm of Savitzky–Golay46 with 17 points and a polynomial of order 3, and auto scaling were applied as the spectral pretreatment. The pretreatment processes were implemented in Unscrambler v 9.8 (CAMO Software AS, Norway). The spectral matrix was then divided into calibration and prediction subsets consisting of 135 and 45 samples, respectively, according

Table 1. Statistics of protein content in rapeseeds measured by Kjeldahl method Parameter Calibration set Prediction set All

Number of samples

Maximum (g kg−1 )

135 45 180

235.94 228.25 235.94

Minimum (g kg−1 ) 167.98 178.12 167.98

Mean (g kg−1 ) 199.7 200.61 199.92

SD (g kg−1 ) 13.01 12.73 12.91

CV (%) 6.5147722 6.3456458 6.457583

2240

SD, standard deviation; CV, coefficient of variation.

wileyonlinelibrary.com/jsfa

c 2013 Society of Chemical Industry 

J Sci Food Agric 2014; 94: 2239–2245

Determination of protein content in rapeseeds using FTIR-PAS

www.soci.org

to the Kennard–Stone (KS) algorithm47 in Matlab R2011b (The MathWorks, Natick, MA, USA). KS selected calibration samples from the spectral matrix by maximizing the Euclidean distances between the spectral variables of the selected samples,48 which guaranteed the representativity of the calibration set. The detailed statistics of protein content in sample sets are given in Table 1. PLS was used to build calibration models since it is the most popular multivariate calibration methodology.49 The SIMPLS algorithm50 was adopted for PLS modeling, and the number of latent variables was chosen based on the minimal root mean square error of cross validation (RMSECV).51 The leave-five-out cross-validation procedure was performed here. RMSECV was calculated as follows:   Ic 1 yi − yi )2 RMSECV =  ( Ic

Figure 1. Smoothed infrared photoacoustic spectra of rapeseeds. Different shading corresponds to different rapeseed samples.

where  yi is the predicted value of the ith observation, yi is the measured value of the ith observation and Ic is the number of observations in the calibration set. PLS models were established using full-spectrum variables and optimized variables through interval selection methods, i.e. iPLS, siPLS, biPLS and dyn-biPLS. In the first three selection methods, the full spectrum was divided into non-overlapping equidistant subintervals of 5, 15, 25 and 35, respectively. For iPLS, the optimal interval was the one with the lowest RMSECV. For siPLS, the PLS regression were performed for all possible combinations of two and three intervals, respectively. Likewise, the interval combination with the lowest RMSECV was chosen. For biPLS, interval selection was realized through back-elimination, in which each time the interval removal resulted in the lowest RMSECV, and eventually the best interval or interval combination was also decided based on the smallest RMSECV. Finally, for dyn-biPLS, the number of intervals was varied to avoid some undesirable effects associated with the fixed number intervals.44 Here, the number of intervals for dyn-biPLS was varied between 16 and 25, with four runs, and the maximal number of selection variables was set to 400. Details of the four interval selection methods may be found elsewhere.40,43 – 45 The PLS_Toolbox 4.2 from Eigenvector Research was used for the full-spectrum PLS. The iToolbox 2.0 from Chemometrics GroupKVL, Copenhagen, Denmark, was used to develop calibration models of iPLS, siPLS, biPLS and dyn-biPLS. The modeling was also implemented in Matlab R2011b (The MathWorks).

Spectral investigation According to the smoothed photoacoustic spectra of all samples (Fig. 1), the general spectral outlines were quite homogeneous and no outliers were identified by visual inspection and robust principal components analysis (RobPCA).57 The spectral features were related to the chemical components of rapeseed, including fats, proteins and carbohydrates. The major absorption peaks could be interpreted as follows.58 A broad and strong band between 3100 and 3600 cm−1 corresponded to O—H stretching vibration. The band overlapped with the N—H stretching vibration band at 3100–3500 cm−1 , due to the presence of proteins. The peak around 2850–3000 cm−1 was assigned to C—H stretching vibration caused by cellulose and fats,59 A broad band around 1530–1730 cm−1 represented a complicated peak involving amide I C O stretching,60 fatty acid C O stretching, amide II (the out-of-phase combination of N—H deformation and C—N stretching vibrations),60 O—H deformation vibration of cellulose or water and C C stretching vibration of unsaturated fats. The broad peak around 1370–1420 cm−1 resulted from C—H deformation vibration of fats and cellulose. A shoulder peak around 1200–1350 cm−1 was assigned to the amide III (the in-phase combination of N—H deformation and C—N stretching vibrations).61 In the range 500–1200 cm−1 was the fingerprint region, in which there was a strong peak near 1000–1200 cm−1 corresponding to the C—O and C—C stretching modes. Through the spectral interpretation it could be seen that much information irrelevant to proteins existed among spectral variables, which formed the theoretical basis for variable selection to quantify rapeseed protein content.

i=1

RESULTS AND DISCUSSION

J Sci Food Agric 2014; 94: 2239–2245

Full-spectrum PLS model A PLS model based on full-spectrum variables was first built to predict the protein content of rapeseed. The optimal model was obtained when the first nine latent variables were retained. The prediction results are summarized in Table 2, and the scatter plot of the predicted versus reference values is shown in Fig. 2, where the solid line is the reference line corresponding to the exact prediction and samples are basically distributed along the reference line. The model achieved an RMSECV of 0.568% and RMSEP of 0.622% for quantification of rapeseed protein. Compared to previous results, the RMSECV was lower than the 0.87% reported by Velasco and Mollers12 and the 0.74% reported by Hom et al.15

c 2013 Society of Chemical Industry 

wileyonlinelibrary.com/jsfa

2241

Model evaluation standard RMSECV in the calibration set, and the root mean square error of prediction set (RMSEP), were calculated and compared for model evaluation.52 In evaluating models, an integrated criterion, RPD (ratio of prediction to deviation, i. e, the ratio of the standard deviation (SD) of the prediction dataset to RMSEP) was taken into account. The RPD was introduced to rank the efficiency of calibration models;53 RPD between 2 than 2.5 indicated a good quantitative model, larger than 2.5 indicated an excellent quantitative model, and the lowest line of RPD for quantitative prediction was 1.8.54 Generally, RPD should be as high as possible. Also, to evaluate model robustness, the ratio RMSEP/RMSECV was adopted. A model with RMSEP/RMSECV lower than 1.2 was usually considered robust.55,56

www.soci.org

Y Lu et al.

Table 2. Quantification results of protein content of rapeseed from the full-spectrum PLS model and different interval PLS models. The number after ‘PLS’ means the number of subintervals divided from the full spectrum, and the number before ‘PLS’ denotes the number of subinterval combination used in siPLS

Models

Number of variables

wavenumber range (cm−1 )

1816 361 121 73 53 722 1083 242 361 146 217 104 156 720 360

500–4000 1196–1890 1429–1660 1614–1753 1603–1703 500–1890 1196–3278 1196–1660 1196–1890 1198–1335; 1614–1753 640–779; 1475–1753 1502–1701 1201–1300; 1603–1801 1892–3278 1892–2121; 2355–2584; 3049–3278 1753–2029; 3142–3278 1703–1801; 1903–2002 2401–2497; 3188–3284 815–887; 1660–1990 2324–2445; 3203–3371

Full-spectrum PLS iPLS5 iPLS15 iPLS25 iPLS35 si2PLS5 si3PLS5 si2PLS15 si3PLS15 si2PLS25 si3PLS25 si2PLS35 si3PLS35 biPLS5 biPLS15 biPLS25 biPLS35

216 206

dyn-biPLS

362

Latent variables

RMSECV (%)

RMSEP (%)

RMSEP/ RMSECV

RPD

9 6 4 4 8 3 2 4 6 7 3 4 5 2 2

0.568 0.515 0.532 0.477 0.465 0.554 0.56 0.53 0.515 0.452 0.511 0.507 0.471 0.568 0.553

0.622 0.523 0.531 0.451 0.421 0.587 0.603 0.541 0.523 0.396 0.529 0.519 0.486 0.626 0.611

1.10 1.02 1.00 0.95 0.91 1.06 1.08 1.02 1.02 0.88 1.04 1.02 1.03 1.10 1.11

2.05 2.43 2.40 2.82 3.02 2.17 2.11 2.35 2.43 3.22 2.41 2.45 2.62 2.03 2.08

2 2

0.542 0.534

0.554 0.546

1.02 1.02

2.30 2.33

3

0.524

0.531

1.01

2.40

of 2.5% in the study of Petisco et al.16 Thus further study should be conducted to improve the model by collecting more samples from a wide span of cultivation seasons and agronomic conditions to guarantee a large variation of rapeseed protein content. Overall, however, our results had demonstrated the potential of FTIR-PAS to measure rapeseed protein content in terms of both the model predictive ability and robustness. Moreover, FTIR-PAS was indeed a worthwhile option due to its advantages of requiring just 100 mg of samples and no sample pretreatment, compared to about several grams of samples, sample grinding or the uniformity of sample particles usually typical of NIRS. Full-spectrum modeling seemed cumbersome and would include uninformative variables that could deteriorate calibration equations. Relevant variable selection was therefore performed to refine calibration models. Figure 2. Scatter plots of observed and predicted values obtained from the full-spectrum PLS model.

2242

The RMSEP of 0.622% was lower than the 0.65% reported by Mika et al.13 and 0.77% reported by Velasco and Mollers.12 However, Petisco et al.16 presented a better result with an RMSEP of 0.53%. The RMSECV, however, was not provided in their work. Also, the ratio RMSEP/RMSECV was 1.10, showing that the full-spectrum PLS model was robust. Previous studies did not give this statistic. Judging from the RPD of 2.05, the model was good for the quantification of rapeseed protein content. However, the RPD was far lower than the 3.57 published by Velasco and Mollers12 and the 4.72 published by Petisco et al.16 The poorer RPD might be associated with the accuracy of the reference values obtained by the Kjeldahl method, but was mainly explained by the small standard deviation (SD) of the reference values. In our study, the SD in the prediction set was 1.273%, which was less than half the SD of 2.7% in the study of Velasco and Mollers,12 and was also far lower than the SD

wileyonlinelibrary.com/jsfa

iPLS model The best interval was selected and used for PLS modeling when the full spectrum was divided into 5, 15, 25 and 35 subintervals. Table 2 shows the optimal prediction results obtained from all the models of iPLS5, iPLS15, iPLS25 and iPLS35. Obviously, every subinterval PLS model performed better than the full-spectrum PLS model in terms of RMSECV, RMSEP, RMSECV/RMSEP and RPD. This result verified the advantage of variable selection in improving calibration models. The best prediction results were achieved by the iPLS35 model, i.e. where the full spectrum was divided into 35 subintervals. The RPD of the model was 3.02, which was increased by about 47.7% compared to the RPD of 2.05 in the full-spectrum PLS model. Figure 3 shows an overview of selection results of iPLS with the 35 subintervals. In the figure, the heights of the rectangles represent the RMSECV of each subinterval PLS model and the horizontal line corresponds to the RMSECV of the full-spectrum PLS model.

c 2013 Society of Chemical Industry 

J Sci Food Agric 2014; 94: 2239–2245

Determination of protein content in rapeseeds using FTIR-PAS

www.soci.org

Figure 3. RESECV for 35 interval models obtained by iPLS with latent variables (italic numbers inside rectangles) for each local PLS model and for full-spectrum model (horizontal black line) with nine latent variables.

the protein C O stretching and N—H deformation vibrations, and the second was 1198–1335 cm−1 , associated with the C—N stretching vibration. From Table 2, it can be observed that the first region selected is identical that selected by iPLS25, and the second region was an additionally selected region. The fact that adding a new region led to an improved model indicated that spectral variables associated with rapeseed protein were not confined to one spectral interval, which aligned well with the chemical assignment of the two selected regions. Obviously, the siPLS method was preferable when there existed more than one characteristic band related to the sample component of interest. Besides, siPLS also showed the advantage of significantly reducing the modeling variables compared to the full-spectrum PLS. In the si2PLS25 model, only 146 variables were employed. Figure 4. Scatter plots of observed and predicted values obtained by siPLS model with 25 intervals and a combination of two intervals.

Evidently, some subinterval PLS model produced lower RMSECV than the full-spectrum PLS model, and the 12th subinterval was the best due to the lowest RMSECV. Further, the selected variables located in the range 1603–1703 cm−1 corresponded well to the C O stretching of the protein I band and N—H deformation of proteins, which lent interpretability to the resultant model. Also, only 53 variables were retained, thus tremendously reducing modeling variables and leading to a parsimonious model.

J Sci Food Agric 2014; 94: 2239–2245

dyn-biPLS model In the interval selection by biPLS, it could happen that the border between two continuous subintervals was located within a specific band, and then one of the neighboring subintervals could be removed while the other was retained, which would impair the integrity of the band and cause a loss of relevant variables in

c 2013 Society of Chemical Industry 

wileyonlinelibrary.com/jsfa

2243

siPLS model In performing siPLS, the full spectrum was also divided into 5, 15, 25 and 35 subintervals. For each case, the number of synergy interval was set to 2 and 3, respectively, which, in total, led to eight siPLS models. Table 2 gives the prediction results of the optimal siPLS models. Likewise, all models gave better prediction results than the full-spectrum PLS model. Particularly, the si2PLS25 model, where the full spectrum was divided into 25 subintervals and the number of interval combination was set to 2, derived the best prediction accuracy. Moreover, the model was better than the aforementioned iPLS35 model, owing to its higher RPD of 3.22 and the lower RMSEP/RMSECV of 0.88. Figure 4 shows the scatter plot of the si2PLS25 model between the observed and predicted values for both calibration sets and prediction sets. A much tighter cluster of samples along the reference line could be observed compared to the full-spectrum PLS model. Further, the selected spectral variables were situated in two regions. The first region was 1614–1753 cm−1 , corresponding to

biPLS model biPLS was implemented with the full spectrum divided into 5, 15, 25 and 35 subintervals. The prediction results of each optimal model in each case are given in Table 2. First, no overfitting appeared in all resultant models according to the RMSEP/RMSEP values. Except for the biPLS5 model, all others achieved better RPD values than the full-spectrum PLS model. The highest RPD of 2.33 was obtained by the biPLS35 model. The model included four subintervals, i.e. the ranges of 1703–1801, 1903–2002, 2401–2497 and 3188–3284 cm−1 . Those subintervals seemed not to form a direct association with rapeseed protein. However, those variables relevant to rapeseed protein were indeed concentrated through the selection of biPLS35, since the biPLS35 model performed better than the full-spectrum PLS model. However, the RPD of 2.33 in this model was much lower than the value of 3.02 in the iPLS35 model and of 3.22 in the si2PLS25 model, which implied that more uninformative variables were incorporated in the biPLS35 model compared to the two other models. Moreover, biPLS did not necessarily achieve better prediction accuracy than iPLS or siPLS. Although biPLS allowed model optimization from a wide range of choices, it was not a real global search method. Of course, biPLS35 also acquired an evident reduction of modeling variables compared to full-spectrum PLS modeling.

www.soci.org

Y Lu et al.

ACKNOWLEDGEMENTS This work was supported by and the National Natural Scientific Foundation of China (40871113) and the Innovative Project for Young Scientists from the Chinese Academy of Sciences (KZCX2YW-QN411). We are genuinely grateful to three anonymous reviewers for valuable comments on an earlier manuscript.

REFERENCES

Figure 5. Frequency for selection of each variable by dyn-biPLS. High frequencies correspond to high possibility for selection of variables. The horizontal line indicates a frequency threshold above which variables are selected.

the resultant model. The problem could be handled by using a dynamic number of subintervals in place of a fixed number of subintervals. Here, dyn-biPLS was performed with the number of subintervals varying from 16 to 25, and with four successive runs. The selection frequency of each variable is shown in Fig. 5, and the prediction results of the resultant model are given in Table 2. From Fig. 5 and Table 2, the dyn-biPLS selected four subintervals, i.e., 815–887, 1660–1990, 2324–2445 and 3203–3371 cm−1 . Compared to the four regions selected by biPLS35, the four regions seemed more correlated with nitrogenous compounds in rapeseed since the bands of 1660–1990, 2324–2445 and 3203–3371 cm−1 involved the C O stretching vibration from proteins or amides58 , C≡N stretching vibration58 and N—H stretching vibration,62 respectively. Thus those selected subintervals could be anticipated to derive a model better than the biPLS35 model, which was confirmed by a slightly higher RPD of 2.40. The number of retained variables, however, was larger than that in biPLS35. Further selection could be done through the GA-PLS or SPA-PLS algorithm to obtain a simplified model.

CONCLUSIONS

2244

The first attempt to quantify rapeseed protein content using FTIRPAS was made in this study. The full-spectrum PLS model achieved an RPD of 2.05, which showed the potential of FTIR-PAS to measure the protein content of rapeseed. FTIR-PAS was a promising analytical tool, particularly considering that it required tiny amounts of samples and no sample pretreatment. However, further study should be done by collecting more samples with a wide background variation to enhance the predictive ability of models. Interval selection is a useful method to simplify and improve the prediction accuracy of PLS models. Compared to the full-spectrum PLS model, all four kinds of interval selection methods including iPLS, siPLS, biPLS, dyn-biPLS adopted in this study produced better models. Among the resultant models, the best result was achieved by the si2PLS25 model, with the highest RPD of 3.22, and relevant bands of 1198–1335 and 1614–1753 cm−1 . The iPLS35 model was second best, with an RPD of 3.02, and extracted the relevant band of 1603–1703 cm−1 . siPLS outperformed iPLS when the spectral information associated with the sample component of interest was distributed in several bands. dyn-biPLS was expected to derive better prediction results than biPLS since the former could capture more relevant variables.

wileyonlinelibrary.com/jsfa

1 Cardone M, Mazzoncini M, Menini S, Rocco V, Senatore A, Seggiani M et al., Brassica carinata as an alternative oil crop for the production of biodiesel in Italy: agronomic evaluation, fuel production by transesterification and characterization. Biomass Bioenerg 25:623–636 (2003). 2 Li PW, Yang M and Zhang W, Studies on quality of oilseed products and its improvement strategy in China. Chinese J Oil Crop Sci 26:84–88 (2004). 3 Bunting ES, Production and Utilization of Amino acid in Oilseed Crops. Martinus Nijhoff, Leiden, pp. 3–11 (1981). 4 Gomez-Campo C, BiologyofBrassicaCoenospecies. Elsevier, Amsterdam, pp. 413–460 (1999). 5 Simonne AH, Simonne EH and Eitenmiller RR, Could the Dumas method replace the Kjeldahl digestion for nitrogen and crude protein determinations in foods? J Sci Food Agric 73:39–45 (1997). 6 Norris KH, Design and development of a new moisture meter. Can Agric Eng 45:370–372 (1964). 7 Batten GD, Plant analysis using near infrared reflectance spectroscopy: the potential and the limitations. Aust J Exp Agric 38:697–706 (1998). 8 Williams PC and Norris K, Near Infrared Technology in the Agriculture and Food Industries. American Association of Cereal Chemists, St Paul, MN (2001). 9 Tkachuk R, Oil and protein analysis of whole rapeseed kernels by near infrared reflectance spectroscopy. J Am Oil Chem Soc 58:819–822 (1981). 10 Panford JA, Williams PC and de Man JM, analysis of oilseeds for protein, oil, fiber and moisture by near-infrared reflectance spectroscopy. J Am Oil Chem Soc 65:1627–1634 (1988). 11 Tillmann P, Reinhardt TC and Paul C, Networking of near infrared spectroscopy instruments for rapeseed analysis: a comparison of different procedures. J Near Infr Spectrosc 8:101–107 (2000). 12 Velasco L and Mollers C, Nondestructive assessment of protein content in single seeds of rapeseed (Brassica napus L.) by near-infrared spectroscopy. Euphytica 123:89–93 (2002). 13 Mika V, Tillimann P, Koprna R, Nerusil P and Kucera V, Fast prediction of quality parameters in whole seeds of oilseed rape (Brassica napus L.). Plant Soil Environ 49:141–145 (2003). 14 Font R, Del Rio M and De Haro Bailon A, The use of near-infrared spectroscopy in the study of seed quality components in plant breeding programs. Ind Crop Prod 24:307–313 (2006). 15 Hom N, Becker HC and Mollers C, Non-destructive analysis of rapeseed quality by NIRS of small seed samples and single seeds. Euphytica 153:27–34 (2007). 16 Petisco C, Garcia-Criado CB, Vazquez-de-Aldana BR, de Haro A and Garcia-Ciudad A, Measurement of quality parameters in intact seeds of Brassica species using visible and near-infrared spectroscopy. Ind Crop Prod 32:139–146 (2010). 17 Bokobza L, Near infrared spectroscopy. J Near Infr Spectrosc 6:3–17 (1998). 18 Yang H and Irudayaraj J, Rapid determination of vitamin C by NIR, MIR and FT-Raman techniques. J Pharm Pharmacol 54:1247–1255 (2002). 19 Wu D, Nie PC, He Y and Bao YD, Determination of calcium content in powdered milk using near and mid-infrared spectroscopy with variable selection and chemometrics. Food Bioprocess Tech 5:1402–1410 (2012). 20 McClelland JF, Jones RW and Bajic SJ, FT-IR photoacoustic spectroscopy, in Handbook of Vibrational Spectroscopy, ed. by Chalemers JM and Griffiths PR. Wiley, Chichester, pp. 1231–1250 (2002). 21 Schmid T, Photoacoustic spectroscopy for process analysis. AnalBioanal Chem 384:1071–1086 (2006). 22 Zhang Y, Barber A, Maxted J, Lowe C, Smith R and Li T, The depth profiling of TiO2 pigmented coil coatings using step scan phase modulation photoacoustic FTIR. Prog Org Coat 76:131–136 (2013).

c 2013 Society of Chemical Industry 

J Sci Food Agric 2014; 94: 2239–2245

Determination of protein content in rapeseeds using FTIR-PAS 23 Du CW, Linker R and Shaviv A, Characterization of soil using photoacoustic mid-infrared spectroscopy. Appl Spectrosc 61:1063–1067 (2007). 24 Michaelian KH and Wen Q, Photoacoutic infrared spectroscopy of solids. J Phys Conf Ser 214:012004 (2010). 25 Neubert R, Collin B and Wartewig S, Direct determination of drug content in semisolid formulations using step-scan FT-IR photoacoustic spectroscopy. Pharmacol Res 14:946–948 (1997). 26 Bjarnestad S and Dahlman O, Chemical compositions of hardwood and softwood pulps employing photoacoustic Fourier transform infrared spectroscopy in combination with partial least-squares analysis. Anal Chem 74:5851–5858 (2002). 27 Nishi K, Dang VQ and Nguyen KL, Determination of carboxyl content in high-yield kraft pulps using photoacoustic rapid-scan Fourier transform infrared spectroscopy. Anal Chem 78:6818–6825 (2006). 28 Dang VQ, Bhardwaj NK, Hoang V and Nguyen KL, Determination of lignin content in high-yield kraft pulps using photoacoustic rapid scan Fourier transform infrared spectroscopy. Carbohydr Polym 68:489–494 (2007). 29 Du CW, Zhou JM and Wang HY, Determination of soil properties using Fourier transform mid-infrared photoacoustic spectroscopy. Vib Spectrosc 49:32–37 (2009). 30 Du CW and Zhou JM, Application of infrared photoacoustic spectroscopy in soil analysis. Appl Spectrosc Rev 46:405–422 (2011). 31 Spiegelman CH, McShane MJ, Goetz MJ, Motamedi M, Yue QL and Cote GL, Theoretical justification of wavelength selection in PLS calibration: development of a new algorithm. Anal Chem 70:35–44 (1998). 32 Borin A and Poppi PJ, Application of mid infrared spectroscopy and iPLS for the quantification of contaminants in lubricating oil. Vib Spectrosc 37:27–32 (2005). 33 Roman MB and Sergey VS, Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. Anal Chim Acta 692:63–72 (2011). 34 Anderson CM and Bro R, Variable selection in regression: a tutorial. J Chemometr 24:728–737 (2010). 35 Lindfnen F, Geladi P, Rannar S and Wold S, Interactive variable selection (IVS) for PLS. Part 1: Theory and algorithms. J Chemometr 8:349–363 (1994). 36 Centner V, Massart DL, de Noord OE and de Jong S, Elimination of uninformative variables for multivariate calibration. Anal Chem 68:3851–3858 (1996). 37 Forina M, Casolino C and Pizarro MC, Iterative predictor weighting (IPW) PLS: a technique for the elimination of useless predictors in regression problems. J Chemometr 13:165–184 (1999). 38 Leardi R, Application of genetic algorithm-PLS for feature selection in spectral data sets. J Chemometr 14:643–655 (2000). 39 Araujo MCU, Saldanha TCB, Galvao RKH, Yoneyama T, Chame HC and Visani V, The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometr Intell Lab Syst 57:65–73 (2001). 40 Norgaard L, Saudland A, Wanger J, Nielsen JP, Munck L and Engelsen SB, Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Appl Spectrosc 54:413–419 (2000). 41 Zou XB, Zhao JW and Li YX, Selection of the efficient wavelength regions in FT-NIR spectroscopy for determination of SSC of ‘Fuji’ apple based on BiPLS and FiPLS models. Vibr Spectrosc 44:220–227 (2007). 42 Ferrao MF, Viera MS, Pazos REP, Fachini D, Gerbase AE and Marder L, Simultaneous determination of quality parameters of

www.soci.org

43

44 45

46 47 48 49 50 51 52 53 54 55

56

57 58 59 60 61 62

biodiesel/diesel blends using HATR-FTIR spectra and PLS, iPLS or siPLS regressions. Fuel 90:701–706 (2011). Munck L, Nielsen JP, Moller B, Jacobsen S, Sondergaard I, Engelsen SB et al., Exploring the phenotypic expression of a regulatoryproteome-altering gene by spectroscopy and chemometrics. Anal Chim Acta 446:171–186 (2001). Leardi R and Norgaard L, Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. J Chemometr 18:486–497 (2004). Norgaard L, Hahn MT, Knudsen LB, Farhat LA and Engelsen SB, Multivariate near-infrared and Raman spectroscopic quantifications of the crystallinity of lactose in whey permeate powder. Int Dairy J 15:1261–1270 (2005). Savitzky A and Golay MJE, Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36:1627–1632 (1964). Kennard RW and Stone LA, Computer aided design of experiments. Technometrics 11:137–148 (1969). Galvao RKH, Araujo MCU, Marcio GEJ, Pontes JC, Silva EC and Saldanha TCB, A method for calibration and validation subset partitioning. Talanta 67:736–740 (2005). Wold S, Sjostrom M and Eriksson L, PLS-regression: a basic tool of chemometrics. Chemometr Intell Lab Syst 58:109–130 (2001). De Jong S, SIMPLS: an alternative approach to partial least squares regression. Chemometr Intell Lab Syst 18:251–263 (1993). Browne MW, Cross-validation methods. J Math Psychol 44:108–132 (2000). Wu D, He Y, Nie PC, Cao F and Bao YD, Hybrid variable selection in visible and near-infrared spectral analysis for non-invasive quality determination of grape juice. Anal Chim Acta 659:229–237 (2010). Williams PC and Sobering DC, Comparison of commercial near infrared transmittance and reflectance instruments for analysis of whole grains and seeds. J Near Infrared Spectrosc 1:25–32 (1993). Viscarra Rossel RA, McGlynn RN and McBratney AB, Determining the composition of mineral-organic mixes using UV-vis-NIR diffuse reflectance spectroscopy. Geoderma 137:70–82 (2006). Marie-Madeleine C, Lina S, Dominique H and Dimas A. Determination of water-soluble and total extractable polyphenolics in biomass, necromass and decomposing plant material using near-infrared reflectance spectroscopy (NIRS). Soil Biol Biochem 37:795–799 (2005). Ana A, Antonio S, Philippe R, Luc EP, Jean-Paul C, Manfred S et al., A common near infrared-based partial least squares regression model for the prediction of wood density of Pinus pinaster and Larix × eurolepis. Wood Sci Tchnol 46:157–175 (2012). Hubert M, Rousseeuw PJ and Vanden K, ROBPCA: a new approach to robust principal component analysis, Technometrics 47:64–79 (2005). Fan KN, Introduction to Spectroscopy. High Education Press, Beijing, pp. 63–73 (2001). Yang H and Irudayaraj J, Characterization of semisolid fats and edible oils by Fourier transform infrared photoacoustic spectroscopy. J Am Oil Chem Soc 7:291–295 (2000). Barth A, Infrared spectroscopy of proteins. Biochim Biophys Acta 1767:1073–1101 (2007). Cai S and Singh BR, A distinct utility of the amide III infrared band for secondary structure estimation of aqueous protein solutions using partial least squares methods. Biochemistry 43:2541–2549 (2004). Irudayaraj J, Sivakesava S, Kamath S and Yang H, Monitoring chemical changes in some foods using Fourier transform photoacoustic spectroscopy. J Food Sci 66:1416–1421 (2001).

2245

J Sci Food Agric 2014; 94: 2239–2245

c 2013 Society of Chemical Industry 

wileyonlinelibrary.com/jsfa

Fast and nondestructive determination of protein content in rapeseeds (Brassica napus L.) using Fourier transform infrared photoacoustic spectroscopy (FTIR-PAS).

Fast and non-destructive determination of rapeseed protein content carries significant implications in rapeseed production. This study presented the f...
832KB Sizes 0 Downloads 0 Views