Food Chemistry 176 (2015) 403–410

Contents lists available at ScienceDirect

Food Chemistry journal homepage: www.elsevier.com/locate/foodchem

Analytical Methods

Estimating cocoa bean parameters by FT-NIRS and chemometrics analysis Ernest Teye a,b,⇑, Xingyi Huang a, Livingstone K. Sam-Amoah b, Jemmy Takrama c, Daniel Boison d, Francis Botchway e, Francis Kumi b a

School of Food and Biological Engineering, Jiangsu University, Xuefu Road 301, Zhenjiang 212013, Jiangsu, PR China School of Agriculture, Department of Agricultural Engineering, University of Cape Coast, Cape Coast, Ghana Cocoa Research Institute, Physiology & Biochemistry Division, New Tafo-Akim, Ghana d School of Biological Sciences, Department of Biochemistry, University of Cape Coast, Cape Coast, Ghana e Quality Control Company Limited, Cocobod Kade, Ghana b c

a r t i c l e

i n f o

Article history: Received 17 March 2014 Received in revised form 20 August 2014 Accepted 11 December 2014 Available online 18 December 2014 Keywords: FT-NIRS Cocoa bean categories pH Fermentation index Multivariate algorithms

a b s t r a c t Rapid analysis of cocoa beans is an important activity for quality assurance and control investigations. In this study, Fourier transform near infrared spectroscopy (FT-NIRS) and chemometric techniques were attempted to estimate cocoa bean quality categories, pH and fermentation index (FI). The performances of the models were optimised by cross-validation and examined by identification rate (%), correlation coefficient (Rpre) and root mean square error of prediction (RMSEP) in the prediction set. The optimal identification model by back propagation artificial neural network (BPANN) was 99.73% at 5 principal components. The efficient variable selection model derived by synergy interval back propagation artificial neural network regression (Si-BPANNR) was superior for pH and FI estimation. Si-BPANNR model for pH was Rpre = 0.98 and RMSEP = 0.06, while for FI was Rpre = 0.98 and RMSEP = 0.05. The results demonstrated that FT-NIRS together with BPANN and Si-BPANNR model could successfully be used for cocoa beans examination. Ó 2014 Elsevier Ltd. All rights reserved.

1. Introduction Ghana is the second leading producer of cocoa bean worldwide and her cocoa beans continue to enjoy high premium price in the world market because of its high quality. It is also the preferred choice for all chocolate and beverage producers of high reputation (COCOBOD, 2013). The consumption of cocoa bean derived products has very beneficial importance such as: it reduces coronary artery disease, it is a myocardial stimulant, diuretic, coronary dilator and muscle relaxant (Di Castelnuovo, di Giuseppe, Iacoviello, & de Gaetano, 2012; Kim & Keeney, 2006; Kris-Etherton & Keen, 2002). The post harvest activity of cocoa beans involves two main processes, namely fermentation and drying. After harvesting, the seeds housed in the mucilaginous pulp of the pod are taken out, fermented and dried. These two activities are particularly crucial in determining the quality of the cocoa bean flavour and aroma. ⇑ Corresponding author at: School of Agriculture, Department Agricultural Engineering, University of Cape Coast, PMB Cape Coast, Ghana. Tel.: +233 2431 70302; fax: +233 3321 32709. E-mail addresses: [email protected], [email protected] (E. Teye). http://dx.doi.org/10.1016/j.foodchem.2014.12.042 0308-8146/Ó 2014 Elsevier Ltd. All rights reserved.

For instance, a good flavour in chocolate is closely attributed to good fermentation and drying. Errors committed during these two processes cannot be corrected in subsequent processing (Minifie, 1989). Partially fermented or unfermented cocoa beans would result in bitter and astringent cocoa derived products with no chocolate flavour (Jalil & Ismail, 2008). Also, flavour and aroma precursors are produced during fermentation which reduces the astringency, and bitterness of the beans. Drying is also essential as it eliminates the growth of moulds that impart unpleasant flavours on the beans. Fermentation and drying therefore, work synergistically for high quality. Quality cocoa beans are examined by cut test score or sensory evaluation. However, cut-test and sensory analysis are subjective and often not consistent because of human error arising from fatigue or mood of the assessor. For instance, cut test and sensory evaluation indicated variability in their results and not consistent (Ilangantileke, Wahyudi, & Bailon, 1991). Furthermore, the analytical methods normally used to examine cocoa beans are: expensive, time consuming, destructive, involves chemical usage, and very often tedious particularly when analysing a lot of samples. Near infrared spectroscopy (NIRS) is an advanced analytical tool. It is fast, simple, non destructive and does not involve

404

E. Teye et al. / Food Chemistry 176 (2015) 403–410

chemical use and elaborate sample preparation. Coupled with the recent advancement in computers and chemometrics, NIRS has been applied in various sectors namely agriculture, pharmaceutical, petrochemical, medical, polymer and food industries. NIRS has been used to determine phytochemicals and other food quality parameters namely xanthenes and polyphenols in bakery products (Bedini et al., 2013), fats, caffeine, theobromine, and epicatechin in unfermented criollo cocoa (Álvarez et al., 2012), polyphenol contents in green tea (Chen, Zhao, Liu, Cai, & Liu, 2008). However, upon a thorough literature search there is no information on the investigation of NIRS and chemometric techniques for simultaneous estimation of cocoa bean categories, pH and fermentation index of cocoa beans. Fermentation index (FI) and pH are essential attributes of cocoa bean quality. FI is a good marker to determine the degree of fermented cocoa beans (Pettipher, 1986). Also, FI correlated significantly with reducing sugars, free amino acids, pH, and cocoa bean cotyledon colour cocoa beans (Ilangantileke et al., 1991). FI > 1 means the cocoa bean mass was adequately fermented (Nazaruddin, Seng, Hassan, & Said, 2006). Moreover, the pH < 4.5 is not accepted by cocoa bean processers because, it leads to low flavour precursors, and over acidic derived products, while pH of 5–6 is considered good for flavour development (Saltini, Akkerman, & Frosch, 2013). Consequently, FI and pH could be used to assess the quality of cocoa beans and check fraudulent activities in the cocoa industry. The objective of this present study was to develop a model for non-destructive and rapid estimation of cocoa bean category (fermented, partially fermented and unfermented), pH and fermentation index (FI) by FT-NIR spectroscopy together with linear and nonlinear algorithms. In this work, partial least square discriminant analysis (PLSDA) and back propagation artificial neural network (BPANN) were attempted to identify the quality categories, while different partial least squares algorithms (PLS, iPLS & SiPLS), back propagation artificial neural network regression (BPANNR) and efficient variable selection technique by synergy interval back propagation artificial neural network regression (SiBPANNR) were also used to develop a prediction model for the estimation of pH and FI. The combination of synergy interval selection and BPANNR as a new technique was attempted comparatively with the others. Theoretical and experimental evidence have shown that spectral bands selection can significantly improve the performance of the model (Chen, Zhao, Liu, Cai, & Liu, 2008; Nørgaard et al., 2000). 2. Materials and methods 2.1. Samples preparation The samples used in this study were acquired from Ghana, and comprised three main cocoa bean categories: fermented (FM = 80 samples), partly fermented (PF = 25 samples) and unfermented (UFM = 25 samples). The cocoa bean samples were powdered separately by grinding with a small multi-purpose grinder (QE-100, Zhejiang YiLi Tool Co., Ltd., China) for 15 s and sieved with a 400 lm mesh. The grinder was allowed to cool down after successive grindings to reduce loss of volatile compounds. The samples were then immediately analysed.

sample cup and the samples were scanned three times after rotating the cup 120°. All the experiments were conducted at an ambient temperature of 25 ± 1 °C and humidity kept at 60%. Each spectrum was an average of 32 scans with a spectra range of 10,000–4000 cm1 and the raw data set were measured in 3.856 cm1 interval at 8.0 cm1 spectral resolution resulting in 1557 variables. The reflectance (R) data were stored at Log (1/R). The mean of the spectra from the same sample were used for further analysis. 2.3. Software All the chemometric techniques were carried out in Matlab Version 7.14 (Mathworks Inc., USA) with Windows 7 ultimate for data processing. 2.4. Pre-processing attempt Mathematical pre-processing methods have become extremely useful techniques for initial processing of spectral data set. There are several pre-processing techniques with the central aim of reducing the number of dimensions and eliminating errors while maintaining as much as possible the differences and similarities between the observations. The derivative methods are used to express derivative values from the derivative spectrum. They are very powerful for correcting additive and multiplicative baseline variations in the spectra, however noise is enhanced. This problem is resolved by Savitzky-Golay smoothing algorithm. Smoothing removes random noise from spectra data and improves the visual aspect of the NIR spectra (Næs, Isaksson, Fearn, & Davies, 2002). Savitzky-Golay is a moving window averaging method where the data are fitted by a polynomial of a certain degree and the central points in the window is placed by the value of the polynomial. Smoothing plus first derivative (Smooth-1der) method was found to be superior to MSC, MC and SNV after an initial trial; hence this was applied in this study. Further, Principal component analysis (PCA) as an unsupervised pattern recognition method was applied. PCA was used for reducing the dimensions of the data matrix by compressing the information into few new variables known as principal components (PCs). PC1, PC2, PC3 normally provide and explain useful information in descending order. 2.5. Data processing and analysis In this study, all the samples were divided into two subsets: calibration set (for developing the models) and prediction set (for evaluating the actual predictive ability of the constructed models). As seen in Table 1, 90 samples were selected in random for the calibration set and 40 were for the prediction set. The qualitative model was constructed by partial least square discriminant analysis (PLSDA) and back propagation artificial neural network (BPANN) and the identification rate (%) estimated as: samples correctly identified divided by total number of samples used. Partial least square discriminant analysis (PLSDA) is a linear identification method that applies the principle of partial least Table 1 Reference measurements of pH and FI in the calibration and prediction sets. Subsets

NS⁄

2.2. FT-NIR spectral acquisition The spectrum of each sample was taken in the reflectance mode using the Antaris II Fourier Transform Near Infrared Spectrophotometer (Thermo Electron Company, USA) with an integrating sphere. 10 ± 0.1 g of each samples were collected into a standard

Calibration

90

Prediction

40

pH

Fermentation index (FI)

Range

Mean

Std

Range

Mean

Std

4.84– 6.25 4.85– 6.12

5.455

0.359

0.897

0.307

5.469

0.352

0.367– 1.368 0.375– 1.358

0.905

0.310

NS⁄ = number of samples.

E. Teye et al. / Food Chemistry 176 (2015) 403–410

squares regression methods to variables which are indicators of the groups. For more details refer (Chevallier, Bertrand, Kohler, & Courcoux, 2006). Artificial neural network (ANN) is a typical nonlinear and non-parametric technique basically designed to mimic the biological nervous system and capable of self-learning on examples (Kovalenko, Rippke, & Hurburgh, 2006). There are several types of ANN however; the back propagation neural network was adopted because of its simplicity and strength. The principle of BPANN is based on an algorithm that corrects the weights within each layer in proportion to the error obtained from the previous layer (Ripley, 2008). The data are fed forward into the network without feedback and the neurons are connected, while the error computed at the output side is propagated backward from the output layer to the hidden layer and then finally to the input layer. Most applications of ANN for processing of spectra information use PCs as an input variable, and the efficiency of the ANN model is improved by additional pre-processing methods (Pérez-Marín, Garrido-Varo, & Guerrero, 2007). Therefore in this work, Smooth1der was applied. The simultaneous prediction of pH and fermentation index (FI) were developed by partial least square (PLS), interval partial least square (iPLS), synergy interval partial least square (SiPLS), back propagation artificial neural network regression (BPANNR) and efficient variable selection technique by synergy interval back propagation artificial neural network regression (Si-BPANNR). Partial least squares regression (PLS) is a classical multivariate linear regression tool that works on the full spectrum. This regression technique is the basic foundation for iPLS, SiPLS, etc. Normally, based on the complex correlation between samples and the NIR spectral data set, the selection of full spectrum may result in the over-fitting of the prediction model. This is because the full spectrum contains irrelevant variables that would reduce the precision of the model. Hence, the selection of appropriate spectral range with many variables that correlate with the target quality parameters is a very vital step for the development of an accurate prediction model. Norgaard and co-workers proposed interval partial least squares (iPLS) and synergy interval partial least square (SiPLS) (Nørgaard et al., 2000), and these regression methods have their own unique strengths and weaknesses. Other researchers have shown that, SiPLS is superior to iPLS and PLS (Chen, Zhao, Liu, Cai, & Liu, 2008; Jiang et al., 2012a). Synergy interval partial least squares (SiPLS) uses the principle of PLS. However, SiPLS is a variable selection method where the data set is split into a number of intervals (variable wise) and then calculates all possible PLS model combinations of 2, 3, or 4 intervals. The spectral range of 10,000–4000 cm1 (full spectrum) of the samples were divided into 10, 11, 12, . . ., 22 intervals combined with 2, 3, or 4 subintervals. The optimal combination of intervals and the number of PLS factors are then optimised by cross-validation and assessed according to the lowest root mean square error of cross-validation (RMSECV). The best combination of intervals are then selected and modelled. The models were optimised by leave one out cross-validation (LOO-CV) procedure, and assessed according to the root mean square error of cross validation (RMSECV), coefficient of correlation (Rcal) and bias in the calibration set. The models were tested with the prediction set by using correlation coefficients (Rpre), root mean square error of prediction (RMSEP) and bias. These parameters were calculated by Eqs. (1)–(4):

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 Pn  ^ i¼1 yni  yi RMSECV ¼ n sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn ^ 2 i ðyi  yi Þ RMSEP ¼ n

ð1Þ

ð2Þ

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u Pn u ^i  yi Þ2 ðy R ¼ t1  Pi¼1 n 2 i¼1 ðyi  yÞ P Bias ¼

i¼1 ðyi

^i Þ y

n

405

ð3Þ

ð4Þ

where: n = the number of samples, yi = the reference measurement ^ni = the estimated result for sample i when the results for sample i, y ^i = the estimated model is constructed with sample i removed, y  = the mean of the referresults of the model for the sample i, and y ence measurement results for all samples 2.6. Reference measurement 2.6.1. Measurement of pH The determination of pH in the cocoa bean samples was done according to the method used by other researchers (Nazaruddin et al., 2006). Fifty millilitres of distilled boiled water was added to 5 g of previously ground cocoa bean sample (400 lm) and allowed to stand for 30 min. The extract was filtered through Whatman No. 4 filter paper and pH was measured by a sensitive digital pH meter (PHS-3TC; Shanghai Tianda Instrument Co., Ltd., China) with the accuracy of 0.001. All the samples were measured in triplicate. 2.6.2. Measurement of fermentation index (FI) Fermentation index was done according to the method proposed by Gourieva and Tserevitinov (1979) and subsequently used by other researchers (Ilangantileke et al., 1991; Nazaruddin et al., 2006). Five hundred milligram of previously ground (400 lm) sample was weighed into a 125 ml conical flask. Fifty millilitres of methanol: hydrochloric acid (93:3 v/v) was added and the mixture was stored in a refrigerator (8 ± 2 °C) for 16–18 h. The extract was filtered through a Whatman filter paper No. 1. The absorbance of the filtrate was taken at 460 nm and 530 nm by UV/Vis spectrophotometer (Rayleigh UV-1601; BRAIC, Beijing, China). The fermentation index was estimated by the ratio of the absorbance at 460 nm and 530 nm. All the measurements were done in triplicate. 3. Results and discussion 3.1. Spectra examination and principal component analysis (PCA) Fig. 1A shows the raw spectra of the original data after FT-NIRS measurement, and it revealed major peaks that are caused by the stretch of the hydrogen groups of C–H, O–H, and N–H in the cocoa bean samples. From Fig. 1B, it could be observed that Smooth-1der method has a unique characteristic, and could have an influence on the model. The application of pre-processing method is extremely necessary in spectral analysis because, data acquired from spectrometer contains back-ground information and noise likewise relevant sample information. A critical observation of Fig. 1B shows the major peaks that are possibly caused by some phytochemicals such as polyphenols, alkaloids, protein, volatile, non-volatile and aromatic compounds present in samples. The principal component analysis (PCA) after Smooth-1der, gave a satisfactory cluster trend. The top three PCs were 82.62%, 8.24% and 4.71% extracted from the 130 samples. This means that the top three PCs can explain 95.57% of the variance information from the spectral data which covers most of the useful chemical compositional information in the cocoa bean samples. PCA as an unsupervised pattern recognition method brings out relevant information and eliminates non useful ones, so that samples with similarities are grouped closer to each other. Therefore, the graphical output could be used to ascertain the differences

406

E. Teye et al. / Food Chemistry 176 (2015) 403–410

Fig. 1. Raw spectra (A), Smooth-1derivative pre-processing (B), Smooth-1der PCA cluster score plot (C) and eigenvectors of the top three PCs (D) (in parentheses is the amount of variation explained) for the cocoa bean categories.

between the categories of samples used and compare these differences in the distribution trend within each group. Fig. 1C also shows that, there were three major groups of cocoa bean samples used in this experiment. The groups cover a wider range of cocoa beans. From this figure, it could be seen that, some samples in the fermented cocoa beans overlapped with the partly fermented cocoa beans revealing that, there might be some cocoa bean samples that were not properly fermented but mixed with fermented ones. This could be due to wrongly mixing partly fermented ones with fermented ones. Also, it could be seen that unfermented cocoa beans were neatly isolated, while fermented cocoa beans from different origins showed minor differences and this could be due to their geographical origin and fermentation types used. It must be stated that, the graphical plot provides useful information that could be used to determine differences within groups and between groups. However, PCA is not a classification tool, but it can indicate the data trend in visualising dimension space (Yu et al., 2008). To explain the phenomenon for the observed spectral discrimination between the different cocoa bean categories, PCA eigenvectors were done on the Smooth-1der spectra as seen in Fig. 1D to examine qualitative differences as done elsewhere (Cozzolino, Chree, Scaife, & Murray, 2005). PC1 explains 82.62% of the total variance in the samples and the largest eigenvectors were located around the range of 5762 cm1 associated with first overtone of CH2, 4296 and 4312 cm1 associated with CH3 combination, and C@C stretching respectively. These are characteristics of fat, proteins and polysaccharides in cocoa beans (Hourant, Baeten, Morales, Meurens, & Aparicio, 2000; Vesela et al., 2007; Westad, Schmidt, & Kermit, 2008). PC2 explains 8.24% of the variation

and the largest eigenvectors were located around 7113 and 5804 cm1 related to first overtone OH, and CH stretching respectively. These are associated with water and sugar content (Cozzolino, Smyth, & Gishen, 2003). The peaks at 5291, 4952 and 4464 cm1 at PC 2 are also related to second overtone of C@O, CH@CH and CH3 combination respectively, and these are the characteristics of polyphenols, aromatics and fatty acids (Chen et al., 2013; Ozaki, McClure, & Christy, 2006; Vesela et al., 2007). For instance, cocoa beans have about 55% fat and the region between 5500 and 6000 cm1 is the first overtone of CH stretching arising from methylene (CH2), methyl (CH3) and ethenyl (CH@CH) functional groups of edible oils and fats (Hourant et al., 2000). PC 3 explains 4.71% of the variation and it seems to be the mirror image of the spectra of the cocoa bean samples and this would account for the slight variations in particle size (Cozzolino et al., 2005). A single spectra data set can reveal several distinct groups or geometrical exploration based on scores plot (Indahl, Sahni, Kirkhus, & Næs, 1999). The variations are related to compositional differences among the cocoa beans categories. These suggest that, a particular chemical parameter either alone or in combination can contribute the biggest influence that explains the basis for the observed differences between the cocoa beans. 3.2. Reference measurement data From Table 1, it could be seen that, there were a very wide range of pH and FI in the cocoa bean samples used. These groups that were included in this study were a true reflection of the cocoa bean samples in the market. Also, the reference measurement

407

E. Teye et al. / Food Chemistry 176 (2015) 403–410

results of FI and pH in the calibration set covers the range in the prediction set and the standard deviations between the two sets are not significantly different. This means that the distributions of the samples are appropriate in the calibration set and prediction set. 3.3. Identification model In this study, the samples used were; fermented, partly fermented and unfermented cocoa beans. These groups were assigned reference arbitrary values (1, 2 and 3). PLSDA was done with an arbitrary cut-off value of ±0.5. The prediction set was used to assess the predictive ability of the PLS-DA model. Table 2 shows the performance of PLSDA and BPANN technique. Between the two techniques, BPANN model was superior to PLSDA. The identification rate by BPANN model was 99.73% at 5 PCs in the prediction set when Smooth-1der pre-processing method was included as seen from Table 2. It could be explained that, BPANN as a non-linear algorithm has a stronger property of self-learning and selfadjusting compared to PLSDA as a linear algorithm. Furthermore, the quality parameters that differentiate the three cocoa bean categories include many hydrogen bonds (C–H, O–H, S–H, and N–H) composed of overtones and combinations of fundamental vibrations of these bonds in the FT-NIR spectrum (wavelengths). These differences might be more inclined towards the non-linear pattern rather than linear. Also, the number of PCs used in BPANN model was 5 and PLSDA model was 9. It suggests that, BPANN model is simpler than PLSDA model in this study. Normally, a higher number of PCs included in the training model might bring too much redundant information which inescapably influences the robustness of the model (Chen, Zhao, Liu, & Cai, 2008), leading to the poor performance of the model when new samples are predicted by the model. The superior performance of BPANN was in line with other researchers who revealed that ANN has better predictive ability than linear models (such as PCR, PLS) for varietal classification (Cen, He, & Huang, 2007) and determination of soluble solid content (Chia, Rahim, & Rahim, 2012). 3.4. Prediction model development 3.4.1. Efficient variable selection Table 3 shows the optimal intervals selection for pH and fermentation index (FI) by Si-PLS. The intervals selected for pH were [5, 12, 15, 17] with RMSECV = 0.0751. As seen from Table 3, the efficient variables (totalling 296) selected corresponded to the spectra range of 5153–5434, 7151–7432, 8007–8289, and 8578– 8859 cm1 in the full spectrum for pH. The spectra subintervals selected by SiPLS are related to organic acids (acetic, citric and lactic) present in the cocoa beans. For instance, the range between 5153 and 7432 cm1 arises from second overtone stretching of C@O, asymmetric stretching and rocking of O–H weakly bonded water, fatty acids and aromatics, while the region of 8009– 8859 cm1 is associated with second overtone of CH3 and CH2 stretching modes and their combination (Ozaki, Morita, & Du,

Table 2 Comparison of identification results from two algorithms. Algorithms

Preprocessing

Optimal PCs

PLSDA

Raw Smooth-1der

BPANN

Raw Smooth-1der

Identification rate (%) Calibration set

Prediction set

12 9

95.01 95.05

94.55 93.58

6 5

96.69 100

90.85 99.73

2007; Vesela et al., 2007). The measure of pH (also called active acidity) is related to combination of hydrogen ion to water to form hydronium ions concentration (H3O+) (Sadler & Murphy, 2010). Thus, the region around 5153–8859 cm1 might be a sensitive wavelength for the pH in cocoa beans as it is related mainly to hydrogen ions (acids) and H2O. The optimal intervals selected by Si-PLS for FI were [2, 4, 7, 9] with RMSECV = 0.0642. As seen from Table 3 these spectra corresponded to 4300–4597, 4902–5199, 5805–6102, and 6406– 6703 cm1 in the full spectrum for FI. These spectra regions are associated with carbonyl groups (CH2 and CH3, –CH@CH–), first overtone stretching of C–H aromatic, C@C and C@N combination, and second overtone of N–H (Hourant et al., 2000; Vesela et al., 2007). These vibrations are caused by ingredients such as polyphenols, alkaloids, vicilin-class globulins, proteins, amines, acids, polysaccharides and other aroma compounds (Chen et al., 2013; Vesela et al., 2007). Fermented cocoa beans have different amounts of these ingredients compared to unfermented or partly fermented ones (Aculey et al., 2010; Kim & Keeney, 2006). Therefore, these spectral bands selected for FI could mean that, the range around 4300–6703 cm1 might be usefully related to compositional differences in fermented, partly fermented, unfermented cocoa beans.

3.4.2. Optimal performance Si-BPANNR After the efficient variable selection by Si-PLS, BPANNR was used to develop the best model for pH and FI. The spectral variables for pH and FI were reduced after efficient selection from 1557 to 296 and 312 respectively. This means that, collinear variables or irrelevant variables which could have influenced the model were eliminated (i.e. too much unwanted information normally inevitably weakens the model and makes it unstable). Table 4 shows the performance of different multivariate calibration methods as compared with the efficient variable selection technique by Si-BPANNR. It can be observed that, Si-BPANNR for pH and FI was superior to the others. The outputs of the new SiBPANNR models were the pH and FI in cocoa beans. While other

Table 3 Performance of SiPLS models with selected optimal spectral regions. Bold values represent best results. Item

Number of subintervals

PLS factors

Selected subintervals

RMSECV

pH

10 11 12 13 14 15 16 17 18 19 20 21 22

10 11 12 13 13 13 15 13 14 12 11 12 11

[4 [3 [4 [3 [5 [4 [4 [6 [6 [7 [7 [5 [8

6 7 10] 8 9 11] 7 9 12] 5 9 13] 8 14] 11 12 15] 9 15] 9 10 14] 10 11] 10 16 19] 11 12 18] 12 15 17] 12 13 22]

0.0857 0.0873 0.0854 0.0854 0.0791 0.0798 0.0859 0.0801 0.0812 0.0820 0.0794 0.0751 0.0758

FI

10 11 12 13 14 15 16 17 18 19 20 21 22

10 11 12 11 10 12 10 13 12 12 11 13 10

[4 [3 [2 [2 [4 [4 [6 [4 [3 [5 [2 [5 [2

6 7] 5 6 11] 4 7 12] 5 11 13] 5 6 12] 5 7 12] 11 12] 5 6 7] 6 15] 6 7 8] 4 7 9] 7 9] 3 5 10]

0.0756 0.0697 0.0667 0.0654 0.0686 0.0804 0.0760 0.0691 0.0713 0.0703 0.0642 0.0671 0.0712

408

E. Teye et al. / Food Chemistry 176 (2015) 403–410

Table 4 Comparison of results based on different regression models for pH and FI analysis. Bold values represent best results. Model

Algorithms

Variables

PCs

pH

PLS iPLS Si-PLS BPANNR Si-BPANNR⁄ PLS iPLS Si-PLS BPANNR Si-BPANNR⁄

1557 156 296 1557 296 1557 104 312 1557 312

12 10 12 7 4 12 8 11 6 5

FI

Calibration set

Prediction set

Rcal

RMSECV

Rpre

RMSEP

0.9584 0.9197 0.9767 0.9754 0.9883 0.9729 0.9210 0.9782 0.9785 0.9895

0.1015 0.1398 0.0758 0.0923 0.0423 0.0721 0.1215 0.0672 0.0554 0.0348

0.9751 0.8579 0.9728 0.9493 0.9841 0.9629 0.8734 0.9647 0.9571 0.9809

0.0810 0.1887 0.0875 0.1141 0.0620 0.0828 0.1469 0.0783 0.0790 0.0591

Si-BPANNR⁄: BPANNR model based on efficient spectra variable selection by Si-PLS.

Fig. 2. Scatter plot of reference measured versus FT-NIRS predicted for (A) pH and for (B) FI by Si-BPANNR model.

factors such as the number of neurons in the input and hidden layer were determined by the lowest RMSECV, the scale function was set as ‘‘tanh’’, the initial weight was set at 0.3 and the learning rate factor and momentum factor were set at 0.1. Also, the error was set to 0.0002 and the iteration time set to 1000. The best Si-BPANNR network architecture for pH and FI were obtained when PC 4 and 5 were achieved respectively and this was superior to others as seen in Table 4. 3.4.3. Si-BPANNR model for pH The best performance of Si-BPANNR model for pH was at 4 PCs with Rpre = 0.9841, RMSEP = 0.0620 and bias = 0.0075 in the prediction set. Fig. 2A shows the scatter plot of reference measurement versus FT-NIRS prediction for pH. It could be seen that, the correlation between the reference measurement values and FT-NIR estimation values for pH in the calibration set and prediction sets was very good. The data points from both set falls closely to the unity line, which implies that the pH measurement by FT-NIRS is not significantly different from the reference measurement. The result shows that, these spectral regions selected and modelled by Si-BPANNR model have a high correlation with the pH values. The correlation coefficient and root mean square error in the prediction set were comparable to others at few principal components (PCs) factors (Jiang et al., 2012a, 2012b). 3.4.4. Si-BPANNR model for fermentation index (FI) The best performance of Si-BPANNR model for FI was at 5 PCs with Rpre = 0.9809, RMSEP = 0.0591 and bias = 0.0141 in the prediction set. Fig. 2B shows the scatter plot of reference measurement versus FT-NIRS prediction for FI. It could be seen that, the correlation between the reference measurement values and FT-NIR estimation values for FI in the calibration set and prediction

sets was very good. The data points falls closely to the unity line, which implies that the FI estimation by FT-NIRS is not significantly different from the reference measurement. The result shows that, these spectral regions selected and modelled by Si-BPANNR model have a high correlation with the FI values.

3.5. Discussions From Table 4, the superiority of Si-BPANNR model for the simultaneous analysis of pH and FI has been revealed. The performances of other linear and non-linear regression models were compared. It can be seen that, Si-BPANNR was the best, and iPLS was the worst in this study. This is because the classical PLS regression was done on all the spectral range (full spectrum) to build the model. This full spectrum is made up of useful informations and irrelevant information such as noise. The noisy aspect of this spectral information significantly reduced the optimal performance of the model. Also, iPLS though eliminates some noisy regions, and selected only one interval to calibrate PLS model. It could be that the selected variable is not the only variable related to the quality parameter of interest (pH and FI) in cocoa beans. Furthermore, the selection of a single interval left out other very relevant spectral data set that could be related to these quality indicators. It thereby inevitably weakens the model. On the other hand, SiPLS was quite good because it eliminated the demerits in the classical PLS and enhanced the strength of iPLS, i.e. noise was removed and multiple spectral regions that are related to pH and FI were selected for calibrating the model. However, SiPLS was not comparable to SI-BPANNR because, Si-BPANNR combines the strength of SiPLS and BPANNR. Also SiPLS is a linear method and sometimes non-linear methods are stronger than the linear ones in the level of self-learning and self-adjustment (Teye, Huang, Dai, & Chen, 2013).

E. Teye et al. / Food Chemistry 176 (2015) 403–410

However, SiPLS was also better than BPANNR; this could be attributed to the fact that SiPLS used less number of variables (due to variable wise selection) with less burden and low error while BPANNR used a wide spectral range (1557 variables). Moreover, when the two non-linear methods were compared, Si-BPANNR was superior to BPANNR. It could be explained that, the strength of synergy interval synergistically complemented that of BPANNR for modelling pH and FI. From the Table 4, it could be observed that, the basic BPANNR model had comparatively higher differences between Rcal and Rpre for pH and FI modelling. This implies that, there was over-fitting problems (Cozzolino, Cynkar, Shah, & Smith, 2011; Hawkins, 2004). A typical BPANN model normally has difficulty with generalisation and often leads to producing models that over fit data (Chen, Zhao, & Lin, 2009; Teye et al., 2013). Also, the number of full spectrum variables (1557) used for BPANNR model magnified the computational difficulty and imparted instability and unreliability in this study (Chen, Ding, Cai, & Zhao, 2012; Chen, Guo, Zhao, & Ouyang, 2012). This means that, those collinear variables and irrelevant variables in the full spectrum weakened the performance of the model as they could be unrelated to pH and FI. However, the efficient variable selection model by Si-BPANNR eliminated those irrelevant and collinear variables that would have affected the model. Also, the speed of Si-BPANNR model was improved with an excellent stability. As seen from Fig. 2A and B, the correlation between the reference values and the FT-NIRS estimated values for pH and FI respectively showed a high correlation (i.e. based on the high correlation coefficient). This means that, FT-NIRS prediction of pH and FI was not significantly different from reference measurement results in this study. The performance of Si-BPANNR had a higher accuracy and was more stable than the single model. This suggests that, better results could be achieved by efficient variable selection by SiPLS combined with BPANNR model (i.e. fusing linear and nonlinear models). This is because, by combining the strengths of both linear and non-linear models, their weaknesses may be complemented by each other to increase the predictive performances (Chia et al., 2012). 4. Conclusions This study has sufficiently demonstrated that cocoa bean of different quality categories can be non-destructively identified and some quality parameters such as pH and FI simultaneously measured by FT-NIRS together with appropriate nonlinear multivariate analysis. The overall results show that BPANN model could be used to identify different cocoa beans quality category. Si-BPANNR model revealed its superiority and can be used for the simultaneous prediction of pH and FI in cocoa beans. This technique could be used for rapid estimation of cocoa bean category, pH and fermentation index of merchantable cocoa beans. It could also be useful for checking adulteration of fermented cocoa beans. Acknowledgements The authors wish to acknowledge the financial assistance provided by University of Cape Coast (AS/86A/V6/1735) and National Natural Science Foundation of China (No. 31071549). We are also grateful to Quality Control Company of the Ghana Cocoa board for their support. References Aculey, P. C., Snitkjaer, P., Owusu, M., Bassompiere, M., Takrama, J., Nørgaard, L., et al. (2010). Ghanaian cocoa bean fermentation characterized by spectroscopic and chromatographic methods and chemometrics. Journal of Food Science, 75(6), S300–S307.

409

Álvarez, C., Pérez, E., Cros, E., Lares, M., Assemat, S., Boulanger, R., et al. (2012). The use of near infrared spectroscopy to determine the fat, caffeine, theobromine and ()-epicatechin contents in unfermented and sun-dried beans of Criollo cocoa. Journal of Near Infrared Spectroscopy, 20, 307. Bedini, A., Zanolli, V., Zanardi, S., Bersellini, U., Dalcanale, E., & Suman, M. (2013). Rapid and simultaneous analysis of xanthines and polyphenols as bitter taste markers in bakery products by FT-NIR spectroscopy. Food Analytical Methods, 6(1), 17–27. Cen, H., He, Y., & Huang, M. (2007). Combination and comparison of multivariate analysis for the identification of orange varieties using visible and near infrared reflectance spectroscopy. European Food Research and Technology, 225(5–6), 699–705. Chen, Q., Ding, J., Cai, J., & Zhao, J. (2012). Rapid measurement of total acid content (TAC) in vinegar using near infrared spectroscopy based on efficient variables selection algorithm and nonlinear regression tools. Food Chemistry, 135, 590–596. Chen, Q., Guo, Z., Zhao, J., & Ouyang, Q. (2012). Comparisons of different regressions tools in measurement of antioxidant activity in green tea using near infrared spectroscopy. Journal of pharmaceutical and biomedical analysis, 60, 92–97. Chen, Q., Zhao, J., & Lin, H. (2009). Study on discrimination of roast green tea (Camellia sinensis L.) according to geographical origin by FT-NIR spectroscopy and supervised pattern recognition. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 72(4), 845–850. Chen, Q., Zhao, J., Liu, M., & Cai, J. (2008). Nondestructive identification of tea (Camellia sinensis L.) varieties using FT-NIR spectroscopy and pattern recognition. Czech Journal of Food Science, 26(5), 360–367. Chen, Q., Zhao, J., Liu, M., Cai, J., & Liu, J. (2008). Determination of total polyphenols content in green tea using FT-NIR spectroscopy and different PLS algorithms. Journal of Pharmaceutical and Biomedical Analysis, 46(3), 568–573. Chen, Y., Deng, J., Wang, Y., Liu, B., Ding, J., Mao, X., et al. (2013). Study on discrimination of white tea and albino tea based on near infrared spectroscopy and chemometrics. Journal of the Science of Food and Agriculture. Chevallier, S., Bertrand, D., Kohler, A., & Courcoux, P. (2006). Application of PLS-DA in multivariate image analysis. Journal of Chemometrics, 20(5), 221–229. Chia, K., Rahim, H. A., & Rahim, R. A. (2012). Neural network and principal component regression in non-destructive soluble solids content assessment: A comparison. Journal of Zhejiang University Science B, 13(2), 145–151. COCOBOD. (2013). Accessed on 25.06.2013. Cozzolino, D., Chree, A., Scaife, J., & Murray, I. (2005). Usefulness of near-infrared reflectance (NIR) spectroscopy and chemometrics to discriminate fishmeal batches made with different fish species. Journal of Agricultural and Food Chemistry, 53(11), 4459–4463. Cozzolino, D., Cynkar, W., Shah, N., & Smith, P. (2011). Multivariate data analysis applied to spectroscopy: Potential application to juice and fruit quality. Food Research International, 44, 1888–1896. Cozzolino, D., Smyth, H. E., & Gishen, M. (2003). Feasibility study on the use of visible and near-infrared spectroscopy together with chemometrics to discriminate between commercial white wines of different varietal origins. Journal of Agricultural and Food Chemistry, 51(26), 7703–7708. Di Castelnuovo, A., di Giuseppe, R., Iacoviello, L., & de Gaetano, G. (2012). Consumption of cocoa, tea and coffee and risk of cardiovascular disease. European Journal of Internal Medicine, 23(1), 15–25. Gourieva, K., & Tserevitinov, O. (1979). Method of evaluating the degree of fermentation of cocoa beans. USSR Patent no, 646, 254. Hawkins, D. M. (2004). The problem of overfitting. Journal of chemical information and computer sciences, 44, 1–12. Hourant, P., Baeten, V., Morales, M. T., Meurens, M., & Aparicio, R. (2000). Oil and fat classification by selected bands of near-infrared spectroscopy. Applied Spectroscopy, 54(8), 1168–1174. Ilangantileke, S. G., Wahyudi, T., & Bailon, M. G. (1991). Assessment methodology to predict quality of cocoa beans for export. Journal of Food Quality, 14(6), 481–496. Indahl, U. G., Sahni, N. S., Kirkhus, B., & Næs, T. (1999). Multivariate strategies for classification based on NIR-spectra – With application to mayonnaise. Chemometrics and Intelligent Laboratory Systems, 49(1), 19–31. Jalil, A. M. M., & Ismail, A. (2008). Polyphenols in cocoa and cocoa products: Is there a link between antioxidant properties and health? Molecules, 13(9), 2190–2219. Jiang, H., Liu, G., Mei, C., Yu, S., Xiao, X., & Ding, Y. (2012a). Measurement of process variables in solid-state fermentation of wheat straw using FT-NIR spectroscopy and synergy interval PLS algorithm. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 97, 277–283. Jiang, H., Liu, G., Mei, C., Yu, S., Xiao, X., & Ding, Y. (2012b). Rapid determination of pH in solid-state fermentation of wheat straw by FT-NIR spectroscopy and efficient wavelengths selection. Analytical and Bioanalytical Chemistry, 404(2), 603–611. Kim, H., & Keeney, P. (2006). Epicatechin content in fermented and unfermented cocoa beans. Journal of Food Science, 49(4), 1090–1092. Kovalenko, I. V., Rippke, G. R., & Hurburgh, C. R. (2006). Measurement of soybean fatty acids by near-infrared spectroscopy: Linear and nonlinear calibration methods. Journal of the American Oil Chemists’ Society, 83(5), 421–427. Kris-Etherton, P. M., & Keen, C. L. (2002). Evidence that the antioxidant flavonoids in tea and cocoa are beneficial for cardiovascular health. Current Opinion in Lipidology, 13(1), 41–49. Minifie, B. W. (1989). Chocolate, cocoa and confectionery: Science and technology. Springer. Næs, T., Isaksson, T., Fearn, T., & Davies, T. (2002). A user-friendly guide to multivariate calibration and classification (Vol. 6). Chichester: NIR Publications.

410

E. Teye et al. / Food Chemistry 176 (2015) 403–410

Nazaruddin, R., Seng, L. K., Hassan, O., & Said, M. (2006). Effect of pulp preconditioning on the content of polyphenols in cocoa beans (Theobroma Cacao) during fermentation. Industrial Crops and Products, 24(1), 87–94. Nørgaard, L., Saudland, A., Wagner, J., Nielsen, J. P., Munck, L., & Engelsen, S. (2000). Interval partial least-squares regression (iPLS): A comparative chemometric study with an example from near-infrared spectroscopy. Applied Spectroscopy, 54(3), 413–419. Ozaki, Y., McClure, W. F., & Christy, A. A. (2006). Near-infrared spectroscopy in food science and technology. Wiley-Interscience. Ozaki, Y., Morita, S., & Du, Y. (2007). Spectral analysis. In W. F. M. Yukihiro Ozaki & Alfred A. Christy (Eds.), Near-infrared spectroscopy in food science and technology. Hoboken, New Jersey: Wiley-Interscience, A John Wiley & Sons Inc, Publication. Pérez-Marín, D., Garrido-Varo, A., & Guerrero, J. E. (2007). Non-linear regression methods in NIRS quantitative analysis. Talanta, 72(1), 28–42. Pettipher, G. M. (1986). An improved method for the extraction and quantitation of anthocyanins in cocoa beans and its use as an index of the degree of fermentation. Journal of the Science of Food and Agriculture, 37(3), 289–296. Ripley, B. D. (2008). Pattern recognition and neural networks. Cambridge University Press.

Sadler, G. D., & Murphy, P. A. (2010). PH and titratable acidity. In S. S. Nielsen (Ed.), Food Analysis (pp. 219–238). New York: Springer Science + Business Media, LLC. Saltini, R., Akkerman, R., & Frosch, S. (2013). Optimizing chocolate production through traceability: A review of the influence of farming practices on cocoa bean quality. Food Control, 29(1), 167–187. Teye, E., Huang, X., Dai, H., & Chen, Q. (2013). Rapid differentiation of Ghana cocoa beans by FT-NIR spectroscopy coupled with multivariate classification. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 114, 183–189. ˇ opíková, J., & Coimbra, M. A. Vesela, A., Barros, A. S., Synytsya, A., Delgadillo, I., C (2007). Infrared spectroscopy and outer product analysis for quantification of fat, nitrogen, and moisture of cocoa powder. Analytica Chimica Acta, 601(1), 77–86. Westad, F., Schmidt, A., & Kermit, M. (2008). Incorporating chemical bandassignment in near infrared spectroscopy regression models. Journal of Near Infrared Spectroscopy, 16, 265–273. Yu, H., Lin, H., Xu, H., Ying, Y., Li, B., & Pan, X. (2008). Prediction of enological parameters and discrimination of rice wine age using least-squares support vector machines and near infrared spectroscopy. Journal of Agricultural and Food Chemistry, 56(2), 307–313.

Estimating cocoa bean parameters by FT-NIRS and chemometrics analysis.

Rapid analysis of cocoa beans is an important activity for quality assurance and control investigations. In this study, Fourier transform near infrare...
977KB Sizes 2 Downloads 6 Views