CN103499616A

CN103499616A - Selection method of sensors in producing area models for quality detection of Longjing tea on basis of genetic algorithm

Info

Publication number: CN103499616A
Application number: CN201310323318.9A
Authority: CN
Inventors: 赵镭; 史波林; 支瑞聪; 汪厚银; 裴高璞; 刘宁晶; 解楠; 张璐璐
Original assignee: China National Institute of Standardization
Current assignee: China National Institute of Standardization
Priority date: 2013-07-30
Filing date: 2013-07-30
Publication date: 2014-01-08

Abstract

A selection method of sensors in producing area models for the quality detection of Longjing tea on the basis of genetic algorithm is disclosed. After 3 rounds of genetic algorithm on the sensor responding spectrum of the producing area models, and aiming at the Yangmeiling and Meijiawu producing area models, 7 sensors, namely LY2/G, LY2/AA, T30/1, P10/1, P40/1, T70/2, and PA/2, are selected and 11 least-used sensors, namely LY2/LG, LY2/GH, LY2/gCTL, LY2/gCT, P10/2, P30/1, P40/2, P30/2, T40/2, T40/1, and TA/2 are eliminated.

Description

The system of selection of sensor in a kind of Longjing tea Quality Detection place of production model based on genetic algorithm

Technical field

The application relates to the system of selection of sensor in a kind of Longjing tea Quality Detection place of production model based on genetic algorithm.

Background technology

Sensory evaluation is the important method of evaluation tea leaf quality quality for a long time, but the method need to have abundant Tea Science knowledge and evaluate experience.Only professional tea judgement person, dealer or manufacturer, the person that generally purchases tea is difficult to differentiate the quality of tea quality, there is no the accumulation of suitable experience, is difficult to obtain reliable result.And cultivate a tealeaves syndic and not only will meticulously select, drop into a large amount of expenses, and cycle of training is also long.Even if moreover specialty teacher of the sampling tea, its sensory sensitivity also is subject to the interference of extraneous factor and changes, thereby affects accuracy, objectivity and the consistance of evaluation result.As people's sense of smell resolving power is subject to the interference of extraneous different assorted smell; People's sense of taste susceptibility is subject to the impact of other irritable food and temperature thereof; People's vision relates to the factors such as optics, vision physiological, visual psychology, and can there be certain difference in the chromatic discrimination power of different people.The sensory sensitivity of the personnel that evaluate also is subject to the impact of other factors, as factors such as areal variation, sex difference, the state of mind and healths.In addition, the sensory review need carry out on the basis of contrast material standard sample, and the making of material standard sample is subject to the restriction of various conditions, is difficult to keep several years continuously unanimously.And standard sample adopts preceding year or the productivity product in former years to make raw material, in the time of can being subject to day, the impact of weather, geographical conditions, so in fact the standard sample quality is difficult to the standard that reaches absolute.

The present invention to the Longjing tea in different collecting periods, different tree species, different producing regions from physical and chemical index and organoleptic indicator, the integrated technology of combined with intelligent allelopathic official analysis, multivariate statistics and modern instrumental analysis, omnibearing parsing Longjing tea feature, analyze the internal relation of each index of tealeaves, set up mathematical model qualitative, quantitative evaluation Longjing tea quality, the Longjing tea quality is carried out to feature identification accurately, ranking, for setting up unified green tea appraisement system standard, provide strong foundation.These researchs will provide for the quality assessment of other tealeaves of China in theory basis and support, in practice for the stability that improves China's tea quality, grading and classification by standardization means strengthening China tealeaves, realize the high quality and favourable price of tealeaves, break the high-quality low price tradition of China's export tealeaves, eliminate the query of developed country to China's product high-quality low-cost, for the vital interests of safeguarding home market order and guarantee consumer, actively defend the international fame of China's tea products, promotion international trade etc. has great importance and significant social benefit, economic benefit.

Along with the development of Modern Instrument Analytical Technique, the physics and chemistry research of tealeaves has also obtained corresponding progress in recent years.Tea aroma separating substances and analytical technology progressively are transitioned into gas chromatography-smell (GC-O) method of distinguishing from conventional gas chromatography (GC) or gas chromatography-mass spectrography (GC-MS).Detect at present the tea aroma composition of kind more than 700, comprised fats derivant, terpenes derivant, aromatic derivant and nitrogenous oxa-lopps compound.But nonetheless, also be difficult to react merely global feature information and the flavouring essence quality of tea aroma from the angle of composition.Instrument analysis technology to the tealeaves taste compound mainly contains liquid phase chromatography, spectroscopic methodology, mass spectroscopy, nuclear magnetic resonance method etc.At present, in clear and definite tealeaves, containing the organic chemistry composition, nearly more than 600 plant, inorganic mineral element also reaches kind more than 40.But owing between various flavours, existing interaction, as the phenomenons such as contrast, modified tone, coordination and the mutual-detoxication of the sense of taste, so the chemical characteristic parameter recorded can not truly reflect the sense of taste feature of sample all sidedly.

The appearance of intelligence organoleptic analysis technology has further promoted the tea leaf quality detection level, and it is based on the technology that the human body perception is imitated.Sensor is equivalent to the sense organ in biosystem, to the attribute generation response signal of sample aspect; Signal picker is transmitted and simple process response signal as nervous system; Computer carries out complex process and analyzes identification signal data as human brain, forms comprehensive, whole judgement.Intelligence organoleptic analysis technology have detection time short, reproducible, do not need complicated sample pretreatment process, sensory fatigue and the objective characteristics such as reliable of testing result do not occur, the more important thing is that the sense organ that can simulate to a certain extent the people provides evaluation result and the finger print information of relevant tea aroma, flavour and expolasm, is focus and development trend that current tea leaf quality detects research.At present for sensory attribute such as the color in tealeaves, shapes, the intelligent organoleptic analysis's technology adopted mainly contains machine vision, Electronic Nose and electronic tongues technology, and its workflow mainly comprises that sensor produces response signal, response signal is carried out pre-service, extracts sample characteristic information, set up the correlation model row mode identification of going forward side by side.Wherein pattern-recognition is intelligent sensorium's important component part.The main method of application has principal component analysis (PCA), artificial neural network and fuzzy diagnosis etc. at present.Principal component analysis (PCA) is processed for signal, suppresses multidimensional sensor response signal noise and compressed signal data.Artificial neural network is learnt and is trained the signal after processing, and sets up network model.Fuzzy diagnosis is carried out fuzzy diagnosis, fuzzy quantitative with fuzzy reasoning to complexity.

Adopt intelligent sense organ technical modelling people sensory review's function and feature, process the abundant product quality information contained in intelligent organoleptic detection in conjunction with many algorithm researches, and then extract corresponding computation model and method.Take and solve the algorithm that terminal problem is purpose, analyze their statistical law in a plurality of intelligent sensor objects and the inter-related situation of a plurality of product index, be well suited for the characteristics of food scientific research.Adopt the integrated technologies such as many algorithms, intelligent organoleptic analysis's technology and Modern Instrument Analytical Technique, can overcome the trouble of the statistics and analysis that many index comprehensive evaluations bring, also can take full advantage of experimental data information simultaneously and obtain the implicit details with tealeaves feature correlation of attributes, make statistical study and the pattern discrimination of tealeaves feature quality to complete, not only rapidly but also accurate simultaneously.Thus; for feature qualitative data storehouse and the intelligent quality evaluation system of setting up China's tealeaves; the analysis that realization is fast, accurate and comprehensive to tea leaf quality; for scientific evaluation, the reasonable definition of China's tealeaves feature quality are offered reference and instruct, for quality guarantee, characteristic protection, the true and false of China's tealeaves are differentiated the technical support that core is provided.

The Novel odor scanner that Electronic Nose grew up as the nineties in 20th century, be widely used in the fields such as food, beverage, cosmetics, environment measuring and Processing Farm Produce control at present.With common chemical analysis method, compare, Electronic Nose is utilized its cross-sensitivity to multiple gases, and the Global Information of comprehensive evaluation gas is compared with people's sense of smell, and measurement result is more objective, reliable.

The electronic tongues technology is the novel detection means of a kind of analysis that grows up of the mid-80 of 20th century, identification liquid taste, now has been applied to the fields such as food, medicine, cosmetics, chemical industry, environmental monitoring.With common chemical analysis method, compare, what electronic tongues was exported is not the analysis result of sample flavour composition, but a kind of signal mode relevant with sample, after the software system analysis with mode identificating ability, can draw the relevant overall assessment of sample sense of taste feature.

In sum, intelligent organoleptic analysis's technology (machine vision technique, Electronic Nose Technology and electronic tongues technology) has obtained better result in tea leaf quality detects, and has shown application prospect preferably.But these technology also have certain gap from practical application at present, still there are some critical problems to need to solve.As:

(1) the gordian technique research of Electronic Nose, electronic tongues: machine vision technique is widespread use in practice, but Electronic Nose, electronic tongues are still in development, therefore to build comprehensive intelligent sensorium, need to be furtherd investigate Electronic Nose, electronic tongues, solve its key issue.

(2) development of specificity sensor and screening: because dissimilar sample has its specific substance system, cause dissimilar sensor all different to the response of different material.Therefore, need further further investigate, for specific substance system set up that response is fast, susceptibility is high, the life-span is long, easy to clean, economic and practical sensor array.

(3) science of the representativeness of sample and sampling: in current research report, its result mostly shows the differentiation rate of Classification of Tea or classification higher.But in these researchs, the representativeness of Tea Samples is strong not, and sample number is also complete not, when collected specimens information, is substantially all Duplicate Samples, the tealeaves of each grade detects and repeats many times, makes the stability of model not good, and usable range is wideless.Only have the sample collection method of the science of foundation and the discrimination principle of sample representativeness, the smooth foundation of guarantee following model.

(4) drift of signal and denoising: due to factors vary such as apparatus measures parameter, measuring method, measurement environment, sample sources, easily cause the drift of sensor response curve, cause the error of intelligent organoleptic detection, make it can not adapt to industrialized long-time continuous operation, therefore need to strengthen the relevant research that reduces response signal drift, signal noise analyzing and processing technology.

(5) robustness of model: some is studied when setting up discrimination model, model is not discussed in detail, also with forecast sample independently, carrys out the robustness of testing model.In addition, in the quality differentiation, the stability deficiency of institute's established model, need to add research and the improvement of strong algorithms, to improve the effect of pattern-recognition.

Electric nasus system belongs to the array combination of many sensors, due to the tea aroma complicated component, make each sensor, to a lot of fragrance, response be arranged, and each fragrance component has response on a lot of sensors, make sensor finger-print array can farthest retain fragrance information, but easily introduce bulk redundancy information, cause quality Modeling Calculation amount large, expend time in long, institute's established model complexity is unstable.Its main cause is: (1), due in intelligent sense organ finger-print, the sample response information of some sensor is very weak, directly affects the precision of prediction of model; (2) due to the impact of Electronic Nose noise of instrument, the sample message signal to noise ratio (snr) of some sensors is lower; Simultaneously, extraneous disturbing factor (as temperature, humidity etc.) is larger in the fingerprint response characteristic impact at some sensor place on the sample quality, thereby has reduced the robustness of model; (3) contain various ingredients in tea aroma, each component all can have stronger response in some or several sensors, and as the detection of tea aroma Global Information, need optimal combination different aromas to be had to the sensor array of specific respone, could comprehensive effectively characteristic perfume finger print information.

Choose reasonable and combination by sensor, not only can reject the uncorrelated or nonlinear news sensor of smelling, remove the redundant sensor data message, extract the most effective fragrance intelligence and smell the finger-print information of hearing, make calibration model there is better predictive ability, simplified operation.And can save those and pattern-recognition effect is had no significant effect to the sensor that negative effect is even arranged, thereby, to reducing the manufacturing cost of Electronic Nose, improving system stability has certain positive effect.

It is exactly a kind of optimization problem often run in practice that sensor is selected.Although currently used optimal combination method has been used the theory of combination to a certain extent, this combination is on preliminary basis of rejecting, and the sensor array after grouping is combined, and does not reach the effect of global optimization combination.Although, and Loading value method has been avoided adding of redundant sensor, do not analyze the response performance of selected sensor, same sensor is to the repeatability of same sample response with to the differentiation of different sample response.Genetic algorithm (Genetic Algorithms, be abbreviated as GA) be to take the biological evolution theory of the Darwinian survival of the fittest and the survival of the fittest to be basis, a kind of optimization method of simulating the heredity and evolution process of organic sphere and setting up, have non-derivative, stochastic global optimization, avoid being absorbed in the characteristics such as local minimum point and easy realization.

Summary of the invention

In a kind of Longjing tea Quality Detection place of production model based on genetic algorithm, the system of selection of sensor, is characterized in that: comprise the steps:

The sample of the calibration set forecast set of the Grade Model of West Lake Dragon Well tea is divided;

The Electronic Nose response diagram analysis of spectrum of different brackets tealeaves;

The principal component scores analysis of trend of all grade samples;

The major component loading analysis of all grade samples;

Set up the number of principal components of grade modeling selects according to the similarity classification method;

The Grade Model of the similarity classification method of tealeaves is set up and prediction.

Wherein, described Electronic Nose adopts French Alpha MOS company to produce the Fox 4000 type Electronic Nose with the head space automatic system, it comprises 18 sensors, totally 617, the sample of differentiating for tea grades, wherein select at random 2/3rds to make the calibration set sample, remaining 1/3rd as the forecast set sample, the realization of described genetic algorithm mainly comprises 5 fundamentals: parameter coding, choosing of variable, the initialization of colony, the fitness function design, genetic manipulation design and convergence criterion, wherein the genetic manipulation as important step comprises three operators: select, crossover and mutation.

Described system of selection, it is characterized in that the sensor of place of production model is responded to collection of illustrative plates after 3 take turns genetic algorithm, for the place of production, He Meijia depressed place, red bayberry ridge model, select LY2/G, LY2/AA, T30/1, P10/1, P40/1, T70/2, these seven sensors of PA/2, reject LY2/LG, LY2/GH, LY2/gCTL, LY2/gCT, P10/2, P30/1, P40/2, P30/2, T40/2, T40/1, the low sensor of this 11 velamen frequency of utilization of TA/2.

Described system of selection, it is characterized in that the sensor of place of production model is responded to collection of illustrative plates after 3 take turns genetic algorithm, run Hou Shan, Long Wu and father-in-law family's mountain products ground model for tiger, the frequency that discovery LY2/LG, PA/2, P30/1, tetra-sensors of TA/2 are used in each genetic process is minimum, therefore rejects this four sensors.LY2/G, LY2/AA, LY2/GH, LY2/gCTL, LY2/gCT, T30/1, P10/1, P10/2, P40/1, T70/2, P40/2, P30/2, these 14 sensors of T40/2, T40/1 that stay are carried out to the foundation of three place of production models.

The accompanying drawing explanation

PCA shot chart (a) and mahalanobis distance value residual plot (b) before Fig. 1 exceptional sample is rejected.

The different samples of Fig. 2 are at the signal graph of Electronic Nose sensor characteristics response point.

PCA shot chart (a) and mahalanobis distance value residual plot (b) after Fig. 3 exceptional sample LLJ rejects.

The Electronic Nose sensor response signal intensity map of tetra-grade Longjing teas of Fig. 4.

Fig. 5 abnormal sample is rejected rear different brackets tealeaves Electronic Nose sensor response mean value figure.

Fig. 6 abnormal sample is rejected the load diagram under rear front four major components.

Relation in Fig. 7 Grade Model between PRESS value and model number of principal components.

Fig. 8 genetic algorithm operational flow diagram.

Fig. 9 crossover algorithm.

Figure 10 mutation algorithm.

The load diagram of tetra-grade samples of Figure 11 under major component one, two.

Tealeaves sensor response mean value figure in the model of Figure 12 place of production.

The principal component scores figure (PC1-PC2) of Figure 13 place of production model.

Figure 14 place of production model LHT-LMT (a) and the load diagram of LYJ-LWJ (b) under major component one, two.

Tealeaves sensor response mean value figure in Figure 15 seeds model.

Figure 16 seeds model principal component scores figure (PC1-PC2).

Embodiment

1 Tea Samples is collected and is processed

The present invention collects the West Lake Dragon Well tea samples in 2011 from the local tea grower in Hangzhou West Lake Dragon Well tea producing region, specifically comprises 4 grades, 2 seeds, 5 places of production.For the ease of the differentiation between the tea sample, every kind of tea sample has been carried out to reasonable numbering and differentiation, specifying information is in Table 1.In order to guarantee the consistance of tea sample matter of the same race, the tea sample is placed in the freezer below-4 ℃, get pouch according to experimental measuring at every turn and tested.

Figure 2013103233189100002DEST_PATH_IMAGE001

2 detection by electronic nose methods

The present invention adopts French Alpha MOS company to produce the Fox 4000 type Electronic Nose with the head space automatic system.At first, pack in each 20 mL head space bottle after the dry tea of 1.00 g Longjing tea and add 5 mL normal temperature ultrapure water gland packings; Every kind of tea sample is all contained sample by this way, and detects successively.The detection of each sample is first the head space bottle to be sent into to preheating zone, at 500 rpm oscillator rotating speeds and 60 ℃ of head space temperature after heating 900 s, extract 2.0 mL gases out and be injected into Electronic Nose sensor array chamber (containing 18 metal oxide sensors) with the sample introduction speed of 2.0 mL/s.With the semiconductor material generation adsorption and desorption effect of 18 sensor surfaces, cause the variation of sensor electrical resistance respectively.Under the different time, produce different resistance values.In sensor array indoor sample gas residence time, be 120s, every 0.5s once sampling, Electronic Nose software records each sampled data automatically.

3 tea leaf quality modeling methods

Tea leaf quality model (grade, the place of production, seeds model etc.) is set up specimen in use and is divided into calibration set and forecast set.To the sample in each model, all select at random 2/3rds to make the calibration set sample, remaining 1/3rd as the forecast set sample.The present invention is by soft independent model SIMCA(Soft independent modeling class analogy, claim again the similarity analysis method) set up the qualitative discrimination model, the PCA data model of all kinds of samples of model, the upper SIMCA distance of calculating unknown sample in this basis is determined its category attribution.The all calculating of modeling completes by self-editing MATLAB 7.0 programs.

4 exceptional sample point analysis and rejectings

4.1 exceptional sample point analysis principle

In the intelligent sensory signal of application carries out the pattern recognition analysis process of tea leaf quality, at first the reliability of all classification and recognition result depends on the accuracy of raw data, obtain the reliability of intelligent sense organ collection signal and the original classification information of tealeaves, the quality of data set quality directly affects the success or failure of pattern discrimination.Therefore, the existence meeting of abnormal (unusual) sample point (Outlier Sample) affects the distribution trend that even changes overall data to a certain extent, thus the accuracy of effect correction model.

So-called exceptional sample point, the conspicuousness that not only refers to the measured value of intelligent fingerprint collection of illustrative plates or sample raw information and actual value is abnormal, also should comprise that the finger-print of this sample and modeling concentrate the significant difference of sample mean figure spectrum information, generally can be divided into that finger-print is abnormal and tealeaves raw information is abnormal.

Cause the abnormal main cause of finger-print to have:

(1) variation of surveying instrument and performance parameter, as variation, noise of instrument and the wave band drift etc. of instrument energy;

(2) variation of measuring method, not equal as the difference of the difference of sampling, measuring point and measuring distance;

(3) variation of measurement environment, as the variation of temperature and humidity;

(4) variation of other physics of sample or mechanical property, as variations such as granularity, viscosity, smooth finish;

(5) variation of sample source, make sensor response resistivity or some characteristic peak intensity abnormal, as variations such as the place of production, standing time, storing mode, collecting period and tillage methods;

(6) sample is rotten or the error such as mistake;

(7) operating mistake in intelligent sensory signal scanning.

The main source that tealeaves original quality information is abnormal has:

(1) reliability of physics and chemistry instrument used and method;

(2) variation of sensory evaluation method;

(3) variation of sample source;

(4) comment tea teacher's error, as the error in judging process and Data Input Process.

The generation of exceptional sample, if maloperation or instrument are abnormal, can be corrected by Resurvey after discovery simply; Whether abnormal sample, if, because sample itself produces, can not be corrected by Resurvey simply, reliably depends on the fitting degree of its sensor response abnormality and model to the predicted value of this sample.So the discovery of abnormal sample and effective rejecting are that calibration model and data results are crucial reliably.

4.2 exceptional sample point analysis method

Exceptional sample analytical approach in the present invention is that the principal component analysis (PCA) score is in conjunction with the mahalanobis distance method.

(1) principal component analysis (PCA) shot chart method

Principal component analysis (PCA) (PCA) is a kind of data mining technology in multivariate statistics.By Data Dimensionality Reduction, be chosen as several less new variables and replace original more variable, to eliminate part overlapped in numerous information co-exists under the prerequisite of not losing main figure spectrum information.By original a large amount of collection of illustrative plates variablees are changed, make the less new variables of number become the linear combination of original variable.

Similarity and the uniqueness of principal component scores after principal component analysis (PCA) between can reflected sample, the corresponding different major components of each sample have different score values.Shot chart based on sample can disclose internal feature and the clustering information of sample, further illustrates each sample and whether have larger difference in large class sample set, for the exceptional sample point analysis provides certain theoretical foundation.

(2) mahalanobis distance method of discrimination

Mahalanobis distance (Mahalanobis) is one of effective ways of research hyperspace vector similarity, in qualitative, the outlier discriminatory analysis of collection of illustrative plates, is used widely.Mahalanobis when calculating, carries out in conjunction with the response data under several sensors (as resistivity), and the mahalanobis distance calculation procedure of sample set is as follows:

Figure 2013103233189100002DEST_PATH_IMAGE002

the collection of illustrative plates score that in formula, ti is calibration set sample i is the average matrix of a calibration set m sample; The average centralization matrix that Tcen is T; The Mahalanobis matrix that M is the calibration set sample; The Mahalanobis distance that MDi is calibration set sample i.According to quantitative correction permissible error and corresponding Mahalanobis distance, determine that outlier Mahalanobis distance threshold is limit and, to after the spectrum data standardization, the mahalanobis distance size of each sample is determined by following formula:

Hii can be used to weigh the impact of a sample for whole standard model collection.In intelligent sense organ sensor detects, hii has expressed sample i to the regression model influence degree, if hii is too large, shows that this regression model is larger to the dependence of sample i, and unfavorable to model stability, in other words, sample i may be abnormal sample.

4.3 exceptional sample point analysis and rejecting

Major component (score matrix) is the linear combination of primal variable, the error of sum square minimum produced while with it, characterizing primal variable.First major component can be explained the amount of variation maximum of former variable, and second is taken second place, and the rest may be inferred by analogy for it, and it is mutually orthogonal that each organizes major component.The method that major component is calculated is more, at this, adopts the nonlinear iterative partial least square method (Nonlinear Iterative Partial Least Squires, NIPALS) of house one validation-cross.The principal component scores of Longjing tea and corresponding mahalanobis distance residual result are as shown in Figure 1.Sample LLJ departs from far with other sample sets in major component figure, and its mahalanobis distance value is also very large, so these LLJ are the exceptional sample point.Analyze its corresponding sensor response diagram (Fig. 2), find that the response diagram difference of this elaboration tealeaves and other sample elaboration is very large, do not belong to a grade sample.The raw information of inquiry sample collection, find that this sample is not the real Long Wu place of production, Hangzhou elaboration West Lake Dragon Well tea, but the Zhejiang Dragon Well tea, because the offering sample mistake causes.By to after these abnormity point elimination, re-start principal component scores and mahalanobis distance value and analyze (Fig. 3), find that these tealeaves are evenly distributed in major component figure, its mahalanobis distance value does not occur abnormal yet, representative, can carry out follow-up model foundation and relevant mathematics manipulation.

The score of major component is the similarity between reflected sample and uniqueness to a certain extent, and the corresponding different major components of each sample have the different score values that obtain.Fig. 3 (a) is the score scatter diagram of each tealeaves sample on the first two major component, has shown dispersion and the difference of sample point, and the sample with identical or close character flocks together, and the obvious sample of difference mutually away from.Secondary tealeaves and other tealeaves difference are very large as we can see from the figure, have oneself independently zone, but the differentiation of the tealeaves of elaboration, superfine and one-level are very little, and obvious overlapping region is arranged.This sensor response curve analysis result with front is consistent.Shot chart based on sample can disclose internal feature and the clustering information of sample, further illustrates the larger difference that sample exists in the sensor response, for utilizing Electronic Nose classification and Detection different brackets tealeaves, provides certain theoretical foundation.But, because other hierarchical region is overlapping serious, this method almost can't be for the differentiation of these four kinds of samples by naked eyes.

The mahalanobis distance residual plot means the influence degree of each sample point to corresponding principal component model, by mahalanobis distance and the residual error of sample point, is decided, and the sample point of high mahalanobis distance value and high residual values is considered to the exceptional sample point.The mahalanobis distance value is that sample point subpoint in model, apart from the distance of model center, means the difference of other sample in this sample and model, and sample point is to the influence degree of set up model, is worth larger expression larger on the impact of model.Residual error is the poor of the observed value of sample point and match value, means the amount of the not construable sample point feature of model, and its value is less, and models fitting is better.From Fig. 3 (b), residual values and the mahalanobis distance value of sample point are all less, show that sample that in each model, calibration set is chosen has the representativeness of corresponding tealeaves characteristic.

After exceptional sample is rejected, its final experiment sample number is in Table 2, and sample is 667 before rejecting, and after rejecting, is 617 samples.The main cause that causes above-mentioned phenomenon is in sample set, to be mixed with not belong to same overall data, after these abnormal datas (exceptional sample) are sneaked into, can make to predict the outcome inaccurate, affects the correctness of statistical inference, and measurement result is brought to adverse influence.Exceptional sample is definitely very important on the impact of calibration model, and the validity of established model in order to guarantee, when data are processed, must be found and identify exceptional sample, and it is rejected from the sample set sample, and then doing follow-up study.

Figure 2013103233189100002DEST_PATH_IMAGE004

By PCA, mahalanobis distance figure and sensor response diagram analysis of spectrum, the abnormal sample point in the search modeling.Show that sensor response finger-print is very easy to be subject to the impact of external interference factor, the exceptional sample of the true character of representative sample point more not, their existence meeting affects the distribution that even changes overall data to a great extent, very large on the impact of modeling.From the angle of mathematics, exceptional sample point is exactly away from the sample of barycenter in multivariate space.This does not belong to the character of model to the most important thing is to have represented some by exceptional sample point, and forecast set can not comprise these features under normal circumstances, makes the existence of exceptional sample point reduce predictive ability and the robustness of model.If do not carry out abnormity point analysis and rejecting, employing finger-print pre-service or other modeling method all are difficult to improve the effect of model, so the abnormal sample rejecting is the problem that each modeling worker must consider.

The Grade Model of 5 Xihu Longjing Teas is set up

5.1 the calibration set forecast set sample of Grade Model is divided

Totally 617, the sample of differentiating for tea grades after the rejecting abnormalities sample point, wherein select at random 2/3rds to make the calibration set sample, remaining 1/3rd as the forecast set sample, make calibration set both have representative preferably, widen again the estimation range of model simultaneously, strengthened the adaptive faculty of model, sample distribution is shown in Table 3.

Figure 2013103233189100002DEST_PATH_IMAGE005

5.2 the Electronic Nose response diagram analysis of spectrum of different brackets tealeaves

During tea aroma detects 18 sensor resistances than the variation response diagram of (resistance variations is compared with original resistance value) as shown in Figure 4, every curve corresponding a sensor, totally 18 curves.When selecting on curve represents the millet paste volatile matter by sensor passage, resistivity change situation in time.According to the difference of Fundamentals of Sensors, its response intensity has positive and negative dividing.The horizontal ordinate below is LY type sensor, and the horizontal ordinate top is T, P type sensor.As shown in Figure 4, gathering early stage, in sample, volatile substance carries out strong enrichment process at sensor surface, and the curve response change is fast, and slope absolute value is larger.When the suction-operated of volatile matter and sensor, during in equilibrium state, the sensor response reaches the absolute value maximum, now best embodies the character of gas in sample.Along with the prolongation of acquisition time, gas concentration reduces gradually, and the sensor response reduces gradually, and curve slowly tends towards stability, and finally reaches a metastable state.But elaboration and superfine collection of illustrative plates are very approaching, secondary and other grade sample difference maximum, and the collection of illustrative plates of one-level sample and elaboration, superfine close, but its response scope difference, the sample that grade is high, the absolute value of its response is just large.Hence one can see that, and Electronic Nose has obvious response to the fragrance ingredient of millet paste, and it is feasible showing to utilize Electronic Nose to measure tea leaf quality.

Response diagram at 120s in the time, more different sample room difference intuitively.Need to find the characteristic response point, find and represent the characteristic response intensity of every sensor to a certain sample.The crest of response curve or trough are lower for the relative standard deviation (RSD) of same sample, usually maximum for the discrimination of different samples.Therefore, choose the maximum point of sensor response absolute value, the peak dot in sensor response signal intensity map or valley point are as unique point.In order to analyze the difference of different brackets, the different place of production, different tree species tea leaf quality, Fig. 2 show certain day different tealeaves (being numbered: LLJ, LWJ, LYJ, LHT, LMT, QWJ, QHJ, QLJ, QYT, QMT, 1,2) at the responsor signal graph at each sensor crest or trough place.As can be seen from Figure 2, each sensor is different to the response of tea aroma.In LY type sensor, along with the difference of tea leaf quality, amplitude has obvious fluctuation, distinguishes apparent in viewly, and T-shaped and its response curve discreteness of P type is less.Simultaneously red secondary sample curve and other sample area divides obviously, although the differentiation of the curve of one-level sample and elaboration, superfine sample is not clearly, elaboration and superfine curve are all between firsts and seconds.Hence one can see that, and the difference of sensor array characteristic response figure has reflected the quality difference of West Lake Dragon Well tea to a certain extent, and has certain characteristic and fingerprint, for the classification of tealeaves, differentiates Fundamentals of Mathematics are provided.

Fig. 5 is different brackets tealeaves response mean value figure separately, from figure, can clearly see, the response of secondary tealeaves is difference and other grade sample obviously.The response collection of illustrative plates of elaboration, superfine and one-level is closely similar, just at sensor LY2/G, LY2/AA, LY2/gCTL, P30/2 etc., locates relatively large difference, and the difference of each sensor response signal is the basis of follow-up mathematical modeling.

5.3 the principal component scores variation tendency of all grade samples

The data matrix that different brackets tealeaves sample odor characteristic parameter is formed carries out principal component analysis (PCA), and the principal component model of its foundation is:

Figure 2013103233189100002DEST_PATH_IMAGE006

.Wherein Am * p is the figure spectrum matrix, and Tm * f is score matrix, and Pf * p is loading matrix, and E is the collection of illustrative plates residual error, and dimension is identical with Am * p.M is the sample number, and p is number of sensors, and f is the major component number.

To each the measuring value aij in matrix A m * p, its principal component analysis (PCA) can be expressed as: , in formula: tin be sample i on n major component score value, pnj is the load value of sensor j on n major component; The residual values of the variable j that eij is sample i.

Employing stays a cross verification to carry out principal component analysis (PCA), the accumulation contribution rate situation that table 4 is the principal component analysis (PCA) of all grade tea samples.The contribution rate of first principal component is 93%, most sample messages of raw data have been represented, front 4 major components have represented 99% sensor information, known according to major component character, front four major components can characterize the Electronic Nose intelligence sensorial data architectural feature of sample, thereby have played the effect that reduces data dimension, reduced data.Select front 4 number of principal components modelings, data matrix reduces to 617 * 4(4 major component from original 617 * 18).

Figure 2013103233189100002DEST_PATH_IMAGE008

5.4 the major component loading analysis of all grade samples

In principal component analysis (PCA), the computing formula of n principal component scores is:

Figure 2013103233189100002DEST_PATH_IMAGE009

, wherein pij is called the load (Loading) of variable aij, and load is larger, illustrate that the correlativity of major component and this variable is better, and variable aij is corresponding to the response of j sensor in the sensor response matrix.The sensor response signal of different brackets tealeaves is through principal component analysis (PCA), and front 4 principal component scores reach 99% to the contribution accumulative total of tealeaves intelligent fingerprint change information.Fig. 6 has represented load and the sensor figure of front 4 major components, can find out the relation between each major component and sensor.

As can be seen from Figure 6, for the PC1(93% that represents tealeaves quantity of information maximum), its load is larger is mainly LY2/G, LY2/AA, these four sensors of LY2/GH, LY2/gCTL, for Second principal component,, except sensor LY2/AA, also have the correlativity of P10/1, P10/2, P40/1 and T40/1, TA/2 larger.Under the 3rd major component, the correlativity of sensor LY2/LG, LY2/G, LY2/AA, LY2/GH is larger; Under the 4th major component, the correlativity of sensor LY2/AA, T30/1, T70/1 and T40/1 is larger.

5.5 the number of principal components of SIMCA grade modeling is selected

At first similarity classification method (SIMCA) modeling carries out the principal component analysis (PCA) modeling to each class sample, makes similar sample be gathered in the same space zone.Table 5 is different brackets sample principal component model contribution rates under different major components separately, and the gradational first principal component contribution rate of institute is all more than 99%, and nearly all grade is all the main information that front 5 major components have represented sample basically simultaneously.

Figure 2013103233189100002DEST_PATH_IMAGE010

similarity classification method algorithm is based on the method for setting up the principal component analysis (PCA) class model, can embody very intuitively the trend of tea leaf quality feature through the variation of principal component analysis (PCA) sensor response signal major component, the definite of number of principal components is the key of setting up good model.Due to the concern of similarity classification method algorithm is the similarity degree of each grade inside, and each major component representative is the variation property of same grade calibration samples, the levels characteristic that more forward major component comprises is abundanter, effect to classification is larger, so before selecting, several major components can make the classification quality reach best, the levels characteristic that the major component of simultaneously selecting comprises more is more, and the effect of modeling and forecasting is also better.

But select too much number of principal components can bring the effect of model over-fitting equally.In this invention, tentatively determine the best number of principal components of above-mentioned different brackets tealeaves model by validation-cross, in the situation that prediction residual quadratic sum (PRESS) changes, do not choose fewer number of principal components very much.Along with major component increases, PRESS reduces gradually, but major component is while surpassing certain numerical value, and due to the appearance of over-fitting phenomenon, PRESS increases on the contrary.Fig. 7 is the PRESS value of different brackets instance model and the relation between number of principal components.Because elaboration is very large with superfine PRESS value in major component one and two, in this figure, all do not draw.The number of principal components of elaboration was at 9 o'clock, and the PRESS value is minimum, and number of principal components is between 5-8, and its PRESS value changes less; Superfine number of principal components was at 7 o'clock, and PRESS value is minimum, and number of principal components is 5 and 6 o'clock, and its PRESS value variation is less; The number of principal components of one-level is 6 o'clock, and the PRESS value is minimum, and number of principal components is 4 and 5 o'clock, and its PRESS value variation is less; The number of principal components of secondary is 6 o'clock, and the PRESS value is minimum, and number of principal components is 4 and 5 o'clock, and its PRESS value variation is less.

5.6 the similarity classification method Grade Model of tealeaves is set up and prediction

The estimated performance of similarity classification method hierarchy model is extremely important, is mainly manifested in the mensuration whether forecast model can be suitable for new data.Good model can be described the data similar to modeling data, and check just refers to brings new similar data into model, then observes predicated error and whether meets predetermined requirement, thereby prove the rationality of selected number of principal components.

Forecast test is divided into two kinds: the one, and external inspection, refer to and use brand-new predicted data to be verified; Another is called internal inspection, refers to and uses the data of modeling itself to be verified model.In theory, the predictive ability of a model can only be checked by brand-new data, but cross-validation (Cross validation) also can provide rational result.

If sample size is less or seldom, the cross-validation method can more effectively be utilized limited sample, but computing velocity is slower than external inspection method.In the cross-validation algorithm, identical sample is both for the structure of model, again for the check of model.Basic ideas are as follows: first reserve a certain amount of sample from the calibration set sample, with remaining sample, set up calibration model, then predicted with those reserved sample input models, draw predicated error; This process repeats, until each sample is reserved, appears once, carries out forecast test, then uses repeatedly the predicated error of modeling to come residual variance and the mean square deviation of calculated population.Cross validation is a kind of extraordinary internal inspection method, as the external inspection method, pursuit be to use independently data to test to model, main benefit is unlike external inspection, predicted data is just for check, and wasted data resource.

Cross validation method can be divided into again the several methods such as full figure spectrum cross validation (full cross validation), part cross validation (segmented cross validation).Full figure spectrum cross validation is the cross validation method used the earliest, when being each modeling, its thinking only reserves a sample as the forecast set sample from gross sample, and other sample is for modeling, repeat this process, until all samples all reserve once, as the prediction sample, carry out testing model.Because full figure spectrum cross-validation method needs the expensive time, verifying speed is slow, and the part cross-validation method is only all samples to be divided into to several parts to be verified.

But because full figure is composed the effective of cross validation, and extensively be used.The first, actual prediction ability that can estimation model, although be internal inspection, do not participate in modeling as predicted sample, can simulate the prediction case to unknown sample; The second, the sample number of calibration set is more, and the sample number that each modeling is rejected is relatively fewer, and estimation effect is just better.

Predictive ability for a model is usually predicted to check with the full spectrogram validation-cross of calibration set and the outside of forecast set.Full spectrogram validation-cross is the predictive ability for calibration set for model, is the self-checking evaluation; Outside prediction is the indication ability for the forecast set sample for evaluation model.Generally, full spectrogram validation-cross estimated performance is predicted higher than outside, full spectrogram validation-cross illustrates to a certain extent model and selects the classification capacity of parameter, outside prediction is an index that problem more can be described, robustness and the adaptability of its reaction institute's use characteristic variable and model, table 6 is the effects to the different similarity classification methods of four grades (SIMCA) calibration modeling.

Figure 2013103233189100002DEST_PATH_IMAGE011

By table 6, the discrimination of known four grade instance models can only reach 70% multiple spot, is not very high, is mainly that elaboration and superfine tea aroma feature are very approaching, has affected the estimated performance of block mold.The differentiation modeling discrimination of this two-stage sample also approximately only has 67% separately, illustrates that the sample overlap ratio of these two grades is more serious.Trace it to its cause is that the angle of sampling tea goes out to send division because elaboration and superfine division are mainly gone into business, namely from fragrance, flavour and plucking time, difference is quite little, main variant from outward appearance aspects such as the regularity of tealeaves, big or small homogeneity, for there is no broken end, uniformly being decided to be elaboration tea, and other tealeaves before bright just is decided to be superfine tea.Therefore, elaboration and superfine odor characteristic are very approaching.

In order further to study the detectability of Electronic Nose, the sample that elaboration in this level Four sample and superfine sample are combined into to a kind of grade is called " essence is superfine " and sets up with the similarity classification method discrimination model that I and II carries out Three Estate, the estimated performance of finding model is very good, the discrimination of calibration set, forecast set reaches respectively 93.43% and 92.72%, all surpasses 92%.Separately elaboration, one-level, secondary are carried out to the similarity classification method discrimination model of Three Estate sets up simultaneously, also separately superfine, one-level, secondary being carried out to the similarity classification method discrimination model of Three Estate sets up, these three grades of models have stronger recognition capability, their discrimination all surpasses 90%, also absolutely proves that level Four model prediction poor performance is because elaboration and superfine sample message is overlapping causes.In addition, in the pattern-recognition of similarity classification method, the foundation of tea grades model is the method for having utilized linear differentiation substantially, the result of tealeaves identification not yet reaches 100% discrimination, this may be due to the characteristic that is subject to storage time, condition of storage and sensor response signal, make the signal obtained have nonlinear transformations, so can also attempt utilizing other nonlinear mode identification method to set up model in work afterwards.The satisfying the market detection need to substantially for these three grades of models at present.

In principal component analysis (PCA) Fig. 3 (a), can see the dispersion degree maximum of secondary sample collection and other sample sets, with the naked eye with regard to gem-pure the distinguishing of energy, similarity classification method two discriminant classification modelings by one-level, secondary, the discrimination of its calibration set and forecast set is all 100%, illustrate that the difference of I and II sample message is very large, and this model is applicable to applying fully.

6 intelligent sense organ collection of illustrative plates feature sensor systems of selection

In Electronic Nose, the response performance of sensor mainly comprises whether same sensor has good stability and whether different samples are had to higher differentiation the response of same sample.

The optimal combination method is that the applying electronic nose gathers the not smell response signal data of same quality sample, by the variance analysis to the different sensors response signal value, carry out preliminary screening and grouping according to sensor response performance quality, again the sensor of grouping carried out to permutation and combination, the discriminant index DI of principal component analysis (PCA) result of take is foundation, finally determines the effective sensor array of sample classification.Although the method has also been used the method for combination to a certain extent, this combination is on preliminary basis of rejecting, and the sensor array after grouping is combined, and does not reach the effect of global optimization combination.

Loading value method, be about to sensor as analytic target, response to sensor under different samples is carried out principal component analysis (PCA), by principal component analysis (PCA) figure (also being the Loading analysis chart of sensor) judgement, distinguishes intimate sensor and rejects.Although the method has been avoided adding of redundant sensor, do not analyze the response performance of selected sensor, same sensor is to the repeatability of same sample response with to the otherness of different sample response.

Genetic algorithm (Genetic Algorithms, be abbreviated as GA) be to take the biological evolution theory of the Darwinian survival of the fittest and the survival of the fittest to be basis, a kind of optimization method of simulating the heredity and evolution process of organic sphere and setting up, have non-derivative, stochastic global optimization, avoid being absorbed in the characteristics such as local minimum point and easy realization.Its basic thought is the feasible solution in Problem Areas (multi sensor combination group) (a certain sensor building form) to be regarded as to body one by one or the chromosome (a certain sensor building form) of population (multi sensor combination group), and each individuality is encoded into to binit string form; Genetic algorithm is estimated chromosomal quality by chromosomal " fitness value ", and the selecteed probability of the chromosome that fitness value is large is high, and on the contrary, the selecteed possibility of the chromosome that fitness value is little is little, and selecteed chromosome enters the next generation; Chromosome in the next generation, by genetic manipulations such as crossover and mutations, produces new chromosome, i.e. " offspring "; After some generations, algorithm convergence is in best chromosome, and this chromosome is exactly optimum solution or the near-optimum solution of problem, i.e. selecteed optimal sensor array.The realization of genetic algorithm mainly comprises 5 fundamentals: the initialization of the choosing of parameter coding, variable, colony, fitness function design, genetic manipulation design and convergence criterion etc.Wherein the genetic manipulation as important step comprises three operators: selection, crossover and mutation.Its operating process is shown in Fig. 8.

The present invention adopts the sensor in genetic algorithm In Grade, the place of production and the foundation of seeds model to select to optimize.All calculating in genetic algorithm complete by self-editing MATLAB 7.0 programs, and its key parameter is as table 7.The concrete steps of this algorithm are as follows:

(1) select suitable variable parameter: Population Size 40, crossover probability pc is 0.6, and variation Probability p m is 0.1, and the termination evolutionary generation T of genetic algorithm is 200.

(2) put k=0, produce at random initial population:

Figure 2013103233189100002DEST_PATH_IMAGE012

.

(3) chromosome coding: all the sensors is carried out to binary coding, and each root sensor is as a gene (totally 18 genes).If gene code is 1, modeling comprises this sensor; If 0, do not comprise this sensor during modeling.A kind of coded combination is called item chromosome.

(4) determine adaptive value function F (k): this experiment adopts the cross verification evaluation to the predictive ability of model, requires the discrimination maximum of institute's established model, and pattern function is:

Figure 2013103233189100002DEST_PATH_IMAGE013

.

(5) chromosomal selection: by " roulette method " commonly used, determine that the large previous generation's chromosome information of fitness value is delivered to the next generation.

(6) chromosomal intersection: adopt the single-point bracketing method, the chromosome of selecting at random some according to predetermined crossover probability pc is to as parents; Then, select at random a point of crossing, the gene strand on exchange right side, parents point of crossing, produce new filial generation; Finally, by child chromosome, replace parent chromosome, produce the new population (see figure 9).This is to produce new individual main method, has determined the ability of searching optimum of genetic algorithm.

(7) chromosomal variation: adopt the basic bit mutation method, with the Probability p m be scheduled to, chromosomal gene is changed, 1 and 0 conversion mutually, replace the parent (see figure 10) by the child chromosome after variation.Individuality after intersecting is made a variation, is obtained population of future generation:

Figure 2013103233189100002DEST_PATH_IMAGE014

; This is to produce new individual householder method, can prevent the prematurity Convergent Phenomenon, improves the local search ability of sensor.

(8) circulation stopping criterion: whether reach maximum reproductive order of generation (Genmax) or optimum solution that preliminary election is set, reach and stop; Otherwise, be circulated back to (4).

Figure 2013103233189100002DEST_PATH_IMAGE015

6.1 the sensor in Grade Model is selected

The sensor response collection of illustrative plates of In Grade model is after 3 take turns genetic algorithm, find that the frequency that three sensor LY2/LG, P40/1, TA/2 be used in each genetic process is minimum, therefore reject this three sensors, these 15 sensors such as the LY2/G, the LY2/AA that stay, LY2/GH, LY2/gCTL, LY2/gCT, T30/1, P10/1, P10/2, T70/2, PA/2, P30/1, P40/2, P30/2, T40/2, T40/1 are carried out to the foundation of different brackets model, and the modeling effect of its sensor before and after rejecting is in Table 8.For the I and II model, because sample own differs greatly, after sensor is rejected, discrimination has still retained 100%; Very little elaboration and the superfine sample for sample difference, the effect of model is not too large variation almost, and calibration set remains on more than 67%, and forecast set changes also little; After same sensor is deleted choosing, for the model of elaboration, superfine, these four grade samples of firsts and seconds, set up, the discrimination of forecast set does not change, or 70% left and right; In three disaggregated models of elaboration, firsts and seconds, the discriminating power of calibration set and forecast set brings up to 92.83% and 92.09% from 92.11%, 90.65% respectively; Although the estimated performance of superfine, one-level and tertiary sample Grade Model decreases, but very approaching, its sensor is rejected front and back effect substantially also in 95% left and right; The Forecasting recognition rate of essence spy, one-level and tertiary sample Grade Model increases equally, with 92.73% of all sensors modeling, becomes 93.20% of 15 Sensor Models.As can be seen here, the performance of the grade discrimination model after sensor is selected does not reduce, and what have also becomes excellent on the contrary, but makes the quantity of sensor obtain minimizing.

Figure 2013103233189100002DEST_PATH_IMAGE016

In order further to study the disallowable mechanism of sensor, make a concrete analysis of the response performance of these Electronic Nose sensors.The measurement of response performance mainly comprises whether same sensor has good cohesion and whether the inhomogeneity sample is had to higher differentiation the response of similar sample.The principle of application variance analysis, be used as a factor to every sensor, and level is used as in the response of different samples, carries out homogeneity test of variance, guarantees that data meet the condition of variance analysis.Application SPSS data analysis software is carried out respectively the calculating F value (table 9) of one-way analysis of variance to the sensing data of all grade samples.The F value shows the separating capacity of same sensor to the inhomogeneity sample, and the F value is larger, and discrimination is larger.

Figure 2013103233189100002DEST_PATH_IMAGE017

Although the F assay of all the sensors all is greater than F _0.05=2.60, be that all the sensors is remarkable to the discrimination of four different brackets, but the F value that compares all the sensors, wherein the F value of LY2/LG, TA/2 and T40/1 all is less than 25, and the little P10/1 of F value fourth from the last is more than 5 times of these three sensors, and the F value minimum of LY2/LG only has 8.003, therefore reject this sensor.

Simultaneously at four grade sample datas in the load diagram (Figure 11) after principal component analysis (PCA), TA/2 and T40/1 are more approaching in load diagram, belong to the sensor that plays similar effect, but the load value of TA/2 under PC2 is lower than T40/1, so can reject sensor TA/2.According to same principle, sensor P40/1 and P10/1 almost in overlap condition, then, according to the Combinatorial Optimization method of sensor, finally reject sensor P40/1 in load diagram.

6.2 the sensor in the model of the place of production is selected to optimize with screening and is analyzed

(1) the calibration set forecast set sample of place of production model is divided

In order to guarantee the comparability of place of production model, this mainly for same grade the different places of production tealeaves model under same seeds condition.In 617 gathered Tea Samples, following four place of production models are arranged: the tiger that (1) originates from the superfine tealeaves of Dragon Well tea 43# seeds runs rear mountain (LHT) and Mei Jia depressed place (LMT) model; (2) originate from the red bayberry ridge (QYT) and Mei Jia depressed place (QMT) model of the superfine tealeaves of colony seeds; (3) originate from the red bayberry ridge (LYJ) and Weng Jiashan (LWJ) model of Dragon Well tea 43# seeds elaboration tealeaves; (4) tiger that originates from colony's seeds elaboration tealeaves runs rear mountain (QHJ), Long Wu (QLJ) and Weng Jiashan (QWJ) model.To the sample in each model, all select at random 2/3rds to make the calibration set sample, remaining 1/3rd as the forecast set sample, and concrete sample distribution is as shown in table 10.

Figure 2013103233189100002DEST_PATH_IMAGE018

(2) Electronic Nose of place of production model response collection of illustrative plates and principal component analysis (PCA)

Figure 12 is four place of production models average response collection of illustrative plates separately, the collection of illustrative plates of model LHT-LMT and model QYT-QMT is distinguished very large as seen from the figure, the collection of illustrative plates of model LYJ-LWJ differs greatly at sensor LY2/G, LY2/AA, LY2/GH, LY2/gCTL and P30/2 place, and in model QHJ-QLJ-QWJ, the average finger-print difference in three places of production is very little.

From principal component scores Figure 13, also can see that the sample in each place of production in model QYT-QMT has separately significantly zone, and the sample variation degree maximum between two places of production; Although in model LHT-LWT, each place of production sample also has zone separately, there is no obvious distinguishing limit between two places of production; In model LYJ-LWJ, two places of production not only do not have obvious boundary, also have and intersect and overlapping zone simultaneously; And the intersection of the sample in model QHJ-QLJ-QWJ is very many, almost be difficult to form place of production classification separately.

(3) sensor in the model of the place of production is selected

This separately to the sensor of place of production model (QYT-QMT) after 3 take turns genetic algorithm, pick out LY2/G, LY2/AA, T30/1, P10/1, P40/1, T70/2, these seven sensors of PA/2, reject LY2/LG, LY2/GH, LY2/gCTL, LY2/gCT, P10/2, P30/1, P40/2, P30/2, T40/2, T40/1, the low sensor of this 11 velamen frequency of utilization of TA/2.Selected sensor is carried out to place of production differentiation, and its effect is in Table 11.After rejecting 11 sensors, the estimated performance or 100% of the colony that the builds tree place of production model of the superfine tealeaves in two places of production kind in He Meijia depressed place, red bayberry ridge, and number of principal components separately all is reduced to 2 from 5 and 6, makes model more simplify, and has greatly reduced number of sensors.Average finger-print and principal component analysis (PCA) figure by this model, can infer that between each place of production, sample differs greatly, and makes every sensor performance all better, just keeping, on the basis that model performance is constant, simplifying as much as possible the needed sensor number of modeling.At this, just can well set up the superfine tealeaves of colony seeds in two places of production, He Meijia depressed place, red bayberry ridge with seven sensors.

The sensor response collection of illustrative plates of model LHT-LWT, model LYJ-LWJ, model QHJ-QLJ-QWJ is respectively after 3 take turns genetic algorithm, find it is all that the frequency that is used of LY2/LG, PA/2, P30/1, tetra-sensors of TA/2 is minimum in each genetic process, therefore reject this four sensors.LY2/G, LY2/AA, LY2/GH, LY2/gCTL, LY2/gCT, T30/1, P10/1, P10/2, P40/1, T70/2, P40/2, P30/2, these 14 sensors such as T40/2, T40/1 that stay are carried out to the foundation of place of production model, and its sensor is rejected the modeling effect of front and back in Table 11.

Originate from regard to less-than-ideal model QHJ-QLJ-QWJ(colony's seeds elaboration tealeaves that tiger runs Hou Shan, Long Wu and these three places, Weng Jia mountain for full spectrum modeling effect itself), after sensor is selected, the whole discrimination of calibration set and forecast set all brings up to 79.59% and 69.39% from 71.43%, 67.35% respectively.Before and after though sensor is rejected, the calibration set discrimination 93.85% of model LYJ-LWJ, the differentiation effect of forecast set brings up to 90.91% from 87.88%.After the sensor number of institute's established model reduces to 14, although that the prediction effect of model LHT-LMT does not reach is original 100%, also have 96.97%, all surpass 95%, fully meet and apply.

Figure 2013103233189100002DEST_PATH_IMAGE019

In this model, the principle of application variance analysis, be used as a factor to every sensor, and level is used as in the response of different samples, carries out homogeneity test of variance, and table 12 is place of production model LHT-LMT, LWJ-LYJ F checks to sensor LY2/LG and TA/2.F due to these 2 models _0.05=3.84, so these two sensors are not remarkable to the place of production differentiation of these 2 models, therefore can in these 2 models, reject these two sensors.

Simultaneously at source in the load diagram (Figure 14 (a)) of sample data after principal component analysis (PCA) of model LHT-LMT, PA/2, P30/1 respectively with red mark separately and blue mark in the effect of other two sensors close, by the effect of Combinatorial Optimization, reject these two sensors in model LHT-LMT.Known in the load diagram of model LYJ-LWJ according to same principle, PA/2 and T70/2 are more approaching in load diagram, belong to the sensor that plays similar effect; P30/1 and P40/2 are very approaching, and similar Loading value is arranged, and belong to the sensor of similar effect, therefore also reject this two sensors.

6.3 the sensor in the seeds model is selected to optimize with screening and is analyzed

(1) the calibration set forecast set sample of seeds model is divided

In order to guarantee the comparability of seeds model, this research in mainly for the different tree species tealeaves model under the same production region conditions of same grade.In 617 gathered Tea Samples, two seeds models are arranged: (1) originates from the Dragon Well tea 43#(LMT of the superfine tealeaves in Mei Jia depressed place) and colony seeds (QMT); (2) originate from the Dragon Well tea 43#(LWJ of Weng Jia mountain elaboration tealeaves) and colony seeds (QWJ).To the sample in each model, all select at random 2/3rds to make the calibration set sample, remaining 1/3rd as the forecast set sample, and concrete sample distribution is as shown in table 13.

Figure 2013103233189100002DEST_PATH_IMAGE021

(2) Electronic Nose of seeds model response collection of illustrative plates and principal component analysis (PCA)

Figure 15 is the average response collection of illustrative plates of seeds separately in two seeds models, is difficult to directly differentiation seeds model separately in collection of illustrative plates.In the major component figure of Figure 16, because the samples of all kinds of seeds presents overlapping phenomenon, can not carry out intuitively the seeds judgement.

(3) sensor in the seeds model is selected

Sensor to seeds model (LMT-QMT) responds collection of illustrative plates after 3 take turns genetic algorithm, find that the frequency that five sensor LY2/AA, LY2/GH, LY2/gCT, T30/1, TA/2 be used in each genetic process is minimum, therefore reject this five sensors, these 13 sensors such as the LY2/LG, the LY2/G that stay, LY2/gCTL, P10/1, P10/2, P40/1, T70/2, PA/2, P30/1, P40/2, P30/2, T40/2, T40/1 are carried out to the foundation of different tree species model, and the modeling effect of its sensor before and after rejecting is in Table 14.13 sensor arraies of employing function admirable are built Dragon Well tea 43# and colony's seeds model of the superfine tealeaves in Mei Jia depressed place, its whole discrimination increases, not only calibration set brings up to 96.92% from 95.38%, and forecast set brings up to 96.97% from 93.94%, very approaching with the discrimination of calibration set, absolutely prove that this model is highly stable.

Sensor to seeds model (LWJ-QWJ) responds collection of illustrative plates after 3 take turns genetic algorithm, find that the frequency that four sensor P10/1, P40/1, T40/1, TA/2 be used in each genetic process is minimum, therefore reject this four sensors, LY2/LG, LY2/G, LY2/AA, LY2/GH, LY2/gCTL, LY2/gCT, T30/1, P10/2, T70/2, PA/2, P30/1, P40/2, these 14 sensors such as P30/2, T40/2 that stay are carried out to the foundation of different tree species model, and its sensor is rejected the modeling effect of front and back in Table 14.As seen from table, although sensor reduces to 14 from 18, the estimated performance of this seeds model does not become, and calibration set and forecast set still keep respectively original 92.31% and 93.34%.

Figure 2013103233189100002DEST_PATH_IMAGE022

Carry out respectively one-way analysis of variance by the sample sensor data to each seeds model, the discrimination of these five sensors that discovery is rejected in seeds model (LMT-QMT) is all very little, and its F value all is less than F _0.05=3.84(table 15); Inapparent four sensors of all discriminations (table 16) have been rejected in seeds model (LWJ-QWJ).

For the different models of this three class of grade, the place of production and seeds, its raw data difference, model property is also different, after therefore adopting genetic algorithm, also different for the number of sensors of each self-modeling.All Grade Models sensor number used is all 15; In the model of the place of production, the superfine tealeaves of colony's kind reduces to 7 at the sensor number of He Meijia depressed place, red bayberry ridge two places production models (LHT-LMT), and other three place of production models (LHT-LMT, LYJ-LWJ, QHJ-QLJ-QWJ) are all 14; In the seeds model, Dragon Well tea 43# and colony seeds model (LMT-QMT) the sensor number used of the superfine tealeaves in Mei Jia depressed place is 13, and the Dragon Well tea 43# of Weng Jia mountain elaboration tealeaves and colony seeds model (LWJ-QWJ) are 14 sensors.

Utilize the characteristic of genetic algorithm parallel optimization and global convergence in the present invention, the method is applied in to Electronic Nose to be analyzed in the modeling sensor screening of tea leaf quality, not only make that the modeling number of sensors is effectively reduced, simplified model, reduced the requirement of instrument to number of sensors, saving resource, saving instrument cost; And keep or further improved precision of prediction, obtained result preferably.

Claims

1. the system of selection of sensor in the Longjing tea Quality Detection place of production model based on genetic algorithm, is characterized in that: comprise the steps:

The sample of the calibration set forecast set of the Grade Model of A West Lake Dragon Well tea is divided;

The Electronic Nose response diagram analysis of spectrum of B different brackets tealeaves;

The principal component scores analysis of trend of all grade samples of C;

The major component loading analysis of all grade samples of D;

The number of principal components that E sets up the grade modeling according to the similarity classification method is selected;

The Grade Model of the similarity classification method of F tealeaves is set up and prediction;

2. system of selection according to claim 1, it is characterized in that the sensor of place of production model is responded to collection of illustrative plates after 3 take turns genetic algorithm, for the place of production, He Meijia depressed place, red bayberry ridge model, select LY2/G, LY2/AA, T30/1, P10/1, P40/1, T70/2, these seven sensors of PA/2, reject LY2/LG, LY2/GH, LY2/gCTL, LY2/gCT, P10/2, P30/1, P40/2, P30/2, T40/2, T40/1, the low sensor of this 11 velamen frequency of utilization of TA/2.

3. system of selection according to claim 1, it is characterized in that the sensor of place of production model is responded to collection of illustrative plates after 3 take turns genetic algorithm, run Hou Shan, Long Wu and father-in-law family's mountain products ground model for tiger, the frequency that discovery LY2/LG, PA/2, P30/1, tetra-sensors of TA/2 are used in each genetic process is minimum, therefore rejects this four sensors; LY2/G, LY2/AA, LY2/GH, LY2/gCTL, LY2/gCT, T30/1, P10/1, P10/2, P40/1, T70/2, P40/2, P30/2, these 14 sensors of T40/2, T40/1 that stay are carried out to the foundation of three place of production models).