CN116539553A - Method for improving robustness of near infrared spectrum model - Google Patents

Method for improving robustness of near infrared spectrum model Download PDF

Info

Publication number
CN116539553A
CN116539553A CN202310519468.0A CN202310519468A CN116539553A CN 116539553 A CN116539553 A CN 116539553A CN 202310519468 A CN202310519468 A CN 202310519468A CN 116539553 A CN116539553 A CN 116539553A
Authority
CN
China
Prior art keywords
spectrum
near infrared
data
infrared spectrum
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310519468.0A
Other languages
Chinese (zh)
Inventor
张翼鹏
颜克亮
凌军
朱保昆
陈微
张伟
曾仲大
文里梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Tobacco Yunnan Industrial Co Ltd
Original Assignee
China Tobacco Yunnan Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Tobacco Yunnan Industrial Co Ltd filed Critical China Tobacco Yunnan Industrial Co Ltd
Priority to CN202310519468.0A priority Critical patent/CN116539553A/en
Publication of CN116539553A publication Critical patent/CN116539553A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/58Extraction of image or video features relating to hyperspectral data

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention discloses a method for improving the robustness of a near infrared spectrum model, which comprises the following steps: step 1: removing noise in a spectrum by adopting first-order derivation, and improving the signal-to-noise ratio of the spectrum and enhancing the division of overlapping peaks; step 2: the spectrum difference caused by different scattering levels is eliminated by adopting multi-element scattering correction, the correlation between spectrums is enhanced, and the baseline translation and offset phenomena of spectrum data are corrected; step 3: an automatic scaling method is adopted to eliminate spectrum dimension and enhance data comparability; step 4: selecting a characteristic variable from the processed spectrum data by adopting a random frog-leaping algorithm; step 5: and constructing a near infrared model by using the selected characteristic variables. The method can select fewer near infrared spectrum characteristic variables on the basis of not affecting modeling accuracy, and can effectively enhance the robustness of a near infrared model.

Description

Method for improving robustness of near infrared spectrum model
Technical Field
The invention discloses a method for improving the robustness of a near infrared spectrum model, belongs to the field of data science, and particularly relates to a method for selecting fewer spectrum characteristic signals to construct a more robust near infrared spectrum model.
Background
The near infrared technology is widely applied due to the advantages of rapidness, low cost, high precision and the like. However, the near infrared spectrum collects the spectrum information of the sample in a certain larger wavelength range, and only part of the spectrum of the wavelength expresses the characteristic information to be detected of the sample, so that the robustness of the near infrared spectrum model can be greatly improved if modeling can be performed only by using the wavelength spectrums which can express or sufficiently express the characteristic information of the detected sample.
If the near infrared spectral feature variables used for modeling are selected improperly, the robustness of the near infrared spectrum is affected. For example: if the selected characteristic variables are too many, the near infrared spectrum model is affected by a noise characteristic spectrum which cannot characterize the characteristic information to be detected, so that the model is under-fitted; if fewer feature variables are selected, then the near infrared spectral model is over-fitted because some spectral feature influencing factors are not considered.
The invention is proposed for this purpose.
Disclosure of Invention
The invention aims to provide a method for improving the robustness variable screening of a near infrared spectrum model. The method for improving the robustness of the near infrared spectrum model is an SSAS (Savitzky golay+ Multiple Scatter Correction +auto scaling+ Shuffled frog leaping algorithm) method. The method adopts first-order derivation to remove noise in the spectrum, improves the signal-to-noise ratio of the spectrum and enhances the division of overlapping peaks; the spectrum difference caused by different scattering levels is eliminated by adopting multi-element scattering correction, the correlation between spectrums is enhanced, and the baseline translation and offset phenomena of spectrum data are corrected; an automatic scaling method is adopted to eliminate spectrum dimension and enhance data comparability; and finally, selecting a near infrared spectrum modeling variable by adopting a random frog-leaping algorithm. According to the invention, on the basis of not affecting modeling accuracy, fewer near infrared spectrum characteristic variables are selected, and the robustness of the near infrared model can be effectively enhanced.
The invention adopts the technical scheme that:
a method for improving the robustness of a near infrared spectrum model, comprising the steps of: after the near infrared spectrum is properly preprocessed by adopting first-order derivation, multi-element scattering correction and maximum and minimum rules, a near infrared spectrum characteristic is selected by adopting a random frog-leaping method to construct a near infrared spectrum model; the method specifically comprises the following steps:
step 1: removing noise in a spectrum by adopting first-order derivation, and improving the signal-to-noise ratio of the spectrum and enhancing the division of overlapping peaks;
step 2: the spectrum difference caused by different scattering levels is eliminated by adopting multi-element scattering correction, the correlation between spectrums is enhanced, and the baseline translation and offset phenomena of spectrum data are corrected;
step 3: an automatic scaling method is adopted to eliminate spectrum dimension and enhance data comparability;
step 4: selecting a characteristic variable from the processed spectrum data by adopting a random frog-leaping algorithm;
step 5: and constructing a near infrared model by using the selected characteristic variables.
Preferably, the specific method of step 1 is as follows: the standard normal variable change is used for eliminating near infrared data affected by near infrared diffuse reflection, a first-order derivative method is adopted for carrying out smooth filtering on near infrared spectrum data, and interference of noise data is reduced; first order derivation method adoptedIs an improvement based on a mobile smoothing algorithm, wherein the solution of a matrix operator is specifically as follows: setting the filter window length n=2m+1, and measuring points in the window as x= (-m, -m+1, …, -1,0,1, …, m-1, m), fitting the n data points by using a k-1 (k < n) th order polynomial shown in the following formula, and f (x) =a 0 +a 1 x+a 2 x 2 +…+a k-1 x k-1 The method comprises the steps of carrying out a first treatment on the surface of the For n points in the window, a k-element linear equation set consisting of n equations is formed, and the parameter A= { a of the polynomial is determined through least square fitting 0 ,a 1 ,…,a k-1 And processing the spectral data using the multiple forms to eliminate noise interference of the spectral data.
Preferably, the specific method of step 2 is as follows:
(1) For each wavelength point of spectrum data for modeling, a corresponding average value is obtained, an ideal spectrum is constructed, and the calculation formula is as follows:wherein (1)>J's epsilon {1,2, …, m } eigenvalues representing "ideal spectrum", the eigenvalues of the near infrared spectrum of m, n being the number of near infrared spectra used for modeling; spec Spec ij For the ith e {1,2, …, n } strip spectrum Spec i J e {1,2, …, m } eigenvalues;
(2) Based on each spectral data Spec for modeling i i.e {1,2, …, n } and "ideal spectrum"Performing unitary linear regression to obtain each spectrum Spec for modeling i And the ideal spectrum>The regression results are shown in the following formula: />Wherein k is i And b i I < th > e {1,2, …, n } strip spectrum Spec, respectively i And the ideal spectrum>A baseline shift amount and an offset amount from the unitary linear regression;
(3) Based on the baseline shift and offset obtained in step (2), each spectral data Spec for modeling is separately obtained i i.e {1,2, …, n } is corrected as follows:wherein Spec is i(MSC) For near infrared spectrum data Spec i i.e {1,2, …, n } is corrected for spectral data by multivariate scattering.
The purpose of carrying out scattering correction on a plurality of near infrared spectrums for constructing a near infrared spectrum model is to correct the phenomenon of translation and offset of a spectrum data base line and eliminate spectrum differences among spectrums, which are caused by different experimental conditions and the like.
Preferably, the specific method of step 3 is as follows: the automatic scaling method is used as follows:wherein x is i The absorbance of the ith wave number of the near infrared spectrum to be treated is that n is the characteristic variable number of the near infrared spectrum, x' i ∈[0,1](i.epsilon. {1,2, …, n }) dimensionless,/i ∈>Absorbance mean of near infrared spectrum data, +.>The standard deviation of absorbance of the external spectrum data is the final { x' i And (i.e {1,2, …, n }) is the pre-processed near infrared spectrum data.
The present invention uses an automatic scaling method to eliminate the dimension of the spectra to enhance the comparability between the spectra.
Preferably, the specific method of step 4 is as follows: randomly generating an initial set of variables of the near infrared spectrum containing Q epsilon {1,2, …, n } (n is the length of the near infrared spectrum), denoted as V 0 Wherein the length of the n near infrared spectrum; assuming that the current iteration number is i= {0,1,2, … }, the spectral eigenvalue number of this iteration is Q i The near infrared spectrum characteristic variable set is marked as V i Iterating according to the following steps;
(a) According to N (Q) i ,θ×Q i ) Generates a random number rand from the probability distribution of (a) i Record Q i+1 =[rand i ]Wherein θ is a value of [0,1 ]]Positive real numbers within the range, N (Q i ,θ×Q i ) Can ensure that when the characteristic variable number Q is selected i When larger, Q i+1 And Q is equal to i The greater the likelihood of a larger value difference; conversely, Q i+1 And Q is equal to i The greater the likelihood of a value difference, the less;
(b) If Q i+1 =Q i V is then i+1 =V i The method comprises the steps of carrying out a first treatment on the surface of the If Q i+1 <Q i Then utilize the characteristic variable set V of the spectrum data i Constructing a PLS model, sorting the absolute values of the characteristic variable coefficients in the PLS model from large to small, and selecting the previous Q i+1 The individual characteristic variables form a characteristic variable set V i+1 The method comprises the steps of carrying out a first treatment on the surface of the If Q i+1 >Q i Then from the set V-V i W (Q) i+1 -Q) feature variables, denoted W i Where V is the set of all spectral features, w > 1, and when w (Q i+1 -Q) > n-Q, W i =V-V i Using the set of characteristic variables V of the spectral data i +W i Constructing a PLS model, sorting the absolute values of the characteristic variable coefficients in the PLS model from large to small, and selecting the previous Q i+1 The individual characteristic variables form a characteristic variable set V i+1
(c) Repeating the above steps until k times of circulation to obtain k+1 spectrum characteristic feature sets V A ={V 0 ,V 1 ,V 2 ,…,V k -a }; calculating each spectral feature v i (i.epsilon. {1,2, …, n }) at V A The frequency of occurrence of (a) is denoted as p i Selecting p therein i ≥p(p∈[0,1]) As a set of characteristic spectra that are ultimately used for near infrared modeling.
The invention has the beneficial effects that:
compared with the prior art, the method for improving the robustness of the near infrared spectrum model can select fewer near infrared spectrum characteristic variables for modeling on the basis of not affecting modeling accuracy, and can effectively enhance the robustness of the near infrared spectrum model.
Drawings
FIG. 1 is a schematic diagram of the steps of the method for improving the robustness of a near infrared spectrum model according to the present invention.
FIG. 2 is a plot of raw data of near infrared spectra used for model construction in the examples.
FIG. 3 is a plot of raw data of near infrared spectrum for model verification in an example.
FIG. 4 is a plot of near infrared spectrum data used for model construction after first order derivation to remove noise from the spectrum in the example.
FIG. 5 is a plot of near infrared spectral data used in the modeling of the example after removal of noise from the spectrum by first order derivation and elimination of the level of due scatter by multiple scatter correction.
FIG. 6 is a plot of near infrared spectral data used in the model construction of the example, after removing noise in the spectrum by first order derivation, eliminating the level of scattering due to multi-component scattering correction, and eliminating the dimension of the spectrum by an automatic scaling method.
FIG. 7 shows the distribution of the number of spectral feature variables selected according to different near infrared spectrum variable selection schemes in the embodiment.
FIG. 8 is a plot of near infrared spectral data for model verification after removal of noise from the spectrum by first order derivation, according to an embodiment.
FIG. 9 is a plot of near infrared spectral data for model verification in an embodiment, after first order derivation to remove noise from the spectrum, and multiple scatter correction to eliminate due scatter levels.
FIG. 10 is a plot of near infrared spectral data for model verification in an embodiment, after removing noise in the spectrum by first order derivation, eliminating the level of scattering due to multi-component scattering correction, and eliminating the dimension of the spectrum by an automatic scaling method.
FIG. 11 shows a model Q constructed according to various near infrared spectrum variable selection schemes 2 A distribution situation map.
Detailed Description
The invention is described in further detail below with reference to the attached drawings and specific examples:
examples
Data: the near infrared spectrum data of 655 flue-cured tobacco samples in different areas, parts and varieties are adopted. Uniformly selecting 524 sample data as a model training set according to a PCA Score matrix by adopting a Kennerd-Stone (PCA-Score, KS) algorithm based on the PCA Score, as shown in FIG. 2; the remaining 131 sample data were used as a model test set, as shown in fig. 3.
The selected 524 sample data is processed as a model training set case by the following steps:
after the near infrared spectrum is properly preprocessed by adopting first-order derivation, multi-element scattering correction and maximum and minimum rules, a near infrared spectrum characteristic is selected by adopting a random frog-leaping method to construct a near infrared spectrum model; the method specifically comprises the following steps:
step 1: removing noise in a spectrum by adopting first-order derivation, and improving the signal-to-noise ratio of the spectrum and enhancing the division of overlapping peaks;
step 2: the spectrum difference caused by different scattering levels is eliminated by adopting multi-element scattering correction, the correlation between spectrums is enhanced, and the baseline translation and offset phenomena of spectrum data are corrected;
step 3: an automatic scaling method is adopted to eliminate spectrum dimension and enhance data comparability;
step 4: selecting a characteristic variable from the processed spectrum data by adopting a random frog-leaping algorithm;
step 5: and constructing a near infrared model by using the selected characteristic variables.
The specific method of the step 1 is as follows: the standard normal variable change is used for eliminating near infrared data affected by near infrared diffuse reflection, a first-order derivative method is adopted for carrying out smooth filtering on near infrared spectrum data, and interference of noise data is reduced; the first-order derivation method is an improvement based on a mobile smoothing algorithm, wherein the solution of a matrix operator is specifically as follows: setting the filter window length n=2m+1, and measuring points in the window as x= (-m, -m+1, …, -1,0,1, …, m-1, m), fitting the n data points by using a k-1 (k < n) th order polynomial shown in the following formula, and f (x) =a 0 +a 1 x+a 2 x 2 +…+a k-1 x k-1 The method comprises the steps of carrying out a first treatment on the surface of the For n points in the window, a k-element linear equation set consisting of n equations is formed, and the parameter A= { a of the polynomial is determined through least square fitting 0 ,a 1 ,…,a k-1 And processing the spectral data using the multiple forms to eliminate noise interference of the spectral data.
The specific method in the step 2 is as follows:
(1) For each wavelength point of spectrum data for modeling, a corresponding average value is obtained, an ideal spectrum is constructed, and the calculation formula is as follows:wherein (1)>J's epsilon {1,2, …, m } eigenvalues representing "ideal spectrum", the eigenvalues of the near infrared spectrum of m, n being the number of near infrared spectra used for modeling; spec Spec ij For the ith e {1,2, …, n } strip spectrum Spec i J e {1,2, …, m } eigenvalues;
(2) Based on each spectral data Spec for modeling i i.e {1,2, …, n } and "ideal spectrum"Performing unitary linear regression to obtain each spectrum Spec for modeling i And the ideal spectrum>The regression results are shown in the following formula: />Wherein k is i And b i I < th > e {1,2, …, n } strip spectrum Spec, respectively i And the ideal spectrum>A baseline shift amount and an offset amount from the unitary linear regression;
(3) Based on the baseline shift and offset obtained in step (2), each spectral data Spec for modeling is separately obtained i i.e {1,2, …, n } is corrected as follows:wherein Spec is i(MSC) For near infrared spectrum data Spec i i.e {1,2, …, n } is corrected for spectral data by multivariate scattering.
The purpose of carrying out scattering correction on a plurality of near infrared spectrums for constructing a near infrared spectrum model is to correct the phenomenon of translation and offset of a spectrum data base line and eliminate spectrum differences among spectrums, which are caused by different experimental conditions and the like.
The specific method in the step 3 is as follows: the automatic scaling method is used as follows:wherein x is i The absorbance of the ith wave number of the near infrared spectrum to be treated is that n is the characteristic variable number of the near infrared spectrum, x i '∈[0,1](i.epsilon. {1,2, …, n }) dimensionless,/i ∈>Absorbance mean of near infrared spectrum data, +.>Is the standard deviation of absorbance of the external spectrum data, and finally { x } i ' i epsilon {1,2, …, n } is the pre-processed near infrared spectral data.
The present invention uses an automatic scaling method to eliminate the dimension of the spectra to enhance the comparability between the spectra.
The specific method in the step 4 is as follows: randomly generating an initial set of variables of the near infrared spectrum containing Q epsilon {1,2, …, n } (n is the length of the near infrared spectrum), denoted as V 0 Wherein the length of the n near infrared spectrum; assuming that the current iteration number is i= {0,1,2, … }, the spectral eigenvalue number of this iteration is Q i The near infrared spectrum characteristic variable set is marked as V i Iterating according to the following steps;
(a) According to N (Q) i ,θ×Q i ) Generates a random number rand from the probability distribution of (a) i Record Q i+1 =[rand i ]Wherein θ is a value of [0,1 ]]Positive real numbers within the range, N (Q i ,θ×Q i ) Can ensure that when the characteristic variable number Q is selected i When larger, Q i+1 And Q is equal to i The greater the likelihood of a larger value difference; conversely, Q i+1 And Q is equal to i The greater the likelihood of a value difference, the less;
(b) If Q i+1 =Q i V is then i+1 =V i The method comprises the steps of carrying out a first treatment on the surface of the If Q i+1 <Q i Then utilize the characteristic variable set V of the spectrum data i Constructing a PLS model, sorting the absolute values of the characteristic variable coefficients in the PLS model from large to small, and selecting the previous Q i+1 The individual characteristic variables form a characteristic variable set V i+1 The method comprises the steps of carrying out a first treatment on the surface of the If Q i+1 >Q i Then from the set V-V i W (Q) i+1 -Q) feature variables, denoted W i Where V is the set of all spectral features, w > 1, and when w (Q i+1 -Q) > n-Q, W i =V-V i Using the set of characteristic variables V of the spectral data i +W i Constructing a PLS model, sorting the absolute values of the characteristic variable coefficients in the PLS model from large to small, and selecting the previous Q i+1 Individual characteristic variablesConstitute feature variable set V i+1
(c) Repeating the above steps until k times of circulation to obtain k+1 spectrum characteristic feature sets V A ={V 0 ,V 1 ,V 2 ,…,V k -a }; calculating each spectral feature v i (i.epsilon. {1,2, …, n }) at V A The frequency of occurrence of (a) is denoted as p i Selecting p therein i ≥p(p∈[0,1]) As a set of characteristic spectra that are ultimately used for near infrared modeling.
The results of 524 modeling spectral data after removing noise from the spectrum by first order derivation are shown in fig. 4. The results of 524 modeling spectral data after removing noise in the spectrum by first order derivation and eliminating the due scatter level by multivariate scatter correction are shown in fig. 5. The results of 524 modeling spectrum data after removing noise in the spectrum, correcting and removing scattering level due to multi-component scattering through first-order derivation and removing spectrum dimension through an automatic scaling method are shown in fig. 6.
After removing noise in the spectrum, correcting and eliminating the scattering level due to multi-element scattering, and eliminating the dimension of the spectrum by an automatic scaling method through first-order derivation, adopting a spectral feature screening scheme shown in table 1, and selecting different spectral feature values.
Table 1 different near infrared spectral signature screening schemes
For the 524 model training sets spectrum data for modeling, after all near infrared spectrum data are properly preprocessed through first-order derivation, multi-element scattering correction and maximum and minimum rules, 8 schemes shown in table 1 are adopted to screen spectrum characteristics, and the screened results are shown in table 2 and fig. 7. It can be seen that the near infrared spectrum data screened by the SSAS characteristic screening method provided by the invention is obviously lower than the near infrared spectrum characteristic quantity screened by other methods.
And (3) respectively constructing a total sugar, reducing sugar, total nitrogen, potassium, chlorine and nicotine model based on the spectral characteristics screened by 8 different screening schemes.
TABLE 2 quantity of spectral feature variables screened with different schemes
Total sugar Reducing sugar Total nitrogen Potassium Chlorine Nicotine Average of
N-Selection 1557 1557 1557 1557 1557 1557 1557
UVE 581 704 785 579 574 634 643
MC-UVE 481 493 742 527 565 439 541
SSAS 213 116 244 456 558 44 272
Range 678 481 754 1493 845 754 834
Range-UVE 882 899 1003 1525 996 985 1048
Range-MC-UVE 823 774 980 1527 968 893 994
Range-SSAS 781 601 858 1531 1060 781 935
The spectral characteristics pre-screening method shown in Table 2 is adopted to screen the near infrared spectral data characteristics, and the total sugar, the reducing sugar, the total nitrogen, the potassium, the chlorine and the nicotine models are respectively constructed.
And carrying out first-order derivation, multi-element scattering correction and automatic scaling pretreatment on the 131 model test data. The result of the 131 model verification spectrum data after the noise in the spectrum is removed through the first order derivative is shown in fig. 8. The results of the 131 model verification spectrum data after removing noise and multivariate scattering correction in the spectrum and eliminating the level of the due scattering by first order derivation are shown in fig. 9. The results of the 131 model verification spectrum data after removing the noise in the spectrum, the multi-element scattering correction and elimination of the scattering level and the automatic scaling method and eliminating the spectrum dimension are shown in fig. 10. The constructed total sugar, reducing sugar, total nitrogen, potassium, chlorine and nicotine models are evaluated by finally using 131 model tester spectrum data, and finally Q of all models is obtained 2 The index statistical analysis is shown in table 3 and fig. 11.
TABLE 3 evaluation of all near infrared models (Q 2 )
Total sugar Reducing sugar Total nitrogen Potassium Chlorine Nicotine Average of
N-Selection 0.956 0.925 0.79 0.931 0.953 0.81 0.894
UVE 0.963 0.926 0.788 0.945 0.958 0.821 0.900
MC-UVE 0.964 0.927 0.793 0.953 0.955 0.821 0.902
SSAS 0.961 0.930 0.781 0.959 0.944 0.829 0.901
Range 0.965 0.916 0.814 0.921 0.96 0.81 0.898
Range-UVE 0.963 0.928 0.792 0.931 0.954 0.826 0.899
Range-MC-UVE 0.964 0.923 0.800 0.931 0.952 0.822 0.899
Range-SSAS 0.964 0.929 0.789 0.932 0.942 0.820 0.896
Statistics (CV) 0.31% 0.43% 1.26% 1.39% 0.63% 0.85% --
The specific definition of the model evaluation index Q2 in table 3 is shown in the following formula.
Wherein: n is the number of samples for model verification (n=131 in this embodiment), pre i Model predictive value, act, for the ith sample i For the actual value of the i-th sample,is the average of the actual values of all samples.
The statistical value (CV) in Table 3 is a coefficient of variation, and is specifically defined as shown in the following formula.
Wherein sigma is a standard deviation average value, mu is an average value, and CV values represent the size fluctuation condition among data.
As shown in Table 3, after near infrared feature screening is performed by adopting different schemes, the difference of model evaluation values constructed for tobacco leaf substances such as total sugar, reducing sugar, total nitrogen, potassium, chlorine and nicotine in tobacco leaf samples is very small (CV is less than or equal to 3%), and even in some tobacco leaf substance prediction models, the difference of modeling result evaluation values of different spectral feature vector screening schemes is less than 1%, which indicates that the method (SSAS) for improving the robustness of the near infrared spectrum model does not influence the model quality of the near infrared spectrum.
The number of spectral feature variables screened in the different models using the different spectral data feature value mass spectrum screening schemes shown in table 1 is shown in table 2. As shown in Table 2, the number of spectral feature variables screened by the method (SSAS) for improving the robustness of the near infrared spectrum model is obviously smaller than that of other feature variables under the condition of not influencing the modeling effect.
In summary, the method (SSAS) of the present invention does not affect the quality of the near infrared spectrum model, and the near infrared spectrum model is more robust because fewer spectral feature variables are selected.
The examples are given solely for the preferred embodiments of the present invention and are not intended to limit the invention thereto, since various modifications and variations will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A method for improving the robustness of a near infrared spectrum model, comprising the steps of:
step 1: removing noise in a spectrum by adopting first-order derivation, and improving the signal-to-noise ratio of the spectrum and enhancing the division of overlapping peaks;
step 2: the spectrum difference caused by different scattering levels is eliminated by adopting multi-element scattering correction, the correlation between spectrums is enhanced, and the baseline translation and offset phenomena of spectrum data are corrected;
step 3: an automatic scaling method is adopted to eliminate spectrum dimension and enhance data comparability;
step 4: selecting a characteristic variable from the processed spectrum data by adopting a random frog-leaping algorithm;
step 5: and constructing a near infrared model by using the selected characteristic variables.
2. The method according to claim 1, wherein the specific method of step 1 is as follows: the standard normal variable change is used for eliminating near infrared data affected by near infrared diffuse reflection, a first-order derivative method is adopted for carrying out smooth filtering on near infrared spectrum data, and interference of noise data is reduced; the first-order derivation method is an improvement based on a mobile smoothing algorithm, wherein the solution of a matrix operator is specifically as follows: setting the filter window length n=2m+1, and measuring points in the window as x= (-m, -m+1, …, -1,0,1, …, m-1, m), fitting the n data points by using a k-1 (k < n) th order polynomial shown in the following formula, and f (x) =a 0 +a 1 x+a 2 x 2 +…+a k-1 x k-1 The method comprises the steps of carrying out a first treatment on the surface of the For n points in the window, a k-element linear equation set consisting of n equations is formed, and the parameter A= { a of the polynomial is determined through least square fitting 0 ,a 1 ,…,a k-1 And use the polymorphic pairsThe spectral data is processed to eliminate noise interference of the spectral data.
3. The method according to claim 1, wherein the specific method of step 2 is as follows:
(1) For each wavelength point of spectrum data for modeling, a corresponding average value is obtained, an ideal spectrum is constructed, and the calculation formula is as follows:wherein (1)>J's epsilon {1,2, …, m } eigenvalues representing "ideal spectrum", the eigenvalues of the near infrared spectrum of m, n being the number of near infrared spectra used for modeling; spec Spec ij For the ith e {1,2, …, n } strip spectrum Spec i J e {1,2, …, m } eigenvalues;
(2) Based on each spectral data Spec for modeling i i.e {1,2, …, n } and "ideal spectrum"Performing unitary linear regression to obtain each spectrum Spec for modeling i And the ideal spectrum>The regression results are shown in the following formula: />Wherein k is i And b i I < th > e {1,2, …, n } strip spectrum Spec, respectively i And the ideal spectrum>A baseline shift amount and an offset amount from the unitary linear regression;
(3) Based on the baseline shift and offset obtained in step (2), each spectral data Spec for modeling is separately obtained i i.e {1,2, …, n } is corrected as follows:wherein Spec is i(MSC) For near infrared spectrum data Spec i i.e {1,2, …, n } is corrected for spectral data by multivariate scattering.
4. The method according to claim 1, wherein the specific method of step 3 is as follows: the automatic scaling method is used as follows:wherein x is i The absorbance of the ith wave number of the near infrared spectrum to be treated is that n is the characteristic variable number of the near infrared spectrum, x' i ∈[0,1](i.epsilon. {1,2, …, n }) dimensionless,/i ∈>Is the absorbance average value of the near infrared spectrum data,the standard deviation of absorbance of the external spectrum data is the final { x' i And (i.e {1,2, …, n }) is the pre-processed near infrared spectrum data.
5. The method according to claim 1, wherein the specific method of step 4 is as follows: randomly generating an initial set of variables of the near infrared spectrum containing Q epsilon {1,2, …, n } (n is the length of the near infrared spectrum), denoted as V 0 Wherein the length of the n near infrared spectrum; assuming that the current iteration number is i= {0,1,2, … }, the spectral eigenvalue number of this iteration is Q i The near infrared spectrum characteristic variable set is marked as V i Iterating according to the following steps;
(a) According to N (Q) i ,θ×Q i ) Generates a random number rand from the probability distribution of (a) i Record Q i+1 =[rand i ]Wherein θ is a value of [0,1 ]]Positive real numbers within the range, N (Q i ,θ×Q i ) Can ensure that when the characteristic variable number Q is selected i When larger, Q i+1 And Q is equal to i The greater the likelihood of a larger value difference; conversely, Q i+1 And Q is equal to i The greater the likelihood of a value difference, the less;
(b) If Q i+1 =Q i V is then i+1 =V i The method comprises the steps of carrying out a first treatment on the surface of the If Q i+1 <Q i Then utilize the characteristic variable set V of the spectrum data i Constructing a PLS model, sorting the absolute values of the characteristic variable coefficients in the PLS model from large to small, and selecting the previous Q i+1 The individual characteristic variables form a characteristic variable set V i+1 The method comprises the steps of carrying out a first treatment on the surface of the If Q i+1 >Q i Then from the set V-V i W (Q) i+1 -Q) feature variables, denoted W i Where V is the set of all spectral features, w > 1, and when w (Q i+1 -Q) > n-Q, W i =V-V i Using the set of characteristic variables V of the spectral data i +W i Constructing a PLS model, sorting the absolute values of the characteristic variable coefficients in the PLS model from large to small, and selecting the previous Q i+1 The individual characteristic variables form a characteristic variable set V i+1
(c) Repeating the above steps until k times of circulation to obtain k+1 spectrum characteristic feature sets V A ={V 0 ,V 1 ,V 2 ,…,V k -a }; calculating each spectral feature v i (i.epsilon. {1,2, …, n }) at V A The frequency of occurrence of (a) is denoted as p i Selecting p therein i ≥p(p∈[0,1]) As a set of characteristic spectra that are ultimately used for near infrared modeling.
CN202310519468.0A 2023-05-10 2023-05-10 Method for improving robustness of near infrared spectrum model Pending CN116539553A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310519468.0A CN116539553A (en) 2023-05-10 2023-05-10 Method for improving robustness of near infrared spectrum model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310519468.0A CN116539553A (en) 2023-05-10 2023-05-10 Method for improving robustness of near infrared spectrum model

Publications (1)

Publication Number Publication Date
CN116539553A true CN116539553A (en) 2023-08-04

Family

ID=87448447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310519468.0A Pending CN116539553A (en) 2023-05-10 2023-05-10 Method for improving robustness of near infrared spectrum model

Country Status (1)

Country Link
CN (1) CN116539553A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117269109A (en) * 2023-11-23 2023-12-22 中国矿业大学(北京) Method for detecting chloride ion content in concrete structure based on near infrared spectrum

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117269109A (en) * 2023-11-23 2023-12-22 中国矿业大学(北京) Method for detecting chloride ion content in concrete structure based on near infrared spectrum
CN117269109B (en) * 2023-11-23 2024-02-23 中国矿业大学(北京) Method for detecting chloride ion content in concrete structure based on near infrared spectrum

Similar Documents

Publication Publication Date Title
Karimi et al. Detection and quantification of food colorant adulteration in saffron sample using chemometric analysis of FT-IR spectra
CN105928901B (en) A kind of near-infrared quantitative model construction method that qualitative, quantitative combines
CN110907393B (en) Method and device for detecting saline-alkali stress degree of plants
CN116539553A (en) Method for improving robustness of near infrared spectrum model
JP2015503763A5 (en)
CN116701845A (en) Aquatic product quality evaluation method and system based on data processing
CN113237836A (en) Flue-cured tobacco leaf moisture content estimation method based on hyperspectral image
CN109060716B (en) Near-infrared characteristic spectrum variable selection method based on window competitive self-adaptive re-weighting sampling strategy
CN109839362B (en) Infrared spectrum quantitative analysis method based on progressive denoising technology
US20080154549A1 (en) Noise-Component Removing Method
CN114417937A (en) Deep learning-based Raman spectrum denoising method
CN113076692B (en) Method for inverting nitrogen content of leaf
WO2022001829A1 (en) Near-infrared spectrum wavelength screening method based on improved team progress algorithm
CN117332358B (en) Corn soaking water treatment method and system
Zhang et al. Uninformative Biological Variability Elimination in Apple Soluble Solids Content Inspection by Using Fourier Transform Near‐Infrared Spectroscopy Combined with Multivariate Analysis and Wavelength Selection Algorithm
CN112782115B (en) Method for detecting consistency of sensory characteristics of cigarettes based on near infrared spectrum
CN112485217A (en) Method and device for constructing meat identification model applied to origin tracing
CN115541531A (en) Method for predicting protein content in feed based on two-dimensional correlation spectrum
Liu et al. A novel wavelength selection strategy for chlorophyll prediction by MWPLS and GA
Yuan et al. Application of hyperspectral imaging to discriminate waxy corn seed vigour after aging.
Liu et al. Rapid determination of maturity in apple using outlier detection and calibration model optimization
CN111415715B (en) Intelligent correction method, system and device based on multi-element spectrum data
CN113607681A (en) Pleurotus eryngii mycelium detection method and device, electronic equipment and storage medium
CN113484270A (en) Construction and detection method of single-grain rice fat content quantitative analysis model
CN110320174B (en) Method for rapidly predicting time for smoldering yellow tea by applying polynomial net structure artificial neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination