CN114113035B - Identification method of transgenic soybean oil - Google Patents

Identification method of transgenic soybean oil Download PDF

Info

Publication number
CN114113035B
CN114113035B CN202111370792.8A CN202111370792A CN114113035B CN 114113035 B CN114113035 B CN 114113035B CN 202111370792 A CN202111370792 A CN 202111370792A CN 114113035 B CN114113035 B CN 114113035B
Authority
CN
China
Prior art keywords
raman spectrum
ultraviolet raman
matrix
soybean oil
ultraviolet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111370792.8A
Other languages
Chinese (zh)
Other versions
CN114113035A (en
Inventor
金伟其
郭宗昱
郭一新
裘溯
何玉青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202111370792.8A priority Critical patent/CN114113035B/en
Publication of CN114113035A publication Critical patent/CN114113035A/en
Application granted granted Critical
Publication of CN114113035B publication Critical patent/CN114113035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/65Raman scattering

Landscapes

  • Health & Medical Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)

Abstract

The invention provides a transgenic soybean oil identification method, which comprises the following steps: forming an ultraviolet raman spectrum set X' from the ultraviolet raman spectra of the plurality of soybean oil samples; determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X'; training an influence model by utilizing the ultraviolet Raman spectrum set X' and the label information to determine a load matrix; determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using a load matrix; after the characteristic peaks of each ultraviolet Raman spectrum are arranged according to the influence intensity from large to small, the first S characteristic peaks are determined to be transgenic influence characteristic peaks; and extracting a transgene influence characteristic peak from the ultraviolet Raman spectrum of the soybean oil sample, and determining the label information of the soybean oil sample according to the transgene influence characteristic peak. The method provided by the invention can extract the characteristic peak of the transgenic influence from the complete ultraviolet Raman spectrum, and the transgenic soybean oil is identified by utilizing the characteristic peak of the transgenic influence, so that the detection data volume is reduced, and the detection efficiency is improved.

Description

Identification method of transgenic soybean oil
Technical Field
The invention relates to the technical field of soybean oil detection, in particular to a transgenic soybean oil identification method.
Background
The transgenic crop refers to a crop with specific target characters obtained by introducing a cloned exogenous gene into crop tissues and expressing the exogenous gene by utilizing a recombinant DNA technology. According to the statistical report of the International agricultural biotechnology application service organization, the planting area of the transgenic crops worldwide is increased to 1.917 hundred million hectares in 1996-2018, and the developing country and the developed country respectively account for 1.031 hundred million hectares and 0.886 hundred million hectares, wherein the application rate of the transgenic soybeans in the world is the highest, and the transgenic soybeans account for 50% of the area of the transgenic crops worldwide. Although transgenic technology can increase crop yield, improve crop quality, improve drought and cold resistance and other characteristics, transgenic crops can also pose potential threats to the ecological environment (such as soil ecosystem and bio-geochemical circulation, etc.), and even can have serious impact on biological populations, so environmental safety evaluation of transgenic crops has been a concern. The soybean consumption of China is 8.34 hundred million tons by 2019, and the soybean consumption is about 10 hundred million tons, the vast majority of which are transgenic soybean. Three agricultural transgenic biological safety certificate approval lists of herbicide-resistant transgenic soybeans were issued by agricultural rural areas in 2020. In order to prevent misuse of transgenic soybean in food, and solve the problems of unclear food identification and even fish-dragon mixing, the detection situation of the components of the transgenic soybean is very urgent.
Raman spectroscopy is a non-destructive, non-contact light scattering analysis method, and the position, intensity and shape of the spectral peaks can accurately reflect structural information about substances or mixtures, and is commonly used for identifying substances and analyzing components. The Raman spectrum detection does not need pretreatment, does not generate chemical pollutants, and has the advantages of rapidness, accuracy, simplicity, high efficiency, high repeatability and the like. However, since the soybean oil component contains a large number of carbon-carbon double bonds (linear or cyclic unsaturated molecules, with a large number of p-bond couplings), a strong fluorescent background is generated, which greatly interferes with the detection of raman spectra.
Compared with the common visible light and near infrared Raman spectrum, the ultraviolet Raman spectrum has the following characteristics: is substantially separated from the fluorescence spectrum; because the ozone layer is used for isolating ultraviolet rays, the ultraviolet Raman spectrum is less interfered by ambient light, and the ultraviolet Raman spectrum is applicable to on-site telemetry and has wider application fields; the Raman scattering intensity is inversely proportional to the fourth power of Raman shift, and the ultraviolet Raman spectrum has more advantages for detecting weak scattering signals under the same conditions, and is more suitable for detection in actual sites. Therefore, the ultraviolet Raman spectrum is suitable for detecting the transgenic soybean oil. In addition, the ultraviolet Raman spectrum detector can carry out remote measurement for a certain distance in a natural environment, so that not only can dangerous goods such as drugs, explosives and the like be effectively detected, but also a high-efficiency effective method for detecting various transgenosis, additives or expired foods in the market can be provided, and the ultraviolet Raman spectrum detector has a wide application prospect.
In the prior art, after ultraviolet Raman spectrum is obtained by detection, the transgenic soybean oil can be detected by using the complete ultraviolet Raman spectrum, but the data size of the complete ultraviolet Raman spectrum is larger, the detection time is long, and the detection efficiency is low.
Disclosure of Invention
Aiming at the technical problems of large whole ultraviolet Raman spectrum data volume, long detection time and low detection efficiency in the prior art, the invention provides a method for identifying transgenic soybean oil.
In order to achieve the above object, the method for identifying transgenic soybean oil provided by the invention comprises the following steps: forming an ultraviolet raman spectrum set X' from the ultraviolet raman spectra of the plurality of soybean oil samples; determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', wherein the label information corresponding to the ultraviolet Raman spectrum is used for identifying brand information and transgene information of a soybean oil sample corresponding to the ultraviolet Raman spectrum; training an influence model by using the ultraviolet Raman spectrum set X 'and label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' to determine a load matrix; determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix; wherein the influence intensity of the characteristic peak is positively correlated with the accuracy of determining the tag information by using the characteristic peak; after the characteristic peaks of each ultraviolet Raman spectrum are arranged according to the influence intensity from large to small, the first S characteristic peaks are determined to be transgenic influence characteristic peaks; and extracting a transgenic influence characteristic peak from the ultraviolet Raman spectrum of the soybean oil sample, and determining the label information of the soybean oil sample according to the transgenic influence characteristic peak.
Further, the methodThe training an influence model by using the ultraviolet raman spectrum set X 'and the label information corresponding to each ultraviolet raman spectrum in the ultraviolet raman spectrum set X' to determine a load matrix includes: dividing the ultraviolet Raman spectrum set X 'and the label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' into a training set and a verification set according to a set proportion, training the influence model by using the training set to determine the load matrix, and verifying the influence model by using the verification set; wherein training the impact model with the training set to determine the load matrix comprises: determining a spectrum matrix E according to the ultraviolet Raman spectrum set X' in the training set 0 Determining a tag matrix F according to tag information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' in the training set 0 The method comprises the steps of carrying out a first treatment on the surface of the Using the spectrum matrix E 0 And the tag matrix F 0 Determining a load matrix L 0
Further, the spectrum matrix E is utilized 0 And the tag matrix F 0 Determining a load matrix L 0 Comprising:
t 1 =E 0 w 1
u 1 =F 0 c 1
wherein t is 1 For spectral matrix E 0 Linear combination of inner matrix elements, u 1 For a tag matrix F 0 Linear combinations of inner matrix elements, w 1 And c 1 Is a weight coefficient;
let t 1 And u is equal to 1 The covariance of (2) reaches the maximum, and a covariance matrix is obtained:
Cov(t 1 ,u 1 )→max;
||w 1 ||=1,||c 1 ||=1;
determining eigenvector [ eta ] of covariance matrix 12 ,····η m ]And a characteristic value [ lambda ] 12 ,····λ m ];
Eigenvectors and eigenvalues from covariance matrixDetermining the load matrix L 0
Further, determining the influence intensity of each characteristic peak in the ultraviolet raman spectrum by using the load matrix comprises: by using the load matrix L 0 And the spectrum matrix E 0 Determining a raman shift-load coefficient curve; and determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum according to the peak value and the trough value of the Raman displacement-load coefficient curve.
Further, the set ratio is between (1-4): 1.
further, the forming of the ultraviolet raman spectrum set X' from the ultraviolet raman spectra of the plurality of soybean oil samples comprises: collecting a plurality of soybean oil samples, and carrying out ultraviolet Raman spectrum detection on the plurality of soybean oil samples to obtain an ultraviolet Raman spectrum set X; and sequentially carrying out polynomial fitting smoothing pretreatment, baseline correction and multi-component scattering correction on the ultraviolet Raman spectrum set X to obtain an ultraviolet Raman spectrum set X'.
Further, a Savitzky-Golay convolution smoothing algorithm is adopted to carry out polynomial fitting smoothing pretreatment on the ultraviolet Raman spectrum set X.
Further, the fitting order of the Savitzky-Golay convolution smoothing algorithm is 3 times, and the window width is 7.
Further, an iterative adaptive weighted penalty least squares method is used to perform baseline correction on the ultraviolet raman spectrum set X.
Further, the raman shift of the characteristic peak of the transgenic effect is 1100cm -1 ,1400cm -1 ,1515cm -1 ,1600cm -1 ,1656cm -1 ,2871cm -1 ,2933cm -1 ,2971cm -1
Through the technical scheme provided by the invention, the invention has at least the following technical effects:
according to the transgenic soybean oil identification method, a plurality of soybean oil samples are detected, an ultraviolet Raman spectrum set X 'is formed according to ultraviolet Raman spectrums of the plurality of soybean oil samples, tag information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' is determined, and the tag information is used for identifying brand information and transgenic information of the soybean oil samples corresponding to the ultraviolet Raman spectrums. Training an influence model by using the ultraviolet Raman spectrum set X 'and the label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' to determine a load matrix, determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix, and determining S characteristic peaks with higher influence intensity as transgenic influence characteristic peaks. When an unknown soybean oil sample is identified, acquiring a complete ultraviolet Raman spectrum of the unknown soybean oil sample, extracting a transgene influence characteristic peak from the complete ultraviolet Raman spectrum, determining tag information of the soybean oil sample according to the transgene influence characteristic peak, determining transgene information corresponding to the tag information, and identifying the soybean oil sample. The method provided by the invention can extract the characteristic peak of the transgenic influence from the complete ultraviolet Raman spectrum, and the transgenic soybean oil is identified by utilizing the characteristic peak of the transgenic influence, so that the detection data volume is reduced, and the detection efficiency is improved.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain, without limitation, the embodiments of the invention. In the drawings:
FIG. 1 is a flow chart of a method for identifying transgenic soybean oil provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of an ultraviolet Raman spectrum set X' in the method for identifying transgenic soybean oil according to the embodiment of the invention;
FIG. 3 is a graph showing the Raman shift-load coefficient curve in the method for identifying transgenic soybean oil according to the embodiment of the present invention;
FIG. 4 is a schematic diagram of the prediction of unknown samples in the method for identifying transgenic soybean oil according to the embodiment of the present invention;
fig. 5 is a schematic diagram of unknown sample clustering in the method for identifying transgenic soybean oil according to the embodiment of the invention.
Detailed Description
The following describes the detailed implementation of the embodiments of the present invention with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
In the present invention, unless otherwise indicated, terms of orientation such as "upper, lower, top, bottom" are used generally with respect to the orientation shown in the drawings or with respect to the positional relationship of the various components with respect to one another in the vertical, vertical or gravitational directions.
The invention will be described in detail below with reference to the drawings in connection with embodiments.
Referring to fig. 1, an embodiment of the present invention provides a method for identifying transgenic soybean oil, which includes the following steps: s101: forming an ultraviolet raman spectrum set X' from the ultraviolet raman spectra of the plurality of soybean oil samples;
further, the forming of the ultraviolet raman spectrum set X' from the ultraviolet raman spectra of the plurality of soybean oil samples comprises: collecting a plurality of soybean oil samples, and carrying out ultraviolet Raman spectrum detection on the plurality of soybean oil samples to obtain an ultraviolet Raman spectrum set X; and sequentially carrying out polynomial fitting smoothing pretreatment, baseline correction and multi-component scattering correction on the ultraviolet Raman spectrum set X to obtain an ultraviolet Raman spectrum set X'.
Specifically, in the embodiment of the invention, a plurality of soybean oil samples are collected, and a self-grinding ultraviolet Raman spectrum system is adopted to carry out ultraviolet Raman spectrum detection on the soybean oil samples, so as to obtain an ultraviolet Raman spectrum set X. The spectrum acquisition uses a marine optical QE-pro spectrometer, the laser wavelength is 266nm, the power is 30mW, the pulse width is 5ns, the resolution is 0.14-7.7 nm (FWHM), and the scanning times are 10 times. Then, preprocessing the ultraviolet Raman spectrum set X, which sequentially comprises the following steps: polynomial fitting smoothing pretreatment, baseline correction and multi-element scattering correction to obtain an ultraviolet Raman spectrum set X', and correcting the ultraviolet Raman spectrum set X through pretreatment, so that the influence of detection environment and detection equipment on the ultraviolet Raman spectrum can be reduced, and a more accurate ultraviolet Raman spectrum can be obtained.
Further, a Savitzky-Golay convolution smoothing algorithm is adopted to carry out polynomial fitting smoothing pretreatment on the ultraviolet Raman spectrum set X.
Specifically, in the embodiment of the invention, the Savitzky-Golay convolution smoothing algorithm is a filtering method based on least square fitting. The ultraviolet Raman spectrum set X is a set of discrete data points, a Savitzky-Golay convolution smoothing algorithm is used for carrying out least square fitting on a certain continuous 2M+1 data point (namely the width of a moving window is 2M+1) in the ultraviolet Raman spectrum set X by selecting a fitting order P, taking the value of a curve obtained by fitting at the center of the data window as a filtered value, then moving the window and repeating the process to realize the processing of all the data points in the ultraviolet Raman spectrum set X, and the processed data is the ultraviolet Raman spectrum set X 1
The width of the moving window is 2M+1, the data points in the window are expressed as S [ n ], n takes values of [ -M, & gt0, & gtM ], and the fitting polynomials in each window are:
the minimum mean square error is:
to maximize the curve fit, the mean square error E is minimized for each coefficient a of the above equation k Take the derivative and let the derivative be 0, i.e
I.e.
Order the
Then the formula (4) can be reduced to:
known moving window width, polynomial order P and data to be fitted S [ n ]]Substituting formula (5) to calculate F r Will G k+r Substituting the polynomial into the formula (6) to obtain the polynomial coefficient a k ([a 0 ,a 1 ,···,a P ]) Thereby determining the polynomial (1) within a window. And when the window is moved, taking the value of the fitting polynomial at the center point of the window as a filtered value, namely the output result of the Savitzky-Golay convolution smoothing algorithm. All data points in the ultraviolet Raman spectrum set X are subjected to the smoothing treatment to obtain the ultraviolet Raman spectrum set X 1
Further, the fitting order of the Savitzky-Golay convolution smoothing algorithm is 3 times, and the window width is 7.
Specifically, the Savitzky-Golay convolution smoothing algorithm has two important parameters: fitting order P and moving window width 2m+1. In general, if the window width is smaller and the fitting order is higher, a noise signal is generated; a larger window width, a lower fitting order, will produce a distorted signal. The Raman spectrogram of the soybean oil can be summarized into a plurality of Voigt linetypes, noise and fluorescent substrate combinations, and the spectrogram characteristics (including the integral spectrogram linetypes, peak widths, intensities of various characteristic peaks and the like) are combined, and the best fitting order of the Savitzky-Golay convolution smoothing algorithm is finally selected to be 3 times, the window width is 7, so that the soybean oil can not excessively denoise and lose low-intensity characteristic peaks, can eliminate most of noise, and has the best smoothing effect.
Further, an iterative adaptive weighted penalty least squares method is used to perform baseline correction on the ultraviolet raman spectrum set X.
Specifically, in an embodiment of the present invention, an iterative adaptive weighted penalty least squares method (Adaptive Iterative Re-weighted Penalized Least Squares, airPLS) is used to perform baseline correction on the ultraviolet raman spectrum set X. When the effective signal is not detected by the ideal ultraviolet Raman spectrum system, the signal intensity acquired by the detector is 0 (namely, the signal intensity is an ideal baseline), but in actual use, due to factors such as electron drift, dark current, readout noise, sample surface characteristics and the like generated by system hardware, the original ultraviolet Raman spectrum acquired and output by the system has certain fluorescent background and noise, and baseline correction is needed. The airPLS algorithm is an error-based iterative weighting strategy, and the weight of each point is updated based on the difference between the baseline and original signals of the last loop fit.
Spectral data subjected to Savitzky-Golay smooth convolution processing is X 1 Assuming that the above vectors are x, the fitting vector is z, and the lengths are m. The fidelity F of vector z to vector x can be expressed as the sum of the squares of the two errors:
the roughness R of the fitting vector z is expressed as:
to obtain an effectively smooth and undistorted output spectrum, the fidelity and smoothness of the data need to be balanced, and the penalty least squares function Q is expressed as the sum of the fidelity and roughness and its penalty coefficients, as follows:
Q=F+λR=||x-z|| 2 +λ||Dz|| 2 (9);
where dz=Δz. The balance of fidelity and smoothness is achieved by adjusting λ in equation (9). The larger λ, the smoother the fit vector z, and too large may result in distortion.
To obtain the minimization solution of the punishment least square function Q, let Q bias the fitting vector z and let its derivative be 0, obtain:
(I+λD′D)z=x (10);
after introducing the weight vector w of the fidelity F, setting the weight vector of the corresponding position of the x peak segment to 0, the fidelity F from z to x in the formula (7) can be expressed as follows:
introducing an iterative idea on the basis of the above formulas, so that the weight of each point in the vector is updated based on the difference between the baseline z and the original signal x of the previous loop fitting, and the formula (9) can be expressed as follows:
adopting an iterative method to self-adaptively obtain a weight vector w, and setting the initial value of w as w 0 =1, the number of iterations is t, w in each iteration can be expressed as:
wherein vector d t From x and z t-1 The negative element of the difference in iteration step t.
The iteration will stop when the maximum number of iterations is reached or a termination criterion is reached, the termination criterion being:
d t <0.001×|x| (14);
in airPLS, the peak points are gradually eliminated, eventually preserving the baseline point Z in the weight vector wBaseline corrected spectral data X 2 =X 1 -Z。
Further, due to uneven or specular reflection on the surface of the sample, the spectrum actually collected may generate certain offset and noise, resulting in reduced spectral repeatability of the same sample. The multivariate scattering correction can correct the baseline shift and the translational phenomenon of the spectrum data through unitary linear regression operation, so that the spectrum repeatability is improved, and the Raman scattering information of the spectrum and the material composition structure is enhanced.
The input spectral data is X 2 Wherein the collected sample types are s, and each sample group is marked as X 2i (i=1, 2, s; the number of the sample collection groups of each type is N i (i=1, 2, s. And performing multi-component scattering correction on a certain group of sample spectrum data, wherein the steps are as follows:
firstly, the average value of all spectrum data in the sample is obtainedI.e.
Performing unitary linear regression on each spectrum and the average spectrum, and obtaining the baseline offset k of each sample by solving a least square problem solution j And baseline shift amount b j I.e.
Correcting each spectrum, and subtracting the baseline shift amount b obtained in the formula (16) j And divided by the baseline shift k j Finally, corrected spectrum Data is obtained j(MSC) I.e.
For all kinds of samplesAfter performing the multiple scatter correction, the obtained preprocessed spectral data is denoted as X ', X' contains [ X ] 21(MSC) ,X 22(MSC) ,···,X 2s(MSC) ]。
S102: determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', wherein the label information corresponding to the ultraviolet Raman spectrum is used for identifying brand information and transgene information of a soybean oil sample corresponding to the ultraviolet Raman spectrum;
specifically, in the embodiment of the present invention, label information, for example, expressed as 1,2, 3··or expressed as a, b, c·is set for each ultraviolet raman spectrum according to the brand information and transgene information of the soybean oil sample corresponding to each ultraviolet raman spectrum in the ultraviolet raman spectrum set X'.
S103: training the influence model by using the ultraviolet Raman spectrum set X 'and label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' to determine a load matrix;
further, the training the influence model by using the ultraviolet raman spectrum set X 'and the label information corresponding to each ultraviolet raman spectrum in the ultraviolet raman spectrum set X' to determine a load matrix includes: dividing the ultraviolet Raman spectrum set X 'and the label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' into a training set and a verification set according to a set proportion, training the influence model by using the training set to determine the load matrix, and verifying the influence model by using the verification set; wherein training the impact model with the training set to determine the load matrix comprises: determining a spectrum matrix E according to the ultraviolet Raman spectrum set X' in the training set 0 Determining a tag matrix F according to tag information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' in the training set 0 The method comprises the steps of carrying out a first treatment on the surface of the Using the spectrum matrix E 0 And the tag matrix F 0 Determining a load matrix L 0
Specifically, in the embodiment of the invention, the ultraviolet Raman spectrum set X 'and the label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' are divided into a training set and a verification set according to a set proportion, an influence model is trained by the training set to determine a load matrix, and the influence is verified by the verification set. Preferably, the set ratio is between (1-4): 1, further preferably, the setting ratio is 4:1.
training an influence model by using the ultraviolet Raman spectrum set X 'and the label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' to determine a load matrix:
determining a spectrum matrix E according to an ultraviolet Raman spectrum set X' in a training set 0 Determining a tag matrix F according to tag information corresponding to each ultraviolet Raman spectrum in an ultraviolet Raman spectrum set X' in the training set 0 Spectral matrix E 0 As an independent variable matrix, a tag matrix F 0 As a dependent variable matrix, the number of samples in the training set is n and E 0 Containing m-dimensional variables, F 0 Including the p-dimensional variables, the independent variable and the dependent variable matrix are E respectively 0 (n.times.m) and F 0 (n×p)。
Respectively in spectrum matrix E 0 And a tag matrix F 0 Extracting component t from the extract 1 And u 1 As a first pair of principal components (also referred to as score vectors), t 1 For spectral matrix E 0 Linear combination of inner matrix elements, u 1 For a tag matrix F 0 Linear combination of inner matrix elements, weight coefficients are w respectively 1 And c 1 I.e. t 1 =E 0 w 1 And u 1 =F 0 c 1 . The requirements are: 1) t is t 1 And u 1 The variance information in the respective data matrix can be represented as large as possible, namely, the variance of the variance information and the variance information reaches the maximum; 2) t is t 1 For u 1 Has the maximum interpretation capability and the maximum correlation degree of the two. To sum up, claim t 1 And u is equal to 1 Is maximized, i.e
Cov(t 1 ,u 1 )→max (18);
Wherein w is 1 And c 1 Are unit vectors, i.e
||w 1 ||=1,||c 1 ||=1 (19);
Solving the extremum problem of the condition by adopting a Lagrangian method to obtain w 1 Is a matrixFeature vector of maximum feature value, c 1 Is a matrix->Feature vector of maximum feature value.
Solving the covariance matrix of the formula (18) to obtain a eigenvector [ eta ] of the covariance matrix 12 ,····η m ]And a characteristic value [ lambda ] 12 ,····λ m ];
Determining a load matrix L according to eigenvectors and eigenvalues of the covariance matrix 0
The score vectors are weights for each principal component, each score vector being in effect a projection of the independent variable matrix in its corresponding load vector direction, reflecting the degree of coverage of the independent variable in the load vector direction.
Further, ten fold cross validation was used to prevent the model from being over fitted. Samples of the training set are divided into 10 groups at random, each subset is respectively used for verifying the subset once, and the rest 9 groups of subset data are used as training subsets. The corresponding accuracy is obtained after each test, the process is repeated 10 times, the test data used each time are different, and each group of samples can be verified once. And after 10 tests, taking the average value of the accuracy of the 10 results as the estimation of the algorithm precision.
S104: determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix; wherein the influence intensity of the characteristic peak is positively correlated with the accuracy of determining the tag information by using the characteristic peak;
further, determining ultraviolet pull using the load matrixThe intensity of the influence of each characteristic peak in the raman spectrum comprises: by using the load matrix L 0 And the spectrum matrix E 0 Determining a raman shift-load coefficient curve; and determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum according to the peak value and the trough value of the Raman displacement-load coefficient curve.
Specifically, in the embodiment of the present invention, the load matrix L 0 Matrix elements and spectral matrix E of (2) 0 The matrix elements (comprising the raman shift and the spectrum intensity information thereof) have a one-to-one correspondence, namely a one-to-one correspondence with the raman shift, so that a raman shift-load coefficient curve can be obtained. By the horizontal and vertical axes of the raman shift-load coefficient curves, multiple characteristic peaks of transgenic impact that are most representative (can represent transgenic/non-transgenic soybean oil) can be found. The horizontal axis in the curve is the raman shift of the characteristic peak of the transgene influence, and the absolute values of the peak value and the trough value are the influence intensities of the characteristic peak of the transgene influence.
S105: after the characteristic peaks of each ultraviolet Raman spectrum are arranged according to the influence intensity from large to small, the first S characteristic peaks are determined to be transgenic influence characteristic peaks;
specifically, in the embodiment of the invention, after the characteristic peaks of each ultraviolet Raman spectrum are arranged according to the influence intensity from large to small, a plurality of characteristic peaks with higher influence intensity are determined as transgenic influence characteristic peaks.
Further, the raman shift of the characteristic peak of the transgenic effect is 1100cm -1 ,1400cm -1 ,1515cm -1 ,1600cm -1 ,1656cm -1 ,2871cm -1 ,2933cm -1 ,2971cm -1
In particular, in the embodiment of the invention, because the detected ultraviolet Raman spectrum of the unknown sample has drift, the displacement in the ultraviolet Raman spectrum can be extracted to be 1100cm -1 ,1400cm -1 ,1515cm -1 ,1600cm -1 ,1656cm -1 ,2871cm -1 ,2933cm -1 ,2971cm -1 Identifying nearby characteristic peaks.
S106: and extracting a transgenic influence characteristic peak from the ultraviolet Raman spectrum of the soybean oil sample, and determining the label information of the soybean oil sample according to the transgenic influence characteristic peak.
Specifically, in the embodiment of the invention, when an unknown soybean oil sample is identified, the complete ultraviolet Raman spectrum of the soybean oil sample is collected, the characteristic peak of the transgenic influence is extracted from the complete ultraviolet Raman spectrum, and the label information of the soybean oil sample is determined by utilizing the characteristic peak of the transgenic influence.
The method provided by the invention can extract the characteristic peak of the transgenic influence from the complete ultraviolet Raman spectrum, and the transgenic soybean oil is identified by utilizing the characteristic peak of the transgenic influence, so that the detection data volume is reduced, and the detection efficiency is improved.
Example 1
In this example, there were no significant differences in appearance from a total of 5 soybean oil samples, including transgenic soybean oil of brand a, non-transgenic soybean oil of brand a, transgenic soybean oil of brand B, non-transgenic soybean oil of brand B, and rice oil of brand C. In the experiment, the temperature of the samples is maintained at room temperature, each sample is 2ml, and the samples are filled into a quartz (transparent blind ultraviolet) cuvette with the size of 12.5 x 40mm and the capacity of 3.5ml, and the quartz cuvette is horizontally placed in a sample collecting area for detection. The ultraviolet Raman spectrum signal of the sample is collected, and a marine optical QE-pro spectrometer is used, wherein the spectrometer adopts an average value of 10 times of scanning as one sample to collect the spectrum. 100 groups of two soybean oil samples of the brand A and the brand B are collected at one time, and the total collection time is 5 times; brand C samples were collected 20 times at a time for a total of 2100 data for the group. In order to increase the robustness of the sample, the acquisition of the same type of data is spaced from the acquisition of another different type of data, namely the same type of sample is discontinuously acquired, and the ultraviolet Raman spectrum set X is obtained. Polynomial fitting smoothing pretreatment, airPLS algorithm baseline correction and multivariate scattering correction are carried out on the ultraviolet Raman spectrum set X to obtain an ultraviolet Raman spectrum set X', please refer to FIG. 2, and FIG. 2 shows six ultraviolet Raman spectra after pretreatment of brand A for convenient viewing.
The tag information corresponding to each ultraviolet raman spectrum in the ultraviolet raman spectrum set X' is determined, please refer to table 1.
Table 1 sample tag information
Determining a load matrix L by utilizing the ultraviolet Raman spectrum set X' and the label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 0 Load matrix L 0 Matrix elements and spectral matrix E of (2) 0 There is a one-to-one correspondence between matrix elements (including raman shift and its spectral intensity information), i.e., there is a one-to-one correspondence between the matrix elements and the raman shift, so as to obtain a raman shift-load coefficient curve, please refer to fig. 3.
By the horizontal and vertical axes of the raman shift-load coefficient curves, multiple characteristic peaks of transgenic impact that are most representative (can represent transgenic/non-transgenic soybean oil) can be found. In the curve, the horizontal axis represents the raman shift of the characteristic peak of the transgene influence, and the absolute values of the peak value and the trough value represent the influence intensity of the characteristic peak of the transgene influence, please refer to table 2.
TABLE 2 Raman shift and influence intensity of the characteristic peaks of the transgene influence
Please refer to table 3 for the assigned chemical bonds corresponding to the characteristic peaks of the transgene effect.
TABLE 3 Raman shift and corresponding home chemical bond
Raman displacement/cm -1 Belonging to chemical bonds
1100 Phosphate group o=p-O (protein)
1400 Methyl CH 3
1515 Cytosine
1600 Amide band
1656 C=c (oil or fat), amide I band
2871~2971 Lipid CH 2
2933 CH 2 Asymmetric stretching
When unknown soybean oil samples are identified, the complete ultraviolet Raman spectrum of the soybean oil samples is collected, the transgene influence characteristic peak is extracted from the complete ultraviolet Raman spectrum, and the tag information of the soybean oil samples is determined according to the transgene influence characteristic peak. Referring to fig. 4 and 5, fig. 4 is a schematic diagram illustrating a prediction situation of an unknown sample according to the method of the present embodiment, and fig. 5 is a distribution situation of the unknown sample. As can be seen from fig. 4 and 5, the difference between the brand C rice oil and other samples is large, and only one sample label is mispredicted; because the ultraviolet Raman spectra of different soybean oil types are similar, the data distribution is overlapped, a certain error can be generated on the prediction accuracy, but the prediction of most data is not influenced. The identification accuracy of the finally obtained unknown sample reaches 70.95 percent according to the calculation.
The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present invention within the scope of the technical concept of the present invention, and all the simple modifications belong to the protection scope of the present invention.
In addition, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described further.
Moreover, any combination of the various embodiments of the invention can be made without departing from the spirit of the invention, which should also be considered as disclosed herein.

Claims (8)

1. A method for identifying transgenic soybean oil, the method comprising:
forming an ultraviolet raman spectrum set X' from the ultraviolet raman spectra of the plurality of soybean oil samples;
determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', wherein the label information corresponding to the ultraviolet Raman spectrum is used for identifying brand information and transgene information of a soybean oil sample corresponding to the ultraviolet Raman spectrum;
training an influence model by using the ultraviolet raman spectrum set X 'and tag information corresponding to each ultraviolet raman spectrum in the ultraviolet raman spectrum set X' to determine a load matrix, including: dividing the ultraviolet Raman spectrum set X 'and the label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' into a training set and a verification set according to a set proportion, training the influence model by using the training set to determine the load matrix, and verifying the influence model by using the verification set;
wherein training the impact model with the training set to determine the load matrix comprises: determining a spectrum matrix E according to the ultraviolet Raman spectrum set X' in the training set 0 Determining a tag matrix F according to tag information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' in the training set 0 The method comprises the steps of carrying out a first treatment on the surface of the Using the spectrum matrix E 0 And the tag matrix F 0 Determining a load matrix L 0
Using the spectrum matrix E 0 And the tag matrix F 0 Determining a load matrix L 0 Comprising:
t 1 =E 0 w 1
u 1 =F 0 c 1
wherein t is 1 For spectral matrix E 0 Linear combination of inner matrix elements, u 1 For a tag matrix F 0 Linear combinations of inner matrix elements, w 1 And c 1 Is a weight coefficient;
let t 1 And u is equal to 1 The covariance of (2) reaches the maximum, and a covariance matrix is obtained:
Cov(t 1 ,u 1 )→max;
||w 1 ||=1,||c 1 ||=1;
determining eigenvector [ eta ] of covariance matrix 12 ,····η m ]And a characteristic value [ lambda ] 12 ,····λ m ];
Determining the load matrix L according to eigenvectors and eigenvalues of covariance matrix 0
Determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix; wherein the influence intensity of the characteristic peak is positively correlated with the accuracy of determining the tag information by using the characteristic peak;
after the characteristic peaks of each ultraviolet Raman spectrum are arranged according to the influence intensity from large to small, the first S characteristic peaks are determined to be transgenic influence characteristic peaks;
and extracting a transgenic influence characteristic peak from the ultraviolet Raman spectrum of the soybean oil sample, and determining the label information of the soybean oil sample according to the transgenic influence characteristic peak.
2. The method of claim 1, wherein determining the impact intensity of each characteristic peak in the ultraviolet raman spectrum using the loading matrix comprises:
by using the load matrix L 0 And the spectrum matrix E 0 Determining a raman shift-load coefficient curve;
and determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum according to the peak value and the trough value of the Raman displacement-load coefficient curve.
3. The method of claim 1, wherein the set ratio is between (1-4): 1.
4. the method of claim 1, wherein the forming the ultraviolet raman spectrum set X' from the ultraviolet raman spectra of the plurality of soybean oil samples comprises:
collecting a plurality of soybean oil samples, and carrying out ultraviolet Raman spectrum detection on the plurality of soybean oil samples to obtain an ultraviolet Raman spectrum set X;
and sequentially carrying out polynomial fitting smoothing pretreatment, baseline correction and multi-component scattering correction on the ultraviolet Raman spectrum set X to obtain an ultraviolet Raman spectrum set X'.
5. The method for identifying transgenic soybean oil according to claim 4,
and carrying out polynomial fitting smoothing pretreatment on the ultraviolet Raman spectrum set X by adopting a Savitzky-Golay convolution smoothing algorithm.
6. The method for identifying transgenic soybean oil according to claim 5, wherein the Savitzky-Golay convolution smoothing algorithm has a fitting order of 3 times and a window width of 7.
7. The method of claim 4, wherein the ultraviolet raman spectrum set X is baseline corrected using an iterative adaptive weighted penalty least squares method.
8. The method for identifying transgenic soybean oil according to claim 1, wherein the raman shift of the characteristic peak of the transgenic effect is 1100cm -1 ,1400cm -1 ,1515cm -1 ,1600cm -1 ,1656cm -1 ,2871cm -1 ,2933cm -1 ,2971cm -1
CN202111370792.8A 2021-11-18 2021-11-18 Identification method of transgenic soybean oil Active CN114113035B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111370792.8A CN114113035B (en) 2021-11-18 2021-11-18 Identification method of transgenic soybean oil

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111370792.8A CN114113035B (en) 2021-11-18 2021-11-18 Identification method of transgenic soybean oil

Publications (2)

Publication Number Publication Date
CN114113035A CN114113035A (en) 2022-03-01
CN114113035B true CN114113035B (en) 2024-02-02

Family

ID=80397898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111370792.8A Active CN114113035B (en) 2021-11-18 2021-11-18 Identification method of transgenic soybean oil

Country Status (1)

Country Link
CN (1) CN114113035B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4847198A (en) * 1987-10-07 1989-07-11 The Board Of Governors For Higher Education, State Of Rhode Island And Providence Plantations Detection and indentification of bacteria by means of ultra-violet excited resonance Raman spectra
CN104614336A (en) * 2015-03-08 2015-05-13 王利兵 Infrared spectral feature based chemical rapid discrimination method and device
CN106546553A (en) * 2016-10-31 2017-03-29 浙江大学 A kind of quick nondestructive discrimination method of genetically engineered soybean oil
CN108362659A (en) * 2018-02-07 2018-08-03 武汉轻工大学 Edible oil type method for quick identification based on multi-source optical spectrum parallel connection fusion
CN108802002A (en) * 2018-05-08 2018-11-13 华南农业大学 A kind of quick nondestructive differentiates the silkworm seed Raman spectrum model building method of termination of diapause
CN109001181A (en) * 2018-08-24 2018-12-14 武汉轻工大学 A kind of edible oil type method for quick identification of Raman spectrum canonical correlation analysis fusion
CN109409350A (en) * 2018-10-23 2019-03-01 桂林理工大学 A kind of Wavelength selecting method based on PCA modeling reaction type load weighting
CN109993155A (en) * 2019-04-23 2019-07-09 北京理工大学 For the characteristic peak extracting method of low signal-to-noise ratio uv raman spectroscopy
CN110032988A (en) * 2019-04-23 2019-07-19 北京理工大学 Uv raman spectroscopy system real-time noise-reducing Enhancement Method
CN110672582A (en) * 2019-10-08 2020-01-10 浙江大学 Raman characteristic spectrum peak extraction method based on improved principal component analysis
CN110715917A (en) * 2019-10-08 2020-01-21 浙江大学 Pork and beef classification method based on Raman spectrum
CN112730373A (en) * 2020-12-03 2021-04-30 北京信息科技大学 Raman spectrum data set analysis method for deep learning training
CN112924412A (en) * 2021-01-22 2021-06-08 中国科学院合肥物质科学研究院 Single-grain rice variety authenticity distinguishing method and device based on near infrared spectrum
CN113191618A (en) * 2021-04-25 2021-07-30 南京财经大学 Millet producing area tracing method based on mid-infrared spectrum technology and feature extraction
CN113567417A (en) * 2021-07-23 2021-10-29 青岛农业大学 Method for identifying peanut oil production place based on Raman spectrum fingerprint analysis technology

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4847198A (en) * 1987-10-07 1989-07-11 The Board Of Governors For Higher Education, State Of Rhode Island And Providence Plantations Detection and indentification of bacteria by means of ultra-violet excited resonance Raman spectra
CN104614336A (en) * 2015-03-08 2015-05-13 王利兵 Infrared spectral feature based chemical rapid discrimination method and device
CN106546553A (en) * 2016-10-31 2017-03-29 浙江大学 A kind of quick nondestructive discrimination method of genetically engineered soybean oil
CN108362659A (en) * 2018-02-07 2018-08-03 武汉轻工大学 Edible oil type method for quick identification based on multi-source optical spectrum parallel connection fusion
CN108802002A (en) * 2018-05-08 2018-11-13 华南农业大学 A kind of quick nondestructive differentiates the silkworm seed Raman spectrum model building method of termination of diapause
CN109001181A (en) * 2018-08-24 2018-12-14 武汉轻工大学 A kind of edible oil type method for quick identification of Raman spectrum canonical correlation analysis fusion
CN109409350A (en) * 2018-10-23 2019-03-01 桂林理工大学 A kind of Wavelength selecting method based on PCA modeling reaction type load weighting
CN109993155A (en) * 2019-04-23 2019-07-09 北京理工大学 For the characteristic peak extracting method of low signal-to-noise ratio uv raman spectroscopy
CN110032988A (en) * 2019-04-23 2019-07-19 北京理工大学 Uv raman spectroscopy system real-time noise-reducing Enhancement Method
CN110672582A (en) * 2019-10-08 2020-01-10 浙江大学 Raman characteristic spectrum peak extraction method based on improved principal component analysis
CN110715917A (en) * 2019-10-08 2020-01-21 浙江大学 Pork and beef classification method based on Raman spectrum
CN112730373A (en) * 2020-12-03 2021-04-30 北京信息科技大学 Raman spectrum data set analysis method for deep learning training
CN112924412A (en) * 2021-01-22 2021-06-08 中国科学院合肥物质科学研究院 Single-grain rice variety authenticity distinguishing method and device based on near infrared spectrum
CN113191618A (en) * 2021-04-25 2021-07-30 南京财经大学 Millet producing area tracing method based on mid-infrared spectrum technology and feature extraction
CN113567417A (en) * 2021-07-23 2021-10-29 青岛农业大学 Method for identifying peanut oil production place based on Raman spectrum fingerprint analysis technology

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于近红外光谱法快速鉴别转基因油研究;朱建国等;光学仪器;第42卷(第4期);正文第62-65页 *
转基因水稻的光谱快速无损检测方法研究;朱文超;中国优秀硕士学位论文全文数据库农业科技辑(第07期);正文第27-49页 *
远程紫外拉曼光谱检测技术研究进展;何玉青等;中国光学;第12卷(第06期);全文 *

Also Published As

Publication number Publication date
CN114113035A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
Qiu et al. Single-kernel FT-NIR spectroscopy for detecting supersweet corn (Zea mays L. saccharata sturt) seed viability with multivariate data analysis
Cogdill et al. Single-kernel maize analysis by near-infrared hyperspectral imaging
Agelet et al. Limitations and current applications of Near Infrared Spectroscopy for single seed analysis
CN108663339B (en) On-line detection method for mildewed corn based on spectrum and image information fusion
Tao et al. A rapid and nondestructive method for simultaneous determination of aflatoxigenic fungus and aflatoxin contamination on corn kernels
Zimmerman et al. Analysis of allergenic pollen by FTIR microspectroscopy
Guo et al. Vis-NIR wavelength selection for non-destructive discriminant analysis of breed screening of transgenic sugarcane
Huang et al. Improved generalization of spectral models associated with Vis-NIR spectroscopy for determining the moisture content of different tea leaves
Kunz et al. Updating a synchronous fluorescence spectroscopic virgin olive oil adulteration calibration to a new geographical region
CN109409350B (en) PCA modeling feedback type load weighting-based wavelength selection method
Martín-Tornero et al. Comparative quantification of chlorophyll and polyphenol levels in grapevine leaves sampled from different geographical locations
Liu et al. “Turn-off” fluorescent sensor for highly sensitive and specific simultaneous recognition of 29 famous green teas based on quantum dots combined with chemometrics
Wagner et al. Subcommunity FTIR-spectroscopy to determine physiological cell states
Beghi et al. Rapid evaluation of grape phytosanitary status directly at the check point station entering the winery by using visible/near infrared spectroscopy
Kutsanedzie et al. In situ cocoa beans quality grading by near-infrared-chemodyes systems
Huang et al. Meta-analysis of the detection of plant pigment concentrations using hyperspectral remotely sensed data
CN107219184A (en) A kind of meat discrimination method and device traced to the source applied to the place of production
CN110702656A (en) Vegetable oil pesticide residue detection method based on three-dimensional fluorescence spectrum technology
CN105717066A (en) Near-infrared spectrum recognition model based on weighting association coefficients
Potůčková et al. Comparison of reflectance measurements acquired with a contact probe and an integration sphere: Implications for the spectral properties of vegetation at a leaf level
KR100934410B1 (en) Simple determination of seed weights in crops using near infrared reflectance spectroscopy
Suzuki et al. Rice-Arabidopsis FOX line screening with FT-NIR-based fingerprinting for GC-TOF/MS-based metabolite profiling
Tao et al. Use of line-scan Raman hyperspectral imaging to identify corn kernels infected with Aspergillus flavus
Li et al. Rapid detection of thiabendazole in food using SERS coupled with flower-like AgNPs and PSL-based variable selection algorithms
CN114113035B (en) Identification method of transgenic soybean oil

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant