CN114113035A - Transgenic soybean oil identification method - Google Patents

Transgenic soybean oil identification method Download PDF

Info

Publication number
CN114113035A
CN114113035A CN202111370792.8A CN202111370792A CN114113035A CN 114113035 A CN114113035 A CN 114113035A CN 202111370792 A CN202111370792 A CN 202111370792A CN 114113035 A CN114113035 A CN 114113035A
Authority
CN
China
Prior art keywords
raman spectrum
matrix
ultraviolet raman
ultraviolet
soybean oil
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111370792.8A
Other languages
Chinese (zh)
Other versions
CN114113035B (en
Inventor
金伟其
郭宗昱
郭一新
裘溯
何玉青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202111370792.8A priority Critical patent/CN114113035B/en
Publication of CN114113035A publication Critical patent/CN114113035A/en
Application granted granted Critical
Publication of CN114113035B publication Critical patent/CN114113035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/65Raman scattering

Landscapes

  • Health & Medical Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)

Abstract

The invention provides a transgenic soybean oil identification method, which comprises the following steps: forming an ultraviolet Raman spectrum set X' according to the ultraviolet Raman spectra of the plurality of soybean oil samples; determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X'; training an influence model by using the ultraviolet Raman spectrum set X' and the label information to determine a load matrix; determining the influence strength of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix; after the characteristic peaks of each ultraviolet Raman spectrum are arranged from large to small according to the influence intensity, determining the first S characteristic peaks as transgene influence characteristic peaks; and (3) extracting a transgenic influence characteristic peak from the ultraviolet Raman spectrum of the soybean oil sample, and determining the label information of the soybean oil sample according to the transgenic influence characteristic peak. By the method, the transgenic influence characteristic peak can be extracted from the complete ultraviolet Raman spectrum, the transgenic soybean oil can be identified by utilizing the transgenic influence characteristic peak, the detection data volume is reduced, and the detection efficiency is improved.

Description

Transgenic soybean oil identification method
Technical Field
The invention relates to the technical field of soybean oil detection, in particular to a transgenic soybean oil identification method.
Background
The transgenic crop refers to a crop with specific target traits, which is obtained by introducing a cloned exogenous gene into a crop tissue and expressing the gene by using a recombinant DNA technology. According to the statistics report of the international agricultural biotechnology application service organization, the planting area of the global transgenic crops is increased to 1.917 hundred million hectares in 1996-2018, and the developing countries and the developed countries respectively account for 1.031 hundred million hectares and 0.886 hundred million hectares, wherein the global application rate of the transgenic soybeans is the highest and accounts for 50 percent of the global area of the transgenic crops. Although transgenic technology can increase crop yield, improve crop quality, improve drought and cold resistance and other characteristics, transgenic crops can also pose potential threats to ecological environment (such as soil ecosystem, biogeochemical cycle and the like) and even can seriously affect biological population, so the evaluation of environmental safety of transgenic crops is always a concern. China is the main soybean consumption country and import country in the world, the amount of imported soybeans reaches 8.34 hundred million tons as of 2019, the consumption amount is about 10 hundred million tons, and most of the soybeans are transgenic soybeans. In 2020, the agricultural rural part issues an agricultural transgenic organism safety certificate approval list of three herbicide-tolerant transgenic soybeans. In order to prevent the abuse of transgenic soybeans in food production and solve the problems of unclear food marks and even mixed fish and dragon, the detection situation of the components of the transgenic soybeans in food is very urgent.
Raman spectroscopy is a nondestructive non-contact light scattering analysis method, and the position, strength and shape of a spectral peak can accurately reflect the structural information of related substances or mixtures, and is commonly used for identifying substances and analyzing components. The Raman spectrum detection does not need pretreatment, does not generate chemical pollutants, and has the advantages of rapidness, accuracy, simplicity, high efficiency, high repeatability and the like. However, since the soybean oil component contains a large number of carbon-carbon double bonds (linear or cyclic unsaturated molecules with a large number of p-bond couplings), a strong fluorescence background is generated, which greatly interferes with the detection of raman spectra.
Compared with common visible light and near-infrared Raman spectrums, the ultraviolet Raman spectrum has the following characteristics: substantially separated from the fluorescence spectrum; because the ozone layer isolates ultraviolet rays, the interference of an ultraviolet Raman spectrum by ambient light is small, the method is suitable for field remote measurement, and the application scene is wider; the Raman scattering intensity is inversely proportional to the fourth power of Raman shift, and the detection of the weak scattering signal by the ultraviolet Raman spectrum under the same condition is more advantageous and more suitable for the detection of an actual field. Therefore, the ultraviolet Raman spectrum is suitable for detecting the transgenic soybean oil. In addition, the ultraviolet Raman spectrum detector can remotely measure at a certain distance in a natural environment, so that the ultraviolet Raman spectrum detector not only can effectively detect dangerous goods such as drugs and explosives, but also can provide a high-efficiency effective method for detecting various transgenes, additives or expired foods in the market, and has wide application prospect.
In the prior art, after the ultraviolet Raman spectrum is detected, the complete ultraviolet Raman spectrum can be used for detecting the transgenic soybean oil, but the data volume of the complete ultraviolet Raman spectrum is large, the detection time is long, and the detection efficiency is low.
Disclosure of Invention
Aiming at the technical problems of large data volume of the whole ultraviolet Raman spectrum, long detection time and low detection efficiency in the prior art, the invention provides the transgenic soybean oil identification method.
In order to achieve the purpose, the identification method of the transgenic soybean oil provided by the invention comprises the following steps: forming an ultraviolet Raman spectrum set X' according to the ultraviolet Raman spectra of the plurality of soybean oil samples; determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', wherein the label information corresponding to the ultraviolet Raman spectrum is used for identifying brand information and transgenic information of the soybean oil sample corresponding to the ultraviolet Raman spectrum; training an influence model by using label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' to determine a load matrix; determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix; wherein the influence strength of the characteristic peak is positively correlated with the accuracy of determining the label information by using the characteristic peak; after the characteristic peaks of each ultraviolet Raman spectrum are arranged from large to small according to the influence intensity, determining the first S characteristic peaks as transgene influence characteristic peaks; and extracting a transgenic influence characteristic peak from the ultraviolet Raman spectrum of the soybean oil sample, and determining the label information of the soybean oil sample according to the transgenic influence characteristic peak.
Further, the training an influence model by using the label information corresponding to each uv raman spectrum in the uv raman spectrum set X 'and the uv raman spectrum set X' to determine a load matrix includes: dividing label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' into a training set and a verification set according to a set proportion, training the influence model by using the training set to determine the load matrix, and verifying the influence model by using the verification set; wherein training the impact model with the training set to determine the load matrix comprises: determining a spectrum matrix E according to the ultraviolet Raman spectrum set X' in the training set0Determining a label matrix F according to label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' in the training set0(ii) a Using said spectral matrix E0And said label matrix F0Determining a load matrix L0
Further, using the spectral matrix E0And said label matrix F0Determining a load matrix L0The method comprises the following steps:
t1=E0w1
u1=F0c1
wherein, t1Is a spectral matrix E0Linear combination of inner matrix elements, u1As a label matrix F0Linear combinations of inner matrix elements, w1And c1Is a weight coefficient;
let t1And u1The covariance of (a) is maximized to obtain a covariance matrix:
Cov(t1,u1)→max;
||w1||=1,||c1||=1;
determining eigenvectors [ eta ] of covariance matrix12,····ηm]And a characteristic value [ lambda ]12,····λm];
Determining the load matrix L according to the eigenvector and eigenvalue of the covariance matrix0
Figure BDA0003362136630000041
Further, determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix, wherein the influence intensity comprises the following steps: using said load matrix L0And the spectral matrix E0Determining a Raman displacement-load coefficient curve; and determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum according to the peak value and the trough value of the Raman shift-load coefficient curve.
Further, the set ratio is between (1-4): 1.
further, the forming a set of uv-raman spectra X' from the uv-raman spectra of the plurality of soybean oil samples comprises: collecting a plurality of soybean oil samples, and carrying out ultraviolet Raman spectrum detection on the plurality of soybean oil samples to obtain an ultraviolet Raman spectrum set X; and sequentially carrying out polynomial fitting smoothing pretreatment, baseline correction and multivariate scattering correction on the ultraviolet Raman spectrum set X to obtain an ultraviolet Raman spectrum set X'.
Further, performing polynomial fitting smoothing pretreatment on the ultraviolet Raman spectrum set X by adopting a Savitzky-Golay convolution smoothing algorithm.
Further, the fitting order of the Savitzky-Golay convolution smoothing algorithm is 3 times, and the window width is 7.
Further, baseline correction is carried out on the ultraviolet Raman spectrum set X by adopting an iterative self-adaptive weighted penalty least square method.
Further, the Raman shift of the characteristic peak influenced by the transgenes is 1100cm-1,1400cm-1,1515cm-1,1600cm-1,1656cm-1,2871cm-1,2933cm-1,2971cm-1
Through the technical scheme provided by the invention, the invention at least has the following technical effects:
the method for identifying the transgenic soybean oil comprises the steps of detecting a plurality of soybean oil samples, forming an ultraviolet Raman spectrum set X 'according to ultraviolet Raman spectrums of the soybean oil samples, and determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', wherein the label information is used for identifying brand information and transgenic information of the soybean oil samples corresponding to the ultraviolet Raman spectrums. And training an influence model by using label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' to determine a load matrix, determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix, and determining S characteristic peaks with higher influence intensity as transgenic influence characteristic peaks. When an unknown soybean oil sample is identified, acquiring a complete ultraviolet Raman spectrum of the unknown soybean oil sample, extracting a transgenic influence characteristic peak from the complete ultraviolet Raman spectrum, determining label information of the soybean oil sample according to the transgenic influence characteristic peak so as to determine transgenic information corresponding to the label information, and identifying the soybean oil sample. By the method, the transgenic influence characteristic peak can be extracted from the complete ultraviolet Raman spectrum, the transgenic soybean oil can be identified by utilizing the transgenic influence characteristic peak, the detection data volume is reduced, and the detection efficiency is improved.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:
FIG. 1 is a flow chart of a method for identifying transgenic soybean oil provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of an ultraviolet Raman spectrum set X' in the method for identifying transgenic soybean oil provided by the embodiment of the invention;
FIG. 3 is a schematic diagram of a Raman shift-load coefficient curve in a transgenic soybean oil identification method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the prediction of an unknown sample in the method for identifying transgenic soybean oil according to the embodiment of the present invention;
fig. 5 is a schematic diagram of unknown sample clustering in the transgenic soybean oil identification method provided by the embodiment of the invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
In the present invention, unless specified to the contrary, use of the terms of orientation such as "upper, lower, top, bottom" or the like are generally described with respect to the orientation shown in the drawings or the positional relationship of the components with respect to each other in the vertical, or gravitational direction.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, an embodiment of the present invention provides a method for identifying transgenic soybean oil, including the following steps: s101: forming an ultraviolet Raman spectrum set X' according to the ultraviolet Raman spectra of the plurality of soybean oil samples;
further, the forming a set of uv-raman spectra X' from the uv-raman spectra of the plurality of soybean oil samples comprises: collecting a plurality of soybean oil samples, and carrying out ultraviolet Raman spectrum detection on the plurality of soybean oil samples to obtain an ultraviolet Raman spectrum set X; and sequentially carrying out polynomial fitting smoothing pretreatment, baseline correction and multivariate scattering correction on the ultraviolet Raman spectrum set X to obtain an ultraviolet Raman spectrum set X'.
Specifically, in the embodiment of the present invention, a plurality of soybean oil samples are collected, and a self-developed uv raman spectroscopy system is used to perform uv raman spectroscopy on the soybean oil samples to obtain a uv raman spectroscopy set X. An ocean optical QE-pro spectrometer is used for spectrum collection, the laser wavelength is 266nm, the power is 30mW, the pulse width is 5ns, the resolution is 0.14-7.7 nm (FWHM), and the scanning frequency is 10 times. Then, preprocessing the ultraviolet Raman spectrum set X, and sequentially comprising the following steps: and performing polynomial fitting smoothing pretreatment, baseline correction and multivariate scattering correction to obtain an ultraviolet Raman spectrum set X', and correcting the ultraviolet Raman spectrum set X through pretreatment, so that the influence of a detection environment and detection equipment on an ultraviolet Raman spectrum can be reduced, and a more accurate ultraviolet Raman spectrum can be obtained.
Further, performing polynomial fitting smoothing pretreatment on the ultraviolet Raman spectrum set X by adopting a Savitzky-Golay convolution smoothing algorithm.
Specifically, in the embodiment of the present invention, the Savitzky-Golay convolution smoothing algorithm is a filtering method based on least square fitting. The ultraviolet Raman spectrum set X is a set of discrete data points, a Savitzky-Golay convolution smoothing algorithm selects a fitting order P to carry out least square fitting on a certain continuous 2M +1 data points (namely the width of a moving window is 2M +1) in the ultraviolet Raman spectrum set X, the value of a curve obtained by fitting at the center of the data window is used as a filtered value, then the window is moved and the process is repeated, the processing of all the data points in the ultraviolet Raman spectrum set X is realized, and the processed data is recorded as the ultraviolet Raman spectrum set X1
The width of the moving window is 2M +1, the data points in the window are represented as S [ n ], n takes on the value [ -M, 0, M ], and the fitting polynomial in each window is:
Figure BDA0003362136630000071
the minimum mean square error is:
Figure BDA0003362136630000072
to maximize the curve fit, the mean square error E is minimized, and the coefficients a of the above formula are adjustedkDerivative and let the derivative be 0, i.e.
Figure BDA0003362136630000073
Namely, it is
Figure BDA0003362136630000074
Order to
Figure BDA0003362136630000075
Then equation (4) can be simplified as:
Figure BDA0003362136630000076
knowing the width of the moving window, the polynomial order P and the data S [ n ] to be fitted]Substituting into equation (5) can calculate FrG isk+rBy substituting the formula (6), the polynomial coefficient a can be obtainedk([a0,a1,···,aP]) Thereby determining a polynomial (1) within a window. And when the window is moved, taking the value of the fitting polynomial at the center point of the window as a filtered value, namely the output result of the Savitzky-Golay convolution smoothing algorithm. Smoothing all data points in the ultraviolet Raman spectrum set X to obtain the ultraviolet Raman spectrum set X1
Further, the fitting order of the Savitzky-Golay convolution smoothing algorithm is 3 times, and the window width is 7.
Specifically, the Savitzky-Golay convolution smoothing algorithm has two important parameters: fitting order P and moving window width 2M + 1. In general, if the window width is small and the fitting order is high, a noise signal is generated; with a larger window width and a lower fitting order, a distorted signal is generated. The Raman spectrogram of the soybean oil can be summarized into a combination of a plurality of Voigt line types, noise and fluorescence bases, the optimal fitting order of the Savitzky-Golay convolution smoothing algorithm is finally selected to be 3 times and the window width is 7 according to the spectrogram characteristics (including the whole spectrogram line type, the peak width, the intensity and the like of each characteristic peak), the low-intensity characteristic peak cannot be excessively denoised, most of noise can be eliminated, and the optimal smoothing effect is achieved.
Further, baseline correction is carried out on the ultraviolet Raman spectrum set X by adopting an iterative self-adaptive weighted penalty least square method.
Specifically, in the embodiment of the present invention, an Iterative Adaptive weighted penalty Least square method (Adaptive Iterative weighted Re-weighted pealed Least Squares, airPLS) is adopted to perform baseline correction on the ultraviolet raman spectrum set X. When an ideal ultraviolet raman spectrum system does not detect an effective signal, the signal intensity acquired by a detector should be 0 (namely, an ideal baseline), but in actual use, due to factors such as electronic drift, dark current, readout noise and sample surface characteristics generated by system hardware, the original ultraviolet raman spectrum acquired and output by the system has a certain fluorescence background and noise, and baseline correction is required. The airPLS algorithm is an error-based iterative weighting strategy, with the weight at each point updated based on the difference between the baseline and the original signal of the last cycle fit.
The spectral data after Savitzky-Golay smooth convolution processing is X1Let the vector be x, the fitting vector be z, and the length be m. The fidelity F of vector z to vector x can be expressed as the sum of the squared errors of the two:
Figure BDA0003362136630000091
the roughness R of the fit vector z is expressed as:
Figure BDA0003362136630000092
to obtain an effectively smooth and undistorted output spectrum, the fidelity and smoothness of the data need to be balanced, and a penalty least squares function Q is expressed as the sum of the fidelity and the roughness and penalty coefficients thereof, as follows:
Q=F+λR=||x-z||2+λ||Dz||2 (9);
wherein Dz ═ Δ z. By adjusting λ in equation (9), a balance between fidelity and smoothness is achieved. The larger λ is, the smoother the fitting vector z, and an excessively large value causes distortion thereof.
In order to obtain a minimum solution for penalizing the least square function Q, Q is used for solving partial derivatives of the fitting vector z, and the derivative of the partial derivatives is 0, so that:
(I+λD′D)z=x (10);
after the weight vector w of the fidelity F is introduced, the weight vector of the corresponding position of the x peak segment is set to 0, and then the fidelity F of z to x in equation (7) can be expressed as:
Figure BDA0003362136630000093
introducing an iterative idea based on the above formulas, so that the weight of each point in the vector is updated based on the difference between the baseline z of the last loop fitting and the original signal x, where equation (9) can be expressed as:
Figure BDA0003362136630000094
adopting an iteration method to self-adaptively obtain a weight vector w, and setting the initial value of w as w0With 1, the number of iterations is t, then w in each iteration can be expressed as:
Figure BDA0003362136630000095
wherein, the vector dtFrom x and zt-1The negative element of the difference in the iteration step t.
The iteration will stop when the maximum number of iterations is reached or a termination criterion is reached, the termination criterion being:
dt<0.001×|x| (14);
in airPLS, the peak points are gradually eliminated, and finally the baseline point Z in the weight vector w is retained, and the baseline-corrected spectral data X2=X1-Z。
Further, due to the uneven or specular reflection on the sample surface, the actually acquired spectrum may generate a certain offset and noise, resulting in a decrease in the spectral repeatability of the same sample. The multiple scattering correction can correct the baseline shift and shift phenomena of the spectral data through unitary linear regression operation, improve the spectral repeatability and enhance the Raman scattering information related to the spectrum and the material composition structure.
The input spectral data is X2Wherein the collected samples have a total of s types, and the group of each type is marked as X2i(i ═ 1,2, ·, s); the number of the collection groups of each type of samples is Ni(i ═ 1,2,. cndot., s). Performing multivariate scattering correction on a certain group of sample spectral data, comprising the following steps of:
firstly, the average value of all spectral data in the sample is obtained
Figure BDA0003362136630000101
Namely, it is
Figure BDA0003362136630000102
Performing unary linear regression on each spectrum and the average spectrum, and solving a least square problem to obtain a baseline offset k of each samplejAnd baseline translation bjI.e. by
Figure BDA0003362136630000103
Each spectrum was corrected, and the base line shift b obtained in equation (16) was subtractedjAnd divided by the baseline offset kjFinally, the corrected spectrum Data is obtainedj(MSC)I.e. by
Figure BDA0003362136630000104
After performing multivariate scattering correction on all kinds of samples, obtaining preprocessed spectral data which are recorded as X ', wherein X' comprises [ X21(MSC),X22(MSC),···,X2s(MSC)]。
S102: determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', wherein the label information corresponding to the ultraviolet Raman spectrum is used for identifying brand information and transgenic information of the soybean oil sample corresponding to the ultraviolet Raman spectrum;
specifically, in the embodiment of the present invention, label information is set for each uv raman spectrum, for example, represented by 1,2, 3 ·, or a, b, c · according to the brand information and transgene information of the soybean oil sample corresponding to each uv raman spectrum in the uv raman spectrum set X'.
S103: training the influence model by using label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' to determine a load matrix;
further, the training the influence model by using the label information corresponding to each uv raman spectrum in the uv raman spectrum set X 'and the uv raman spectrum set X' to determine a load matrix includes: dividing label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' into a training set and a verification set according to a set proportion, training the influence model by using the training set to determine the load matrix, and verifying the influence model by using the verification set; wherein training the impact model with the training set to determine the load matrix comprises: determining a spectrum matrix E according to the ultraviolet Raman spectrum set X' in the training set0According to purple in the training setTag information corresponding to each ultraviolet Raman spectrum in outer Raman spectrum set X' determines tag matrix F0(ii) a Using said spectral matrix E0And said label matrix F0Determining a load matrix L0
Specifically, in the embodiment of the present invention, label information corresponding to each uv raman spectrum in the uv raman spectrum set X 'and the uv raman spectrum set X' is divided into a training set and a verification set according to a set proportion, an influence model is trained by using the training set to determine a load matrix, and the influence is verified by using the verification set. Preferably, the set ratio is between (1-4): 1, further preferably, the ratio is set to 4: 1.
training an influence model by using label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' to determine a load matrix:
determining a spectrum matrix E according to the ultraviolet Raman spectrum set X' in the training set0Determining a label matrix F according to label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' in the training set0Spectrum matrix E0As an argument matrix, a tag matrix F0The number of samples in the training set is n, E for the dependent variable matrix0Containing m-dimensional variables, F0Including p-dimensional variables, the independent variable and dependent variable matrices are respectively E0(n.times.m) and F0(n×p)。
Respectively in the spectral matrix E0And a label matrix F0To extract a component t1And u1As a first pair of principal components (also called score vectors), t1Is a spectral matrix E0Linear combination of inner matrix elements, u1As a label matrix F0Linear combination of inner matrix elements with weight coefficients of w1And c1I.e. t1=E0w1And u1=F0c1. The method comprises the following steps: 1) t is t1And u1The variation information in the respective data matrix can be represented as much as possible, i.e. the variance of the two is maximized; 2) t is t1For u is paired1Has maximum interpretation ability and the correlation degree of the twoTo the maximum. In summary, t is required1And u1The covariance of (a) is maximized, i.e.
Cov(t1,u1)→max (18);
Wherein w1And c1Are all unit vectors, i.e.
||w1||=1,||c1||=1 (19);
Solving the condition extreme value problem by adopting a Lagrangian method to obtain w1Is a matrix
Figure BDA0003362136630000121
Eigenvectors of maximum eigenvalues, c1Is a matrix
Figure BDA0003362136630000122
The eigenvector of the largest eigenvalue.
Solving the covariance matrix of the formula (18) to obtain the eigenvector [ eta ] of the covariance matrix12,····ηm]And a characteristic value [ lambda ]12,····λm];
Determining a load matrix L according to the eigenvector and eigenvalue of the covariance matrix0
Figure BDA0003362136630000123
The score vectors are the weights of each principal component, each score vector is actually a projection of the independent variable matrix in the direction of its corresponding load vector, reflecting the degree of coverage of the independent variable in the direction of the load vector.
Further, ten-fold cross-validation is employed to prevent affecting model overfitting. And (3) randomly dividing the samples of the training set into 10 groups, respectively performing primary verification subset on each subset, and performing training subset on the rest 9 groups of subset data. The corresponding accuracy is obtained after each test, the process is repeated for 10 times, the test data used each time are different, and each group of samples can be verified once. After 10 times of experiments, the average value of the accuracy of 10 results is taken as the estimation of the algorithm precision.
S104: determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix; wherein the influence strength of the characteristic peak is positively correlated with the accuracy of determining the label information by using the characteristic peak;
further, determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix, wherein the influence intensity comprises the following steps: using said load matrix L0And the spectral matrix E0Determining a Raman displacement-load coefficient curve; and determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum according to the peak value and the trough value of the Raman shift-load coefficient curve.
Specifically, in the embodiment of the present invention, the load matrix L0Matrix element of (1) and spectral matrix E0The matrix elements (including the Raman shift and the spectrum intensity information thereof) have one-to-one correspondence, namely, the matrix elements have one-to-one correspondence with the Raman shift, and a Raman shift-load coefficient curve can be obtained. Through the horizontal axis and the vertical axis of the Raman shift-load coefficient curve, a plurality of most representative transgenic influence characteristic peaks (which can represent transgenic/non-transgenic soybean oil) can be found. In the curve, the horizontal axis is the Raman displacement of the characteristic peak influenced by the transgenosis, and the absolute values of the wave peak value and the wave valley value are the influence intensity of the characteristic peak influenced by the transgenosis.
S105: after the characteristic peaks of each ultraviolet Raman spectrum are arranged from large to small according to the influence intensity, determining the first S characteristic peaks as transgene influence characteristic peaks;
specifically, in the embodiment of the present invention, after the characteristic peaks of each uv raman spectrum are arranged from large to small according to their influence intensities, several characteristic peaks with higher influence intensities are determined as transgene influence characteristic peaks.
Further, the Raman shift of the characteristic peak influenced by the transgenes is 1100cm-1,1400cm-1,1515cm-1,1600cm-1,1656cm-1,2871cm-1,2933cm-1,2971cm-1
In particular, in the embodiment of the invention, the detected ultraviolet Raman spectrum of the unknown sample has a drift conditionTherefore, the shift of 1100cm in the ultraviolet Raman spectrum can be extracted-1,1400cm-1,1515cm-1,1600cm-1,1656cm-1,2871cm-1,2933cm-1,2971cm-1And identifying nearby characteristic peaks.
S106: and extracting a transgenic influence characteristic peak from the ultraviolet Raman spectrum of the soybean oil sample, and determining the label information of the soybean oil sample according to the transgenic influence characteristic peak.
Specifically, in the embodiment of the invention, when an unknown soybean oil sample is identified, the complete ultraviolet raman spectrum of the soybean oil sample is collected, the transgenic influence characteristic peak is extracted from the complete ultraviolet raman spectrum, and the tag information of the soybean oil sample is determined by using the transgenic influence characteristic peak.
By the method, the transgenic influence characteristic peak can be extracted from the complete ultraviolet Raman spectrum, the transgenic soybean oil can be identified by utilizing the transgenic influence characteristic peak, the detection data volume is reduced, and the detection efficiency is improved.
Example one
In this example, there were 5 soybean oil samples in total, including brand a transgenic soybean oil, brand a non-transgenic soybean oil, brand B non-transgenic soybean oil, and brand C rice oil, with no apparent difference in appearance. The samples were maintained at room temperature for the experiment, 2ml per sample, and placed horizontally in a quartz (solar blind ultraviolet transparent) cuvette of size 12.5 x 40mm and capacity 3.5ml, and tested in the sample collection area. And collecting ultraviolet Raman spectrum signals of the sample, and using a marine optical QE-pro spectrometer, wherein the spectrometer adopts the average value of 10 scanning times as a sample collection spectrum. Collecting 100 groups of soybean oil samples of brand A and brand B at one time, and collecting 5 times in total; brand C samples were collected 20 times at a time, 5 times in total, for a total of 2100 sets of data. In order to increase the robustness of the sample, the acquisition of the same type of data is separated from the acquisition of another different type of data, namely, the same type of sample is discontinuously acquired, and an ultraviolet Raman spectrum set X is obtained. Performing polynomial fitting smoothing pretreatment, airPLS algorithm baseline correction and multivariate scattering correction on the ultraviolet Raman spectrum set X to obtain an ultraviolet Raman spectrum set X', please refer to FIG. 2, and FIG. 2 shows six ultraviolet Raman spectrums after pretreatment of brand A for convenience of viewing.
And (3) determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', please refer to Table 1.
TABLE 1 sample tag information
Figure BDA0003362136630000141
Figure BDA0003362136630000151
Determining a load matrix L by using label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' and the ultraviolet Raman spectrum set X0The load matrix L0Matrix element of (1) and spectral matrix E0The matrix elements (including the raman shift and the spectral intensity information thereof) have a one-to-one correspondence relationship, that is, a one-to-one correspondence relationship with the raman shift, so as to obtain a raman shift-load coefficient curve, please refer to fig. 3.
Through the horizontal axis and the vertical axis of the Raman shift-load coefficient curve, a plurality of most representative transgenic influence characteristic peaks (which can represent transgenic/non-transgenic soybean oil) can be found. In the curve, the horizontal axis represents the raman shift of the characteristic peak affected by the transgene, and the absolute values of the peak value and the valley value represent the intensity of the characteristic peak affected by the transgene, please refer to table 2.
TABLE 2 transgenic influence on Raman Shift and influence on intensity of characteristic peaks
Figure BDA0003362136630000152
Refer to table 3 for the chemical bonds assigned to the characteristic peaks of transgene influence.
TABLE 3 Raman shifts and corresponding ascribed chemical bonds
Raman shift/cm-1 Chemical bond of attribution
1100 Phosphate group O-P-O (protein)
1400 Methyl CH3
1515 Cytosine
1600 Amide belt
1656 C ═ C (fats) and amide I band
2871~2971 CH of lipids2
2933 CH2Asymmetric stretching
When an unknown soybean oil sample is identified, the complete ultraviolet Raman spectrum of the soybean oil sample is collected, a transgenic influence characteristic peak is extracted from the complete ultraviolet Raman spectrum, and the label information of the soybean oil sample is determined according to the transgenic influence characteristic peak. Referring to fig. 4 and 5, fig. 4 is a schematic diagram illustrating a prediction situation of an unknown sample according to the method of the present embodiment, and fig. 5 is a distribution situation of the unknown sample. As can be seen from the combination of FIG. 4 and FIG. 5, the difference between the brand C rice oil and other samples is large, and only one sample label is wrong in prediction; due to the fact that ultraviolet Raman spectrums of different types of soybean oil are similar, data distribution is overlapped, certain errors can be generated on prediction accuracy, and prediction of most data is not affected. The identification accuracy rate of the finally obtained unknown sample reaches 70.95 percent according to calculation.
The preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, however, the present invention is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present invention within the technical idea of the present invention, and these simple modifications are within the protective scope of the present invention.
It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition.
In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims (10)

1. A transgenic soybean oil identification method is characterized by comprising the following steps:
forming an ultraviolet Raman spectrum set X' according to the ultraviolet Raman spectra of the plurality of soybean oil samples;
determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', wherein the label information corresponding to the ultraviolet Raman spectrum is used for identifying brand information and transgenic information of the soybean oil sample corresponding to the ultraviolet Raman spectrum;
training an influence model by using label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' to determine a load matrix;
determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix; wherein the influence strength of the characteristic peak is positively correlated with the accuracy of determining the label information by using the characteristic peak;
after the characteristic peaks of each ultraviolet Raman spectrum are arranged from large to small according to the influence intensity, determining the first S characteristic peaks as transgene influence characteristic peaks;
and extracting a transgenic influence characteristic peak from the ultraviolet Raman spectrum of the soybean oil sample, and determining the label information of the soybean oil sample according to the transgenic influence characteristic peak.
2. The method for identifying transgenic soybean oil according to claim 1, wherein the training of the influence model using the label information corresponding to each uv raman spectrum in the uv raman spectrum set X 'and the uv raman spectrum set X' to determine the loading matrix comprises:
dividing label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' into a training set and a verification set according to a set proportion, training the influence model by using the training set to determine the load matrix, and verifying the influence model by using the verification set;
wherein training the impact model with the training set to determine the load matrix comprises:
determining a spectrum matrix E according to the ultraviolet Raman spectrum set X' in the training set0Determining a label matrix F according to label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' in the training set0
Using said spectral matrix E0And said label matrix F0Determining a load matrix L0
3. The method of claim 2, wherein the spectral matrix E is used0And said label matrix F0Determining a load matrix L0The method comprises the following steps:
t1=E0w1
u1=F0c1
wherein, t1Is a spectral matrix E0Linear combination of inner matrix elements, u1As a label matrix F0Linear combinations of inner matrix elements, w1And c1Is a weight coefficient;
let t1And u1The covariance of (a) is maximized to obtain a covariance matrix:
Cov(t1,u1)→max;
||w1||=1,||c1||=1;
determining eigenvectors [ eta ] of covariance matrix1,η2,…ηm]And a characteristic value [ lambda ]1,λ2,…λm];
Determining the load matrix L according to the eigenvector and eigenvalue of the covariance matrix0
Figure FDA0003362136620000021
4. The method of claim 3, wherein determining the intensity of the effect of each characteristic peak in the UV Raman spectrum using the loading matrix comprises:
using said load matrix L0And the spectral matrix E0Determining a Raman displacement-load coefficient curve;
and determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum according to the peak value and the trough value of the Raman shift-load coefficient curve.
5. The method of claim 2, wherein the set ratio is between (1-4) to 1.
6. The method of claim 1, wherein forming a set of uv-raman spectra X' from the uv-raman spectra of the plurality of soybean oil samples comprises:
collecting a plurality of soybean oil samples, and carrying out ultraviolet Raman spectrum detection on the plurality of soybean oil samples to obtain an ultraviolet Raman spectrum set X;
and sequentially carrying out polynomial fitting smoothing pretreatment, baseline correction and multivariate scattering correction on the ultraviolet Raman spectrum set X to obtain an ultraviolet Raman spectrum set X'.
7. The method of identifying a transgenic soybean oil according to claim 6,
and performing polynomial fitting smoothing pretreatment on the ultraviolet Raman spectrum set X by adopting a Savitzky-Golay convolution smoothing algorithm.
8. The method of claim 7, wherein the Savitzky-Golay convolution smoothing algorithm is fitted 3 times with a window width of 7.
9. The method for identifying transgenic soybean oil according to claim 6, wherein the ultraviolet Raman spectrum set X is subjected to baseline correction by using an iterative adaptive weighted penalty least square method.
10. The method of claim 1, wherein the transgenic soybean oil has a Raman shift of 1100cm-1,1400cm-1,1515cm-1,1600cm-1,1656cm-1,2871cm-1,2933cm-1,2971cm-1
CN202111370792.8A 2021-11-18 2021-11-18 Identification method of transgenic soybean oil Active CN114113035B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111370792.8A CN114113035B (en) 2021-11-18 2021-11-18 Identification method of transgenic soybean oil

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111370792.8A CN114113035B (en) 2021-11-18 2021-11-18 Identification method of transgenic soybean oil

Publications (2)

Publication Number Publication Date
CN114113035A true CN114113035A (en) 2022-03-01
CN114113035B CN114113035B (en) 2024-02-02

Family

ID=80397898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111370792.8A Active CN114113035B (en) 2021-11-18 2021-11-18 Identification method of transgenic soybean oil

Country Status (1)

Country Link
CN (1) CN114113035B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4847198A (en) * 1987-10-07 1989-07-11 The Board Of Governors For Higher Education, State Of Rhode Island And Providence Plantations Detection and indentification of bacteria by means of ultra-violet excited resonance Raman spectra
CN104614336A (en) * 2015-03-08 2015-05-13 王利兵 Infrared spectral feature based chemical rapid discrimination method and device
CN106546553A (en) * 2016-10-31 2017-03-29 浙江大学 A kind of quick nondestructive discrimination method of genetically engineered soybean oil
CN108362659A (en) * 2018-02-07 2018-08-03 武汉轻工大学 Edible oil type method for quick identification based on multi-source optical spectrum parallel connection fusion
CN108802002A (en) * 2018-05-08 2018-11-13 华南农业大学 A kind of quick nondestructive differentiates the silkworm seed Raman spectrum model building method of termination of diapause
CN109001181A (en) * 2018-08-24 2018-12-14 武汉轻工大学 A kind of edible oil type method for quick identification of Raman spectrum canonical correlation analysis fusion
CN109409350A (en) * 2018-10-23 2019-03-01 桂林理工大学 A kind of Wavelength selecting method based on PCA modeling reaction type load weighting
CN109993155A (en) * 2019-04-23 2019-07-09 北京理工大学 For the characteristic peak extracting method of low signal-to-noise ratio uv raman spectroscopy
CN110032988A (en) * 2019-04-23 2019-07-19 北京理工大学 Uv raman spectroscopy system real-time noise-reducing Enhancement Method
CN110672582A (en) * 2019-10-08 2020-01-10 浙江大学 Raman characteristic spectrum peak extraction method based on improved principal component analysis
CN110715917A (en) * 2019-10-08 2020-01-21 浙江大学 Pork and beef classification method based on Raman spectrum
CN112730373A (en) * 2020-12-03 2021-04-30 北京信息科技大学 Raman spectrum data set analysis method for deep learning training
CN112924412A (en) * 2021-01-22 2021-06-08 中国科学院合肥物质科学研究院 Single-grain rice variety authenticity distinguishing method and device based on near infrared spectrum
CN113191618A (en) * 2021-04-25 2021-07-30 南京财经大学 Millet producing area tracing method based on mid-infrared spectrum technology and feature extraction
CN113567417A (en) * 2021-07-23 2021-10-29 青岛农业大学 Method for identifying peanut oil production place based on Raman spectrum fingerprint analysis technology

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4847198A (en) * 1987-10-07 1989-07-11 The Board Of Governors For Higher Education, State Of Rhode Island And Providence Plantations Detection and indentification of bacteria by means of ultra-violet excited resonance Raman spectra
CN104614336A (en) * 2015-03-08 2015-05-13 王利兵 Infrared spectral feature based chemical rapid discrimination method and device
CN106546553A (en) * 2016-10-31 2017-03-29 浙江大学 A kind of quick nondestructive discrimination method of genetically engineered soybean oil
CN108362659A (en) * 2018-02-07 2018-08-03 武汉轻工大学 Edible oil type method for quick identification based on multi-source optical spectrum parallel connection fusion
CN108802002A (en) * 2018-05-08 2018-11-13 华南农业大学 A kind of quick nondestructive differentiates the silkworm seed Raman spectrum model building method of termination of diapause
CN109001181A (en) * 2018-08-24 2018-12-14 武汉轻工大学 A kind of edible oil type method for quick identification of Raman spectrum canonical correlation analysis fusion
CN109409350A (en) * 2018-10-23 2019-03-01 桂林理工大学 A kind of Wavelength selecting method based on PCA modeling reaction type load weighting
CN109993155A (en) * 2019-04-23 2019-07-09 北京理工大学 For the characteristic peak extracting method of low signal-to-noise ratio uv raman spectroscopy
CN110032988A (en) * 2019-04-23 2019-07-19 北京理工大学 Uv raman spectroscopy system real-time noise-reducing Enhancement Method
CN110672582A (en) * 2019-10-08 2020-01-10 浙江大学 Raman characteristic spectrum peak extraction method based on improved principal component analysis
CN110715917A (en) * 2019-10-08 2020-01-21 浙江大学 Pork and beef classification method based on Raman spectrum
CN112730373A (en) * 2020-12-03 2021-04-30 北京信息科技大学 Raman spectrum data set analysis method for deep learning training
CN112924412A (en) * 2021-01-22 2021-06-08 中国科学院合肥物质科学研究院 Single-grain rice variety authenticity distinguishing method and device based on near infrared spectrum
CN113191618A (en) * 2021-04-25 2021-07-30 南京财经大学 Millet producing area tracing method based on mid-infrared spectrum technology and feature extraction
CN113567417A (en) * 2021-07-23 2021-10-29 青岛农业大学 Method for identifying peanut oil production place based on Raman spectrum fingerprint analysis technology

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
何玉青等: "远程紫外拉曼光谱检测技术研究进展", 中国光学, vol. 12, no. 06 *
朱建国等: "基于近红外光谱法快速鉴别转基因油研究", 光学仪器, vol. 42, no. 4, pages 62 - 65 *
朱文超: "转基因水稻的光谱快速无损检测方法研究", 中国优秀硕士学位论文全文数据库农业科技辑, no. 07, pages 27 - 49 *

Also Published As

Publication number Publication date
CN114113035B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
Bassan et al. Resonant Mie scattering (RMieS) correction of infrared spectra from highly scattering biological samples
Zhang et al. Application of near-infrared hyperspectral imaging with variable selection methods to determine and visualize caffeine content of coffee beans
Zhang et al. Moisture content detection of maize seed based on visible/near‐infrared and near‐infrared hyperspectral imaging technology
Cogdill et al. Single-kernel maize analysis by near-infrared hyperspectral imaging
CN108663339B (en) On-line detection method for mildewed corn based on spectrum and image information fusion
Guo et al. Vis-NIR wavelength selection for non-destructive discriminant analysis of breed screening of transgenic sugarcane
Wu et al. Practicability investigation of using near-infrared hyperspectral imaging to detect rice kernels infected with rice false smut in different conditions
Zimmerman et al. Analysis of allergenic pollen by FTIR microspectroscopy
CN110132856B (en) Construction and application of spectrum disease index for identifying wheat scab infected seeds
Zhao et al. Detection of fungus infection on petals of rapeseed (Brassica napus L.) using NIR hyperspectral imaging
CN108169165B (en) Maltose mixture quantitative analysis method based on terahertz spectrum and image information fusion
CN109409350B (en) PCA modeling feedback type load weighting-based wavelength selection method
CN101738373A (en) Method for distinguishing varieties of crop seeds
CN105717066B (en) A kind of near infrared spectrum identification model based on weighted correlation coefficient
Martín-Tornero et al. Comparative quantification of chlorophyll and polyphenol levels in grapevine leaves sampled from different geographical locations
CN105372202B (en) Transgene cotton variety ecotype method
Liu et al. “Turn-off” fluorescent sensor for highly sensitive and specific simultaneous recognition of 29 famous green teas based on quantum dots combined with chemometrics
CN104215591A (en) Damage-free visible-near infrared light spectrum detecting method
Kutsanedzie et al. In situ cocoa beans quality grading by near-infrared-chemodyes systems
KR100934410B1 (en) Simple determination of seed weights in crops using near infrared reflectance spectroscopy
Suhandy et al. Chemometric quantification of peaberry coffee in blends using UV–visible spectroscopy and partial least squares regression
He et al. Study on the identification of resistance of rice blast based on near infrared spectroscopy
CN112485216B (en) Multi-source information fusion Thailand jasmine rice adulteration identification method
CN114113035A (en) Transgenic soybean oil identification method
KR101683404B1 (en) Development of detection method for virus-infected (cucumber green mottle mosaic virus) watermelon seed using near-infrared reflectance spectrum and detection apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant