CN114113035A

CN114113035A - Transgenic soybean oil identification method

Info

Publication number: CN114113035A
Application number: CN202111370792.8A
Authority: CN
Inventors: 金伟其; 郭宗昱; 郭一新; 裘溯; 何玉青
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-11-18
Filing date: 2021-11-18
Publication date: 2022-03-01
Anticipated expiration: 2041-11-18
Also published as: CN114113035B

Abstract

The invention provides a transgenic soybean oil identification method, which comprises the following steps: forming an ultraviolet Raman spectrum set X' according to the ultraviolet Raman spectra of the plurality of soybean oil samples; determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X'; training an influence model by using the ultraviolet Raman spectrum set X' and the label information to determine a load matrix; determining the influence strength of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix; after the characteristic peaks of each ultraviolet Raman spectrum are arranged from large to small according to the influence intensity, determining the first S characteristic peaks as transgene influence characteristic peaks; and (3) extracting a transgenic influence characteristic peak from the ultraviolet Raman spectrum of the soybean oil sample, and determining the label information of the soybean oil sample according to the transgenic influence characteristic peak. By the method, the transgenic influence characteristic peak can be extracted from the complete ultraviolet Raman spectrum, the transgenic soybean oil can be identified by utilizing the transgenic influence characteristic peak, the detection data volume is reduced, and the detection efficiency is improved.

Description

Transgenic soybean oil identification method

Technical Field

The invention relates to the technical field of soybean oil detection, in particular to a transgenic soybean oil identification method.

Background

The transgenic crop refers to a crop with specific target traits, which is obtained by introducing a cloned exogenous gene into a crop tissue and expressing the gene by using a recombinant DNA technology. According to the statistics report of the international agricultural biotechnology application service organization, the planting area of the global transgenic crops is increased to 1.917 hundred million hectares in 1996-2018, and the developing countries and the developed countries respectively account for 1.031 hundred million hectares and 0.886 hundred million hectares, wherein the global application rate of the transgenic soybeans is the highest and accounts for 50 percent of the global area of the transgenic crops. Although transgenic technology can increase crop yield, improve crop quality, improve drought and cold resistance and other characteristics, transgenic crops can also pose potential threats to ecological environment (such as soil ecosystem, biogeochemical cycle and the like) and even can seriously affect biological population, so the evaluation of environmental safety of transgenic crops is always a concern. China is the main soybean consumption country and import country in the world, the amount of imported soybeans reaches 8.34 hundred million tons as of 2019, the consumption amount is about 10 hundred million tons, and most of the soybeans are transgenic soybeans. In 2020, the agricultural rural part issues an agricultural transgenic organism safety certificate approval list of three herbicide-tolerant transgenic soybeans. In order to prevent the abuse of transgenic soybeans in food production and solve the problems of unclear food marks and even mixed fish and dragon, the detection situation of the components of the transgenic soybeans in food is very urgent.

Raman spectroscopy is a nondestructive non-contact light scattering analysis method, and the position, strength and shape of a spectral peak can accurately reflect the structural information of related substances or mixtures, and is commonly used for identifying substances and analyzing components. The Raman spectrum detection does not need pretreatment, does not generate chemical pollutants, and has the advantages of rapidness, accuracy, simplicity, high efficiency, high repeatability and the like. However, since the soybean oil component contains a large number of carbon-carbon double bonds (linear or cyclic unsaturated molecules with a large number of p-bond couplings), a strong fluorescence background is generated, which greatly interferes with the detection of raman spectra.

Compared with common visible light and near-infrared Raman spectrums, the ultraviolet Raman spectrum has the following characteristics: substantially separated from the fluorescence spectrum; because the ozone layer isolates ultraviolet rays, the interference of an ultraviolet Raman spectrum by ambient light is small, the method is suitable for field remote measurement, and the application scene is wider; the Raman scattering intensity is inversely proportional to the fourth power of Raman shift, and the detection of the weak scattering signal by the ultraviolet Raman spectrum under the same condition is more advantageous and more suitable for the detection of an actual field. Therefore, the ultraviolet Raman spectrum is suitable for detecting the transgenic soybean oil. In addition, the ultraviolet Raman spectrum detector can remotely measure at a certain distance in a natural environment, so that the ultraviolet Raman spectrum detector not only can effectively detect dangerous goods such as drugs and explosives, but also can provide a high-efficiency effective method for detecting various transgenes, additives or expired foods in the market, and has wide application prospect.

In the prior art, after the ultraviolet Raman spectrum is detected, the complete ultraviolet Raman spectrum can be used for detecting the transgenic soybean oil, but the data volume of the complete ultraviolet Raman spectrum is large, the detection time is long, and the detection efficiency is low.

Disclosure of Invention

Aiming at the technical problems of large data volume of the whole ultraviolet Raman spectrum, long detection time and low detection efficiency in the prior art, the invention provides the transgenic soybean oil identification method.

In order to achieve the purpose, the identification method of the transgenic soybean oil provided by the invention comprises the following steps: forming an ultraviolet Raman spectrum set X' according to the ultraviolet Raman spectra of the plurality of soybean oil samples; determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', wherein the label information corresponding to the ultraviolet Raman spectrum is used for identifying brand information and transgenic information of the soybean oil sample corresponding to the ultraviolet Raman spectrum; training an influence model by using label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' to determine a load matrix; determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix; wherein the influence strength of the characteristic peak is positively correlated with the accuracy of determining the label information by using the characteristic peak; after the characteristic peaks of each ultraviolet Raman spectrum are arranged from large to small according to the influence intensity, determining the first S characteristic peaks as transgene influence characteristic peaks; and extracting a transgenic influence characteristic peak from the ultraviolet Raman spectrum of the soybean oil sample, and determining the label information of the soybean oil sample according to the transgenic influence characteristic peak.

Further, the training an influence model by using the label information corresponding to each uv raman spectrum in the uv raman spectrum set X 'and the uv raman spectrum set X' to determine a load matrix includes: dividing label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' into a training set and a verification set according to a set proportion, training the influence model by using the training set to determine the load matrix, and verifying the influence model by using the verification set; wherein training the impact model with the training set to determine the load matrix comprises: determining a spectrum matrix E according to the ultraviolet Raman spectrum set X' in the training set₀Determining a label matrix F according to label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' in the training set₀(ii) a Using said spectral matrix E₀And said label matrix F₀Determining a load matrix L₀。

Further, using the spectral matrix E₀And said label matrix F₀Determining a load matrix L₀The method comprises the following steps:

t₁＝E₀w₁；

u₁＝F₀c₁；

wherein, t₁Is a spectral matrix E₀Linear combination of inner matrix elements, u₁As a label matrix F₀Linear combinations of inner matrix elements, w₁And c₁Is a weight coefficient;

let t₁And u₁The covariance of (a) is maximized to obtain a covariance matrix:

Cov(t₁,u₁)→max；

||w₁||＝1,||c₁||＝1；

determining eigenvectors [ eta ] of covariance matrix₁,η₂,····η_m]And a characteristic value [ lambda ]₁,λ₂,····λ_m]；

Determining the load matrix L according to the eigenvector and eigenvalue of the covariance matrix₀：

Further, determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix, wherein the influence intensity comprises the following steps: using said load matrix L₀And the spectral matrix E₀Determining a Raman displacement-load coefficient curve; and determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum according to the peak value and the trough value of the Raman shift-load coefficient curve.

Further, the set ratio is between (1-4): 1.

further, the forming a set of uv-raman spectra X' from the uv-raman spectra of the plurality of soybean oil samples comprises: collecting a plurality of soybean oil samples, and carrying out ultraviolet Raman spectrum detection on the plurality of soybean oil samples to obtain an ultraviolet Raman spectrum set X; and sequentially carrying out polynomial fitting smoothing pretreatment, baseline correction and multivariate scattering correction on the ultraviolet Raman spectrum set X to obtain an ultraviolet Raman spectrum set X'.

Further, performing polynomial fitting smoothing pretreatment on the ultraviolet Raman spectrum set X by adopting a Savitzky-Golay convolution smoothing algorithm.

Further, the fitting order of the Savitzky-Golay convolution smoothing algorithm is 3 times, and the window width is 7.

Further, baseline correction is carried out on the ultraviolet Raman spectrum set X by adopting an iterative self-adaptive weighted penalty least square method.

Further, the Raman shift of the characteristic peak influenced by the transgenes is 1100cm^-1，1400cm^-1，1515cm^-1，1600cm^-1，1656cm^-1，2871cm^-1，2933cm^-1，2971cm^-1。

Through the technical scheme provided by the invention, the invention at least has the following technical effects:

the method for identifying the transgenic soybean oil comprises the steps of detecting a plurality of soybean oil samples, forming an ultraviolet Raman spectrum set X 'according to ultraviolet Raman spectrums of the soybean oil samples, and determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', wherein the label information is used for identifying brand information and transgenic information of the soybean oil samples corresponding to the ultraviolet Raman spectrums. And training an influence model by using label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' to determine a load matrix, determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix, and determining S characteristic peaks with higher influence intensity as transgenic influence characteristic peaks. When an unknown soybean oil sample is identified, acquiring a complete ultraviolet Raman spectrum of the unknown soybean oil sample, extracting a transgenic influence characteristic peak from the complete ultraviolet Raman spectrum, determining label information of the soybean oil sample according to the transgenic influence characteristic peak so as to determine transgenic information corresponding to the label information, and identifying the soybean oil sample. By the method, the transgenic influence characteristic peak can be extracted from the complete ultraviolet Raman spectrum, the transgenic soybean oil can be identified by utilizing the transgenic influence characteristic peak, the detection data volume is reduced, and the detection efficiency is improved.

Additional features and advantages of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:

FIG. 1 is a flow chart of a method for identifying transgenic soybean oil provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram of an ultraviolet Raman spectrum set X' in the method for identifying transgenic soybean oil provided by the embodiment of the invention;

FIG. 3 is a schematic diagram of a Raman shift-load coefficient curve in a transgenic soybean oil identification method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the prediction of an unknown sample in the method for identifying transgenic soybean oil according to the embodiment of the present invention;

fig. 5 is a schematic diagram of unknown sample clustering in the transgenic soybean oil identification method provided by the embodiment of the invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

In the present invention, unless specified to the contrary, use of the terms of orientation such as "upper, lower, top, bottom" or the like are generally described with respect to the orientation shown in the drawings or the positional relationship of the components with respect to each other in the vertical, or gravitational direction.

The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

Referring to fig. 1, an embodiment of the present invention provides a method for identifying transgenic soybean oil, including the following steps: s101: forming an ultraviolet Raman spectrum set X' according to the ultraviolet Raman spectra of the plurality of soybean oil samples;

Specifically, in the embodiment of the present invention, a plurality of soybean oil samples are collected, and a self-developed uv raman spectroscopy system is used to perform uv raman spectroscopy on the soybean oil samples to obtain a uv raman spectroscopy set X. An ocean optical QE-pro spectrometer is used for spectrum collection, the laser wavelength is 266nm, the power is 30mW, the pulse width is 5ns, the resolution is 0.14-7.7 nm (FWHM), and the scanning frequency is 10 times. Then, preprocessing the ultraviolet Raman spectrum set X, and sequentially comprising the following steps: and performing polynomial fitting smoothing pretreatment, baseline correction and multivariate scattering correction to obtain an ultraviolet Raman spectrum set X', and correcting the ultraviolet Raman spectrum set X through pretreatment, so that the influence of a detection environment and detection equipment on an ultraviolet Raman spectrum can be reduced, and a more accurate ultraviolet Raman spectrum can be obtained.

Specifically, in the embodiment of the present invention, the Savitzky-Golay convolution smoothing algorithm is a filtering method based on least square fitting. The ultraviolet Raman spectrum set X is a set of discrete data points, a Savitzky-Golay convolution smoothing algorithm selects a fitting order P to carry out least square fitting on a certain continuous 2M +1 data points (namely the width of a moving window is 2M +1) in the ultraviolet Raman spectrum set X, the value of a curve obtained by fitting at the center of the data window is used as a filtered value, then the window is moved and the process is repeated, the processing of all the data points in the ultraviolet Raman spectrum set X is realized, and the processed data is recorded as the ultraviolet Raman spectrum set X₁。

The width of the moving window is 2M +1, the data points in the window are represented as S [ n ], n takes on the value [ -M, 0, M ], and the fitting polynomial in each window is:

the minimum mean square error is:

to maximize the curve fit, the mean square error E is minimized, and the coefficients a of the above formula are adjusted_kDerivative and let the derivative be 0, i.e.

Namely, it is

Order to

Then equation (4) can be simplified as:

knowing the width of the moving window, the polynomial order P and the data S [ n ] to be fitted]Substituting into equation (5) can calculate F_rG is_k+rBy substituting the formula (6), the polynomial coefficient a can be obtained_k([a₀,a₁,···,a_P]) Thereby determining a polynomial (1) within a window. And when the window is moved, taking the value of the fitting polynomial at the center point of the window as a filtered value, namely the output result of the Savitzky-Golay convolution smoothing algorithm. Smoothing all data points in the ultraviolet Raman spectrum set X to obtain the ultraviolet Raman spectrum set X₁。

Specifically, the Savitzky-Golay convolution smoothing algorithm has two important parameters: fitting order P and moving window width 2M + 1. In general, if the window width is small and the fitting order is high, a noise signal is generated; with a larger window width and a lower fitting order, a distorted signal is generated. The Raman spectrogram of the soybean oil can be summarized into a combination of a plurality of Voigt line types, noise and fluorescence bases, the optimal fitting order of the Savitzky-Golay convolution smoothing algorithm is finally selected to be 3 times and the window width is 7 according to the spectrogram characteristics (including the whole spectrogram line type, the peak width, the intensity and the like of each characteristic peak), the low-intensity characteristic peak cannot be excessively denoised, most of noise can be eliminated, and the optimal smoothing effect is achieved.

Specifically, in the embodiment of the present invention, an Iterative Adaptive weighted penalty Least square method (Adaptive Iterative weighted Re-weighted pealed Least Squares, airPLS) is adopted to perform baseline correction on the ultraviolet raman spectrum set X. When an ideal ultraviolet raman spectrum system does not detect an effective signal, the signal intensity acquired by a detector should be 0 (namely, an ideal baseline), but in actual use, due to factors such as electronic drift, dark current, readout noise and sample surface characteristics generated by system hardware, the original ultraviolet raman spectrum acquired and output by the system has a certain fluorescence background and noise, and baseline correction is required. The airPLS algorithm is an error-based iterative weighting strategy, with the weight at each point updated based on the difference between the baseline and the original signal of the last cycle fit.

The spectral data after Savitzky-Golay smooth convolution processing is X₁Let the vector be x, the fitting vector be z, and the length be m. The fidelity F of vector z to vector x can be expressed as the sum of the squared errors of the two:

the roughness R of the fit vector z is expressed as:

to obtain an effectively smooth and undistorted output spectrum, the fidelity and smoothness of the data need to be balanced, and a penalty least squares function Q is expressed as the sum of the fidelity and the roughness and penalty coefficients thereof, as follows:

Q＝F+λR＝||x-z||²+λ||Dz||² (9)；

wherein Dz ═ Δ z. By adjusting λ in equation (9), a balance between fidelity and smoothness is achieved. The larger λ is, the smoother the fitting vector z, and an excessively large value causes distortion thereof.

In order to obtain a minimum solution for penalizing the least square function Q, Q is used for solving partial derivatives of the fitting vector z, and the derivative of the partial derivatives is 0, so that:

(I+λD′D)z＝x (10)；

after the weight vector w of the fidelity F is introduced, the weight vector of the corresponding position of the x peak segment is set to 0, and then the fidelity F of z to x in equation (7) can be expressed as:

introducing an iterative idea based on the above formulas, so that the weight of each point in the vector is updated based on the difference between the baseline z of the last loop fitting and the original signal x, where equation (9) can be expressed as:

adopting an iteration method to self-adaptively obtain a weight vector w, and setting the initial value of w as w⁰With 1, the number of iterations is t, then w in each iteration can be expressed as:

wherein, the vector d^tFrom x and z^t-1The negative element of the difference in the iteration step t.

The iteration will stop when the maximum number of iterations is reached or a termination criterion is reached, the termination criterion being:

d_t<0.001×|x| (14)；

in airPLS, the peak points are gradually eliminated, and finally the baseline point Z in the weight vector w is retained, and the baseline-corrected spectral data X₂＝X₁-Z。

Further, due to the uneven or specular reflection on the sample surface, the actually acquired spectrum may generate a certain offset and noise, resulting in a decrease in the spectral repeatability of the same sample. The multiple scattering correction can correct the baseline shift and shift phenomena of the spectral data through unitary linear regression operation, improve the spectral repeatability and enhance the Raman scattering information related to the spectrum and the material composition structure.

The input spectral data is X₂Wherein the collected samples have a total of s types, and the group of each type is marked as X_2i(i ═ 1,2, ·, s); the number of the collection groups of each type of samples is N_i(i ═ 1,2,. cndot., s). Performing multivariate scattering correction on a certain group of sample spectral data, comprising the following steps of:

firstly, the average value of all spectral data in the sample is obtained

Namely, it is

Performing unary linear regression on each spectrum and the average spectrum, and solving a least square problem to obtain a baseline offset k of each sample_jAnd baseline translation b_jI.e. by

Each spectrum was corrected, and the base line shift b obtained in equation (16) was subtracted_jAnd divided by the baseline offset k_jFinally, the corrected spectrum Data is obtained_j(MSC)I.e. by

After performing multivariate scattering correction on all kinds of samples, obtaining preprocessed spectral data which are recorded as X ', wherein X' comprises [ X_21(MSC),X_22(MSC),···,X_2s(MSC)]。

S102: determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', wherein the label information corresponding to the ultraviolet Raman spectrum is used for identifying brand information and transgenic information of the soybean oil sample corresponding to the ultraviolet Raman spectrum;

specifically, in the embodiment of the present invention, label information is set for each uv raman spectrum, for example, represented by 1,2, 3 ·, or a, b, c · according to the brand information and transgene information of the soybean oil sample corresponding to each uv raman spectrum in the uv raman spectrum set X'.

S103: training the influence model by using label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' to determine a load matrix;

further, the training the influence model by using the label information corresponding to each uv raman spectrum in the uv raman spectrum set X 'and the uv raman spectrum set X' to determine a load matrix includes: dividing label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' into a training set and a verification set according to a set proportion, training the influence model by using the training set to determine the load matrix, and verifying the influence model by using the verification set; wherein training the impact model with the training set to determine the load matrix comprises: determining a spectrum matrix E according to the ultraviolet Raman spectrum set X' in the training set₀According to purple in the training setTag information corresponding to each ultraviolet Raman spectrum in outer Raman spectrum set X' determines tag matrix F₀(ii) a Using said spectral matrix E₀And said label matrix F₀Determining a load matrix L₀。

Specifically, in the embodiment of the present invention, label information corresponding to each uv raman spectrum in the uv raman spectrum set X 'and the uv raman spectrum set X' is divided into a training set and a verification set according to a set proportion, an influence model is trained by using the training set to determine a load matrix, and the influence is verified by using the verification set. Preferably, the set ratio is between (1-4): 1, further preferably, the ratio is set to 4: 1.

training an influence model by using label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' to determine a load matrix:

determining a spectrum matrix E according to the ultraviolet Raman spectrum set X' in the training set₀Determining a label matrix F according to label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' in the training set₀Spectrum matrix E₀As an argument matrix, a tag matrix F₀The number of samples in the training set is n, E for the dependent variable matrix₀Containing m-dimensional variables, F₀Including p-dimensional variables, the independent variable and dependent variable matrices are respectively E₀(n.times.m) and F₀(n×p)。

Respectively in the spectral matrix E₀And a label matrix F₀To extract a component t₁And u₁As a first pair of principal components (also called score vectors), t₁Is a spectral matrix E₀Linear combination of inner matrix elements, u₁As a label matrix F₀Linear combination of inner matrix elements with weight coefficients of w₁And c₁I.e. t₁＝E₀w₁And u₁＝F₀c₁. The method comprises the following steps: 1) t is t₁And u₁The variation information in the respective data matrix can be represented as much as possible, i.e. the variance of the two is maximized; 2) t is t₁For u is paired₁Has maximum interpretation ability and the correlation degree of the twoTo the maximum. In summary, t is required₁And u₁The covariance of (a) is maximized, i.e.

Cov(t₁,u₁)→max (18)；

Wherein w₁And c₁Are all unit vectors, i.e.

||w₁||＝1,||c₁||＝1 (19)；

Solving the condition extreme value problem by adopting a Lagrangian method to obtain w₁Is a matrix

Eigenvectors of maximum eigenvalues, c₁Is a matrix

The eigenvector of the largest eigenvalue.

Solving the covariance matrix of the formula (18) to obtain the eigenvector [ eta ] of the covariance matrix₁,η₂,····η_m]And a characteristic value [ lambda ]₁,λ₂,····λ_m]；

Determining a load matrix L according to the eigenvector and eigenvalue of the covariance matrix₀：

The score vectors are the weights of each principal component, each score vector is actually a projection of the independent variable matrix in the direction of its corresponding load vector, reflecting the degree of coverage of the independent variable in the direction of the load vector.

Further, ten-fold cross-validation is employed to prevent affecting model overfitting. And (3) randomly dividing the samples of the training set into 10 groups, respectively performing primary verification subset on each subset, and performing training subset on the rest 9 groups of subset data. The corresponding accuracy is obtained after each test, the process is repeated for 10 times, the test data used each time are different, and each group of samples can be verified once. After 10 times of experiments, the average value of the accuracy of 10 results is taken as the estimation of the algorithm precision.

S104: determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix; wherein the influence strength of the characteristic peak is positively correlated with the accuracy of determining the label information by using the characteristic peak;

Specifically, in the embodiment of the present invention, the load matrix L₀Matrix element of (1) and spectral matrix E₀The matrix elements (including the Raman shift and the spectrum intensity information thereof) have one-to-one correspondence, namely, the matrix elements have one-to-one correspondence with the Raman shift, and a Raman shift-load coefficient curve can be obtained. Through the horizontal axis and the vertical axis of the Raman shift-load coefficient curve, a plurality of most representative transgenic influence characteristic peaks (which can represent transgenic/non-transgenic soybean oil) can be found. In the curve, the horizontal axis is the Raman displacement of the characteristic peak influenced by the transgenosis, and the absolute values of the wave peak value and the wave valley value are the influence intensity of the characteristic peak influenced by the transgenosis.

S105: after the characteristic peaks of each ultraviolet Raman spectrum are arranged from large to small according to the influence intensity, determining the first S characteristic peaks as transgene influence characteristic peaks;

specifically, in the embodiment of the present invention, after the characteristic peaks of each uv raman spectrum are arranged from large to small according to their influence intensities, several characteristic peaks with higher influence intensities are determined as transgene influence characteristic peaks.

In particular, in the embodiment of the invention, the detected ultraviolet Raman spectrum of the unknown sample has a drift conditionTherefore, the shift of 1100cm in the ultraviolet Raman spectrum can be extracted^-1，1400cm^-1，1515cm^-1，1600cm^-1，1656cm^-1，2871cm^-1，2933cm^-1，2971cm^-1And identifying nearby characteristic peaks.

S106: and extracting a transgenic influence characteristic peak from the ultraviolet Raman spectrum of the soybean oil sample, and determining the label information of the soybean oil sample according to the transgenic influence characteristic peak.

Specifically, in the embodiment of the invention, when an unknown soybean oil sample is identified, the complete ultraviolet raman spectrum of the soybean oil sample is collected, the transgenic influence characteristic peak is extracted from the complete ultraviolet raman spectrum, and the tag information of the soybean oil sample is determined by using the transgenic influence characteristic peak.

By the method, the transgenic influence characteristic peak can be extracted from the complete ultraviolet Raman spectrum, the transgenic soybean oil can be identified by utilizing the transgenic influence characteristic peak, the detection data volume is reduced, and the detection efficiency is improved.

Example one

In this example, there were 5 soybean oil samples in total, including brand a transgenic soybean oil, brand a non-transgenic soybean oil, brand B non-transgenic soybean oil, and brand C rice oil, with no apparent difference in appearance. The samples were maintained at room temperature for the experiment, 2ml per sample, and placed horizontally in a quartz (solar blind ultraviolet transparent) cuvette of size 12.5 x 40mm and capacity 3.5ml, and tested in the sample collection area. And collecting ultraviolet Raman spectrum signals of the sample, and using a marine optical QE-pro spectrometer, wherein the spectrometer adopts the average value of 10 scanning times as a sample collection spectrum. Collecting 100 groups of soybean oil samples of brand A and brand B at one time, and collecting 5 times in total; brand C samples were collected 20 times at a time, 5 times in total, for a total of 2100 sets of data. In order to increase the robustness of the sample, the acquisition of the same type of data is separated from the acquisition of another different type of data, namely, the same type of sample is discontinuously acquired, and an ultraviolet Raman spectrum set X is obtained. Performing polynomial fitting smoothing pretreatment, airPLS algorithm baseline correction and multivariate scattering correction on the ultraviolet Raman spectrum set X to obtain an ultraviolet Raman spectrum set X', please refer to FIG. 2, and FIG. 2 shows six ultraviolet Raman spectrums after pretreatment of brand A for convenience of viewing.

And (3) determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', please refer to Table 1.

TABLE 1 sample tag information

Determining a load matrix L by using label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' and the ultraviolet Raman spectrum set X₀The load matrix L₀Matrix element of (1) and spectral matrix E₀The matrix elements (including the raman shift and the spectral intensity information thereof) have a one-to-one correspondence relationship, that is, a one-to-one correspondence relationship with the raman shift, so as to obtain a raman shift-load coefficient curve, please refer to fig. 3.

Through the horizontal axis and the vertical axis of the Raman shift-load coefficient curve, a plurality of most representative transgenic influence characteristic peaks (which can represent transgenic/non-transgenic soybean oil) can be found. In the curve, the horizontal axis represents the raman shift of the characteristic peak affected by the transgene, and the absolute values of the peak value and the valley value represent the intensity of the characteristic peak affected by the transgene, please refer to table 2.

TABLE 2 transgenic influence on Raman Shift and influence on intensity of characteristic peaks

Refer to table 3 for the chemical bonds assigned to the characteristic peaks of transgene influence.

TABLE 3 Raman shifts and corresponding ascribed chemical bonds

Raman shift/cm^-1	Chemical bond of attribution
		1100	Phosphate group O-P-O (protein)
1400	Methyl CH₃
		1515	Cytosine
1600	Amide belt
		1656	C ═ C (fats) and amide I band
2871～2971	CH of lipids₂
		2933	CH₂Asymmetric stretching

When an unknown soybean oil sample is identified, the complete ultraviolet Raman spectrum of the soybean oil sample is collected, a transgenic influence characteristic peak is extracted from the complete ultraviolet Raman spectrum, and the label information of the soybean oil sample is determined according to the transgenic influence characteristic peak. Referring to fig. 4 and 5, fig. 4 is a schematic diagram illustrating a prediction situation of an unknown sample according to the method of the present embodiment, and fig. 5 is a distribution situation of the unknown sample. As can be seen from the combination of FIG. 4 and FIG. 5, the difference between the brand C rice oil and other samples is large, and only one sample label is wrong in prediction; due to the fact that ultraviolet Raman spectrums of different types of soybean oil are similar, data distribution is overlapped, certain errors can be generated on prediction accuracy, and prediction of most data is not affected. The identification accuracy rate of the finally obtained unknown sample reaches 70.95 percent according to calculation.

The preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, however, the present invention is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present invention within the technical idea of the present invention, and these simple modifications are within the protective scope of the present invention.

It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition.

In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims

1. A transgenic soybean oil identification method is characterized by comprising the following steps:

forming an ultraviolet Raman spectrum set X' according to the ultraviolet Raman spectra of the plurality of soybean oil samples;

determining label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X', wherein the label information corresponding to the ultraviolet Raman spectrum is used for identifying brand information and transgenic information of the soybean oil sample corresponding to the ultraviolet Raman spectrum;

training an influence model by using label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' to determine a load matrix;

determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum by using the load matrix; wherein the influence strength of the characteristic peak is positively correlated with the accuracy of determining the label information by using the characteristic peak;

after the characteristic peaks of each ultraviolet Raman spectrum are arranged from large to small according to the influence intensity, determining the first S characteristic peaks as transgene influence characteristic peaks;

and extracting a transgenic influence characteristic peak from the ultraviolet Raman spectrum of the soybean oil sample, and determining the label information of the soybean oil sample according to the transgenic influence characteristic peak.

2. The method for identifying transgenic soybean oil according to claim 1, wherein the training of the influence model using the label information corresponding to each uv raman spectrum in the uv raman spectrum set X 'and the uv raman spectrum set X' to determine the loading matrix comprises:

dividing label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X 'and the ultraviolet Raman spectrum set X' into a training set and a verification set according to a set proportion, training the influence model by using the training set to determine the load matrix, and verifying the influence model by using the verification set;

wherein training the impact model with the training set to determine the load matrix comprises:

determining a spectrum matrix E according to the ultraviolet Raman spectrum set X' in the training set₀Determining a label matrix F according to label information corresponding to each ultraviolet Raman spectrum in the ultraviolet Raman spectrum set X' in the training set₀；

Using said spectral matrix E₀And said label matrix F₀Determining a load matrix L₀。

3. The method of claim 2, wherein the spectral matrix E is used₀And said label matrix F₀Determining a load matrix L₀The method comprises the following steps:

t₁＝E₀w₁；

u₁＝F₀c₁；

Cov(t₁，u₁)→max；

||w₁||＝1，||c₁||＝1；

determining eigenvectors [ eta ] of covariance matrix₁，η₂，…η_m]And a characteristic value [ lambda ]₁，λ₂，…λ_m]；

4. The method of claim 3, wherein determining the intensity of the effect of each characteristic peak in the UV Raman spectrum using the loading matrix comprises:

using said load matrix L₀And the spectral matrix E₀Determining a Raman displacement-load coefficient curve;

and determining the influence intensity of each characteristic peak in the ultraviolet Raman spectrum according to the peak value and the trough value of the Raman shift-load coefficient curve.

5. The method of claim 2, wherein the set ratio is between (1-4) to 1.

6. The method of claim 1, wherein forming a set of uv-raman spectra X' from the uv-raman spectra of the plurality of soybean oil samples comprises:

collecting a plurality of soybean oil samples, and carrying out ultraviolet Raman spectrum detection on the plurality of soybean oil samples to obtain an ultraviolet Raman spectrum set X;

and sequentially carrying out polynomial fitting smoothing pretreatment, baseline correction and multivariate scattering correction on the ultraviolet Raman spectrum set X to obtain an ultraviolet Raman spectrum set X'.

7. The method of identifying a transgenic soybean oil according to claim 6,

and performing polynomial fitting smoothing pretreatment on the ultraviolet Raman spectrum set X by adopting a Savitzky-Golay convolution smoothing algorithm.

8. The method of claim 7, wherein the Savitzky-Golay convolution smoothing algorithm is fitted 3 times with a window width of 7.

9. The method for identifying transgenic soybean oil according to claim 6, wherein the ultraviolet Raman spectrum set X is subjected to baseline correction by using an iterative adaptive weighted penalty least square method.

10. The method of claim 1, wherein the transgenic soybean oil has a Raman shift of 1100cm^-1，1400cm^-1，1515cm^-1，1600cm^-1，1656cm^-1，2871cm^-1，2933cm^-1，2971cm^-1。