CN114518337A - Method for qualitatively identifying amino acid mixture based on terahertz spectrum - Google Patents
Method for qualitatively identifying amino acid mixture based on terahertz spectrum Download PDFInfo
- Publication number
- CN114518337A CN114518337A CN202210106259.9A CN202210106259A CN114518337A CN 114518337 A CN114518337 A CN 114518337A CN 202210106259 A CN202210106259 A CN 202210106259A CN 114518337 A CN114518337 A CN 114518337A
- Authority
- CN
- China
- Prior art keywords
- amino acid
- terahertz
- sample
- convolution
- acid mixture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 150000001413 amino acids Chemical class 0.000 title claims abstract description 54
- 239000000203 mixture Substances 0.000 title claims abstract description 44
- 238000001228 spectrum Methods 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000005070 sampling Methods 0.000 claims abstract description 24
- 239000000126 substance Substances 0.000 claims abstract description 22
- 238000010586 diagram Methods 0.000 claims abstract description 20
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 19
- 239000011159 matrix material Substances 0.000 claims description 26
- 238000000862 absorption spectrum Methods 0.000 claims description 22
- 238000011176 pooling Methods 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 15
- 238000010521 absorption reaction Methods 0.000 claims description 10
- 230000003595 spectral effect Effects 0.000 claims description 9
- 230000005540 biological transmission Effects 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000004611 spectroscopical analysis Methods 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 9
- 238000001514 detection method Methods 0.000 abstract description 4
- 150000001875 compounds Chemical class 0.000 abstract description 3
- 238000005259 measurement Methods 0.000 abstract description 2
- 229940024606 amino acid Drugs 0.000 description 38
- 239000000523 sample Substances 0.000 description 35
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 20
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 15
- 229960003136 leucine Drugs 0.000 description 14
- 229960004441 tyrosine Drugs 0.000 description 13
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 11
- 235000019454 L-leucine Nutrition 0.000 description 7
- 239000004395 L-leucine Substances 0.000 description 7
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 6
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 6
- 229960004295 valine Drugs 0.000 description 5
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 239000004474 valine Substances 0.000 description 4
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 description 2
- 108091033411 PCA3 Proteins 0.000 description 2
- 239000004696 Poly ether ether ketone Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 239000008188 pellet Substances 0.000 description 2
- 229920002530 polyetherether ketone Polymers 0.000 description 2
- 239000003826 tablet Substances 0.000 description 2
- 238000002835 absorbance Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910002092 carbon dioxide Inorganic materials 0.000 description 1
- 239000001569 carbon dioxide Substances 0.000 description 1
- 239000007891 compressed tablet Substances 0.000 description 1
- 229920006351 engineering plastic Polymers 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N21/3581—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using far infrared light; using Terahertz radiation
- G01N21/3586—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using far infrared light; using Terahertz radiation by Terahertz time domain spectroscopy [THz-TDS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2201/00—Features of devices classified in G01N21/00
- G01N2201/12—Circuits of general importance; Signal processing
- G01N2201/129—Using chemometrical methods
- G01N2201/1296—Using chemometrical methods using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Toxicology (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention discloses a method for qualitatively identifying amino acid mixtures based on terahertz spectrum, which comprises the steps of firstly analyzing CSP (component space pattern) to obtain a probability chemical diagram of a certain compound appearing at each sampling point, and then identifying the CSP analysis result through a convolutional neural network to determine whether specific components exist in the mixtures. The proposed method will enable us to qualitatively detect components in unknown mixtures in a single measurement and will be used for real-life biomolecule detection.
Description
The technical field is as follows:
the invention belongs to the technical field of terahertz spectrum and imaging, and particularly relates to a method for qualitatively identifying an amino acid mixture based on terahertz spectrum.
Background art:
terahertz waves are electromagnetic waves with wavelengths between infrared rays and microwaves, and have many unique properties such as fingerprint spectrum, low energy, special penetrability and the like due to the transition region from photonics to electronics. The terahertz wave and the sexual material act to generate a unique absorption spectrum-fingerprint spectrum. Polar molecules, such as water molecules, and nonpolar molecules, such as carbon dioxide molecules, have very obvious difference on the absorption of terahertz waves, so that the terahertz absorption spectrum also has important value on detecting the characteristics of molecules. The time domain terahertz spectrum scanning technology utilizes a high-precision delay line to prolong the sampling time of the femtosecond pulse to dozens of picoseconds, and reduces noise through hardware pretreatment. The time domain signal generated by the device is subjected to Fourier transform to obtain a characteristic absorption spectrum of the detected substance, the spectrum width of the characteristic absorption spectrum can reach more than 5THz, and the dynamic range can reach more than 70 dB. The spectrum performance can meet the detection requirements of most compounds, so that a large number of application scenes are provided for terahertz time-domain spectrum scanning.
In view of the fingerprint characteristics of terahertz absorption spectrum, the method has important application to substance identification and classification. Many methods are currently available to identify the spectra of mixtures qualitatively or quantitatively, such as Partial Least Squares (PLS), Support Vector Regression (SVR), etc., however, these methods focus on one-dimensional terahertz spectra, tend to suffer from broadening and overlapping nature of the vibration peaks, and are only feasible under certain induced and low humidity conditions, and are therefore cumbersome to use in practical environments. Therefore, the invention provides a method for identifying mixed substance components from an absorption spectrum library based on component space pattern analysis and a convolutional neural network
The invention content is as follows:
in order to solve the above problems, the present invention aims to provide a method for identifying the composition of a mixture from an absorption spectrum library, which comprises the steps of firstly obtaining a probability chemical diagram of the occurrence of a certain compound at each sampling point through CSP analysis by component space pattern analysis, and then identifying the result of the CSP analysis through a convolutional neural network to determine whether a specific component exists in the mixture. The proposed method will enable us to qualitatively detect components in unknown mixtures in a single measurement and will be used for real-life biomolecule detection.
In order to achieve the above object, the present invention relates to a method for qualitatively identifying an amino acid mixture based on terahertz spectroscopy, comprising the steps of:
step 1: scanning the sample point by adopting a transmission type terahertz time-domain system to obtain terahertz time-domain spectral data corresponding to the sampling point of the amino acid mixture sample;
step 2: converting time domain spectrum data into frequency spectrum data through Fourier transform, and calculating the absorption rate by adopting formulas (1) to (2) to obtain an absorption spectrogram of a sampling point of the amino acid mixed sample to be detected;
where n (ω) represents the refractive index, ω is the angular frequency, φ (ω) represents the phase change induced by propagation in the sample, c is the speed of light in vacuum, d is the sample thickness, α (ω) represents the absorption, ρ (ω) represents the amplitude ratio of the sample and reference signals;
and step 3: terahertz spectrum data matrix F for constructing amino acid mixture sampleN×LCalculating to obtain a matrix P by adopting a formula (4)M×LI.e. the chemical diagram of a sample of the amino acid mixture;
[PM×L]=([SN×M]t[SN×M])-1[SN×M]t[FN×L], (4)
in the formula, L represents a pixel point on the amino acid mixture sample, i.e., a sampling point (a two-dimensional coordinate of the sampling point on the sample is converted into a one-dimensional coordinate in advance, corresponding to L), and N represents N terahertz spectral data frequency components corresponding to each pixel point, i.e., FN×LComposed of N frequency components of terahertz absorption spectrum of L sampling points in amino acid mixture sampleMatrix, SN×MThe terahertz spectrum matrix represents known M amino acids, and the terahertz absorption spectrum data of each amino acid has N frequency components, namely SN×MIs a matrix composed of N frequency components of terahertz absorption spectra of known M amino acids, PM×LRepresenting the probability of each pixel point appearing in a certain amino acid, also described as a chemical diagram;
and 4, step 4: p is obtained by adopting formula (5)M×LRemoving background by each data self-adaptive threshold, setting the pixel value higher than the threshold to be 1, and otherwise, setting the pixel value to be 0 to obtain a processed chemical diagram;
C1=min(P)+C0[max(P)-min(P)] (5)
in the formula, C1Represents a threshold value of 0; max (P) is the matrix PM×LMaximum of middle element, min (P) is matrix PM×LMinimum of medium element; c0Is [ max (P) -min (P)]A weight for the value range set to 0.6;
and 5: constructing a convolutional neural network based on LeNet-5, and training the convolutional neural network;
step 6: and (4) inputting the chemical diagram of the amino acid mixture sample to be detected obtained according to the steps 1 to 4 into a trained convolutional neural network, and outputting the types of the amino acids contained in the sample.
The convolutional neural network comprises a first convolution-pooling layer, a second convolution-pooling layer, a first fully-connected layer, a second fully-connected layer and a plane layer which are sequentially connected, wherein 6 convolution kernels form the first convolution-pooling layer, 16 convolution kernels form the second convolution-pooling layer, the first fully-connected layer is 120 in size, the second fully-connected layer is 84 in size, the convolution filter is 5 in kernel size, and the step is 1; the two fully connected layers convert the output of the second convolution-pooling layer into linear input of a plane layer, the fully connected layers use sigmoid activation functions for binary classification, the network selects an Adam training strategy, the distance between a prediction result and a data mark is calculated by using a binary cross entropy loss function, the maximum iteration number of training is 50, and the training is terminated in advance after the precision stops increasing for 10 iteration periods.
Compared with the prior art, the invention has the following beneficial effects: amino acids of different compositions can be identified from the mixture at room temperature and in humid air, and 100% accuracy can be achieved. The method overcomes the influence of drought conditions such as high humidity and uneven particle size distribution in the particles, and has the potential of qualitatively detecting the biomolecules in real life scenes such as customs, airports and the like.
Description of the drawings:
fig. 1 is a diagram of an experimental apparatus of a transmission-type terahertz time-domain system adopted in the present invention.
FIG. 2a is a terahertz time-domain spectrogram of a certain sampling point in an amino acid mixture sample.
Fig. 2b is a terahertz spectrogram of a certain sampling point in an amino acid mixture sample.
FIG. 2c is an absorption spectrum of L-leucine.
FIG. 2d is an absorption spectrum of Ltyrosine.
FIG. 2e is an absorption spectrum of a 1:1 mixture of Lleucine and Ltyrosine.
FIG. 3a is the chemical diagram of pure L leucine.
FIG. 3b is a chemical diagram of pure Ltyrosine.
FIG. 3c is a chemical diagram of a 1:1:1 mixture of Lleucine, Ltyrosine and peek. The red dots represent L tyrosine, the green dots represent L leucine, and the yellow dots represent the coexistence of the two.
FIG. 4a is a photograph of a compressed tablet of a 1:1 binary mixture of 6 amino acids, L leucine and L valine, L leucine and DL tyrosine, L valine and DL tyrosine, L leucine and L tyrosine, DL tyrosine and L tyrosine, and L tyrosine and L valine in that order from top left to bottom right.
FIG. 4b is a chemical diagram of the L-leucine identified after a component spatial pattern analysis (CSP analysis) of the 6 tablets in FIG. 4 a. The L-leucine containing regions of the pellet were clearly lighter than the other regions.
FIG. 4c is a chemical diagram of the identified Ltyrosine after component space pattern analysis (CSP analysis) of the 6 tablets in FIG. 4 a. The area of the Ltyrosine-containing pellet was clearly lighter than the other areas.
FIG. 4d is an absorption spectrum chart of four amino acids, DL-Tyrosine, L-Leucine (L Leucine), L-Valine (L Valine), and L-Tyrosine.
Fig. 5a is a frame diagram of a convolutional neural network according to the present invention.
Fig. 5b is a loss function trained using a convolutional neural network.
FIG. 5c is the accuracy of L-leucine testing using a convolutional neural network.
FIG. 5d shows the accuracy of L-leucine testing using a convolutional neural network.
Fig. 5e is a plot of recall as a function of training.
FIG. 6a is a graph of a class confusion matrix for L-leucine recognition results.
FIG. 6b is a class confusion matrix chart of the results of Ltyrosine recognition.
The specific implementation mode is as follows:
the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.
Example 1:
as shown in fig. 1, the transmissive terahertz time-domain system employed in the present embodiment includes a femtosecond laser 1, an optical delay line 2, a transmitter PCA3, a sample stage 4, and a receiver PCA 5.
The data acquisition process of the transmission-type terahertz time-domain system specifically comprises the following steps:
the femtosecond laser 1 emits a laser beam, the center wavelength of the input laser beam is 1560nm, the pulse width is about 100fs, the repetition rate is about 100MHz, and the power is about 80 mW. After passing through the linear polarizer 6 and the half-wave plate 7, the input beam is equally split into a pump beam and a probe beam for terahertz generation and detection, respectively. The probe beam is translated through the optical delay line 2 (sampling range 120ps) so that the laser pulse is modulated by the amplitude of the received terahertz pulse, and then converted into an analog electrical signal for processing by a computer system. The terahertz beam from emitter PCA3 was collimated and focused by two TPX lenses and collimated and focused by the other two TPX lenses to receiver PCA 5. The lateral resolution of the transmission terahertz time-domain system is 30 μm. The sample is placed on a sample stage 4 and held by a clamp, the sample stage 4 being controlled in translation by a computer system. The THz wave is passed through the sample and time domain spectral data of the sample is measured and collected. The step length of the scanning of the transmission type terahertz time-domain system is 0.5mm on the horizontal axis and the vertical axis. The experiment was carried out at a temperature of 25 ℃ and a humidity of around 40%.
The method for qualitatively identifying the amino acid mixture based on the terahertz spectrum comprises the following steps:
step 1: mixing and tabletting any two of the four amino acids of DL tyrosine, L leucine, L valine and L tyrosine with polyether ether ketone (peek) according to different proportions to obtain six amino acid mixture samples, as shown in figure 4 a. Polyetheretherketone (peek) is a special engineering plastic and is transparent in the terahertz wave band.
Step 2: the method comprises the steps of scanning an amino acid mixture sample point by adopting a transmission type terahertz time-domain system to obtain time-domain spectral data (shown as a figure 2a) corresponding to each sample sampling point, converting the time-domain spectral data into spectral data (shown as a figure 2b) through Fourier transformation, and calculating the absorption rate by adopting formulas (1) - (2) to obtain an absorption spectrogram, namely a one-dimensional absorption spectrum (shown as figures 2c, 2d and 2e) of the sample sampling point.
Where n (ω) represents the refractive index, ω is the angular frequency, φ (ω) represents the phase change caused by propagation in the amino acid mixture sample, c is the speed of light in vacuum, d is the thickness of the amino acid mixture sample, α (ω) represents the absorbance, and ρ (ω) represents the ratio of the amplitudes of the amino acid mixture sample and the reference signal.
And step 3: terahertz spectrum data matrix F for constructing amino acid mixture sampleN×LCalculating to obtain a matrix P by adopting a formula (4)M×LI.e. the chemical diagram of a sample of the amino acid mixture.
[PM×L]=([SN×M]t[SN×M])-1[SN×M]t[FN×L], (4)
In the formula, L represents a pixel point on the amino acid mixture sample, i.e., a sampling point (a two-dimensional coordinate of the sampling point on the sample is converted into a one-dimensional coordinate in advance, corresponding to L), and N represents N terahertz spectral data frequency components corresponding to each pixel point, i.e., FN×LIs a matrix composed of N frequency components of terahertz absorption spectra of L sampling points in the amino acid mixture sample SN×MThe terahertz spectrum matrix represents known M amino acids, and the terahertz absorption spectrum data of each amino acid has N frequency components, namely SN×MIs a matrix composed of N frequency components of terahertz absorption spectra of known M amino acids, PM×LThe probability of the occurrence of a certain amino acid at each pixel point is also described as a chemical diagram. In this example, SN×MA matrix consisting of N frequency components representing the terahertz absorption spectra of the four amino acids employed, FN×LSetting i to correspond to L-tyrosine and P as a matrix consisting of N frequency components of terahertz absorption spectra of L sampling points (all sampling points) in any sampleijIndicating the probability of the occurrence of L-tyrosine at pixel j.
And 4, step 4: p is obtained by adopting formula (5)M×LRemoving background by each data self-adaptive threshold, setting the pixel value higher than the threshold to be 1, and otherwise, setting the pixel value to be 0 to obtain a processed chemical diagram;
C1=min(P)+C0[max(P)-min(P)] (5)
in the formula, C1Represents a threshold value of 0; max (P) is the matrix PM×LMaximum of middle element, min (P) is matrix PM×LMinimum of medium element; c0Is [ max: (P)-min(P)]The weight of the value range is set herein to 0.6.
And 5: a convolutional neural network was constructed based on LeNet-5, the 50 chemical maps obtained from 25 scans of two sets of six amino acid mixture tabletted samples were divided into 300 images (2 × 6 × 25 ═ 300), 60% of the images were used as training data and sent to the convolutional neural network for training, and after four iteration periods, the network converged (as shown in fig. 5 b), and the training was completed.
As shown in fig. 5a, the convolutional neural network includes a first convolutional-pooling layer (Conv1), a second convolutional-pooling layer (Conv2), a first fully-connected layer (FC1), a second fully-connected layer (FC2), and a planar layer (FC3) connected in sequence. The first convolution-pooling layer (Conv1) is composed of 6 convolution kernels, the second convolution-pooling layer (Conv2) is composed of 16 convolution kernels, the size of the first fully-connected layer (FC1) is 120, and the size of the second fully-connected layer (FC2) is 84. The kernel size of the convolution filter is 5, and the step size is 1; the two fully-connected layers convert the output of the second convolution-pooling layer (Conv2) to linear inputs of the planar layer, and the fully-connected layers are binned using a sigmoid activation function. The network selects an Adam training strategy and calculates the distance between the prediction result and the data label by using a binary cross entropy loss function. The maximum number of iterations of the training is 50, and the training is terminated early after the precision stops increasing by 10 iteration periods.
Step 6: inputting the rest 40% of images as test data into the trained convolutional neural network for testing, and outputting the corresponding amino acid types of the sampling points in the sample, as shown in fig. 4b and 4c, the recognition accuracy of the mixture components reaches 100%, and the recognition accuracy also reaches 100%, as shown in fig. 5c and 5 d. In FIG. 6a, 31 represents True Positive, 0 represents False Negative, and 27 represents True Negative, so that 31L-leucine-containing samples were correctly identified, 27L-leucine-free samples were also correctly identified, and the classification confusion matrix chart of the identification results. Similarly, in fig. 6b, 32 represents True positive, 0 represents False Negative, and 30 represents True Negative, so that 31L-tyrosine-containing samples are correctly identified, 30L-tyrosine-free samples are also correctly identified, and the classification confusion matrix map of the identification result is obtained.
Claims (2)
1. A method for qualitatively identifying an amino acid mixture based on a terahertz spectrum is characterized by comprising the following steps:
step 1: scanning the sample point by adopting a transmission type terahertz time-domain system to obtain terahertz time-domain spectral data corresponding to the sampling point of the amino acid mixture sample;
step 2: converting time domain spectrum data into frequency spectrum data through Fourier transform, and calculating the absorption rate by adopting formulas (1) to (2) to obtain an absorption spectrogram of a sampling point of the amino acid mixed sample to be detected;
where n (ω) represents the refractive index, ω is the angular frequency, φ (ω) represents the phase change induced by propagation in the sample, c is the speed of light in vacuum, d is the sample thickness, α (ω) represents the absorption, ρ (ω) represents the amplitude ratio of the sample and reference signals;
and step 3: terahertz spectrum data matrix F for constructing amino acid mixture sampleN×LCalculating to obtain a matrix P by adopting a formula (4)M×LI.e. the chemical diagram of a sample of the amino acid mixture;
[PM×L]=([SN×M]t[SN×M])-1[SN×M]t[FN×L], (4)
in the formula, L represents a pixel point on the amino acid mixture sample, i.e., a sampling point (a two-dimensional coordinate of the sampling point on the sample is converted into a one-dimensional coordinate in advance, corresponding to L), and N represents N terahertz spectral data frequency components corresponding to each pixel point, i.e., FN×LFor L samples of amino acid mixturesA matrix composed of N frequency components of the terahertz absorption spectrum, SN×MThe terahertz spectrum matrix represents known M amino acids, and the terahertz absorption spectrum data of each amino acid has N frequency components, namely SN×MIs a matrix composed of N frequency components of terahertz absorption spectra of known M amino acids, PM×LRepresenting the probability of each pixel point appearing in a certain amino acid, also depicted as a chemical diagram;
and 4, step 4: p is obtained by adopting formula (5)M×LEach data in the data acquisition unit is subjected to self-adaptive threshold value background removal, the pixel value higher than the threshold value is set to be 1, otherwise, the pixel value is 0, and a processed chemical diagram is obtained;
C1=min(P)+C0[max(P)-min(P)] (5)
in the formula, C1Represents a threshold value of 0; max (P) is the matrix PM×LMaximum of middle element, min (P) is matrix PM×LMinimum of medium element; c0Is [ max (P) -min (P)]A weight for the range of values set to 0.6;
and 5: constructing a convolutional neural network based on LeNet-5, and training the convolutional neural network;
step 6: and (4) inputting the chemical diagram of the amino acid mixture sample to be detected obtained according to the steps 1 to 4 into a trained convolutional neural network, and outputting the amino acid types contained in the sample sampling point.
2. The method for qualitatively identifying amino acid mixtures based on terahertz spectroscopy as claimed in claim 1, wherein the convolutional neural network comprises a first convolution-pooling layer, a second convolution-pooling layer, a first fully-connected layer, a second fully-connected layer and a planar layer which are connected in sequence, wherein 6 convolution kernels form the first convolution-pooling layer, and 16 convolution kernels form the second convolution-pooling layer, the first fully-connected layer has a size of 120, the second fully-connected layer has a size of 84, the convolution filter has a kernel size of 5, and the step is 1; the two fully connected layers convert the output of the second convolution-pooling layer into linear input of a plane layer, the fully connected layers use sigmoid activation functions for binary classification, the network selects an Adam training strategy, the distance between a prediction result and a data mark is calculated by using a binary cross entropy loss function, the maximum iteration number of training is 50, and the training is terminated in advance after the precision stops increasing for 10 iteration periods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210106259.9A CN114518337A (en) | 2022-01-28 | 2022-01-28 | Method for qualitatively identifying amino acid mixture based on terahertz spectrum |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210106259.9A CN114518337A (en) | 2022-01-28 | 2022-01-28 | Method for qualitatively identifying amino acid mixture based on terahertz spectrum |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114518337A true CN114518337A (en) | 2022-05-20 |
Family
ID=81596915
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210106259.9A Pending CN114518337A (en) | 2022-01-28 | 2022-01-28 | Method for qualitatively identifying amino acid mixture based on terahertz spectrum |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114518337A (en) |
-
2022
- 2022-01-28 CN CN202210106259.9A patent/CN114518337A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mittleman et al. | Gas sensing using terahertz time-domain spectroscopy. | |
CN101105446B (en) | Differential optical absorption spectroscopy air quality detection system | |
JPH10153547A (en) | Analysis processing method for medium | |
Jiang et al. | Machine learning and application in terahertz technology: A review on achievements and future challenges | |
Bird et al. | High definition infrared chemical imaging of colorectal tissue using a Spero QCL microscope | |
Reddy et al. | Accurate histopathology from low signal-to-noise ratio spectroscopic imaging data | |
US20090210447A1 (en) | Processing and analyzing hyper-spectral image data and information via dynamic database updating | |
CN112862077B (en) | System and method for replacing traditional spectrometer by multimode optical fiber and deep learning network | |
CN110542668A (en) | method for quantitatively analyzing component distribution condition of blade based on terahertz imaging technology | |
CN105486655A (en) | Rapid detection method for organic matters in soil based on infrared spectroscopic intelligent identification model | |
CN110632002A (en) | Aperture coding spectrum detection device based on compressed sensing | |
CN116399850B (en) | Spectrum detection and identification system for optical signal processing and detection method thereof | |
CN211927689U (en) | Spectrum detection device | |
Zhou et al. | Research on hyperspectral regression method of soluble solids in green plum based on one-dimensional deep convolution network | |
WO2023231903A1 (en) | Spectrometer suitable for detecting trace elements in agricultural product, and application thereof | |
CN114518337A (en) | Method for qualitatively identifying amino acid mixture based on terahertz spectrum | |
CN116883720A (en) | Fruit and vegetable pesticide residue detection method and system based on spatial spectrum attention network | |
AU2021102705A4 (en) | THz-TDS image defocus processing method based on deep learning | |
CN113092402B (en) | Non-contact substance terahertz characteristic spectrum detection and identification system and method | |
Brigada et al. | Chemical identification with information-weighted terahertz spectrometry | |
CN211148422U (en) | Aperture coding spectrum detection device based on compressed sensing | |
CN210376134U (en) | Terahertz-based indoor environmental pollutant detection device | |
CN110443301B (en) | Liquid dangerous goods identification method based on double-layer feature classification | |
CN114112985A (en) | Near-infrared spectrometer and near-infrared online detection method | |
Yeo | Terahertz spectroscopic characterization and imaging for biomedical applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |